前些天的教程:直接為CellPhoneDB創(chuàng)建一個獨立的conda環(huán)境,以及:把Seurat對象里面表達(dá)量矩陣和細(xì)胞表型信息輸出給CellPhoneDB做細(xì)胞通訊,給大家演示了如何對pbmc3k單細(xì)胞數(shù)據(jù)集做細(xì)胞通訊,并且在:CellPhoneDB的單細(xì)胞通訊結(jié)果的理解 給大家演示了細(xì)胞通訊結(jié)果的多個txt文件的含義,前面的代碼是:
conda activate cellphonedb
cellphonedb method statistical_analysis test_meta.txt test_counts.txt \
--counts-data hgnc_symbol --threads 4
# 必須要保證當(dāng)前路徑下面有前面的步驟輸出的out文件夾哦
cellphonedb plot dot_plot
cellphonedb plot heatmap_plot cellphonedb_meta.txt
# 做完后為了跟前面的區(qū)分,我把 out文件夾,修改名字為 out-by-symbol 文件夾啦
把 out文件夾,修改名字為 out-by-symbol文件夾,其結(jié)構(gòu)如下所示:
ls -lh out-by-symbol|cut -d" " -f 7-
1.5K 2 12 09:43 count_network.txt
64K 2 12 09:30 deconvoluted.txt
11K 2 12 09:43 heatmap_count.pdf
11K 2 12 09:43 heatmap_log_count.pdf
113B 2 12 09:43 interaction_count.txt
149K 2 12 09:30 means.txt
803K 2 12 09:43 plot.pdf
128K 2 12 09:30 pvalues.txt
61K 2 12 09:30 significant_means.txt
如果你對這些文件的理解還不夠,繼續(xù)看 :CellPhoneDB的單細(xì)胞通訊結(jié)果的理解
接下來我們對這些文件進(jìn)行高度定制化的可視化!
我們?nèi)庋劭吹模?/p>
all_sum
Memory_CD4_T 101
B 105
CD14_Mono 157
NK 171
CD8_T 100
Naive_CD4_T 68
FCGR3A_Mono 241
DC 180
Platelet 60
同樣是單核細(xì)胞, CD14_Mono 和 FCGR3A_Mono 的總通訊數(shù)量不一樣,那就看看它們跟兩種CD4的T細(xì)胞(Memory_CD4_T和Naive_CD4_T)的通訊情況吧。
主要是因為細(xì)胞亞群太多了,如果全部可視化出來,也很難給大家講清楚。
rm(list = ls())
getwd()
setwd('pbmc3k-CellPhoneDB/')
mypvals <- read.table("./out-by-symbol/pvalues.txt",header = T,sep = "\t",stringsAsFactors = F)
mymeans <- read.table("./out-by-symbol/means.txt",header = T,sep = "\t",stringsAsFactors = F)
# 我們重點看 CD14_Mono 和 FCGR3A_Mono ,以及 Memory_CD4_T和Naive_CD4_T 的通訊情況。
kp = grepl(pattern = "Mono", colnames(mypvals)) & grepl(pattern = "CD4_T", colnames(mypvals))
table(kp)
pos = (1:ncol(mypvals))[kp]
choose_pvalues <- mypvals[,c(c(1,5,6,8,9),pos )]
choose_means <- mymeans[,c(c(1,5,6,8,9),pos)]
logi <- apply(choose_pvalues[,5:ncol(choose_pvalues)]<0.05, 1, sum)
# 只保留具有細(xì)胞特異性的一些相互作用對
choose_pvalues <- choose_pvalues[logi>=1,]
# 去掉空值
logi1 <- choose_pvalues$gene_a != ""
logi2 <- choose_pvalues$gene_b != ""
logi <- logi1 & logi2
choose_pvalues <- choose_pvalues[logi,]
# 同樣的條件保留choose_means
choose_means <- choose_means[choose_means$id_cp_interaction %in% choose_pvalues$id_cp_interaction,]
得到了如下所示的兩個變量,choose_means和 choose_pvalues , 他們前面的5列是一樣的,但是后面只有8列,并不是 4X4的16列,說明很多單細(xì)胞亞群的兩兩組合之間其實并沒有統(tǒng)計學(xué)顯著的細(xì)胞通訊受體和配體對哦。
> head(choose_means,1)
id_cp_interaction gene_a gene_b receptor_a receptor_b CD14_Mono.Memory_CD4_T CD14_Mono.Naive_CD4_T
40 CPI-SS0703338F5 LGALS9 CD44 False True 0.862 0.803
FCGR3A_Mono.Memory_CD4_T FCGR3A_Mono.Naive_CD4_T Memory_CD4_T.CD14_Mono Memory_CD4_T.FCGR3A_Mono
40 0.949 0.89 0.552 0.71
Naive_CD4_T.CD14_Mono Naive_CD4_T.FCGR3A_Mono
40 0.513 0.67
> head(choose_pvalues,1)
id_cp_interaction gene_a gene_b receptor_a receptor_b CD14_Mono.Memory_CD4_T CD14_Mono.Naive_CD4_T
40 CPI-SS0703338F5 LGALS9 CD44 False True 0 0
FCGR3A_Mono.Memory_CD4_T FCGR3A_Mono.Naive_CD4_T Memory_CD4_T.CD14_Mono Memory_CD4_T.FCGR3A_Mono
40 0 0 0.888 0
Naive_CD4_T.CD14_Mono Naive_CD4_T.FCGR3A_Mono
40 1 1
可以看到之前是 302行, 是90多列,也就是說9個細(xì)胞亞群,每個都是自己跟包括自己在內(nèi)的9個單細(xì)胞亞群進(jìn)行通訊分析結(jié)果都存在,81種兩兩組合都存在 ,但是現(xiàn)在只剩下8個單細(xì)胞亞群組合啦,而且僅僅是19個受體配體對啦 :
> dim(mypvals)
[1] 302 92
> dim(mymeans)
[1] 302 92
> dim(choose_means)
[1] 19 13
> dim(choose_pvalues)
[1] 19 13
接下來對這19個受體配體對,在8個單細(xì)胞亞群組合的結(jié)果進(jìn)行簡單的可視化,雖然說簡單,但是代碼也有點長啊 :
# 將choose_pvalues和choose_means數(shù)據(jù)寬轉(zhuǎn)長
library(tidyverse)
meansdf <- choose_means %>% reshape2::melt()
meansdf <- data.frame(interacting_pair = paste0(meansdf$gene_a,"_",meansdf$gene_b),
CC = meansdf$variable,
means = meansdf$value)
pvalsdf <- choose_pvalues %>% reshape2::melt()
pvalsdf <- data.frame(interacting_pair = paste0(pvalsdf$gene_a,"_",pvalsdf$gene_b),
CC = pvalsdf$variable,
pvals = pvalsdf$value)
# 合并p值和mean文件
pvalsdf$joinlab<- paste0(pvalsdf$interacting_pair,"_",pvalsdf$CC)
meansdf$joinlab<- paste0(meansdf$interacting_pair,"_",meansdf$CC)
pldf <- merge(pvalsdf,meansdf,by = "joinlab")
# dotplot可視化
summary((filter(pldf,means >0))$means)
head(pldf)
pcc = pldf%>% filter(means >0) %>%
ggplot(aes(CC.x,interacting_pair.x) )+
geom_point(aes(color=means,size=-log10(pvals+0.0001)) ) +
scale_size_continuous(range = c(1,3))+
scale_color_gradient2(high="red",mid = "yellow",low ="darkblue",midpoint = 1 )+
theme_bw()+
# scale_color_manual(values = rainbow(100))+
theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))
pcc
如下所示:

可以看到,從mono到cd4方向的受體配體對,跟從 cd4到mono的,基本上是完全不一樣的哦。這19個受體配體對具體信息如下所示
> unique(pldf$interacting_pair.x)
[1] "ALOX5_ALOX5AP" "ANXA1_FPR1" "C5AR1_RPS19" "CD2_CD58" "CD28_CD86" "CD47_SIRPG"
[7] "CD52_SIGLEC10" "CD74_MIF" "CD99_PILRA" "HLA-F_LILRB1" "HLA-F_LILRB2" "LAMP1_VSTM1"
[13] "LGALS9_CD44" "LGALS9_CD47" "LGALS9_SORL1" "MIF_TNFRSF14" "PLXNB2_SEMA4D" "SCGB3A1_MARCO"
[19] "SELL_SELPLG"
而且,肉眼上看,是 CD74_MIF 和 C5AR1_RPS19 這兩個配對比較明顯, 讓我們看看他們的表達(dá)量情況,代碼如下所示:
library(SeuratData) #加載seurat數(shù)據(jù)集
getOption('timeout')
options(timeout=10000)
#InstallData("sce")
data("pbmc3k")
sce <- pbmc3k.final
library(Seurat)
table(Idents(sce))
sce = sce[,Idents(sce) %in% c('Naive CD4 T', 'Memory CD4 T' , 'CD14+ Mono','FCGR3A+ Mono')]
DimPlot(sce,label = T,repel = T)
cg = unique(unlist(str_split(unique(pldf[,2]),'_')) )
library(ggplot2)
th= theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))
pdopt <- DotPlot(sce, features = cg,
assay='RNA' ) + coord_flip() +th
library(patchwork)
pdopt + DimPlot(sce,label = T,repel = T)
pdopt + pcc
出結(jié)果如下所示:

是不是很簡單,前面的CD74_MIF 和 C5AR1_RPS19 這兩個配對情況,在mono到cd4的方向比較明顯,所以它們就是CD74和C5AR1在mono高表達(dá),而MIF和RPS19在cd4的t細(xì)胞里面高表達(dá)。
所謂的細(xì)胞通訊,就是差異分析從一個基因單位到兩個基因
我們差異分析很容易找到不同單細(xì)胞亞群各自的高表達(dá)量基因,但是這些基因在每個單細(xì)胞亞群是獨立的列表,所謂的單細(xì)胞通訊,就是找到那些不同單細(xì)胞亞群的特異性基因的兩兩組合,而且是具有受體配體的生物學(xué)背景知識的那些。