Metacell或者中文直接翻譯叫元細(xì)胞大家相比都聽過。簡單點(diǎn)說就是cell合并,將相似的細(xì)胞合并。為什么說這個(gè),主要是為了compass單細(xì)胞轉(zhuǎn)錄組代謝分析((視頻教程)Compass代謝分析詳細(xì)流程及python版-R語言版下游分析和可視化), 想做compass的小伙伴應(yīng)該都了解了,它的那個(gè)分析速度感人,那么compasss官網(wǎng)也說了,用metacell可能是一個(gè)好的辦法。所以這里我們介紹一個(gè)metacell分析的工具Supercell,是基于R語言的分析,分析比較簡單!supercell和其他單細(xì)胞metacell分析方法雖然在操作上有所不同,但是原理一樣,都是network-based的。 1、單細(xì)胞網(wǎng)絡(luò)是基于細(xì)胞間的相似性(在轉(zhuǎn)錄組空間)來計(jì)算的。2、高度相似的細(xì)胞被認(rèn)為是那些在單細(xì)胞網(wǎng)絡(luò)中形成密集區(qū)域并合并成元細(xì)胞(coarse-graining)的細(xì)胞。3、將每個(gè)元細(xì)胞內(nèi)的轉(zhuǎn)錄組信息組合(平均值或總和)。4、Metacell data可以替代大規(guī)模單細(xì)胞數(shù)據(jù)用于下游分析。 supercell github : https://github.com/GfellerLab/SuperCell 接下來具體演示以下,首先安裝包: setwd('D:\\KS項(xiàng)目\\公眾號文章\\supercell_單細(xì)胞數(shù)據(jù)合并')if (!requireNamespace("remotes")) install.packages("remotes")remotes::install_github("GfellerLab/SuperCell")library(SuperCell)library(Seurat) 如果你是多樣本數(shù)據(jù),我建議分析單個(gè)運(yùn)行,這樣避免不同組的celltype合并,我們得到的metacell確保只是一個(gè)組的,不影響后續(xù)分析。 uterus <- readRDS("D:/KS項(xiàng)目/公眾號文章/uterus.rds")table(uterus$orig.ident)# AEH EEC HC # 9525 12033 6356AEH <- subset(uterus, orig.ident=='AEH')#這里我們提取一個(gè)分組的數(shù)據(jù)進(jìn)行單個(gè)的演示 Building metacell based on expression matrix:就一個(gè)函數(shù)。 exp_mat <- GetAssayData(AEH, layer = "data", assay = 'RNA')AEH <- FindVariableFeatures(AEH)hvg <- VariableFeatures(AEH)SC <- SCimplify(exp_mat, # gene expression matrix 基因表達(dá)矩陣 k.knn = 5, # number of nearest neighbors to build kNN network gamma = 20, # graining level,初始數(shù)據(jù)集中的單細(xì)胞數(shù)與最終數(shù)據(jù)集中的元細(xì)胞數(shù)的比例 genes.use = hvg) # 用于PCA降維基因數(shù)量 提取metacell矩陣: SC.GE <- supercell_GE(exp_mat, SC$membership) metacell注釋,合并后我們得分配cell type,并對結(jié)果評估。
SC$cell_line <- supercell_assign(clusters = AEH$celltype, # 單細(xì)胞注釋 supercell_membership = SC$membership, # single-cell assignment to metacells method = "jaccard")#assign的方法有c("jaccard", "relative", "absolute")
# plot network of metacells colored by cell line assignment supercell_plot(SC$graph.supercells, group = SC$cell_line, main = "Metacells colored by cell line assignment"
purity <- supercell_purity(clusters = AEH$celltype, supercell_membership = SC$membership, method = 'entropy')hist(purity, main = "Purity of metacells \nin terms of cell line composition")SC$purity <- purity supercell還可以將metacell結(jié)果返回seurat:
Super_sce <- supercell_2_Seurat(SC.GE = as.matrix(SC.GE), # supercell_GE(exp_mat, SC$membership)獲得的metacell表達(dá)矩陣 SC = SC, #super-cell (output of SCimplify function) fields = c("cell_line", "purity")#需要添加的信息,其實(shí)最主要的就是metacell annotation )
# Performing log-normalization # 0% 10 20 30 40 50 60 70 80 90 100% # [----|----|----|----|----|----|----|----|----|----| # **************************************************| # [1] "Done: NormalizeData" # [1] "Doing: data to normalized data" # [1] "Doing: weighted scaling" # [1] "Done: weighted scaling" # Computing nearest neighbor graph # Computing SNN # Warning message: # 2116 instances of variables with zero scale detected!
Super_sce <- RunUMAP(Super_sce,dims = 1:10) Super_sce <- FindClusters(Super_sce, graph.name = "RNA_nn") Idents(Super_sce) <- 'cell_line' DimPlot(Super_sce, label = T)

如果你的數(shù)據(jù)很大,比如有幾十萬的細(xì)胞,下游分析有很吃力,那么metacell分析就是一個(gè)不錯(cuò)的選擇,這種合并我認(rèn)為比單純的抽樣分析可能更靠譜。其實(shí)我們的目的只是想得到metacell矩陣,然后就可以進(jìn)行compass或者pyscenic分析。這里我們簡單驗(yàn)證以下metacell效果,結(jié)果是不是與原始的數(shù)據(jù)一樣呢?我們隨便分析兩種細(xì)胞的差異基因,當(dāng)然因?yàn)榫仃嚫淖兘Y(jié)果肯定不是一模一樣,但是一致性還挺強(qiáng)!
de_orig <- FindMarkers(AEH, ident.1 = "Unciliated epithelial cells", ident.2 = "Ciliated epithelial cells")
de_super <- FindMarkers(Super_sce, ident.1 = "Unciliated epithelial cells", ident.2 = "Ciliated epithelial cells")
library(Vennerable) Set1 <- as.list(rownames(de_orig[de_orig$p_val <= 0.05 & abs(de_orig$avg_log2FC)>=0.25,])) Set2 <- as.list(rownames(de_super[de_super$p_val <= 0.05 & abs(de_super$avg_log2FC)>=0.25,])) example <-list(Set1=Set1,Set2=Set2)
Veenplot <- Venn(example) Veenplot<-Veenplot[, c("Set1", "Set2")] plot(Veenplot, doWeights = TRUE)
same_gene <- Veenplot@IntersectionSets$`11` data <- cbind(de_orig[de_orig$p_val <= 0.05 & abs(de_orig$avg_log2FC)>=0.25,][same_gene,][,2], de_super[de_super$p_val <= 0.05 & abs(de_super$avg_log2FC)>=0.25,][same_gene,][,2]) colnames(data) <- c("de_orig", "de_super") data <- as.data.frame(data) library(ggpubr) ggscatter(data,x="de_orig",y="de_super", add = "reg.line", conf.int = T, color = '#0f8096')+ stat_cor(label.x = 0.2, label.y = 0)
 覺得我們分享有些用的,點(diǎn)個(gè)贊再走唄!
|