【原】Supercell: 單細(xì)胞轉(zhuǎn)錄組metacell分析(詳細(xì)注釋版）

TS的美夢 2024-12-11

展開全文

Metacell或者中文直接翻譯叫元細(xì)胞大家相比都聽過。簡單點(diǎn)說就是cell合并，將相似的細(xì)胞合并。為什么說這個(gè)，主要是為了compass單細(xì)胞轉(zhuǎn)錄組代謝分析（（視頻教程）Compass代謝分析詳細(xì)流程及python版-R語言版下游分析和可視化), 想做compass的小伙伴應(yīng)該都了解了，它的那個(gè)分析速度感人，那么compasss官網(wǎng)也說了，用metacell可能是一個(gè)好的辦法。所以這里我們介紹一個(gè)metacell分析的工具Supercell，是基于R語言的分析，分析比較簡單！supercell和其他單細(xì)胞metacell分析方法雖然在操作上有所不同，但是原理一樣，都是network-based的。

1、單細(xì)胞網(wǎng)絡(luò)是基于細(xì)胞間的相似性（在轉(zhuǎn)錄組空間）來計(jì)算的。
2、高度相似的細(xì)胞被認(rèn)為是那些在單細(xì)胞網(wǎng)絡(luò)中形成密集區(qū)域并合并成元細(xì)胞（coarse-graining）的細(xì)胞。
3、將每個(gè)元細(xì)胞內(nèi)的轉(zhuǎn)錄組信息組合（平均值或總和）。
4、Metacell data可以替代大規(guī)模單細(xì)胞數(shù)據(jù)用于下游分析。

supercell github : https://github.com/GfellerLab/SuperCell

接下來具體演示以下，首先安裝包：

setwd('D:\\KS項(xiàng)目\\公眾號文章\\supercell_單細(xì)胞數(shù)據(jù)合并')if (!requireNamespace("remotes")) install.packages("remotes")remotes::install_github("GfellerLab/SuperCell")library(SuperCell)library(Seurat)

如果你是多樣本數(shù)據(jù)，我建議分析單個(gè)運(yùn)行，這樣避免不同組的celltype合并，我們得到的metacell確保只是一個(gè)組的，不影響后續(xù)分析。

uterus <- readRDS("D:/KS項(xiàng)目/公眾號文章/uterus.rds")table(uterus$orig.ident)# AEH   EEC    HC # 9525 12033  6356AEH <- subset(uterus, orig.ident=='AEH')#這里我們提取一個(gè)分組的數(shù)據(jù)進(jìn)行單個(gè)的演示

Building metacell based on expression matrix:就一個(gè)函數(shù)。

exp_mat <- GetAssayData(AEH, layer = "data", assay = 'RNA')AEH <- FindVariableFeatures(AEH)hvg <- VariableFeatures(AEH)SC <- SCimplify(exp_mat,  # gene expression matrix 基因表達(dá)矩陣                k.knn = 5, # number of nearest neighbors to build kNN network                gamma = 20, # graining level，初始數(shù)據(jù)集中的單細(xì)胞數(shù)與最終數(shù)據(jù)集中的元細(xì)胞數(shù)的比例                genes.use = hvg) # 用于PCA降維基因數(shù)量

提取metacell矩陣：

SC.GE <- supercell_GE(exp_mat, SC$membership)

metacell注釋，合并后我們得分配cell type，并對結(jié)果評估。

SC$cell_line <- supercell_assign(clusters = AEH$celltype, # 單細(xì)胞注釋                                 supercell_membership = SC$membership, # single-cell assignment to metacells                                 method = "jaccard")#assign的方法有c("jaccard", "relative", "absolute")
# plot network of metacells colored by cell line assignment supercell_plot(SC$graph.supercells,                group = SC$cell_line,                main = "Metacells colored by cell line assignment"

purity <- supercell_purity(clusters = AEH$celltype,                            supercell_membership = SC$membership,                            method = 'entropy')hist(purity, main = "Purity of metacells \nin terms of cell line composition")SC$purity <- purity

supercell還可以將metacell結(jié)果返回seurat：

Super_sce <- supercell_2_Seurat(SC.GE = as.matrix(SC.GE), # supercell_GE(exp_mat, SC$membership)獲得的metacell表達(dá)矩陣                               SC = SC, #super-cell (output of SCimplify function)                               fields = c("cell_line", "purity")#需要添加的信息，其實(shí)最主要的就是metacell annotation                               )
# Performing log-normalization# 0%   10   20   30   40   50   60   70   80   90   100%#   [----|----|----|----|----|----|----|----|----|----|#      **************************************************|#      [1] "Done: NormalizeData"#    [1] "Doing: data to normalized data"#    [1] "Doing: weighted scaling"#    [1] "Done: weighted scaling"#    Computing nearest neighbor graph#    Computing SNN#    Warning message:#      2116 instances of variables with zero scale detected! 

Super_sce <- RunUMAP(Super_sce,dims = 1:10)Super_sce <- FindClusters(Super_sce, graph.name = "RNA_nn") Idents(Super_sce)  <- 'cell_line'DimPlot(Super_sce, label = T)

如果你的數(shù)據(jù)很大，比如有幾十萬的細(xì)胞，下游分析有很吃力，那么metacell分析就是一個(gè)不錯(cuò)的選擇，這種合并我認(rèn)為比單純的抽樣分析可能更靠譜。其實(shí)我們的目的只是想得到metacell矩陣，然后就可以進(jìn)行compass或者pyscenic分析。這里我們簡單驗(yàn)證以下metacell效果，結(jié)果是不是與原始的數(shù)據(jù)一樣呢？我們隨便分析兩種細(xì)胞的差異基因，當(dāng)然因?yàn)榫仃嚫淖兘Y(jié)果肯定不是一模一樣，但是一致性還挺強(qiáng)！

de_orig <- FindMarkers(AEH, ident.1 = "Unciliated epithelial cells",                       ident.2 = "Ciliated epithelial cells")

de_super <- FindMarkers(Super_sce, ident.1 = "Unciliated epithelial cells",                        ident.2 = "Ciliated epithelial cells")

library(Vennerable)Set1 <- as.list(rownames(de_orig[de_orig$p_val <= 0.05 & abs(de_orig$avg_log2FC)>=0.25,]))Set2 <- as.list(rownames(de_super[de_super$p_val <= 0.05 & abs(de_super$avg_log2FC)>=0.25,]))example <-list(Set1=Set1,Set2=Set2)
Veenplot <- Venn(example)Veenplot<-Veenplot[, c("Set1", "Set2")]plot(Veenplot, doWeights = TRUE)


same_gene <- Veenplot@IntersectionSets$`11`data <- cbind(de_orig[de_orig$p_val <= 0.05 & abs(de_orig$avg_log2FC)>=0.25,][same_gene,][,2],              de_super[de_super$p_val <= 0.05 & abs(de_super$avg_log2FC)>=0.25,][same_gene,][,2])colnames(data) <- c("de_orig", "de_super")data <- as.data.frame(data)library(ggpubr)ggscatter(data,x="de_orig",y="de_super",                         add = "reg.line",                         conf.int = T,                         color = '#0f8096')+    stat_cor(label.x = 0.2, label.y = 0)