【原】筆記 | GWAS 操作流程3：plink關(guān)聯(lián)分析--完結(jié)篇

育種數(shù)據(jù)分析 2021-11-18

展開全文

飛哥注：具體數(shù)據(jù)文件，查看之前的系列博文。這里使用的是清洗后的數(shù)據(jù)。

「文件」

HapMap_3_r3_11.bed  HapMap_3_r3_11.bim  HapMap_3_r3_11.fam  HapMap_3_r3_11.log

1. 查看數(shù)據(jù)

這里，文件是bed文件，二進制不方便查看，我們將其轉(zhuǎn)化為ped文件和map文件

注意，這里我使用的是ped和map格式，如果ped文件中有表型數(shù)據(jù)（第六列），如果想指定表型數(shù)據(jù)，用--pheno，包括三列：家系，個體，表型值。

plink --bfile HapMap_3_r3_11 --recode --out test

這里的數(shù)據(jù)：

基因型個體：110個
SNP個數(shù)：1073743

2. plink關(guān)聯(lián)分析的類型

?
**參考：**https://www.jianshu.com/p/286050959dbd?utm_campaign=haruki
?

2.1 閾值性狀（1,2）

plink的語境叫“case and control”，即表型值數(shù)據(jù)是兩類數(shù)據(jù)：1,2，其中0和-9都表示缺失?？梢赃x擇的方法有卡方檢驗和邏輯斯蒂回歸（X2關(guān)聯(lián)分析和logistic分析）。

「--assoc」，不允許有協(xié)變量
「--logistic」，允許有協(xié)變量，如果考慮協(xié)變量，速度變慢。比assoc速度慢。

2.2 連續(xù)性狀（定量性狀）

這里的性狀是連續(xù)性狀，也就是除了1,2，0，-9外還有其它數(shù)值，「--assoc」會進行T檢驗（Student's test），還可以用**--linear**進行分析。

「--assoc」，不允許有協(xié)變量，速度快
「--linear」，允許有協(xié)變量，速度慢

3. 控制假陽性

因為plink進行關(guān)聯(lián)分析時常常面對的是大量的SNP數(shù)據(jù)，容易產(chǎn)生假陽性，因此需要矯正。

Bonferroni，使用0.05/n計算出矯正后的p值作為閾值，其中n為檢測SNP的個數(shù)。缺點是SNP之間可能存在LD，而且這種方法過于保守，容易產(chǎn)生假陰性。
FDR，是一種最小化假陽性預(yù)測比例的方法

plink的解決方法是--adjust，生成多種類型的p值。

「文件說明：」

.*.adjusted (basic multiple-testing corrections)
Produced by --adjust.

A text file with a header line, and then one line per set or polymorphic variant with the following 8-11 fields:

CHRChromosome code. Not present with set tests.
'SNP'/'SET'Variant/set identifier
UNADJUnadjusted p-value
GCDevlin & Roeder (1999) genomic control corrected p-value. Requires an additive model.
QQP-value quantile. Only present with 'qq-plot' modifier.
BONFBonferroni correction
HOLMHolm-Bonferroni (1979) adjusted p-value
SIDAK_SS?idák single-step adjusted p-value
SIDAK_SD?idák step-down adjusted p-value
FDR_BHBenjamini & Hochberg (1995) step-up false discovery control
FDR_BYBenjamini & Yekutieli (2001) step-up false discovery control
Variants/sets are sorted in p-value order. (As a result, if the QQ field is present, its values just increase linearly.)

UNADJ：原始p值
GC：基因組矯正P值（依賴加性模型）
QQ：P-value的QQ圖
BONF：Bonferroni 矯正結(jié)果
HOLM：Holm-Bonferroni (1979) adjusted p-value，矯正結(jié)果
FDR_BYBenjamini & Yekutieli (2001) step-up false discovery control，F(xiàn)DR方法

4. 閾值性狀關(guān)聯(lián)分析

數(shù)據(jù)：觀測值一列是1和2，可以用的方法有：--assoc和--logistic

4.1 accoc關(guān)聯(lián)分析未矯正

plink --file test --assoc --out result

「結(jié)果文件說明：」

.assoc, .assoc.fisher (case/control association allelic test report)
Produced by --assoc acting on a case/control phenotype.

A text file with a header line, and then one line per variant typically with the following 9-10 fields:

CHRChromosome code
SNPVariant identifier
BPBase-pair coordinate
A1Allele 1 (usually minor)
F_AAllele 1 frequency among cases
F_UAllele 1 frequency among controls
A2Allele 2
CHISQAllelic test chi-square statistic. Not present with 'fisher'/'fisher-midp' modifier.
PAllelic test p-value
ORodds(allele 1 | case) / odds(allele 1 | control)
If the 'counts' modifier is present, the 5th and 6th fields are replaced with:

C_AAllele 1 count among cases
C_UAllele 1 count among controls
If --ci 0.xy has also been specified, there are three additional fields at the end:

SEStandard error of odds ratio estimate
LxyBottom of xy% symmetric approx. confidence interval for odds ratio
HxyTop of xy% approx. confidence interval for odds ratio

CHR 染色體
SNP SNPID
BP 物理位置
。。。
CHISQ 卡方值
P P值結(jié)果文件：

4.2 assoc關(guān)聯(lián)分析矯正p值

 plink --file test --assoc --adjust --out result_adjust

結(jié)果生成三個文件：

result_adjust.assoc           result_adjust.log
result_adjust.assoc.adjusted

查看矯正后的值：

4.3 logistic不考慮協(xié)變量

plink --file test --logistic --out result_logistic

結(jié)果生成：

result_logistic.assoc.logistic  result_logistic.log

4.4 logistic 不考慮協(xié)變量矯正p值

plink --file test --logistic --adjust --out result_logistic_adjust

結(jié)果：

result_logistic_adjust.assoc.logistic           result_logistic_adjust.log
result_logistic_adjust.assoc.logistic.adjusted

查看矯正值

如果考慮協(xié)變量，加上--covar即可。

5. 可視化

「把文件改名字，方便后面代碼里作圖，這樣不用修改代碼了?！?/strong>

cp result.assoc assoc_results cp result_logistic.assoc.logistic logistic_results.assoc.logistic
注意，logistic里面會有NA，所以我們這里講NA的行去掉。
awk '!/'NA'/' logistic_results.assoc.logistic >logistic_results.assoc2.logistic
5.1 曼哈頓圖
R代碼：注意，如果沒有安裝qqman，就運行代碼：install.packages("qqman")，安裝即可。
#install.packages("qqman",repos="http://cran.cnr./",lib="~" ) # location of installation can be changed but has to correspond with the library location #library("qqman",lib.loc="~") library(qqman) results_log <- read.table("logistic_results.assoc_2.logistic", head=TRUE) jpeg("Logistic_manhattan.jpeg") manhattan(results_log,chr="CHR",bp="BP",p="P",snp="SNP", main = "Manhattan plot: logistic") dev.off() results_as <- read.table("assoc_results.assoc", head=TRUE) jpeg("assoc_manhattan.jpeg") manhattan(results_as,chr="CHR",bp="BP",p="P",snp="SNP", main = "Manhattan plot: assoc") dev.off()
「assoc的卡方檢驗結(jié)果：」
「--logistic邏輯斯蒂回歸結(jié)果：」可以看到兩者結(jié)果類似，assoc檢測到了顯著性位點，logistic沒有檢測到顯著性位點。
5.2 QQ plot
R代碼：
#install.packages("qqman",repos="http://cran.cnr./",lib="~" ) # location of installation can be changed but has to correspond with the library location #library("qqman",lib.loc="~") library(qqman) results_log <- read.table("logistic_results.assoc_2.logistic", head=TRUE) jpeg("QQ-Plot_logistic.jpeg") qq(results_log$P, main = "Q-Q plot of GWAS p-values : log") dev.off() results_as <- read.table("assoc_results.assoc", head=TRUE) jpeg("QQ-Plot_assoc.jpeg") qq(results_as$P, main = "Q-Q plot of GWAS p-values : log") dev.off()
「--assoc結(jié)果：」
「--logistic結(jié)果：」
6. 注意
注意，這是閾值性狀的結(jié)果，分類性狀，可以使用assoc和logistic，連續(xù)性狀的話，如果沒有協(xié)變量就用assoc，如果有協(xié)變量，就用linear即可。
7. 總結(jié)
這是使用plink計算GWAS分析的流程，包括數(shù)據(jù)的清洗，以及建模，以及出結(jié)果，以及可視化。
但是，這只是一個開始，后面再研究一下GEMMA，R中的GAPIT的操作方法，最后能夠通過編碼進行GWAS的分析，才算是掌握了這個分析方法。

完結(jié)感言：

這一系列的學(xué)習筆記，主要是根據(jù)別人的教程，自己重演了一遍，后面使用其它軟件的數(shù)據(jù)，自己操作一遍是最好的。GWAS只是統(tǒng)計模型的應(yīng)用，如果自己能用編程語言實現(xiàn)效果更好。通過強迫自己輸入來輸出，對我而言，是比較有效的方法。反正這些筆記，對我自己是很有收獲的，當然里面也有一些尚未完全理解的地方，還需要后面繼續(xù)研究。希望對你也有一定的收獲，如果有什么問題，歡迎交流。

立個Flag：

嗶哩嗶哩上面，之前上傳了一些油管下載的視頻，留言中呼吁中文版。中文版字幕添加復(fù)雜，倒還不如自己重新錄一遍呢！所以先就這個GWAS教程，錄一下操作流程傳上去吧……我仿佛看到B站大佬向我招手（踹腳）。

GWAS學(xué)習筆記系列：

筆記 | GWAS 操作流程1：下載數(shù)據(jù)

筆記 | GWAS 操作流程2-1：缺失質(zhì)控

筆記 | GWAS 操作流程2-2：性別質(zhì)控

筆記 | GWAS 操作流程2-3：MAF過濾

筆記 | GWAS 操作流程2-4：哈溫平衡檢驗

筆記 | GWAS 操作流程2-5：雜合率檢驗

筆記 | GWAS 操作流程2-6：去掉親緣關(guān)系近的個體