【原】GCTA學習6 | GCTA計算GRM矩陣（G矩陣）

育種數(shù)據(jù)分析 2022-01-17

展開全文

GRM矩陣，全稱：genetic relationship matrix (GRM)。

GCTA計算GRM有兩種方法

默認的Yang，--make-grm-alg 0
Van的方法：--make-grm-alg 1

GCTA計算GRM有兩種形式

默認的二進制形式：--make-grm，或者 --make-grm-bin
文本格式（三元組）：--make-grm-gz

1. GCTA計算GRM：二進制

下面這兩個命令，是等價的。

--make-grm

--make-grm-bin

Estimate the genetic relationship matrix (GRM) between pairs of individuals from a set of SNPs and save the lower triangle elements of the GRM to binary files, e.g. test.grm.bin, test.grm.N.bin, test.grm.id. 結果會生成矩陣的下三角，保存為二進制文件。

「Output file」

test.grm.bin (it is a binary file which contains the lower triangle elements of the GRM). 這個是二進制存儲的GRM
test.grm.N.bin (it is a binary file which contains the number of SNPs used to calculate the GRM). 這個是二進制文件，存儲的是參與計算的SNP個數(shù)
test.grm.id (no header line; columns are family ID and individual ID, see above). 這個是FID和IID的信息。

「運行命令：」

gcta64 --bfile ../test --make-grm --make-grm-alg 0 --out g1

2. R函數(shù)，讀入二進制矩陣

可以通過R語言代碼讀取二進制GRM文件：

# R script to read the GRM binary file
ReadGRMBin=function(prefix, AllN=F, size=4){
  sum_i=function(i){
    return(sum(1:i))
  }
  BinFileName=paste(prefix,".grm.bin",sep="")
  NFileName=paste(prefix,".grm.N.bin",sep="")
  IDFileName=paste(prefix,".grm.id",sep="")
  id = read.table(IDFileName)
  n=dim(id)[1]
  BinFile=file(BinFileName, "rb");
  grm=readBin(BinFile, n=n*(n+1)/2, what=numeric(0), size=size)
  NFile=file(NFileName, "rb");
  if(AllN==T){
    N=readBin(NFile, n=n*(n+1)/2, what=numeric(0), size=size)
  }
  else N=readBin(NFile, n=1, what=numeric(0), size=size)
  i=sapply(1:n, sum_i)
  return(list(diag=grm[i], off=grm[-i], id=id, N=N))
}

3. 將二進制GRM變?yōu)镹*N的矩陣

然后通過下面代碼，轉換為n*n的G矩陣：

aa = ReadGRMBin(prefix = "g1")

G_mat = matrix(0,length(aa$diag),length(aa$diag))
diag(G_mat) = aa$diag
lowerTriangle(G_mat,byrow = T) = aa$off
G_mat = G_mat+t(G_mat)-diag(diag(G_mat))
rownames(G_mat) = colnames(G_mat) = aa$id$V2
G_mat[1:10,1:10]

4. GRM為文本形式

--make-grm-gz

Estimate the GRM, save the lower triangle elements to a compressed text file (e.g. test.grm.gz) and save the IDs in a plain text file (e.g. test.grm.id). 估計的GRM文件，存儲矩陣的下三角，壓縮文件，存儲ID信息

「Output file format」test.grm.gz (no header line; columns are indices of pairs of individuals (row numbers of the test.grm.id), number of non-missing SNPs and the estimate of genetic relatedness) 生成的文件，為壓縮文件，第一列和第二列為編號信息（根據(jù)ID的順序編號，相當于是矩陣的下三角行列信息），第三列是SNP個數(shù)，第四列是相關系數(shù)

1    1    1000    1.0021
2    1    998     0.0231
2    2    999     0.9998
3    1    1000    -0.0031
...

test.grm.id (no header line; columns are family ID and individual ID) 為FID和IID數(shù)據(jù)，第一列為家系信息，第二列是個體ID。

011      0101
012      0102
013      0103
...

5. 將GCTA計算的GRM變?yōu)锳SReml支持的格式

ASReml-R的ginv格式，是矩陣的下三角，第一列是矩陣的行號，第二列是矩陣的列號，第三列是矩陣的數(shù)值（親緣關系系數(shù)）。所以，可以直接根據(jù)GCTA的文本的GRM，進行轉換。

「注意，ASReml計算需要的是G逆矩陣，而GCTA計算的是G矩陣，所以要求逆矩陣之后，才可以利用?！?/strong>

「命令代碼：」

gcta64 --bfile test --make-grm-bin --make-grm-alg 1 --out g1 --maf 0.01
在R語言中讀取二進制G矩陣，并轉化為逆矩陣的三元組形式
ReadGRMBin=function(prefix, AllN=F, size=4){ sum_i=function(i){ return(sum(1:i)) } BinFileName=paste(prefix,".grm.bin",sep="") NFileName=paste(prefix,".grm.N.bin",sep="") IDFileName=paste(prefix,".grm.id",sep="") id = read.table(IDFileName) n=dim(id)[1] BinFile=file(BinFileName, "rb"); grm=readBin(BinFile, n=n*(n+1)/2, what=numeric(0), size=size) NFile=file(NFileName, "rb"); if(AllN==T){ N=readBin(NFile, n=n*(n+1)/2, what=numeric(0), size=size) } else N=readBin(NFile, n=1, what=numeric(0), size=size) i=sapply(1:n, sum_i) return(list(diag=grm[i], off=grm[-i], id=id, N=N)) } aa = ReadGRMBin(prefix = "g1") G_mat = matrix(0,length(aa$diag),length(aa$diag)) diag(G_mat) = aa$diag lowerTriangle(G_mat,byrow = T) = aa$off G_mat = G_mat+t(G_mat)-diag(diag(G_mat)) rownames(G_mat) = colnames(G_mat) = aa$id$V2 #diag(G_mat) = diag(G_mat) + 0.01 ginv = G.inverse(G_mat,sparseform = T)$Ginv head(ginv)
然后就可以進行GBLUP評估了：兩者結果完全一致。
「ASReml的結果：」
「GCTA reml的結果：」