宏基因組入門（2）~QIIME2-2018.2數(shù)據(jù)導(dǎo)入

Hobart_joe 2019-07-18

展開全文

在學(xué)習(xí)宏基因組的過程中，這個大神的中文翻譯可以讓你快速上手https://blog.csdn.net/woodcorpse/article/details/78407438

在實(shí)踐操作中，還是要針對自己個案做一些調(diào)整。
我現(xiàn)在得到的一批數(shù)據(jù)，下機(jī)已經(jīng)分裝拆分好了，所以可以直接導(dǎo)入

首先先說一個坑

由于樣品很多，如果公司給你的樣品有問題，那你后面分析不下去，可能會一直找不到原因，我就是遇到了這么一個問題，公司給的原始數(shù)據(jù)，里面的樣品中，竟然有幾個雙端測序的read1 和 read2重復(fù)了，我一開始不知道，導(dǎo)入數(shù)據(jù)分析到后面dada2一直進(jìn)行不下去，或者就是分析結(jié)構(gòu)很差，找半天原因也找不出來，去論壇提問也沒有答案，找了一周才發(fā)現(xiàn)，運(yùn)來read1 和read2竟然重復(fù)了。

所以我用python自己編了一個腳本，可以檢查下機(jī)數(shù)據(jù)是否重復(fù)。

def splitline(input):
    return input.split()

address = input("輸入文件路徑")
f = open(address)
print("文件名和絕對路徑：" + f.name + "\n")

print("以下顯示RawData中相同的項(xiàng)：\n")


dic = {}
i = 0for line in f.readlines():
    line = line.strip()
    splitlines = splitline(line)
    dic[splitlines[1]] = splitlines[0]
        
counter = Counter(dic.values())for item in counter:    if counter[item] > 1:
        same = item
        i = i + 1


        match_data = {}        for (key, value) in dic.items():            if value.startswith(same):
                match_data[key] = value

        print(match_data) ```




        
        
print("\nRawdata中共有"+str(len(dic))+"項(xiàng)")        
print("\n總共有"+ str(i) + "個重復(fù)項(xiàng)")

這個腳本是在命令行的，我在 jupyter notebook 中也編了一個，測試可行

顯示相同的項(xiàng)

代碼不是很完美，但是可以用哈。

開始準(zhǔn)備導(dǎo)入

導(dǎo)入方式有很多，看你的初始數(shù)據(jù)是啥,具體可以看官網(wǎng)：
https://docs./2018.2/tutorials/importing/

我主要用的是“Fastq manifest” formats 的格式導(dǎo)入數(shù)據(jù)

直接先看命令：
創(chuàng)建環(huán)境

source activate qiime2-2018.2

導(dǎo)入數(shù)據(jù)

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path pe-33-manifest.txt
--output-path paired-end-demux.qza
--source-format PairedEndFastqManifestPhred33

選擇你的數(shù)據(jù)格式，我這里是雙端測序數(shù)據(jù)
pe-33-manifest.txt 是自己制作的一個文件，告訴程序?qū)氲膬?nèi)容和地址
paired-end-demux.qza 是生成已經(jīng)分裝好的數(shù)據(jù)
一般來說都選33 ( 官網(wǎng)是這么說的，當(dāng)然也有可能是64）

PairedEndFastqManifestPhred33
In this variant of the fastq manifest format, there must be forward and reverse read fastq.gz / fastq files for each sample id. As a result, each sample id is represented twice in this file: once for its forward reads, and once for its reverse reads. This format assumes that the PHRED offset used for the positional quality scores in all of the fastq.gz / fastqfiles is 33.

pe-33-manifest.txt 的格式

格式

三列第一列是樣品名第二列是路徑第三列是告訴程序是正向還是反向

導(dǎo)入成功

導(dǎo)入成功之后，就可以后續(xù)分析了

導(dǎo)入成功

new 是一個裝原始數(shù)據(jù)的文件夾

本站是提供個人知識管理的網(wǎng)絡(luò)存儲空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自： Hobart_joe > 《QIIME2實(shí)戰(zhàn)流程》

舉報(bào)/認(rèn)領(lǐng)