上個(gè)月,bioRxiv上發(fā)文數(shù)達(dá)到了破紀(jì)錄的2600篇,此外還有大約1000份手稿有版本更新,這些數(shù)字向?qū)W界表明著bioRxiv——這一生物學(xué)界最為成熟的預(yù)印本(preprint)平臺(tái)——在國(guó)際上越來越流行的趨勢(shì)。另一邊廂,六月份剛剛出爐的醫(yī)學(xué)預(yù)印本平臺(tái)medRxiv也在上個(gè)月月末迎來了第100篇preprint。和bioRxiv一樣,medRxiv表示會(huì)與學(xué)術(shù)期刊達(dá)成協(xié)議,允許研究人員在向同行評(píng)議期刊正式投稿時(shí)直接將medRxiv上的preprint直接轉(zhuǎn)過去。目前,已有以下基本雜志達(dá)成了協(xié)議:JCO Clinical Cancer Informatics, JCO Precision Oncology, 以及 Genetics in Medicine。和運(yùn)作成熟的bioRxiv相比,medRxiv不論是關(guān)注度還是運(yùn)作上暫時(shí)都有很大差距,但相信其管理者們會(huì)迅速推出更多的措施方便大家,并將medRxiv打造成為醫(yī)學(xué)研究的重要平臺(tái)。 七月也是電影暑期檔的時(shí)節(jié)。12日,《獅子王2019》于大陸率先上映,喚起了無數(shù)人的童年記憶。巧合的是,七月的biorxiv恰好也登出了一篇預(yù)印本手稿,報(bào)道了非洲獅基因組測(cè)序的最新結(jié)果,似乎也表達(dá)了生物學(xué)家對(duì)經(jīng)典的致敬。與之呼應(yīng),上個(gè)月的biorxiv上還發(fā)布了西伯利亞虎的基因組測(cè)序結(jié)果。而或許最令人意想不到的是,這兩篇文章的第一作者,竟都是來自美國(guó)斯坦福大學(xué)的博后Ellie Armstrong!讓我們一起看看吧。 1. 獅、虎基因組 1.1 【Genomics】非洲獅基因組 Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long read data(CC-BY-NC-ND 4.0) The lion (Panthera leo) is one of the most popular and iconic feline species on the planet, yet in spite of its popularity, the last century has seen massive declines for lion populations worldwide. Genomic resources for endangered species represent an important way forward for the field of conservation, enabling high-resolution studies of demography, disease, and population dynamics. Here, we present a chromosome-level assembly for the captive African lion from the Exotic Feline Rescue Center as a resource for current and subsequent genetic work of the sole social species of the Panthera clade. Our assembly is composed of 10x Genomics Chromium data, Dovetail Hi-C, and Oxford Nanopore long-read data. Synteny is highly conserved between the lion, other Panthera genomes, and the domestic cat. We find variability in the length and levels of homozygosity across the genomes of the lion sequenced here and other previous published resequence data, indicating contrasting histories of recent and ancient small population sizes and/or inbreeding. Demographic analyses reveal similar histories across all individuals except the Asiatic lion, which shows a more rapid decline in population size. This high-quality genome will greatly aid in the continuing research and conservation efforts for the lion. Figure S1 Circos plot of alignments between tiger (left) and lion (right) chromosomes. Colors represent different chromosomes with bottom chromosome (shown in dark brown) representing A1. 1.2 【Genomics】65頭老虎基因組測(cè)序看遺傳漂變和自然選擇在老虎進(jìn)化中扮演的角色 Recent evolutionary history of tigers highlights contrasting roles of genetic drift and selection Tigers are among the most charismatic of endangered species, yet little is known about their evolutionary history. We sequenced 65 individual genomes representing extant tiger geographic range. We found strong genetic differentiation between putative tiger subspecies, divergence within the last 10,000 years, and demographic histories dominated by population bottlenecks. Indian tigers have substantial genetic variation and substructure stemming from population isolation and intense recent bottlenecks here. Despite high genetic diversity across India, individual tigers host longer runs of homozygosity, potentially suggesting recent inbreeding here. Amur tiger genomes revealed the strongest signals of selection and over-representation of gene ontology categories potentially involved in metabolic adaptation to cold. Novel insights highlight the antiquity of northeast Indian tigers. Our results demonstrate recent evolution, with differential isolation, selection and drift in extant tiger populations, providing insights for conservation and future survival. 2. 【Bioinformatics】PromethION,11個(gè)人基因組, 9天時(shí)間,63X覆蓋, N50 42kb,盡在全新的從頭組裝工具——殺死他(SHASTA) Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit(CC-BY 4.0) Present workflows for producing human genome assemblies from long-read technologies have cost and production time bottlenecks that prohibit efficient scaling to large cohorts. We demonstrate an optimized PromethION nanopore sequencing method for eleven human genomes. The sequencing, performed on one machine in nine days, achieved an average 63x coverage, 42 Kb read N50, 90% median read identity and 6.5x coverage in 100 Kb+ reads using just three flow cells per sample. To assemble these data we introduce new computational tools: Shasta - a de novo long read assembler, and MarginPolish & HELEN - a suite of nanopore assembly polishing algorithms. On a single commercial compute node Shasta can produce a complete human genome assembly in under six hours, and MarginPolish & HELEN can polish the result in just over a day, achieving 99.9% identity (QV30) for haploid samples from nanopore reads alone. We evaluate assembly performance for diploid, haploid and trio-binned human samples in terms of accuracy, cost, and time and demonstrate improvements relative to current state-of-the-art methods in all areas. We further show that addition of proximity ligation (Hi-C) sequencing yields near chromosome-level scaffolds for all eleven genomes. 3. 【Evolution】德國(guó)比勒菲爾德大學(xué):細(xì)胞懸浮液漂了25年的擬南芥細(xì)胞基因組有什么變化? 25 years of propagation in suspension cell culture results in substantial alterations of the Arabidopsis thaliana genome(CC-BY 4.0) Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that has been established about 25 years ago. Here we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Col-0 reference sequence were detected. The number of deletions exceeds the number of insertions thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions e.g. the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones. 4. 【Genomics】1997到2019:十萬genome十萬菌 What can we learn from over 100,000 Escherichia coli genomes?(CC-BY-NC-ND 4.0) The explosion of microbial genome sequences in public databases allows for large-scale population studies of model organisms, such as Escherichia coli. We have examined more than one hundred-thousand E. coli and Shigella genomes. After removing outliers, genomes were classified into two broad clusters based on a semi-automated Mash analysis, which distinguished 14 distinct phylotypes, graphically illustrated by Cytoscape. From a set of more than ten-thousand good quality E. coli and Shigella genomes from GenBank, we find roughly 2,700 gene families in the E. coli species core, and more than 135,000 gene families in the E. coli pan-genome. Based on a set of 2,613 single-copy core proteins taken from one representative genome per phylotype, we constructed a robust phylogenetic tree. This is the largest E. coli genome dataset analyzed to date, and provides valuable insight into the population structure of the species. 5. 【Omics】麻省大學(xué)醫(yī)學(xué)院Dekker開發(fā)新技術(shù)助力Hi-C染色體區(qū)隔化研究 Compartment-dependent chromatin interaction dynamics revealed by liquid chromatin Hi-C(CC-BY-NC-ND 4.0) Chromosomes are folded so that active and inactive chromatin domains are spatially segregated. Compartmentalization is thought to occur through polymer phase/microphase separation mediated by interactions between loci of similar type. The nature and dynamics of these interactions are not known. We developed liquid chromatin Hi-C to map the stability of associations between loci. Before fixation and Hi-C, chromosomes are fragmented removing the strong polymeric constraint to enable detection of intrinsic locus-locus interaction stabilities. Compartmentalization is stable when fragments are over 10-25 kb. Fragmenting chromatin into pieces smaller than 6 kb leads to gradual loss of genome organization. Dissolution kinetics of chromatin interactions vary for different chromatin domains. Lamin-associated domains are most stable, while interactions among speckle and polycomb-associated loci are more dynamic. Cohesin-mediated loops dissolve after fragmentation, possibly because cohesin rings slide off nearby DNA ends. Liquid chromatin Hi-C provides a genome-wide view of chromosome interaction dynamics。
6. 【Genomics】加州大學(xué)戴維斯分校學(xué)者GWAS研究揭示環(huán)境對(duì)墨西哥玉米基因組的影響 Single-gene resolution of locally adaptive genetic variation in Mexican maize(CC-BY-NC 4.0) Threats to crop production due to climate change are one of the greatest challenges facing plant breeders today. While considerable adaptive variation exists in traditional landraces, natural populations of crop wild relatives, and ex situ germplasm collections, separating adaptive alleles from linked deleterious variants that impact agronomic traits is challenging and has limited the utility of these diverse germplasm resources. Modern genome editing techniques such as CRISPR offer a potential solution by targeting specific alleles for transfer to new backgrounds, but such methods require a higher degree of precision than traditional mapping approaches can achieve. Here we present a high-resolution genome-wide association analysis to identify loci exhibiting adaptive patterns in a large panel of more than 4500 traditional maize landraces representing the breadth of genetic diversity of maize in Mexico. We evaluate associations between genotype and plant performance in 13 common gardens across a range of environments, identifying hundreds of candidate genes underlying genotype by environment interaction. We further identify genetic associations with environment across Mexico and show that such loci are associated with variation in yield and flowering time in our field trials and predict performance in independent drought trials. Our results indicate that the variation necessary to adapt crops to changing climate exists in traditional landraces that have been subject to ongoing environmental adaptation and can be identified by both phenotypic and environmental association. 7. 【Bioinformatics】奧地利維也納大學(xué)科學(xué)家開發(fā)adaptive introgression檢測(cè)新軟件 VolcanoFinder: genomic scans for adaptive introgression(CC-BY-NC-ND 4.0) The process by which beneficial alleles are introduced into a species from a closely-related species is termed adaptive introgression. We present an analytically-tractable model for the effects of adaptive introgression on non-adaptive genetic variation in the genomic region surrounding the beneficial allele. The result we describe is a characteristic volcano-shaped pattern of increased variability that arises around the positively-selected site, and we introduce an open-source method VolcanoFinder to detect this signal in genomic data. Importantly, VolcanoFinder is a population-genetic likelihood-based approach, rather than a comparative-genomic approach, and can therefore probe genomic variation data from a single population for footprints of adaptive introgression, even from a priori unknown and possibly extinct donor species. 8. 【Bioinformatics】轉(zhuǎn)錄組組裝工具StringTie升級(jí)2.0版震撼來襲(可以handle長(zhǎng)度段) Transcriptome assembly from long-read RNA-seq alignments with StringTie2(CC-BY 4.0) RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools. 9. 【Omics】加州大學(xué)圣地亞哥分校張坤:splint oligo實(shí)現(xiàn)同一單細(xì)胞的ATAC-seq和RNA-seq雙測(cè)序 Linking transcriptome and chromatin accessibility in nanoliter droplets for single-cell sequencingLinked profiling of transcriptome and chromatin accessibility from single cells can provide unprecedented insights into cellular status. Here we developed a droplet-based Single-Nucleus chromatin Accessibility and mRNA Expression sequencing (SNARE-seq) assay, that we used to profile neonatal and adult mouse cerebral cortices. To demonstrate the strength of single-cell dual-omics profiling, we reconstructed transcriptome and epigenetic landscapes of cell types, uncovered lineage-specific accessible sites, and connected dynamics of promoter accessibility with transcription during neurogenesis. 10. 【Omics】選自medRxiv Early detection of molecular disease progression by whole-genome circulating tumor DNA in advanced solid tumors(CC-BY-ND 4.0) Purpose: Treatment response assessment for patients with advanced solid tumors is complex and existing methods of assessment require greater precision for early disease assessment. Current guidelines rely on imaging, which has limitations such as the long time required before treatment effectiveness can be determined. Serial changes in whole-genome (WG) circulating tumor DNA (ctDNA) were used to detect disease progression early in the treatment course. Methods: 97 patients with advanced cancer were enrolled, and blood was collected before and after initiation of a new treatment. Plasma cell-free DNA libraries were prepared for either WG or WG bisulfite sequencing. Longitudinal changes in the fraction of ctDNA were quantified to identify molecular progression or response in a binary manner. Study endpoints were agreement with first follow-up imaging (FUI) and stratification of progression-free survival (PFS). Results: Patients with early molecular progression had shorter PFS (n=14; median 62d) compared to others (n=78; median 263d, HR 12.6 [95% confidence interval 5.8-27.3], log-rank P<10-10, 5 excluded from analysis). All cases with molecular progression were confirmed by FUI and molecular progression preceded FUI by a median of 40d. Sensitivity for the assay in identifying clinical progression was 54%, median 24d into treatment and specificity was 100%. Conclusions: Molecular progression, based on ctDNA data, detected disease progression for cases on treatment with high specificity approximately 6 weeks before follow-up imaging. This technology may enable early course change to a potentially effective therapy, avoiding side effects and cost associated with cycles of ineffective treatment. |
|