| Abstract Detail
Comparative Genomics/Transcriptomics Gruenstaeudl, Michael [1]. Assessing data quality among archived plastid genomes via novel software tools. The sequencing and comparative analysis of complete plastid genomes has become a common, almost routine procedure in contemporary botanical research. Dozens, if not hundreds, of complete plastid genomes are now generated per investigation. The number of complete plastid genomes available through public sequence databases has, consequently, skyrocketed in recent years: by the end of March 2021, more than 10,000 unique and complete plastid genomes of angiosperms are stored on NCBI GenBank, representing numerous branches of the angiosperm phylogeny. This large collection of plastid genomes will undoubtedly continue to grow in both size and taxonomic representation. Understandably, many researchers draw on this abundance and employ previously published plastid genomes to supplement their new studies. However, a growing number of studies are reporting potential quality issues with publicly available plastid genome records. Among these issues are indications of bias in the assembly and the sequence annotations of the genomes. Until now, few, if any, investigations have systematically evaluated the quality of archived plastid genomes, despite the ample use of these genomes in contemporary botanical research. Here, I report on the assessment of data quality among plastid genomes archived on NCBI GenBank and present novel software tools to conduct these assessments. First, we assessed the impact that sequence coverage can have on the accuracy and structure of archived plastid genomes. Among other results, we found that sequencing evenness was significantly correlated with assembly quality and that the number of sequence windows with reduced coverage depth was significantly different across the four partitions of these quadripartite genomes. Second, we assessed the presence of sequence annotations of the inverted repeats among archived plastid genomes. We found that only about half of all angiosperm plastid genomes currently stored on GenBank contained sequence annotations for these repeats and that release year and publication status of the genome records have a significant effect on the frequency of complete repeat annotation. The results of our assessments indicate that the correctness of the assembly and annotation of archived plastid genomes cannot be taken for granted and that several as-of-yet under-explored correlations between genome structure and sequence coverage exist. Based on these results, I highlight the importance of applying bioinformatic quality control tools during the assembly and annotation process of plastid genomes to increase their data quality. Log in to add this item to your schedule
1 - Freie Universitaet Berlin, Institute of Biology, Altensteinstr. 6, Berlin, 14195, Germany
Keywords: plastid genome Genome Assembly sequence annotation sequence coverage quality control software tool.
Presentation Type: Oral Paper Session: CGT1, Comparative Genomics/Transcriptomics I Location: / Date: Tuesday, July 20th, 2021 Time: 10:45 AM(EDT) Number: CGT1004 Abstract ID:348 Candidate for Awards:None |