Create your own conference schedule! Click here for full instructions

The Virtual Conference is located at

Abstract Detail

Comparative Genomics/Transcriptomics

Gruenstaeudl, Michael [1].

Assessing data quality among archived plastid genomes via novel software tools.

The sequencing and comparative analysis of complete plastid genomes has become a common, almost routine procedure in contemporary botanical research. Dozens, if not hundreds, of complete plastid genomes are now generated per investigation. The number of complete plastid genomes available through public sequence databases has, consequently, skyrocketed in recent years: by the end of March 2021, more than 10,000 unique and complete plastid genomes of angiosperms are stored on NCBI GenBank, representing numerous branches of the angiosperm phylogeny. This large collection of plastid genomes will undoubtedly continue to grow in both size and taxonomic representation. Understandably, many researchers draw on this abundance and employ previously published plastid genomes to supplement their new studies. However, a growing number of studies are reporting potential quality issues with publicly available plastid genome records. Among these issues are indications of bias in the assembly and the sequence annotations of the genomes. Until now, few, if any, investigations have systematically evaluated the quality of archived plastid genomes, despite the ample use of these genomes in contemporary botanical research. Here, I report on the assessment of data quality among plastid genomes archived on NCBI GenBank and present novel software tools to conduct these assessments. First, we assessed the impact that sequence coverage can have on the accuracy and structure of archived plastid genomes. Among other results, we found that sequencing evenness was significantly correlated with assembly quality and that the number of sequence windows with reduced coverage depth was significantly different across the four partitions of these quadripartite genomes. Second, we assessed the presence of sequence annotations of the inverted repeats among archived plastid genomes. We found that only about half of all angiosperm plastid genomes currently stored on GenBank contained sequence annotations for these repeats and that release year and publication status of the genome records have a significant effect on the frequency of complete repeat annotation. The results of our assessments indicate that the correctness of the assembly and annotation of archived plastid genomes cannot be taken for granted and that several as-of-yet under-explored correlations between genome structure and sequence coverage exist. Based on these results, I highlight the importance of applying bioinformatic quality control tools during the assembly and annotation process of plastid genomes to increase their data quality.

Log in to add this item to your schedule

1 - Freie Universitaet Berlin, Institute of Biology, Altensteinstr. 6, Berlin, 14195, Germany

plastid genome
Genome Assembly
sequence annotation
sequence coverage
quality control
software tool.

Presentation Type: Oral Paper
Session: CGT1, Comparative Genomics/Transcriptomics I
Location: /
Date: Tuesday, July 20th, 2021
Time: 10:45 AM(EDT)
Number: CGT1004
Abstract ID:348
Candidate for Awards:None

Copyright © 2000-2021, Botanical Society of America. All rights reserved