| Abstract Detail
Conference Wide Gruenstaeudl, Michael [1]. Assessing sequence coverage and inverted repeat annotations among complete plastid genomes. The sequencing and comparative analysis of complete plastid genomes has become a common, almost routine procedure in contemporary botanical research. Researchers can now choose from a plethora of user-friendly software tools for genome assembly and annotation which enable them to generate dozens, if not hundreds, of complete plastid genomes per investigation. Understandably, this ease in data generation has prompted some researchers to consider the assembly and annotation of plastid genomes a triviality and to implicitly assume the correctness of plastid genomes archived on public sequence databases. However, a growing number of studies are reporting potential quality issues with publicly available plastid genomes, and many researchers can cite anecdotal evidence of incorrect genome assembly or annotation despite using state-of-the-art tools. The systematic detection of suboptimal plastid genome assemblies or annotations is challenging, and no single method exists that can be used to identify such anomalies comprehensively. However, several bioinformatic strategies have been reported that seem to provide quality indicators for complete plastid genome sequences. In this workshop, we will discuss two of these indicators: sequence coverage and inverted repeat annotation. First, we discuss the application of the R package PACVr (https://doi.org/10.1186/s12859-020-3475-0), which can visualize sequencing depth and evenness across complete plastid genomes to highlight regions of reduced coverage depth. Second, we will discuss the application of the Python package airpg (https://pypi.org/project/airpg/), which can survey thousands of archived plastid genomes and automatically parse sequence annotations to identify the presence or absence of inverted repeat annotations. Both tools were designed to assist in the process of quality control of complete plastid genomes, and their application on land plant plastid genomes will be demonstrated. Participants will be guided through both tools in a step-by-step process. While prepared datasets will be provided, I encourage attendees to bring their own land plant plastid genomes so that they can be evaluated right then and there. Specifically, users should bring at least one plastid genome (in GenBank flatfile format) as well as the underlying sequence reads of that genome (in FASTQ format). Please note: This workshop is intended for researchers with prior experience in plastid genome assembly and annotation. Participants should join with a UNIX-compatible operating system (OS-X or Linux) and should have a basic understanding of the UNIX command line. Log in to add this item to your schedule
Related Links: https://pypi.org/project/airpg/ https://doi.org/10.1186/s12859-020-3475-0
1 - Freie Universitaet Berlin, Institute of Biology, Altensteinstr. 6, Berlin, 14195, Germany
Keywords: none specified
Presentation Type: Workshop Session: W14, Assessing sequence coverage and inverted repeat annotations among complete plastid genomes Location: Virtual/Virtual Date: Sunday, July 18th, 2021 Time: 1:00 PM(EDT) Number: W14001 Abstract ID:374 Candidate for Awards:None |