| Abstract Detail
Comparative Genomics/Transcriptomics McKibben, Michael [1], Barker, Michael S. [1]. Applying Machine Learning to Classify the Origins of Gene Duplications. Nearly all lineages of land plants have at least one whole genome duplication (WGD) in their history. The legacy of these ancient WGDs is still observable in the diploidized genomes of extant plants. Genes originating from WGD-paleologs-can be maintained in diploid genomes for millions of years. These paleologs have the potential to shape plant evolution through sub- and neofunctionalization, increased genetic diversity, and reciprocal gene loss among lineages. Current methods for classifying paleologs suffer from false positives, require significant computational time, or multiple species with high quality genome assemblies. These approaches require users to make a tradeoff between accuracy or limiting the systems they study. Here we develop a supervised learning approach to infer paleologs in a broader range of plant genomes. We collected empirical data on syntenic block sizes and other genomic features from 27 plant species with different ages, types, and numbers of WGDs. These features were used to train a gradient boosted decision tree to classify genes as paleologs or non-paleologs. Using this approach, Frackify (Fractionation Classify), we were able to accurately identify and classify paleologs in a broad range of parameter space, including scenarios with multiple overlapping WGD. We then compared Frackify against other paleolog inference approaches in eight species with tetraploid and hexaploid paleopolyploid ancestry. Frackify provides a methodology to quickly classify paleologs with the inferences maintaining a high degree of overlap with more conservative methodologies. Using this tool, users are now able to explore questions regarding the origins of gene duplications in a wider variety of systems. Log in to add this item to your schedule
1 - University of Arizona, Department Of Ecology & Evolutionary Biology, PO Box 210088, Tucson, Arizona, 85721-0088, United States
Keywords: Machine learning Whole genome duplication Genetics Tools Paleologs genome evolution polyploidy.
Presentation Type: Oral Paper Session: CGT1, Comparative Genomics/Transcriptomics I Location: / Date: Tuesday, July 20th, 2021 Time: 10:00 AM(EDT) Number: CGT1001 Abstract ID:282 Candidate for Awards:None |