Create your own conference schedule! Click here for full instructions

The Virtual Conference is located at

Abstract Detail

Comparative Genomics/Transcriptomics

McKibben, Michael [1], Barker, Michael S. [1].

Applying Machine Learning to Classify the Origins of Gene Duplications.

Nearly all lineages of land plants have at least one whole genome duplication (WGD) in their history. The legacy of these ancient WGDs is still observable in the diploidized genomes of extant plants. Genes originating from WGD-paleologs-can be maintained in diploid genomes for millions of years. These paleologs have the potential to shape plant evolution through sub- and neofunctionalization, increased genetic diversity, and reciprocal gene loss among lineages. Current methods for classifying paleologs suffer from false positives, require significant   computational time, or multiple species with high quality genome assemblies. These approaches require users to make a tradeoff between accuracy or limiting the systems they study. Here we develop a supervised learning approach to infer paleologs in a broader range of plant genomes. We collected empirical data on syntenic block sizes and other genomic features from 27 plant species with different ages, types, and numbers of WGDs. These features were used to train a gradient boosted decision tree to classify genes as paleologs or non-paleologs. Using this approach, Frackify (Fractionation Classify), we were able to accurately identify and classify paleologs in a broad range of parameter space, including scenarios with multiple overlapping WGD. We then compared Frackify against other paleolog inference approaches in eight species with tetraploid and hexaploid paleopolyploid ancestry. Frackify provides a methodology to quickly classify paleologs with the inferences maintaining a high degree of overlap with more conservative methodologies. Using this tool, users are now able to explore questions regarding the origins of gene duplications in a wider variety of systems.

Log in to add this item to your schedule

1 - University of Arizona, Department Of Ecology & Evolutionary Biology, PO Box 210088, Tucson, Arizona, 85721-0088, United States

Machine learning
Whole genome duplication
genome evolution

Presentation Type: Oral Paper
Session: CGT1, Comparative Genomics/Transcriptomics I
Location: /
Date: Tuesday, July 20th, 2021
Time: 10:00 AM(EDT)
Number: CGT1001
Abstract ID:282
Candidate for Awards:None

Copyright © 2000-2021, Botanical Society of America. All rights reserved