Enhancing Quality and Use of Herbarium Collection Data through Community Data Curation

Miller, Joe [1], Nelson, Gil [2], Krimmel, Erika [3], Bruneau, Anne [4], Thiers, Barbara M. [5], Knapp, Sandy [6].

Data aggregation over the last 20 years has led to an impressive amount of collection data available for public use but this represents only a small percentage of the specimens currently in museums.  While digitization of these collections must remain a focus, there is growing realization that the quality of data already digitized can be improved.  There is a demand for high quality data with over 5,000 peer-reviewed publications using data mediated by the Global Biodiversity Information Facility (GBIF) in the past 10 years and with increasing use every year.  However some aggregated data lack the level of metadata and precision required for rigorous scientific use. The most common ways for researchers to provide feedback to data providers about potential data errors is directly through emails or through data aggregator helpdesk mechanisms such as GitHub which are indirect and slow.  Aggregated herbarium data is used for many research purposes and after download researchers undertake several rounds of cleaning to improve fitness for purpose. Unfortunately, there is no clear way to roundtrip this data back to the data provider, therefore the added value of the work is lost and doomed to be repeated. Several recent national and international reports and strategies have emphasized the need to better utilize collection data to provide scientific based strategies to protect biodiversity from climate change and other human impacts.  Improving the quality of the data extracted from collections is the quickest, highest impact first step in enhancing collection data use.  Currently data is held in thousands of independent collection management systems and aggregated at multiple geographic and taxonomic levels.  This system is not well integrated and precludes simple data annotations from workers outside the immediate data holding institution.  This colloquium will describe the work of several initiatives to build expert community curated annotations projects and systems to improve data quality.  These include plant taxonomists and ecologists curating expert taxonomies and occurrence distributions for a particular clade of interest and initiatives by data aggregators to improve data quality as it enters and exits their domains.  The colloquium will also outline progress on recent work to provide mechanisms to build an integrated global annotation system built around persistent unique identifiers in the extended/digital specimen framework and current data integrations performed by iDigBio and GBIF. A goal of the colloquium is to inform botanists on this progress and encourage participation in the process.

