Genomic Biodiversity

GBWG operates as in interest group under both TDWG and the Genomics Standards Consortium (GSC) to foster discussion between the biodiversity and genomics communities

GitHub

Image by Cameron Venti

Conveners

  • Gabi Droege - Botanical Garden and Botanical Museum, Berlin, Germany
  • Katharine Barker - Smithsonian Institution, National Museum of Natural History, Washington DC, USA
  • John Deck, University of California at Berkeley, CA, USA

Core members

  • Katie Barker
  • John Deck
  • Gabi Droege
  • Chris Hunter
  • Chris Meyer
  • Ramona Walls
  • Thomas Stjernegaard Jeppesen
  • Dmitry Schigel
  • Mari Pent
  • Lynn Schriml
  • Maxime Sweetlove
  • Raïssa Meyer
  • Michael Hope
  • Saara Suominen
  • Pieter Provoost
  • Ward Appeltans

Motivation and scope

Biodiversity genomics is a fast-growing field of study that describes biological variation in all its dimensions from the foundational DNA layer to organisms and ecosystems, phylogeny and function. Much of the data collected in such efforts currently has no consistent vocabulary implementation, standards representation, or implementation for dissemination and integration in the public domain. This group focuses on metadata about genomic and metagenomic samples and not the management of actual sequence data. It also facilitates discussion of use cases, forms task groups to produce specific deliverables, and communicates relevant advances in biodiversity genomics technologies, vocabularies, and standards to the wider community of genomics data managers.

Any TDWG working groups and task groups associated with genomics data will fall under this newly formed interest group as well as biodiversity-related task and interest groups from Genomics Standards Consortium (GSC).

Goals, outputs, and outcomes

Goals

  • Identify and fill gaps in standards for sharing genomic data and the material samples used to derive them.
  • Coordinate across other working groups and standards, e.g., Global Genome Biodiversity Network (GGBN) data standards task force and GGBN data standard; GSC and Minimum Information for any (x) Sequence (MIxS); International Society for Biological and Environmental Repositories (ISBER) and Sample PREanalytic Code (SPREC); Biospecimen Reporting for Improved Study Quality (BRISQ).
  • Create a task group for contextualizing and handling workflows as well as data standards for environmental DNA samples
  • Create a task group for high throughput next-generation sequencing library samples to come up with data standards like the GGBN Data Standard

Outputs and outcomes

  • Additional gaps in genomic data uses defined, standards reviewed and ratified, standard data dictionaries updated, white papers published.
  • Reduced redundancy across genomics standards working groups and associated standards, increased efficiency for addressing community needs and filling gaps.

Strategy

The interest group will work with other data aggregators and communities to identify gaps in the pipelines, best practices, and vocabularies for publishing genomic collections data and provide use cases on genomic collections data management.

A list of known gaps and use cases can be found in the GGBN wiki use case collection: https://wiki.ggbn.org/ggbn/Use_Case_Collection

Becoming involved

This group welcomes participation from interested parties with backgrounds in informatics, biodiversity, molecular collections, genetics, technical architecture, or taxonomy. We propose the organization point for this group to be the TDWG website. Prospective members should refer to the email of the conveners for more information.

The benefit of inclusion in this group is to be informed of, influence, and promote new technologies and standards having to do with genomic biodiversity collections, data and research. Members will explore new avenues for research for both biologists and informaticians, and garner the opportunities of working directly with a globally diverse set of participants.

Please subscribe to our open mailing list to be informed about upcoming meetings and news. We will post working activities in the issue area of Github.

Resources

Related articles:

  • Deck, et al. (2013) Clarifying Concepts and Terms in Biodiversity Informatics, Standards in Genomic Sciences.https://doi.org/10.4056/sigs.3907833
  • Droege, G., Barker, K., Seberg, O., Coddington, J., Benson, E., Berendsohn, W.G., Bunk, B., Butler, C., Cawsey, E.M., Deck, J., Döring, M., Flemons, P., Gemeinholzer, B., Güntsch, A., Hollowell, T., Kelbert, P., Kostadinov, I., Kottmann, R., Lawlor, R.T., Lyal, C., Mackenzie-Dodds, J., Meyer, C., Mulcahy, D., Nussbeck, S.Y., Ó Tuama, É., Orrell, T., Petersen, G., Robertson, T., Söhngen, C., Whitacre, J., Wieczorek, J., Yilmaz, P., Zetzsche, H., Zhang, Y., Zhou, X. (2016): The Global Genome Biodiversity Network (GGBN) Data Standard specification. Database. baw125. https://doi.org/10.1093/database/baw125
  • Droege, G., Barker, K., Astrin, J., Partels, P., Butler, C., Cantrill, D., Coddington, J., Forest, F., Gemeinholzer, B., Hobern, D., Mackenzie-Dodds, J., Ó Tuama, É., Petersen, G., Sanjur, O., Schindel, D., Seberg, O. (2014): The Global Genome Biodiversity Network (GGBN) Data Portal. Nucleic Acids Research. 42 (D1): D607-D612. https://doi.org/10.1093/nar/gkt928
  • Kelbert, P., Droege, G., Barker, K., Braak, K., Cawsey, E.M., Coddington, J., Robertson, T., Whitacre, J., Güntsch, A. (2015): B-HIT - A Tool for Harvesting and Indexing Biodiversity Data. PLoS ONE 10 (11): e014224. https://doi.org/10.1371/journal.pone.0142240
  • Robbins, et al. (2011) RCN4GSC Meeting Report: Initiating a Testbed for Managing Data at the Interface of Biodiversity and Genomics/Metagenomics, Standards in Genomic Sciences.
  • Seberg, O., Droege, G., Barker, K., Coddington, J.A., Funk, A., Gostel, M., Petersen, G. & Smith, P.P. (2016): Global Genome Biodiversity Network: Saving a blueprint of the Tree of Life – A botanical perspective. Annals of Botany. https://doi.org/10.1093/aob/mcw121
  • Wooley, et al. (2009) Extending Standards for Genomics and Metagenomics Data: A Research Coordination Network for the Genomic Standards Consortium (RCN4GSC), Environmental Microbiome

History and context

GGBN

In October 2011, thirty two representatives from thirteen organizations across Africa, Australia, Europe, North, Central, and South America convened for a two day workshop in Washington, D.C. This workshop produced preliminary plans for an international coordinating mechanism for biodiversity biobanks. Participants agreed to form a global network that would encourage the formulation and use of best practices, common standards, and global accessibility. Between October 2011 and June 2014 a series of interim Executive Committee meetings were held and three task forces were established to develop and implement a strategic plan and program of work. The Global Genome Biodiversity Network (GGBN) was thereby formed. Task Forces were established to address:

  • Data Standards and Data Access for Genomic Samples;
  • Policies and Practices Related to Management and Stewardship of Genomic Samples;
  • Marketing and Outreach (hereafter Communications and Outreach).

Data Standards and Data Access for Genomic Samples Task Force

The GGBN data standards task force was developed in an effort to standardize data and associated infrastructure for molecular collections. Both DwC and ABCD lack terms for molecular data. Thus in 2007 GGBN, as part of the precursor project DNA Bank Network, began developing a standard for biodiversity DNA biobanks accompanying natural history and culture collections. The result was the DNA extension of ABCD, ABCDDNA. Between 2012 and 2015, the GGBN has undertaken major revisions of ABCDDNA and has included other existing standards related to molecular data or tissue data. The outcome was the GGBN Data Standard that incorporates all molecular terms of MIxS, and can also handle SPREC and large parts of BRISQ. From this the GGBN data portal were developed in 2015 in order to provide standardized access to genomic samples for research. We recognize that the standardization of molecular collections is an ongoing process. As a result, this group will assess and address gaps in molecular collections data infrastructure in an effort to streamline pipelines for research.

The Genomic Biodiversity Working Group

The Genomic Biodiversity Working Group has operated as an interest group under both TDWG and the Genomics Standards Consortium (GSC) since 2012 with a purpose of fostering discussion between the biodiversity and genomics communities and promoting thoughtful interactions of standards-based vocabularies as well as ontology development.

Summary

The Biodiversity Genomics Interest Group works on standardizing data and associated infrastructure for all biodiversity genomics data including barcoding, metabarcoding, next-generation sequencing, environmental DNA, and molecular biodiversity collections (e.g. natural history collections, botanic gardens, culture collections, zoos, etc.). The interest group will work with other data aggregators and communities in an effort to identify gaps in the pipelines and best practices for publishing molecular collections data and provide use cases on molecular collections data management. This TDWG interest group encompasses the GGBN Data Standards Task Force, which ensures that information associated with biobank management is considered when developing pipelines for publication of molecular data for research. This includes not only the standardization of data and associated data infrastructure, but the management of physical molecular collections, including legal matters. This TDWG interest group works closely, and overlaps in membership, with the GSC Compliance and Interoperability Group (CIG).

Task groups