A globally synthesised and flagged bee occurrence dataset and cleaning workflow

James B Dorey; Erica E Fischer; Paige R Chesshire; Angela Nava-Bolaños; Robert L O'Reilly; Silas Bossert; Shannon M Collins; Elinor M Lichtenberg; Erika M Tucker; Allan Smith-Pardo; Armando Falcon-Brindis; Diego A Guevara; Bruno Ribeiro; Diego de Pedro; John Pickering; Keng-Lou James Hung; Katherine A Parys; Lindsie M McCabe; Matthew S Rogan; Robert L Minckley; Santiago J E Velazco; Terry Griswold; Tracy A Zarrillo; Walter Jetz; Yanina V Sica; Michael C Orr; Laura Melissa Guzman; John S Ascher; Alice C Hughes; Neil S Cobb

doi:10.1038/s41597-023-02626-w

A globally synthesised and flagged bee occurrence dataset and cleaning workflow

Sci Data. 2023 Nov 2;10(1):747. doi: 10.1038/s41597-023-02626-w.

Authors

James B Dorey¹, Erica E Fischer², Paige R Chesshire³, Angela Nava-Bolaños⁴, Robert L O'Reilly⁵, Silas Bossert^{6

7}, Shannon M Collins⁸, Elinor M Lichtenberg⁸, Erika M Tucker⁹, Allan Smith-Pardo¹⁰, Armando Falcon-Brindis¹¹, Diego A Guevara¹², Bruno Ribeiro¹³, Diego de Pedro¹⁴, John Pickering¹⁵, Keng-Lou James Hung¹⁶, Katherine A Parys¹⁷, Lindsie M McCabe¹⁸, Matthew S Rogan^{19

20}, Robert L Minckley²¹, Santiago J E Velazco²², Terry Griswold¹⁸, Tracy A Zarrillo²³, Walter Jetz^{19

20}, Yanina V Sica^{19

20}, Michael C Orr^{24

25}, Laura Melissa Guzman²⁶, John S Ascher²⁷, Alice C Hughes²⁸, Neil S Cobb⁹

Affiliations

¹ College of Science and Engineering, Flinders University, Sturt Rd, Bedford Park, 5042, SA, Australia. [email protected].
² Centre for the History of Science, Technology, and Medicine, Department of History, King's College London, Strand, WC2R 2LS, London, United Kingdom.
³ Department of Biological Sciences, Northern Arizona University, S Beaver St, Flagstaff, 86011, AZ, USA.
⁴ Unidad Multidisciplinaria de Docencia e Investigación, Facultad de Ciencias, Campus Juriquilla, Universidad Nacional Autónoma de México, Boulevard Juriquilla, Jurica La Mesa, Juriquilla, 76230, Querétaro, México.
⁵ College of Science and Engineering, Flinders University, Sturt Rd, Bedford Park, 5042, SA, Australia.
⁶ Department of Entomology, Washington State University, Dairy Rd, Pullman, 99164-6382, WA, USA.
⁷ Department of Entomology, National Museum of Natural History, Smithsonian Institution, 10th and Constitution Avenue, Washington, 20560, DC, USA.
⁸ Department of Biological Sciences and Advanced Environmental Research Institute, University of North Texas, W Mulberry St, Denton, 76201, TX, USA.
⁹ Biodiversity Outreach Network, W Silver Spruce Ave, Flagstaff, 86001, AZ, USA.
¹⁰ Animal Plant Health Inspection Service (APHIS); Plant Protection and Quarantine (PPQ); Science and Technology (S&T); Pest Identification Technology laboratory (PITL) United States Department of Agriculture (USDA), St. Suite, Sacramento, CA, 95814, USA.
¹¹ Department of Entomology, Research and Education Center, University of Kentucky, University Dr, Lexington, KY, 42445, USA.
¹² Departamento de Biología, Universidad Nacionalde Colombia, Bogotá, Cra 45 #268-5, D.C., Colombia.
¹³ Programa de Pós-graduação em Ecologia e Evolução, Universidade Federal de Goiás, Goiânia, Av, Esperança, 74690-900, GO, Brazil.
¹⁴ Ensenada Center for Scientific Research and Higher Education, Carr. Tijuana-Ensenada, Zona Playitas, 22860, Ensenada, Baja California, Mexico.
¹⁵ Discover Life, Blue Heron Drive, Athens, GA, 30605, USA.
¹⁶ Oklahoma Biological Survey, University of Oklahoma, Chesapeake St, Norman, 73019, OK, USA.
¹⁷ USDA ARS Pollinator Health in Southern Crop Ecosystems Research Unit, Experiment Station Rd, Stoneville, 38776, MS, USA.
¹⁸ USDA-ARS Pollinating Insects-Research Unit, Old Main Hill, Logan, 84322, UT, USA.
¹⁹ Center for Biodiversity and Global Change, Yale University, Prospect St, New Haven, 06511, CT, USA.
²⁰ Department of Ecology & Evolutionary Biology, Yale University, Prospect St, New Haven, 06511, CT, USA.
²¹ Department of Biology, University of Rochester, Rochester, 14620, NY, USA.
²² Instituto de Biología Subtropical, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional de Misiones, Puerto Iguazú, Misiones, Argentina.
²³ The Connecticut Agricultural Experiment Station, Huntington St, New Haven, 06511, CT, USA.
²⁴ Entomologie, Staatliches Museum für Naturkunde Stuttgart, Rosenstein, Stuttgart, 70191, Baden, Württemberg, Germany.
²⁵ Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, Beijing, 100101, China.
²⁶ Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Trousdale Pkwy, Los Angeles, 90089-0371, CA, USA.
²⁷ Department of Biological Sciences, National University of Singapore, Science Dr, 117558, Singapore, Singapore.
²⁸ School of Biological Sciences, University of Hong Kong, Pok Fu Lam Rd, Lung Fu Shan, Hong Kong.

Abstract

Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, "cleaned" and "flagged-but-uncleaned". The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.

A globally synthesised and flagged bee occurrence dataset and cleaning workflow

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding