Specify SPNHC Sessions and Talks
Specify is extremely excited to announce our participation in the upcoming SPNHC 2025 Annual Meeting. The Society for the Preservation of Natural History Collections (SPNHC) is hosting their upcoming annual meeting in Specify’s own backyard; the University of Kansas in Lawrence, Kansas. The theme for this year’s meeting is Sustainable Futures: Challenges and Opportunities for Modern Collections.
Specify staff is excited to get involved and interact with our participating members! We have a full calendar for the meeting to give us the most opportunities to interact with our existing and potential members as well as the community as a whole. Included in the list of activities are two Sessions and three talks. We are speaking in each of the two Sessions and giving a demo during DemoCamp.
We are excessively grateful to our Specify members speaking in our Specify Spotlight Session and the Collections Community members speaking in our Things to Know Before Publishing Your Data Session. We could not have done this without you!
SYM07 - Specify Spotlight: Improving Data Digitization and Management
08:30-10:30 Friday, 30 May, 2025
The Specify Spotlight session aims to showcase the role Specify software has played in improving data digitization and management. The session will feature presentations from several users who will highlight their projects involving Specify software for research, data management, and collaborations. The presentations will cover a diverse range of topics, including collection digitization, data integration, and specimen tracking. In addition to user presentations, a member of the Specify Consortium team will deliver an overview of the latest features, updates, and future developments in Specify 7. Attendees will have the opportunity to ask questions and engage in discussions about best practices, technical support, and potential future collaborations.
Navigating Complexities: Implementing Specify 7 at the Canadian Forest Service
Kathryn Jastremski, Jonathan Tardif, Heryk Julien, Amelie Potvin
Canadian Forest Service, Laurentian Forestry Centre, Quebec, Quebec, Canada
Abstract
The Canadian Forest Service (CFS) at Natural Resources Canada maintains a pan-Canadian network of biological collections comprising over 1M specimens and samples. However, limited digitization and related metadata, and the absence of a consolidated collection management system have hindered the discovery, sharing, and use of these collections. As a result, their potential to inform research, decision-making, and policy – both within the department and for the public – has been significantly constrained.
As a crucial step in modernizing these collections by bringing them into the digital era, CFS has implemented Specify 7 as a consolidated collection management system. This presentation will explore the challenges encountered during the implementation process, particularly those related to the security authorization process and the complex governance structures required to manage decentralized biological collections in different federal scientific organizations.
Beyond enhancing data management, Specify 7 has served as a catalyst for broader organizational change. Previously managed in silos, collections are transitioning toward a more collaborative governance approach, where decisions are made based on disciplines within Specify 7 rather than individual collections. However, sustaining this transformation requires additional financial support and overarching guidelines to ensure equitable management across collections.
Ultimately, the success of CFS’ collections digitization initiative extends beyond its technical implementation – it represents a shift in how collection work is conducted, fostering greater collaboration and information sharing.
Integrating Specify7 into a National Mass Digitization Infrastructure
Fedor Steeman
Statens Naturhistoriske Museum, Copenhagen, Denmark
Abstract
The DaSSCo project is the Danish implementation of the pan-European DiSSCo initiative, aiming to create a unified digital infrastructure for European scientific collections. Currently, three separate institutions in Denmark are actively involved in this effort. The local infrastructure includes physical workstations for rapid imaging and data registration, along with specialized software solutions and data processing pipelines.
The data and media assets generated are destined for various installations of the Specify7 collections management system. Despite facing several challenges, Specify7’s loosely coupled architecture and customization options make it a highly favorable solution for our needs. For instance, we developed the Asset Registry System (ARS) to store media assets and metadata, serving as a more powerful alternative to Specify’s own web asset server. Additionally, Specify7 facilitates quick and easy publication to GBIF, enhancing the accessibility and utility of the digitized collections.
Integrating Specify into an efficient workflow for managing the sampling of herbarium specimens at the Royal Botanic Garden Edinburgh
Robyn Drinkwater, Robert Cubey, Elspeth Haston
Royal Botanic Garden Edinburgh, Edinburgh, United Kingdom
Abstract
Following the migration of the Royal Botanic Garden Edinburgh Herbarium (E) Collection Management System to Specify in 2022, we have developed new workflows and processes for managing the sampling of herbarium specimens.
Sampling of herbarium specimens is playing an increasingly crucial role in the collections’ workflow in the herbarium. Requests for material are increasing at an unprecedented rate as the number of genetic studies continues to grow and new techniques increase the success of genetic sequencing of older preserved material. Mass digitisation and automated data aggregation are also opening up collections enabling specimen discovery.
To streamline this process, we have developed an efficient system that integrates the herbarium team, the wider Science division and requests from external researchers. By connecting specimens to the permits and to information about any restrictions on the material, initial decisions about the suitability of material for sampling can be made. The re-purposed Gift table allows us to effectively manage the sampling process. Sample preparations are created, which can be sent to researchers. Unused material or derivatives, as well as resulting data links (i.e. BOLD & GenBank), can also be managed in this workflow.
Adding Value: Semi-Automated Label Transcription at the Museum für Naturkunde Berlin
Franziska Schuster, Stefanie Krause, Christian Bölling, Margot Belot, Leonardo Preuss
Museum für Naturkunde - Leibniz Institute for Evolution and Biodiversity Research, Berlin, Germany
Abstract
Opening its extensive collections to the scientific world as well as to the broad public is one of the main missions of the Museum für Naturkunde Berlin (MfN). For this purpose, the MfN aims to digitize its 30 million collection objects, making them virtually accessible. The diverse characteristics of collection objects require specific digitization approaches for different parts of the collection. For the 15 million entomological specimens housed at the MfN, the labels assigned to each individual insect are of particular interest. They carry valuable information on each specimen, for example on collecting events, taxonomic decisions and type statuses. Enriching the collection database with this information would allow targeted and detailed queries about the MfN holdings, supporting effective collection management as well as scientific and other projects in a wider range of fields. However, transcribing these label texts is a challenge due to the enormous number of documents and their fragmented nature. The often poor preservation state of the historic labels results in bad readability which is reinforced by the broad mixture of different print types, handwritings and alphabets. This talk will summarize our different experimental approaches to develop and establish semi-automated procedures at the MfN to transcribe insect labels. We will especially focus on our cooperation with the Austrian company READ-COOP SCE, in the course of which we learned a lot about the challenges of OCR for handwriting, transcription rules and the creation of training data. The project is building on a former MfN pilot project to explore a promising mass-digitization approach for insects, resulting in high-resolution images of more than half a million specimens and its corresponding labels.
Data Management at the University of Kansas Invertebrate Paleontology Collection (KUMIP) Using Specify
Natalia López Carranza
Biodiversity Institute, University of Kansas, Lawrence, KS, USA
Abstract
The University of Kansas Biodiversity Institute Invertebrate Paleontology Collection (KUMIP) houses approximately one million fossil invertebrate and microfossil specimens representing all major invertebrate taxonomic groups from every continent and geological time period, with particular emphasis on arthropods, brachiopods, echinoderms, and mollusks. Over the last two decades, digitizing specimens and their associated data has been a top priority at KUMIP. Thanks to various successful digitization projects, over 570,000 specimens from our collection have been digitized, accounting for about 57% of the total holdings. Currently, our records and metadata–including specimens, localities, taxonomies, biographical information, and more–are managed using the Specify Collection Management Platform. In this presentation, we will explore how our database is structured and organized, what specimen information is being captured, how these metadata are encoded in regularized fields mapped to best-practice community vocabularies such as Darwin Core, and how we track and manage multiple types of non-specimen information critical to the operation of our collection (e.g., physical storage, accessions, loans, etc.). We will also highlight the metadata fields that are most relevant to the discipline of invertebrate paleontology. Additionally, we will showcase recent changes implemented to enhance the user interface of our database, making it more intuitive and easier to navigate—a testament to the platform’s customization capabilities. Finally, we will discuss future changes and updates we plan to make to our data management system and workflows, including areas for improvement and new Specify features (e.g., GeoSpecify/Specify 7.10) that we would like to implement in our database going forward.
COGs, COTs, and Other Interesting Acronyms in Specify 7
Grant Fitzsimmons, Theresa Miller
Specify Collections Consortium, Lawrence, KS, USA
Abstract
The Specify Collections Consortium (SCC) supports research collections around the world with an intuitive, robust, and highly-customizable software platform for digitizing their holdings and managing their collections. For over 25 years, the SCC has supported biological research museums, biorepositories, and now geological research museums, with its open-source Specify software for managing, integrating, and publishing collections information.
Recent Specify software updates enable SCC members to better catalog heterogeneous collections, whether that be different discipline types, object types, or preparations. Specify now allows users to differentiate Collection Object Types (COTs) in their collection and assign different catalog number formats, classification trees, data entry forms, and controlled vocabularies to specific types.
Users can also utilize new Collection Object Groups (COGs) which group Collection Object records based on physical or in situ relationships, allowing the user to document the metadata associated with each individual part of the group. COGs can be used to capture information about multiple objects within a microscope slide, herbarium sheet, rock plate, tissue sample, fossil matrix, and more.
The SCC is a collaborative collections community governed, funded, and sustained by its member institutions. Our software and technical support priorities address member and system infrastructure needs. Specify software enables natural history museums to curate, integrate, and publish data from a single, robustly-engineered platform for biological and geological collections.
SYM08 - Things to Know Before Publishing Your Data
08:30-10:30 Thursday, 29 May, 2025
Data Publishing is an essential part of collection management responsibilities. Deciding which data to share, preparing data to be shared, and selecting who to share your data with, all involve a lot of forethought, planning, and knowledge. This session will focus on presentations from members of the data community on data standards, Darwin Core compliance, exporting and uploading data, avoiding common data-sharing mistakes, and various publishing platform options and their tools.
You put what, where?! Challenges, changes and the here and now of the Darwin Core Standard
David Bloom
University of North Carolina Greensboro, Greensboro, NC, USA. TDWG, San Francisco, CA, USA
Abstract
The Darwin Core Standard (DwC), maintained by Biodiversity Information Standards (TDWG), was intended to be a “stable, straightforward and flexible framework” used to compile biodiversity data from a wide range of sources. Has it realized these goals? Mostly, but it is still evolving, just like the data describing the objects it intends to standardize. This presentation will service two primary purposes. The first will explore how DwC is applied in data quality, mobilization and aggregation, including how to map to The Standard, common mistakes and misinterpretations and key terms and vocabularies.The second will explore current changes to Darwin Core and how the process of mobilization may change in the future.
Best practices for publishing useful and high quality specimen datasets to GBIF
John Waller
GBIF, Copenhagen, Denmark
Abstract
High-quality biodiversity data is essential for research, conservation, and policy decisions. Data quality is an important consideration in data publishing to GBIF. Museum collections face challenges in terms of handling data quality issues. In my presentation, I will present a best practices guide for publishing specimen records to GBIF.
GBIF aggregates occurrence records from a vast range of publishers, including museums, citizen science projects, and others. Often end users make assumptions about GBIF mediated occurrences that do not hold for specimen based datasets. I will explore publishing strategies that help novice users navigate museum data better.
Specimen records often need to be retrospectively geo-coded, which can be difficult to do well. While automated methods like (GEOLocate) can assist in assigning coordinates, they can sometimes introduce systematic errors. I will highlight some common errors produced from these methods.
Specimen-based occurrence records often have a complex provenance due to the multiple stages of collection, curation, and digitization they undergo. A single specimen may have been collected decades or even centuries ago, exchanged between institutions, re-identified by different taxonomists, and recorded in multiple databases before being published to GBIF. I will present the GBIF related records tool, that can help understand possible related records in published datasets.
Publishing high-quality specimen records to GBIF requires careful attention. By following best practices in georeferencing, museum publishers can make their data useful to a wide audience.
Advancing Specimen Data Publishing through Digital Specimen Infrastructure
Wouter Addink, Soulaine Theocharides, Sharif Islam, Ni Yan
Naturalis, Leiden, Netherlands
Abstract
The Distributed System of Scientific Collections (DiSSCo) envisions a transformative approach to data publishing, integrating the concepts of FAIR (Findable, Accessible, Interoperable, Reusable) Digital Objects and Digital Extended Specimen (DES) to enhance the accessibility, interoperability, and usability of biodiversity data. Traditional specimen-based research is evolving with digital infrastructure, ensuring that data associated with physical specimens are enriched and dynamically linked by machine and expert annotations, and made persistently available.
DiSSCo’s perspective on data publishing emphasizes the need for a FAIR and open ecosystem that supports seamless integration of diverse data sources. The Digital Extended Specimen model extends physical specimens with digital annotations about e.g. genomic data, literature, traits and environmental contexts, forming an interconnected knowledge network. The DES acts as a surrogate of the physical specimen with continuously updated and enriched data, fostering enhanced reproducibility and collaborative research.
By leveraging robust digital infrastructures, DiSSCo aims to establish standardised digitisation and data enrichment workflows supported by AI tools, persistent identifiers, and machine-readable data formats, ensuring that scientific collections contribute effectively to global biodiversity knowledge systems. Through these innovations, DiSSCo is not only redefining the role of collections in scientific research but also promoting sustainable and scalable data publishing practices for the future.
Push or pull, give and take: A history of working with technically diverse data providers in the ALA
Peggy Newman1, Patricia Koh2, Mahmoud Sadeghi2, Niels Klazenga3, Rosemary O’Connor4, Pruthviraj Chavan5, Simon Sherrin1
1 Atlas of Living Australia, Melbourne, Victoria, Australia. 2 Atlas of Living Australia, Canberra, ACT, Australia. 3 Royal Botanic Gardens Victoria, Melbourne, VIC, Australia. 4 Atlas of Living Australia, Brisbane, QLD, Australia. 5 Atlas of Living Australia, Melbourne, VIC, Australia
Abstract
The Atlas of Living Australia (ALA) ingests Darwin Core datasets from all major Australian government jurisdictions, museums & collections, citizen science agencies and several smaller projects and data providers with limited technical resources. Instead of managing a single Integrated Publishing Toolkit (IPT) like most GBIF nodes, the ALA’s Data team builds and manages our own Python based Preingestion Framework, a set of tools that automate data transfers and mappings to Darwin Core using many different technologies allowing the ALA to customise data management solutions for ourselves or for data providers.
This presentation will outline the history of the framework and reflect on the lessons we’ve learned and our journey; including our technology choices, our decision to favour full refresh datasets in lieu of deltas, the never-ending battle with whitespace and encoding, managing unique identifiers, and the huge variation in levels of support that our data providers require from the team. In future we intend for the framework to handle post-occurrence unified model data, and to give data providers the ability to push and initiate data loads.
Exploring common questions to iDigBio related to data mobilization
Kalina Jakymec1, Caitlin Chapman2, Austin Mast1,3
1 Department of Biological Science, Florida State University, Tallahassee, Florida, USA. 2 Florida Museum of Natural History, University of Florida, Gainesville, Florida, USA. 3 Institute for Digital Information & Scientific Communication, Florida State University, Tallahassee, Florida, USA
Abstract
iDigBio (the U.S. National Science Foundation’s National Resource for Advancing Digitization of Biodiversity Collections) hosts a portal providing access to millions of specimen records and offers tailored support for collections, including mobilization assistance with data and media. In addition, iDigBio offers a range of online professional development courses through the Digitization Academy, serving curators, collection managers, interns, project staff, and others involved in the digitization of biodiversity data. Feedback from this diverse community highlights recurring themes related to data management, standards, and publishing. In this presentation, we plan to explore commonly asked questions and provide an overview of essential principles for effective data mobilization compiled by iDigBio’s Biodiversity Informatics Coordinator and Professional Development Manager.
Preparing & Publishing Museum Specimen Data: a Graduate Student’s Journey
Lily Hart1, R. Edward DeWalt2
1 University of Illinois, Champaign, IL, USA. 2 Illinois Natural History Survey, Champaign, IL, USA
Abstract
In the USA, states partner with the U. S. Fish and Wildlife Service to protect habitat and wildlife through the development of State Wildlife Action Plans (SWAPs). Wildlife species often enter these SWAPs as Species in Greatest Conservation Need (SGCN). This year, Arkansas will renew their SWAP to include an updated list of stonefly SGCN. However, to update this list the state needs a comprehensive dataset based upon museum specimens and historical literature. Here, we have published a preliminary dataset, made available via the Global Biodiversity Information Facility (GBIF), to aid research scientists, regional taxonomists, and agency managers to assess completeness of sampling, conservation status, and temporal shifts in the distribution of Arkansas stonefly species.
This presentation will highlight how the dataset came to be- the need for re-examination of questionable specimens, merging multiple datasets from several institutions into one master list, the tools that were employed to do so (TaxonWorks, Google Sheets, Microsoft Excel, OpenRefine), best practices, and the importance of Darwin Core standards when sharing biodiversity data. As a student, learning all of the skills needed in order to publish a dataset from beginning to end was a great deal of work and required guidance from many mentors and colleagues.
In total, our dataset included 3,561 specimen records of Plecoptera for the state of Arkansas, and this is only Version 1 of the dataset. As I continue my work on this project, I also continue working to create a standard operating procedure for others to use, as many funding agencies are now encouraging their awardees to have a plan in place for their data to be shared and accessible.
How the Symbiota Support Hub can help publish your extended specimen data
Lindsay Walker1, Edward Gilbert1,2, Mark Fisher1, K. Samanta Orellana1, Katie Pearson1, Gregory Post1, Nikita Salikov1, Logan Wilt1, Jenn Yost3, Nico Franz1
1 University of Kansas, Lawrence, KS, USA. 2 Arizona State University, Tempe, AZ, USA. 3 California Polytechnic State University San Luis Obispo, San Luis Obispo, CA, USA
Abstract
A key feature contributing to Symbiota’s long-term success is its ability to mobilize data beyond the ecosystem of Symbiota-based data portals. While this open-source software has been adopted by many institutions as a collections management system, Symbiota’s built-in data publishing tools allow it to also be used as a mechanism for sharing datasets with other Darwin Core-compliant data aggregators–even for collections that prefer to use other software to manage their canonical occurrence records. Features that can be leveraged prior to data publishing include Symbiota’s built-in taxonomic and geographic cleaning tools, as well as the scoring of traits and creation of data linkages, such as occurrence associations and additional identifiers, in support of the Extended Specimen. The Symbiota Support Hub (SSH) maintains 18 Symbiota data portals that also function as GBIF installations, which collectively contribute over 500 datasets and 19.9+ million occurrences to the GBIF data portal (https://www.gbif.org) from museums, universities, research laboratories, and similar entities worldwide. In addition to increasing community capacity for using related features in Symbiota-based portals, the SSH facilitates the data publishing process as an Associate Participant Node of GBIF, a status that allows the SSH to endorse new data publishers. This presentation will be aimed at collections administrators who are new to data publishing and would like to learn how Symbiota and the Symbiota Support Hub can help them mobilize their specimen data.
Publishing Your Data with Specify 7
Grant Fitzsimmons, Theresa Miller
Specify Collections Consortium, Lawrence, KS, USA
Abstract
The Specify Collections Consortium (SCC) supports research collections around the world with an intuitive, robust, and highly-customizable software platform for digitizing their holdings and managing their collections. For over 25 years, the SCC has supported biological research museums, biorepositories, and now geological research museums, with its open-source Specify software for managing, integrating, and publishing collections information.
Data publishing is critically important to collection managers and is a continuously evolving requirement. The SCC has prioritized improving the Specify data export/publication process in 2025 by facilitating publishing to data aggregators such as the Global Biodiversity Information Facility (GBIF), the Atlas of Living Australia (ALA), the Global Genome Biodiversity Network (GGBN), and Integrated Digitized Biocollections (iDigBio). Publication focuses on the Darwin Core standard and its extensions as defined and maintained by Biodiversity Information Standard’s (TDWG) Darwin Core maintenance group. Specify users can take advantage of direct RSS feed publishing to these data aggregators as well as on-demand export creation.
This presentation will present upcoming changes to Specify 7’s data exporting functionality and describe how Specify users can utilize new changes to set up one or more export mappings, incorporate limiting criteria, and publish their data in Darwin Core Archive (DwCA) format or other publishing formats.
DemoCamp
Specify staff are also presenting at the DemoCamp session.
Improving Data Integrity: Enhanced Batch Editing and Bulk Upload Tools in Specify 7
Theresa Miller, Grant Fitzsimmons
Specify Collections Consortium, Lawrence, KS, USA
Abstract
The Specify Collections Consortium (SCC) is excited to demonstrate the new, exciting capabilities added over the past year to its web-based, open-source collection management platform, Specify 7.
The SCC has augmented the Specify data model, data processing logic, data cleaning, and customizable web forms interface to streamline and optimize workflows for digitizing, integrating, and publishing collection holdings. These improvements include the Batch Edit tool and the Workbench. The Batch Edit tool allows a user to bulk edit or enhance existing records. The Workbench allows the user to bulk upload new record identifications, new record citations, and tissue information. The Workbench will soon also allow users to upload and view images when entering data to assist users transcribing label and voucher information directly into the app.
The SCC will present additional Specify 7 enhancements and new features, and answer questions about existing and planned capabilities.
In addition to the Sessions and talks, we will also:
- Host a Booth with lots of information and goodies (until we run out)
- Kansas Room
- Representatives there during breaks between sessions
- Host an Open House in our office for anyone who wants to visit or ask questions
- Friday May 30, 3:30-5
- Specify Office, 606 Natural History Museum
- Host a two-day workshop, Maximizing Your Specify Instance
- WKSHP09
- May 31 - June 1
- KU Natural History Museum
- And of course Grant and I as well as various Specify staff will be around for almost all of the meeting to chat, answer questions, or just say “Hi!”
Registration ends April 30th. Hope to see you there!