Data Conversion Process

Grant · June 5, 2025, 5:16pm

This guide provides a high-level overview of the data conversion process when migrating from any other data management system to Specify.

1. Export data from existing platform

Before Specify can ingest data, it must be exported from its current home—whether that’s in an Excel spreadsheet, CSV exports, Microsoft Access, or another data management system. In most cases you’ll need to get your data into a spreadsheet-style format (CSV, XLSX, etc.), since Specify’s built-in import tools expect tabular inputs. The exact steps depend on your source system: you might run an “Export to CSV” routine in your legacy CMS, save Access tables as Excel files, or use a database query to dump data. Once you have these files, you’ll often find that the fields are misaligned with those in Specify, column names are inconsistent, and values may lack the standard formatting necessary to take full advantage of the Specify data model.

2. Map existing schema to Specify schema

Your first step in shaping legacy data for Specify is understanding how your current fields and tables line up with Specify’s data model. Every collections database evolves differently, so it’s common to find fields in your existing system that don’t have an exact one-to-one counterpart in Specify—or vice versa. To get started, take a look at the full Specify schema here and spend a little time browsing the table and field names, relationships, and controlled vocabularies that drive Specify’s structure.

In relational database terms, a “schema” is simply the blueprint that defines where and how data lives: which tables exist, what columns they contain, how those tables relate to one another, and what rules (constraints) govern valid values. The Specify schema is designed to balance stability with flexibility. While the core tables and column types are fixed to ensure data integrity, you can tailor field and table labels, control pick-lists (vocabularies), show or hide fields, and adjust input formats to suit your collection’s workflows.

To perform an initial mapping, we recommend exporting your current schema into a spreadsheet where each row represents one field and each column captures properties such as the source table name, field name, data type, sample values, and any notes about usage or required status. In a parallel column, record the Specify table and field where you think that data should live. As you work row-by-row, you’ll discover three common scenarios:

Direct matches, where your old field corresponds neatly to a Specify field (for example, “CatalogNumber” → “CatalogNumber”).
Splits or combinations, where your existing column bundles multiple pieces of information—perhaps a full scientific name that needs to be parsed into genus, species, subspecies, author, and year across five Specify fields.
Custom additions, where you have unique data that doesn’t fit into a default Specify field. In these cases, you can either repurpose a customizable pick-list or text field, or plan to request a new field via Specify’s schema customization tools.

As you map, watch for differences in “required” status. Specify may expect certain fields (like AgentType or LocalityName) that your legacy system didn’t enforce. Likewise, you might have legacy data fields—such as a bespoke project code or an internal note—that you decide to archive rather than import. We recommend that you take notes in your spreadsheet so you can revisit these decisions once your Specify instance is live.

Before you commit to any transformations, take time to dive into the Specify user interface. Open the Data Model Viewer and Schema Config and click through the table definitions, pick-list editors, and format controls. By seeing how fields are displayed and validated in the application, you’ll get a better understanding of what mapping choices are available and minimize surprises when you run your first test import. Once your spreadsheet mapping is complete, you’ll be ready to move on to cleaning, normalizing, and eventually loading your data into Specify.

3. Determine data sharing within the organization

A key architectural decision is whether to take a centralized or decentralized approach. Do you want a single Specify database that houses specimens from all departments, or separate databases for each collection? You’ll need to decide which data domains to share—taxonomy lists, agent tables, collecting events, georeference localities, and even custom forms or report templates. Striking the right balance is important so you are not forcing unrelated collections into a one-size-fits-all schema.

You can read our guide on how to decide on how to configure organizational scoping in your database(s).

4. Data Cleaning

Standardizing Data

The first step in mapping is to clean up names and codes so they align with the fields in Specify’s data model. This involves making collector and determiner names consistent, such as combining “J. Smith” and “John Smith” into a single, consistent string used in all applicable locations. It also requires resolving synonyms in your taxonomic determinations and ensuring that dates and chronologies follow a uniform format. Data conversion staff typically use tools like OpenRefine, Excel’s built-in text functions, or simple Python scripts to apply transformations across hundreds or thousands of records.

All work must be coordinated with the collections staff, who understand and own the data, to ensure the changes are accurate and of high quality. Important context about agents (collectors, determiners, catalogers, authors, etc.) is often not accessible from the database records alone. Understanding the specific terminology for the collection is essential to make sure it is accurately translated.

Backups of the original data sets are kept indefinitely for reference.

Normalizing Data

Normalization is about breaking out compound fields into atomic pieces that fit Specify’s relational design. You’ll parse agent names into separate columns (first name, last name, honorifics) so that Specify can link each person to multiple roles across your database. Locality information gets split into “Collecting Event” and “Locality” tables, and scientific names are divided into genus, species, subspecies, author, and year. By reducing redundancy and enforcing consistent foreign-key relationships, you guarantee that your data remains clean, shareable, and easy to query once it’s in Specify.

Agent

Original String: Grant L Fitzsimmons, grant@.ku.edu, ORCID: 0009-0002-5293-3761

Parse into the following columns:

Agent FirstName : Grant
Agent MiddleInitial: L
Agent LastName: Fitzsimmons
Agent Email: grant@ku.edu
Agent → Agent Identifier ORCID: 0009-0002-5293-3761

flowchart LR
  %% Agent Parsing Diagram
  subgraph AgentParsing [Agent Parsing]
    A[Original string<br/>Grant L Fitzsimmons, grant at ku.edu, ORCID: 0009-0002-5293-3761]
    B1[FirstName:<br/>Grant]
    B2[MiddleInitial:<br/>L]
    B3[LastName:<br/>Fitzsimmons]
    B4[Email:<br/>grant at ku.edu]
    B5[ORCID:<br/>0009-0002-5293-3761]
  end

  A --> B1
  A --> B2
  A --> B3
  A --> B4
  A --> B5

Collecting Event & Locality

Original String: DDB 13-14: 2013-08-22: Suriname, Sipaliwini: Small forest wetland about 10 m from Geijskey Creek on top of Tafelberg, 3.9391833, -56.1827333

Parse into the following columns:

Collecting Event StationFieldNumber: DDB 13-14
Collecting Event StartDate: 2013-08-22
Locality LocalityName: Small forest wetland about 10 meters from Geijskey Creek on top of Tafelberg
Locality Geography --> State: Sipaliwini
Locality Geography --> Country: Suriname
Locality Latitude1: 3.9391833
Locality Longitude1: -56.1827333

flowchart LR
  %% Collecting Event & Locality Parsing Diagram
  subgraph Parsing [Collecting Event & Locality Parsing]
    A[Original string<br/>DDB 13-14: 2013-08-22: Suriname, Sipaliwini: Small forest wetland about 10 m from Geijskey Creek on top of Tafelberg, 3.9391833, -56.1827333]
    
    subgraph CE [Collecting Event]
      B1[StationFieldNumber:<br/>DDB 13-14]
      B2[StartDate:<br/>2013-08-22]
    end
    
    subgraph LOC [Locality]
      C1[LocalityName:<br/>Small forest wetland about 10 m from …]
      C2[State:<br/>Sipaliwini]
      C3[Country:<br/>Suriname]
      C4[Latitude1:<br/>3.9391833]
      C5[Longitude1:<br/>-56.1827333]
    end
  end

  A --> B1
  A --> B2
  A --> C1
  A --> C2
  A --> C3
  A --> C4
  A --> C5

Taxonomy (Trees)

Original String: ⁠Abrotanella trilobata

Parse into the following columns:

Taxon Species Author: Swenson
Taxon Species Name: trilobata
Taxon Genus Name: Abrotanella
Taxon Family Name: Asteraceae
Taxon Order Name: Asterales

flowchart TB
  %% Taxonomy Parsing & Lookup Diagram
  subgraph TaxonParsing [Taxonomy Parsing]
    A[Original string:<br/>Abrotanella trilobata]
    B1[GenusName:<br/>Abrotanella]
    B2[SpeciesName:<br/>trilobata]
    B3[SpeciesAuthor:<br/>Swenson]
    B2 --> B3
    B2 --> B1
    B1 --> F[FamilyName:<br/>Asteraceae]
    F --> O[OrderName:<br/>Asterales]
  end

  A --> B2

5. Error Checking

After each standardization or normalization pass, we recommend that you run validity checks—making sure dates aren’t in the future, catalog numbers aren’t duplicated, latitude and longitude fall within plausible ranges, and required fields aren’t blank. Spot-checking records, running summary pivot tables in Excel, or writing simple SQL or Python tests can reveal data gaps or anomalies before they arrive in Specify.

6. Develop data conversion pipelines

Once your mapping and normalization rules are defined, package them into repeatable scripts or workflows. That might mean an OpenRefine recipe for your agent data, a Python ETL (extract-transform-load) script for locality tables, or a series of CSV templates paired with Specify’s built-in import utility (WorkBench). The goal is to automate as much as possible so that if new legacy data arrives—or if you need to re-run a transformation—you can do so reliably and with minimal manual effort.

There may be data cleanup that is best accommodated later using some of the tools provided within Specify, such as Record Merging or Batch Edit.

7. Upload Data Following the Described Procedures

With your cleaned and normalized CSV or Excel files ready, you can start using Specify’s bulk data import tool, the Specify Workbench . Upload your spreadsheets and follow the field-mapping dialogs to align your source columns with Specify’s database fields. Run a small test batch to ensure everything loads as expected, and then proceed with the full import.

If you have developed custom pipelines for data ingestion using the Specify APIs, this can supplement or replace the need for the Workbench during this process. We do not recommend performing bulk data imports through direct SQL modifications when you can achieve the same task using the APIs or WorkBench.

Import Order Suggestion

Always keep backups of your pre-import files and, where possible, import in stages (for example: agents first, then taxonomy, localities, collecting events, then occurrences) so you can isolate and correct any issues quickly. If you have any questions about the order of import operations, please feel free to reach out to us at support@specifysoftware.org.

8. Change Management

Technical Expertise

Transitioning to a new collections management system is not just a data exercise—it’s also fundamentally technical. You’ll need staff or consultants comfortable with database design, scripting languages, and Specify’s configuration options to ensure a smooth implementation.

Communication and Collaboration

Equally important is keeping everyone in the loop. Schedule regular check-ins with curators, collections managers, and volunteers. Host hands-on workshops or personalized Specify webinars where staff can ask questions, suggest custom fields, and watch live demonstrations of their own data in Specify.

Human Factors and Buy-in

Ultimately, the success of your rollout depends on people. Engaging your collections team early, soliciting feedback on field labels or form layouts, and celebrating small milestones will build confidence and enthusiasm. When staff see their own data come to life in Specify—and understand how the new workflows make their work easier—they become champions of the system rather than reluctant adopters.

Topic		Replies	Views
Exporting Data for a Web Portal/Third Party Configuration & Installation Specify-6	0	498	March 17, 2022
Awesome Specify: A community-driven list of Specify integrations, software, and resources Specify Open Source Ecosystem Specify-7 , Announcement	7	210	June 26, 2025
WorkBench Technical Breakdown Technical Docs Specify-7	0	11	July 8, 2022
Export Data Using the Data Exporter Announcements & News Specify-6	0	370	September 13, 2022
Before You Begin: Advice when preparing to create your database Welcome Specify Members	3	60	July 11, 2025

Data Conversion Process

1. Export data from existing platform

2. Map existing schema to Specify schema

3. Determine data sharing within the organization

4. Data Cleaning

Standardizing Data

Normalizing Data

Agent

Collecting Event & Locality

Taxonomy (Trees)

5. Error Checking

6. Develop data conversion pipelines

7. Upload Data Following the Described Procedures

Import Order Suggestion

8. Change Management

Technical Expertise

Communication and Collaboration

Human Factors and Buy-in

Related topics