Importing Taxonomy Using the WorkBench

:bookmark: This article explains how to import taxonomy using the WorkBench.

Taxonomy can be uploaded to your taxon trees in spreadsheet format using the WorkBench.

This process is only recommended for users that are familiar with using the WorkBench and managing trees. To learn more about trees, read the article Trees in Specify. For information about the WorkBench, read the article The Specify 7 WorkBench.

[!caution]
We recommend backing up your Specify database before uploading data with the WorkBench.

Tree Hierarchy

The most important principle to understand when importing taxonomy is that all trees in Specify (Taxonomy, Geography, Storage, etc.) are fundamentally hierarchical. Every record, or โ€œnode,โ€ in the tree must have a parent node, except for the highest-level โ€œrootโ€ nodes. When you upload a tree from a file, you are essentially providing a set of instructions that tells Specify how to build these parent-child relationships.

Configure Tree Ranks

If you have not yet added tree ranks to your taxon tree definition, you must do this before uploading taxonomy. The tree must include every taxonomic rank (e.g., Kingdom, Phylum, Class, Order, Family, Genus, Species) for which you will import data.

Additionally, you should configure any ranks that should be enforced, meaning the rank cannot be skipped when adding child nodes. For example, if the species rank is enforced, every node at the subspecies level must have a parent node at the species level.

To learn more about creating and editing tree ranks, read the article Editing Tree Definitions/Ranks.

Organize Your Spreadsheet

To create a taxonomic tree in Specify, you must generate a spreadsheet (typically a .csv, .tsv, or .xlsx) where each row represents a complete path from the highest rank down to the lowest rank for that particular entry, including authorship and any other details at each level.

The columns in your spreadsheet should correspond to the ranks in your taxon tree definition. The critical rule is that for every row, you must include the data for all preceding higher-level ranks.

Here is an example of a correctly formatted file. Notice how the higher-level ranks are repeated for each subsequent child row.

kingdom phylum class order family genus species species author species source
Animalia
Animalia Chordata
Animalia Chordata Amphibia
Animalia Chordata Amphibia Gymnophiona
Animalia Chordata Amphibia Gymnophiona Herpelidae
Animalia Chordata Amphibia Gymnophiona Herpelidae Boulengerula
Animalia Chordata Amphibia Gymnophiona Herpelidae Boulengerula boulengeri Tornier, 1896 https://www.catalogueoflife.org/data/taxon/5WQMJ
Animalia Chordata Amphibia Gymnophiona Herpelidae Boulengerula changamwensis Loveridge, 1932 https://www.catalogueoflife.org/data/taxon/MR6V

When Specify processes this file, it reads each row and builds the tree structure. For example, the last row tells the system: โ€œThe species changamwensis belongs to the genus Boulengerula , which belongs to the family Herpelidae ,โ€ and so on. If a node like Herpelidae has already been created by a previous row, Specify will simply link to the existing node instead of creating a duplicate.

Including Other Data

Your data set is not limited to just the names of the ranks. You can include any other fields that belong to the Taxon table, such as the author, source, or common name. You simply need to add more columns to your spreadsheet with headers that match the field names in Specify. In the example above, species author and species source are mapped to the corresponding fields for the taxon record at the species level.

For more detailed examples and comprehensive files that you can use as a template, please visit our official repository of taxonomic data files here, particularly those from 2021:

Index of /taxonfiles

WorkBench Mapping

First, choose the appropriate base table for your upload. The Taxon base table is generally the safest option when uploading taxonomy since it is easier to spot and fix mistakes without extra relationships. If you use a different base table, remember that the same rules of organization still apply; a parent record should be referenced for each new Taxon record, and all Taxon data (author, source, etc.) must be repeated every time a new Taxon record is referenced.

:taxon: Taxon: Use this to create only Taxon records and when importing entire trees.
:collection_object: Collection Object: Use this to link new or existing Taxon records to new Collection Objects using the Determination relationship.
:determination: Determination: Use this to link new or existing Taxon records to new or existing Collection Objects.

Then, match each column from your spreadsheet to the corresponding field in the Taxon table for each rank.

Upload Your Data

When uploading large data sets, you may need the assistance of an IT Administrator to edit the database timeout configuration and increase the maximum packet size. To learn more about this, read this guide. Otherwise, you will need to upload multiple smaller data sets to avoid timeouts when validating and uploading.

Start with higher level taxonomy

The tree files available in the Index of /taxonfiles are ready to be imported as a single data set. However, if you created your own spreadsheets, we recommend uploading taxonomy one rank at a time, starting from the highest rank. If you were to upload all taxa in one data set, all data (author, source, etc.) would need to be repeated every time it is used as a parent. If any data is missing in just a single row, the WorkBench will upload it as a new record, resulting in duplicate taxa.

If you upload one rank at a time, you only need to include the name of its existing parent in the spreadsheet, mitigating the risk of creating duplicates. If you notice any mistakes after uploading, you can also roll back individual ranks instead of the entire tree.

Clean Up Your Tree

If you made a mistake during the upload, the Tree Viewer has several tools you can use to clean up your tree without rolling back the data sets. Click the links below for more detailed information about each tool.

:trash_: Delete: Deletes the selected node. This tool only works if the record and its children are not referenced by another record (eg. Determination).

:move_: Move: Allows you to move the selected node and its children to another parent.

:merge_: Merge: Allows you to combine two nodes into one record. This is helpful for handling duplicate records after they have already been referenced by other records.