The Advanced Trees helpcast says that we can upload our own taxonomic trees, even in the setup wizard stage, but they have to be formatted correctly. What is the correct formatting?
Hi @vhough,
Getting the formatting correct is the key to making sure your hierarchy is built properly when you begin uploading taxonomy in Specify.
The Core Concept: Hierarchy
The most important principle to understand is that all trees in Specify (Taxonomy, Geography, Storage, etc.) are fundamentally hierarchical. Every record, or “node,” in the tree must have a parent node, except for the highest-level “root” nodes. When you upload a tree from a file, you are essentially providing a set of instructions that tells Specify how to build these parent-child relationships.
File Formatting for Upload
To create a taxonomic tree in Specify, you must generate a spreadsheet (typically a .csv, .tsv, or .xlsx) where each row represents a complete path from the highest rank down to the lowest rank for that particular entry, including authorship and any other details at each level. This will be imported using the WorkBench.
The columns in your spreadsheet should correspond to the ranks in your taxon tree definition (e.g., Kingdom, Phylum, Class, Order, Family, Genus, Species). The critical rule is that for every row, you must include the data for all preceding higher-level ranks.
Here is an example of a correctly formatted file. Notice how the higher-level ranks are repeated for each subsequent child row.
| kingdom | phylum | class | order | family | genus | species | species author | species source |
|---|---|---|---|---|---|---|---|---|
| Animalia | ||||||||
| Animalia | Chordata | |||||||
| Animalia | Chordata | Amphibia | ||||||
| Animalia | Chordata | Amphibia | Gymnophiona | |||||
| Animalia | Chordata | Amphibia | Gymnophiona | Herpelidae | ||||
| Animalia | Chordata | Amphibia | Gymnophiona | Herpelidae | Boulengerula | |||
| Animalia | Chordata | Amphibia | Gymnophiona | Herpelidae | Boulengerula | boulengeri | Tornier, 1896 | https://www.catalogueoflife.org/data/taxon/5WQMJ |
| Animalia | Chordata | Amphibia | Gymnophiona | Herpelidae | Boulengerula | changamwensis | Loveridge, 1932 | https://www.catalogueoflife.org/data/taxon/MR6V |
When Specify processes this file, it reads each row and builds the tree structure. For example, the last row tells the system: “The species changamwensis belongs to the genus Boulengerula, which belongs to the family Herpelidae,” and so on. If a node like Herpelidae has already been created by a previous row, Specify will simply link to the existing node instead of creating a duplicate.
Including Other Data
Your data set is not limited to just the names of the ranks. You can include any other fields that belong to the Taxon table, such as the author, source, or common name. You simply need to add more columns to your spreadsheet with headers that match the field names in Specify. In the example above, species author and species source are mapped to the corresponding fields for the taxon record at the species level.
For more detailed examples and comprehensive files that you can use as a template, please visit our official repository of taxonomic data files here, particularly those from 2021:
Please let us know if you have any more questions as you prepare your data for ingestion!
Thank you. It looks like there are no special secret formatting issues to worry about, then, and so long as you mean “you must include the data for all preceding enforced higher-level ranks” I should have no problem using a 3rd party tree with minimal modification.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.