Issues with adding the Catalogue of Life 2021 taxon tree during the discipline creation process

Hi @epalatou,

Welcome to the Speciforum! We are glad to have you here! Thank you very much for your question.

I have provided guidance on the format required for building a tree in Specify here:

The official repository for the tree files available in the Specify 6 Wizard can be found here. This includes the dataset that caused the error when building the database in that version:

Specify 6 may struggle with the larger taxon trees available from the Catalogue of Life by default. Specify 7 handles tree importing much better. However, you will need to download the appropriate tree from this site and map it to the columns in your Taxon tree.

Hierarchy

The most important principle to understand is that all trees in Specify (Taxonomy, Geography, Storage, etc.) are fundamentally hierarchical. Every record, or “node,” in the tree must have a parent node, except for the highest-level “root” nodes. When you upload a tree from a file, you are essentially providing a set of instructions that tells Specify how to build these parent-child relationships.

File Formatting for Upload

To create a taxonomic tree in Specify, you must generate a spreadsheet (typically a .csv, .tsv, or .xlsx) where each row represents a complete path from the highest rank down to the lowest rank for that particular entry, including authorship and any other details at each level. This will be imported using the WorkBench.

The columns in your spreadsheet should correspond to the ranks in your taxon tree definition (e.g., Kingdom, Phylum, Class, Order, Family, Genus, Species). The critical rule is that for every row, you must include the data for all preceding higher-level ranks.

Here is an example of a correctly formatted file. Notice how the higher-level ranks are repeated for each subsequent child row.

kingdom phylum class order family genus species species author species source
Animalia
Animalia Chordata
Animalia Chordata Amphibia
Animalia Chordata Amphibia Gymnophiona
Animalia Chordata Amphibia Gymnophiona Herpelidae
Animalia Chordata Amphibia Gymnophiona Herpelidae Boulengerula
Animalia Chordata Amphibia Gymnophiona Herpelidae Boulengerula boulengeri Tornier, 1896 https://www.catalogueoflife.org/data/taxon/5WQMJ
Animalia Chordata Amphibia Gymnophiona Herpelidae Boulengerula changamwensis Loveridge, 1932 https://www.catalogueoflife.org/data/taxon/MR6V

When Specify processes this file, it reads each row and builds the tree structure. For example, the last row tells the system: “The species changamwensis belongs to the genus Boulengerula, which belongs to the family Herpelidae,” and so on. If a node like Herpelidae has already been created by a previous row, Specify will simply link to the existing node instead of creating a duplicate.

Including Other Data

Your data set is not limited to just the names of the ranks. You can include any other fields that belong to the Taxon table, such as the author, source, or common name. You simply need to add more columns to your spreadsheet with headers that match the field names in Specify. In the example above, species author and species source are mapped to the corresponding fields for the taxon record at the species level.