Force full name reconstruction for taxon tree

Hi there,

Is there a way to force the Taxon.fullName reconstruction for every taxon already present in the tree? (for both isPreferred=true taxa and synonyms)

Thanks !

Hi @pverley,

Thank you for your question! Currently, there is no way to instruct Specify to rebuild all Taxon full name fields without modifying the ranks in the Taxon tree itself or by editing one of its parent taxa (or grandparent or great-great-grandparent).

Rebuilding full names for taxon nodes that are not preferred is not possible at all at the moment, as our logic explicitly excludes them when the names are rebuilt.

I’ve added a feature request for this capability to our GitHub, including support for a parameter to rebuild synonymized names as well:

Repair tree does that, doesn’t it? If that is not already in the API, it might be better to add that than doing something special for the names.

Hi @NielsKlazenga,

The “Repair Tree” option only rebuilds the node numbers for the selected tree, not the full names. It might just be the right place to integrate this functionality into the UI!


Technical Details

When you click Repair Tree in the User Tools menu, it runs two functions to renumber the tree and validate that the numbering is correct.

renumber_tree function

This function repairs or rebuilds the tree numbering system by:

  • Updating each node’s rank to match its full name definition in the tree schema
  • Checking for and warning about invalid parent-child rank relationships
  • Creating a complete path enumeration for each node in the tree
  • Assigning new node numbers based on the hierarchical paths
  • Setting proper highest child node numbers for parent nodes
  • Clearing any maintenance flags in the system related to tree nodes

validate_tree_numbering function

This function checks if the hierarchical tree structure is valid by:

  • Verifying that all nodes have nodenumber and highestchildnodenumber set
  • Ensuring children have higher ranks than their parents (maintaining proper hierarchy)
  • Confirming that child node numbers are properly nested within their parent’s range
Code References

specify7/specifyweb/specify/tree_views.py at f9cb3421767993a8c0ff504f86e94187d79a3dd9 · specify/specify7 · GitHub
specify7/specifyweb/specify/tree_extras.py at f9cb3421767993a8c0ff504f86e94187d79a3dd9 · specify/specify7 · GitHub

1 Like

Thank you for the feature request and +1 for adding the feature in the Repair tree: it is actually the first thing I attempted without much thinking, so I guess it is kind of intuitive to expect it here.

The API offers a predict_fullname path and it looked like the perfect opportunity to practice.

:books: Prerequisite: How to use the Specify API as a generic webservice

:warning: Disclaimer: the following drill will makes use of PUT request that do alter the database. Even though the API implements optimistic locking which is safer than meddling with the SQL database, I’d say it is advisable to backup the database before running any PUT/POST/DELETE API request.

API predict_fullname

/api/specify_tree/{tree}/{parentid}/predict_fullname/ Returns the predicted fullname for a node based on the name field of the node and its . Requires GET parameters treedefitemid and name, to indicate the rank (treedefitem) and name of the node, respectively.

URL parameters:

  1. {tree} name of the tree. taxon in this case.
  2. {parentid} ID of the Parent of Taxon taxon.parent.id

GET parameters:

  1. {treedefitemid} : Taxonomic rank ID taxon.taxonomicRank.id
  2. {name} : Name of the taxon taxon.name

API taxon

After predicting the taxon fullName, I will need to (i) get the taxon version and (ii) update the taxon fullname. It can be achieved with respectively a GET and a PUT request with api/specify/taxon/{taxonid}/

Requesting synonym taxa

Even though I did not mention it in my initial post, I only need to regenerate full names for infraspecific taxa (subspecies, variety and forma in our case).

I crafted some queries that would give me TaxonID, Taxon name, Taxonomic rank ID, Parent of Taxon ID . For instance:

Taxon ID Taxon name Taxonomic Rank ID Parent of Taxon ID
58906 alata 14 18448
58907 leucostachyus 14 31783
58909 diffusa 14 14263
58921 octandra 14 15064
… … 14 …

Queries results were exported as CSV files.

API calls

For the sake of clarity and brevity, I assume that connection has been established beforehand.

#! /bin/bash
# path of the request results 
FILE=repair-taxon-fullname_subspecies.csv
# CSFRToken obfuscated here, the one from cookies.txt
csrftoken=*********
while IFS="," read -r taxonid name treedefitemid parentid
do
  echo "parentid: $parentid"
  echo "treedefitemid: $treedefitemid"
  echo "name: $name"
  # generate fullname
  fullname=$(curl -s -b cookies.txt -G "https://specify.herbier-guyane.fr/api/specify_tree/Taxon/${parentid}/predict_fullname/"  --data-urlencode "treedefitemid=${treedefitemid}" --data-urlencode "name=${name}")
  echo "fullname: $fullname"
  # get taxon version
  version=$(curl -s -b cookies.txt -G "https://specify.herbier-guyane.fr/api/specify/taxon/${taxonid}/" | grep -o '\"version\": [0-9]*' | awk '{print $NF}')
  # update fullname
  curl -s -b cookies.txt -X PUT \
    -H "X-CSRFToken: $csrftoken" \
    -H "Referer: https://specify.herbier-guyane.fr/" \
    --data "{\"version\": $version, \"fullname\":\"$fullname\"}" \
    https://specify.herbier-guyane.fr/api/specify/taxon/$taxonid/ \
    | jq '.fullname'
  echo ""
done < <(tail -n +2 $FILE)

The whole script with login and logout:
repair-taxon-fullname.sh (2.2 KB)

Results

I had to generate and update ~3700 taxa. It ran in a few minutes with outputs such as:

parentid: 18448
treedefitemid: 14
name: alata
fullname: Irlbachia alata subsp. alata
"Irlbachia alata subsp. alata"

parentid: 31783
treedefitemid: 14
name: leucostachyus
fullname: Andropogon virginicus subsp. leucostachyus
"Andropogon virginicus subsp. leucostachyus"

etc.

A query on taxon full names with both isPreferred=Yes and isPreferred=No showed afterward that synonym full names had been reconstructed :partying_face:

My personal conclusion is that there is a learning curve to working with the API, but it is worth it a hundred times over for how useful and efficient it is :innocent:

1 Like