Hi there,
Is there a way to force the Taxon.fullName
reconstruction for every taxon already present in the tree? (for both isPreferred=true
taxa and synonyms)
Thanks !
Hi there,
Is there a way to force the Taxon.fullName
reconstruction for every taxon already present in the tree? (for both isPreferred=true
taxa and synonyms)
Thanks !
Hi @pverley,
Thank you for your question! Currently, there is no way to instruct Specify to rebuild all Taxon full name fields without modifying the ranks in the Taxon tree itself or by editing one of its parent taxa (or grandparent or great-great-grandparent).
Rebuilding full names for taxon nodes that are not preferred is not possible at all at the moment, as our logic explicitly excludes them when the names are rebuilt.
I’ve added a feature request for this capability to our GitHub, including support for a parameter to rebuild synonymized names as well:
Repair tree does that, doesn’t it? If that is not already in the API, it might be better to add that than doing something special for the names.
Hi @NielsKlazenga,
The “Repair Tree” option only rebuilds the node numbers for the selected tree, not the full names. It might just be the right place to integrate this functionality into the UI!
When you click Repair Tree in the User Tools menu, it runs two functions to renumber the tree and validate that the numbering is correct.
renumber_tree
functionThis function repairs or rebuilds the tree numbering system by:
validate_tree_numbering
functionThis function checks if the hierarchical tree structure is valid by:
nodenumber
and highestchildnodenumber
setThank you for the feature request and +1 for adding the feature in the Repair tree: it is actually the first thing I attempted without much thinking, so I guess it is kind of intuitive to expect it here.
The API offers a predict_fullname
path and it looked like the perfect opportunity to practice.
Prerequisite: How to use the Specify API as a generic webservice
Disclaimer: the following drill will makes use of
PUT
request that do alter the database. Even though the API implements optimistic locking which is safer than meddling with the SQL database, I’d say it is advisable to backup the database before running any PUT/POST/DELETE
API request.
/api/specify_tree/{tree}/{parentid}/predict_fullname/
Returns the predicted fullname
for a node based on the name
field of the node and its . Requires GET parameters treedefitemid
and name
, to indicate the rank (treedefitem) and name of the node, respectively.
URL parameters:
{tree}
name of the tree. taxon
in this case.{parentid}
ID of the Parent of Taxon taxon.parent.id
GET parameters:
{treedefitemid}
: Taxonomic rank ID taxon.taxonomicRank.id
{name}
: Name of the taxon taxon.name
After predicting the taxon fullName, I will need to (i) get the taxon version and (ii) update the taxon fullname. It can be achieved with respectively a GET
and a PUT
request with api/specify/taxon/{taxonid}/
Even though I did not mention it in my initial post, I only need to regenerate full names for infraspecific taxa (subspecies, variety and forma in our case).
I crafted some queries that would give me TaxonID, Taxon name, Taxonomic rank ID, Parent of Taxon ID
. For instance:
Taxon ID | Taxon name | Taxonomic Rank ID | Parent of Taxon ID |
---|---|---|---|
58906 | alata | 14 | 18448 |
58907 | leucostachyus | 14 | 31783 |
58909 | diffusa | 14 | 14263 |
58921 | octandra | 14 | 15064 |
… | … | 14 | … |
Queries results were exported as CSV files.
For the sake of clarity and brevity, I assume that connection has been established beforehand.
#! /bin/bash
# path of the request results
FILE=repair-taxon-fullname_subspecies.csv
# CSFRToken obfuscated here, the one from cookies.txt
csrftoken=*********
while IFS="," read -r taxonid name treedefitemid parentid
do
echo "parentid: $parentid"
echo "treedefitemid: $treedefitemid"
echo "name: $name"
# generate fullname
fullname=$(curl -s -b cookies.txt -G "https://specify.herbier-guyane.fr/api/specify_tree/Taxon/${parentid}/predict_fullname/" --data-urlencode "treedefitemid=${treedefitemid}" --data-urlencode "name=${name}")
echo "fullname: $fullname"
# get taxon version
version=$(curl -s -b cookies.txt -G "https://specify.herbier-guyane.fr/api/specify/taxon/${taxonid}/" | grep -o '\"version\": [0-9]*' | awk '{print $NF}')
# update fullname
curl -s -b cookies.txt -X PUT \
-H "X-CSRFToken: $csrftoken" \
-H "Referer: https://specify.herbier-guyane.fr/" \
--data "{\"version\": $version, \"fullname\":\"$fullname\"}" \
https://specify.herbier-guyane.fr/api/specify/taxon/$taxonid/ \
| jq '.fullname'
echo ""
done < <(tail -n +2 $FILE)
The whole script with login and logout:
repair-taxon-fullname.sh (2.2 KB)
I had to generate and update ~3700 taxa. It ran in a few minutes with outputs such as:
parentid: 18448
treedefitemid: 14
name: alata
fullname: Irlbachia alata subsp. alata
"Irlbachia alata subsp. alata"
parentid: 31783
treedefitemid: 14
name: leucostachyus
fullname: Andropogon virginicus subsp. leucostachyus
"Andropogon virginicus subsp. leucostachyus"
etc.
A query on taxon full names with both isPreferred=Yes
and isPreferred=No
showed afterward that synonym full names had been reconstructed
My personal conclusion is that there is a learning curve to working with the API, but it is worth it a hundred times over for how useful and efficient it is