Thank you for the feature request and +1 for adding the feature in the Repair tree: it is actually the first thing I attempted without much thinking, so I guess it is kind of intuitive to expect it here.
The API offers a predict_fullname path and it looked like the perfect opportunity to practice.
Prerequisite: How to use the Specify API as a generic webservice
Disclaimer: the following drill will makes use of PUT request that do alter the database. Even though the API implements optimistic locking which is safer than meddling with the SQL database, Iād say it is advisable to backup the database before running any PUT/POST/DELETE API request.
API predict_fullname
/api/specify_tree/{tree}/{parentid}/predict_fullname/ Returns the predicted fullname for a node based on the name field of the node and its . Requires GET parameters treedefitemid and name, to indicate the rank (treedefitem) and name of the node, respectively.
URL parameters:
{tree}name of the tree.taxonin this case.{parentid}ID of the Parent of Taxontaxon.parent.id
GET parameters:
{treedefitemid}: Taxonomic rank IDtaxon.taxonomicRank.id{name}: Name of the taxontaxon.name
API taxon
After predicting the taxon fullName, I will need to (i) get the taxon version and (ii) update the taxon fullname. It can be achieved with respectively a GET and a PUT request with api/specify/taxon/{taxonid}/
Requesting synonym taxa
Even though I did not mention it in my initial post, I only need to regenerate full names for infraspecific taxa (subspecies, variety and forma in our case).
I crafted some queries that would give me TaxonID, Taxon name, Taxonomic rank ID, Parent of Taxon ID . For instance:
| Taxon ID | Taxon name | Taxonomic Rank ID | Parent of Taxon ID |
|---|---|---|---|
| 58906 | alata | 14 | 18448 |
| 58907 | leucostachyus | 14 | 31783 |
| 58909 | diffusa | 14 | 14263 |
| 58921 | octandra | 14 | 15064 |
| ⦠| ⦠| 14 | ⦠|
Queries results were exported as CSV files.
API calls
For the sake of clarity and brevity, I assume that connection has been established beforehand.
#! /bin/bash
# path of the request results
FILE=repair-taxon-fullname_subspecies.csv
# CSFRToken obfuscated here, the one from cookies.txt
csrftoken=*********
while IFS="," read -r taxonid name treedefitemid parentid
do
echo "parentid: $parentid"
echo "treedefitemid: $treedefitemid"
echo "name: $name"
# generate fullname
fullname=$(curl -s -b cookies.txt -G "https://specify.herbier-guyane.fr/api/specify_tree/Taxon/${parentid}/predict_fullname/" --data-urlencode "treedefitemid=${treedefitemid}" --data-urlencode "name=${name}")
echo "fullname: $fullname"
# get taxon version
version=$(curl -s -b cookies.txt -G "https://specify.herbier-guyane.fr/api/specify/taxon/${taxonid}/" | grep -o '\"version\": [0-9]*' | awk '{print $NF}')
# update fullname
curl -s -b cookies.txt -X PUT \
-H "X-CSRFToken: $csrftoken" \
-H "Referer: https://specify.herbier-guyane.fr/" \
--data "{\"version\": $version, \"fullname\":\"$fullname\"}" \
https://specify.herbier-guyane.fr/api/specify/taxon/$taxonid/ \
| jq '.fullname'
echo ""
done < <(tail -n +2 $FILE)
The whole script with login and logout:
repair-taxon-fullname.sh (2.2 KB)
Results
I had to generate and update ~3700 taxa. It ran in a few minutes with outputs such as:
parentid: 18448
treedefitemid: 14
name: alata
fullname: Irlbachia alata subsp. alata
"Irlbachia alata subsp. alata"
parentid: 31783
treedefitemid: 14
name: leucostachyus
fullname: Andropogon virginicus subsp. leucostachyus
"Andropogon virginicus subsp. leucostachyus"
etc.
A query on taxon full names with both isPreferred=Yes and isPreferred=No showed afterward that synonym full names had been reconstructed ![]()
My personal conclusion is that there is a learning curve to working with the API, but it is worth it a hundred times over for how useful and efficient it is ![]()
