Establishing relationship between synonym and preferred/accepted taxon, en masse

We need to upload a (long and fairly clean) list of new taxa prior to uploading the actual specimens/records that bear those names. I know many of those binomials are synonyms of others (specific ones) in that same list, where each synonym is unequivocally linked to its accepted/preferred name.
However, I can’t figure out how to map that relationship in the WorkBench. Fields like Taxon.acceptedTaxon and Taxon.isAccepted (populating it would help down the line) are not available in the WB. But neither is Determination.preferredTaxon if one is to try an alternative route.
Again, it’s a long list and synonymizing binomials one by one on the taxon tree would be painful.
I must be missing something. Thanks.

3 Likes

Hi @igranzow,

Unfortunately, you are not missing anything. Currently, there is no method to bulk upload synonyms through WorkBench; they need to be set up one by one via the tree interface. Updating Taxon records individually through SQL or the API is feasible but more complex and risky compared to using the user interface.

This has been requested a number of times and I’ve added your comment to the official issue tracking. If you have more comments or want to share your ideal workflow once this request is implemented, please reply to this topic below or on GitHub!

1 Like

Hi @igranzow,

As mentioned by @Grant, the best way this can be accomplished without individually synonymizing the Taxa records using the Tree Viewer would be through SQL or through a script utilizing the API (or some combination of the two).

If such a solution appeals to you before the ability to upload synonymies via the WorkBench is implemented and you are in need of assistance, can you clarify the structure of the input or provide a (optionally fictitious) sample of the dataset?

For example, what are the mapped column headings (to fields/relationships from CollectionObject, Determination, or Taxon, etc.) in the dataset?

Hi, @jason_m

We have 32.4k taxon names that we wish to import into Specify. 28% (9.1k) of them are synonyms. I’m interested to find out more about the options I have here to avoid forcing our curatorial staff to hand-edit that many synonyms post-launch.

By way of example, our data contains parent/child relationships between members of the taxonomic hierarchy and a relationship between a prior taxon name (a synonym of some kind) and a more recent name, which may itself be a synonym. That is, there may be several “hops” across names before arriving at one or more accepted taxon names.

NAME_ID RANK_NAME IS_CURRENT NAME AUTHOR
9024 Species false Boronia machardiana F.Muell.
9025 Species false Boronia viminea Lindl.
16636 Subspecies true Boronia crenulata subsp. viminea (Lindl.) Paul G. Wilson
OLD_NAME_ID NEW_NAME_ID XREF_TYPE
9024 16636 TSY
9025 16636 TSY

This data documents the following synonym relationships.

  1. Boronia machardiana F.Muell. is a taxonomic synonym of Boronia crenulata subsp. viminea (Lindl.) Paul G. Wilson.
  2. Boronia viminea Lindl. is a taxonomic synonym of Boronia crenulata subsp. viminea (Lindl.) Paul G. Wilson.

I had planned to convert this data into a CSV suitable for uploading to the Workbench and hoped to build sufficient data into that CSV file to allow Specify’s taxon tree to recognise the synonym relationships.

1 Like

Thanks Iñigo and Ben for bringing up the issue. We are facing exactly the same problem here in the French Guiana Herbarium (CAY) for migrating our taxon tree to Specify.
We have ~55k taxa to import and ~30k are synonyms, so this is definitely not something that can be deal with manually.
We +1 the feature request and in the meantime we are happy to share dataset and brain power to come up with a SQL based (semi)automatized solution.
Thanks,
Philippe V. (CAY)

Hi @pverley,

Could you share the dataset of synonyms you want to import into Specify? We will explore and suggest an interim method for importing synonyms until WorkBench fully supports this feature.

Thank you!

Hi Grant.

You asked Philippe for the dataset of synonyms he’s dealing with, so I tag along. I have a couple of them (seed plants & fishes) from the MNCN Biobank database, which I’m so close to finalize migrating. The taxa lists are by no means the size of what Philippe is talking about because the Biobank contains a small subset of taxa, obviously, but I will encounter much much larger volumes when I tackle the main collections.

… if it’s of any help

Thanks so much.

Íñigo

FishesACCEPTED in MNCN-CSIC biobank.xlsx (37.1 KB)

PlantsACCEPTED in MNCN-CSIC biobank.xlsx (22.2 KB)

1 Like

Hi everyone!

Thank you all for providing example datasets and being patient.

I have created a repository on GitHub which demonstrates how the API can be used with Python and the requests library to create an application which mass-imports taxonomic data (including synonyms) to a Specify 7 instance.

In short, the demo takes a CSV containing information in the following format, creates a Mammalia taxon node if one does not exist, and uploads the taxon records under the Mammalia node (at the correct ranks specified in the CSV columns)

Order Family Genus Species isAccepted Author AcceptedGenus AcceptedSpecies AcceptedAuthor
Afrosoricida Tenrecidae Microgale talazaci Yes Major, 1896
Afrosoricida Tenrecidae Oryzorictes talpoides No G.Grandidier & Petit, 1930 Oryzorictes hova A.Grandidier, 1870

By default, the application is set to connect to https://sp7demofish.specifycloud.org/ using the sp7demofish user and logging into the KUFishvoucher collection, so you can see it in action and independently make edits to the code/data and see the result without worrying about making changes to a live production instance.
(If you plan on developing your own application or apopting the one in the repository, you can use this sp7demofish instance for API testing purposes. The data in the instance should be regularly wiped).

The code was developed to be minimum-viable product (demo) without optimization in mind, so optimizations can be made to the code.
And/or host a Specify 7 instance locally and have the application connect to the local instance to improve performance.


:warning: If interested, please read the README of the repository

If this is not helpful, or an alternative approach should be considered, a demo using SQL directly to accomplish the same task can be made.

2 Likes

Hi, @jason_m

Thanks for this. I will be able to create rows in that format quite easily.

Some questions:

  • Does it also handle subordinate ranks of Species, e.g. Subspecies, Variety etc.?

  • Also, will the tool cope when there are several accepted names for a taxon with isAccepted = No?

Hi @Benr,

This is not a one-size-fits-all solution and is merely a starting point for a more evolved import system. It would need to be adjusted to accommodate these situations based on my understanding.

The repository provides more details about the specifics. This tool only allows for importing or updating Taxon records and linking them to the appropriate “accepted” Taxon record. While it does not offer additional functionality, it would be great to share this tool with the community if you or someone else further develops it!

I do not know the WorkBench in Specify7 at all, but if you can upload Taxon Citations in it, that is what I would try.

As of last week, I am managing five different taxonomies, VicFlora, the Australian Plant Census (APC), the World Checklist of Vascular Plants (WCVP/PoWo), AusMoss and Bryonames in the Taxon Tree. Each taxonomy is a Reference Work and in the Taxon Citation I have a field (Text1) in which I record the accepted name according to that taxonomy/Reference Work. From there it is a simple update query to get the synonyms (provided the accepted names are in the Taxon Tree).

I have not had the chance to tie this all up in a bow yet, but I plan to do this only once and having it update automatically periodically after that. So, @Benr , if you can wait until September, when I am back from long service leave, I can help you out.

I was not very responsive in sending demo data, but import file is basically just like the one sent by Íñigo: taxon details + accepted taxon details.
I will read carefully the README, but at first glance I understand that I can :

  • either use it to upload new taxa with synonyms
  • or update existing taxa and link them as synonyms

Both ways of using the tool will be very useful. Thank you so much for the hard work !

Thank you so much @jason_m for sharing the python code as an example of how to use the API in a programming way, so handy!

I adapted slightly the code to our needs since we are importing an entire taxon tree (plants of Guyane) in Specify from an existing Database where we already had synonym index. I’m writing down what I did if it could ever be useful to other:

Rank Name ID SynonymRank SynonymName SynonymID
Forma acrocarpa 80667 Subspecies sphaerocarpa 73402
Forma acropteron 18346 Species hemipteron 18345
Forma aculeata 50701 Variety aculeata 50699
Forma acuminatum 99284 Species auriculatum 17972
Forma acutifolia 81079 Species subrevoluta 1465
Forma alabamense 16918 Species jenmanii 16915
Forma albiflora 77914 Species volubilis 5452
Forma albolana 77954 Species pentandra 838
Forma alternatum 99289 Species auriculatum 17972
Forma althaeoides 80673 Species althaeoides 5326
Forma amazonica 17157 Species sellowii 16554
Forma amurensis 100008 Species sibirica 12046
Forma anadencum 99217 Species campyloptera 98013

That way we could handle synonymy for every taxon rank and handle the fact that the accepted taxon may not be of same rank.

We synonymized ~30k taxa this way. I split the import in a bulk of 60 CSV files of ~500 taxa to avoid Session closed error and looped the import script over the CSV files. It took a work day of understanding, adapting the python code, preparing the CSV files. The import ran overnight in less than 12 hours :partying_face:

Cheers,
Philippe V. (CAY, NOU)

2 Likes