Workbench won't match with known species

fedoras · March 5, 2025, 3:10pm

While doing mass imports, we have just discovered something very odd with one particular collection (Vascular plants) that refuses to match the species names to existing ones in the tree. So as a result it creates multiple duplicate species that we would subsquently need to clean up again. It has no such issues with the genus or higher.

For other collections (Entomology), where we use import files that are structured and generate in the exact same way, we have no such issues.

Is there any way to explain this strange behaviour? Could there be some obscure setting in the collection or discipline that might cause this?

Using version 7.9.6.2

NHMD_Herba_20241113_14_55_JMJ_processed_imported.tsv (248.1 KB)

Grant · March 6, 2025, 6:37pm

Hi @fedoras,

Thanks for sharing the spreadsheet! Could you also share the export mapping? We want to check if there are any additional fields mapped at a lower level (such as author, attribute, citation, etc.) that might lead Specify to treat them as unique.

Thank you!

fedoras · March 7, 2025, 9:06am

Hi @Grant

Hereby some of the upload plans used:

NHMD_Herba_export_plan.json (8.8 KB)

NHMD_Herba_20241113_14_55_JMJ_processed_imported.json (6.7 KB)

NHMD_Herba_20250207_15_03_CB_processed_imported.json (7.0 KB)

Thanks in advance

Grant · March 10, 2025, 1:56pm

Hi @fedoras,

Thanks for the files! We’re investigating this now and we’ll get back to you once we identify the issue.

jason_m · March 13, 2025, 5:36am

Hi @fedoras!

Thank you for your patience while this Issue was being looked into!
I do believe I have found the cause of the duplicate Species records.

The Problem

Ultimately, the problem lies in the seemingly innocuous mapping of Taxon -> Rank -> Is Hybrid and how Specify handles default field values when searching for existing Tree Records

Essentially, with isHybrid explicitly mapped with the default matching behavior, the searching and matching behavior for the associated record will always include the isHybrid field and value in the row. If the value of isHybrid for the row is blank (i.e., NULL), Specify will include a isHybrid=NULL in the search for existing records.

The problem with this is that isHybrid is marked as not nullable at a database level and is given a default value of False by the datamodel. In other words, if you don’t specify a isHybrid value for a Tree record, Specify will always default to isHybrid=False.

So for each Species in your Data Set, when Specify is trying to find a matching/existing Species, it is searching for isHybrid=NULL when there is no data in the cell: which will never result in a match and a new Species is created with a False isHybrid. Thus, each Species is being duplicated.

Below is a video which recreates and (mostly minimally) demonstrates the problem:

To track the problem, I have opened up a GitHub Issue which is slightly more technical and comprehensive in its explanation of the problem and proposes some proper solutions:

github.com/specify/specify7

Datamodel defaults not being respected in Workbench Tree Matching

opened 04:39AM - 13 Mar 25 UTC

melton-jason

2 - WorkBench

**Describe the bug** If a tree field which has a default value defined in the da…tamodel (like `isAccepted` or `isHybrid`) is explicitly included in a WorkBench Data Set and does not contain any value for a row (i.e., is blank/null), Specify will still use a NULL value for searching and matching purposes. For a concrete example, consider three columns for any Tree rank in a Data Set: `Genus -> name`, `Species -> name`, and `Species -> isHybrid`, where `Species -> isHybrid` has the default matching behavior (Never Ignore, Allow Null Values, and Don't use a Default Value) : | Genus Name | Species Name | Species isHybrid | |------------|--------------|------------------| | TestGenus | TestSpecies | | | TestGenus | TestSpecies | | In a simplified explanation that demonstrates the Issue, on the first row Specify will search for any existing Genus records with the name `TestGenus`. If one exists, then Specify will match to that record and otherwise creating one. Specify will also search for a Species record which has the name `TestSpecies`, has the `TestGenus` parent from the previous step, _and_ has an empty (NULL) `isHybrid`- matching to the record if it exists or creating it otherwise (this will always result in creating a new node, as there can not be a Taxon record without an `isHybrid` value). When Specify creates the `TestSpecies` node, it "passes through" the datamodel and sees that `isHybrid` has a default value defined: false; Specify replaces the empty `isHybrid` value with `false` before sending it to the database. The process the repeats for the second row, with the exception in behavior that Specify will always match the `TestGenus` to an existing Taxon record (it matches to the TestGenus of the previous row if it was created). Specify will not match the TestSpecies because it searches for Taxon records with an empty `isHybrid`. https://github.com/user-attachments/assets/d7c7a7fe-0df2-4d7c-9c64-ca5de2be27c5 **To Reproduce** Steps to reproduce the behavior: 1. Create a Data Set which minimally contains two columns (there can be other columns), each mapping to a specific Tree's rank (i.e., both columns map to one of Species, Genus, Family, etc.): one mapping to an identifying/text field (like Name, Author, etc.), and the other mapping to `isHybrid` or `isAccepted` 2. Populate the Data Set with data a. Ensure all rows have identical information (same name, author, etc.) for the Tree record mapped with the `isHybrid`/`isAccepted` columns b. Leave the `isHybrid` or `isAccepted` columns blank 3. Validate and/or upload the Data Set and observe that rows which should have matched with the Tree record instead are created as new, duplicate records **Expected behavior** Specify should do one or more of the following: - For matching purposes, if a field with a default has a `NULL` value, replace it with the field's default value - For fields with a default value, automatically have the `Use Default Value` option checked and filled with the field's default value **Current Workarounds** - Use the `Ignore When Blank` matching behavior for columns mapped to fields with default values - Use the `Use Default Value` option and explicitly define a default for the column - Modify the data in the Data Set such that all rows have data in the `isHybrid` and `isAccepted` columns Please fill out the following information manually: - OS: macOS Sonoma (14.3) - Browser: Google Chrome 134.0.6998.45 (Official Build) (arm64) - Specify 7 Version: Reproduced in [v7.9.6.2](https://github.com/specify/specify7/tree/v7.9.6.2) and [v7.10.0](https://github.com/specify/specify7/tree/v7.10.0) **Reported By** Fedor Steeman from the Natural History Museum of Denmark via the Speciforum: https://discourse.specifysoftware.org/t/workbench-wont-match-with-known-species/2398

Feel free to provide any of your own feedback either here or on the GitHub Issue!

Solutions

Thankfully, there are some fairly easy workarounds until a proper solution is implemented.
You can do any of the following to resolve the problem:

Use the “Ignore When Blank” matching behavior

ignore_when_blank2254×1286 454 KB
Specify a default value for blank/empty cells

Or provide a value in the cell of the isHybrid column for each Species you are uploading

fedoras · March 13, 2025, 3:49pm

Marvellous! Thank you for your assistance @jason_m !

NielsKlazenga · March 14, 2025, 11:02am

That’s some very impressing sleuthing @jason_m!

system · March 21, 2025, 11:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding agents via workbench causing duplicates Get Help	5	217	June 20, 2024
Webinar: Specify 7 WorkBench WorkBench Workflows Specify-7 , Videos , Webinar , Members-only	2	620	April 1, 2024
Establishing relationship between synonym and preferred/accepted taxon, en masse Get Help	13	398	May 30, 2024
Workbench match behaviour lacking Get Help	5	236	May 14, 2025
WorkBench Technical Breakdown Technical Docs Specify-7	0	13	July 8, 2022

Workbench won't match with known species

The Problem

Solutions

Related topics