DwCA questions

nfshoobs · February 28, 2025, 9:48pm

Specify v7.9.6.2

I’ve just made a DwCA export and RSS feed for of our collection data (attached) DwCA_OSUM_Bivalves.xml (12.7 KB). I had a few quick questions for those with experience setting these up about configuring it so that the export can better conform to darwincore.

Since our date data is in the form of a start date and end date both conforming to ISO 8601 (YYYY-MM-DD), we want to report the start and end dates concatenated with “/” between them if both are present in the dwc:eventDate field, and the enddate with a “/” prefix if there is no startdate. Is there a way to do this without an intermediate processing step?
Similar issue – dwc:associatedMedia should have both the media associated with the CO and CE (since CE and CO are both “associated with” an occurrence record. Really, any table linked to CO with an attachment might be considered associatedMedia to GBIF). Is there a way to make both aggregated lists of media links (CO attachments Aggregated and CE Attachments Aggregated) to be concatenated into one single field in the DwCA file. The order doesn’t matter, I just want both aggregated strings to be concatenated with a “ | ” in between them. I currently have them formatted correctly so they display a list of links to the attachment files.

For the above two things: I’ve tried a couple of different tricks with the xml (like using conditional logic or concatenation rules that normally work on labels made in jaspersoft) and haven’t been able to make them work in the DwCA file without it failing with a traceback:null error.

Our taxon table uses the WoRMS AphiaID to match taxa to WoRMS. In the DWC export, in order to conform to the dwc:taxonID and dwc:acceptedNameUsageID terms, the AphiaID should be exported with its LSID prefix (urn:lsid:marinespecies.org:taxname:), but I don’t record that prefix in the field because the WoRMS API / URL stems often use the aphiaID by itself. Is there a way I can format the AphiaID with a static text prefix that doesn’t use the taxon table formatter? Since I can only put one formatted record per table in a query, I can’t use it for both fields, and I am actually already using it for another dwc field.
the dwc:Modified term is broader than the function of Specify CO table’s timestampModified field. dwc:dateModified is meant to capture the last time that any of the data in a single row was modified. If I modify a collecting event, the CO table’s timestampModified field doesn’t update for all related records, even though fundamentally the data I have changed effects them. Is there a way to have Specify take the most recent timestampModified of the CE, CO, Locality, Preparations, Determinations etc tables and use that as timestampModified? As it stands, I could edit the value of almost every single field that appears in the DwCA row, and the CO timestampmodified would not be changed.

Some general DwCA export questions:

GBIF validator claims “The description of the dataset is missing or too short”:“DESCRIPTION_MISSING_OR_TOO_SHORT” and “The EML document does not validate against the schema”:“URI is not absolute”. I can’t find descriptions of these issues online. I setup the EML following the instructions on the forum here (swapping it out for the relevant info on our collection). I feel like this is probably an easy fix, but haven’t been able to figure it out yet. EML here: Test EML.xml (1.3 KB)
Is there a way to have make an export that contains all of the collections using the same query? I assume no, but do recall reading that some settings allow queries across collections. I don’t normally want queries to access all collections, but in this specific context it would be convenient because it would mean 1 occurrence dataset to keep track of on GBIF vs 3.

Thanks!

AndyBentley · February 28, 2025, 10:17pm

#1. You could try using a formatter for the collecting event table if you aren’t already using one for something else.
#2. Yes but only if you use the Audubon Core extension to Darwin Core. Then you can create two separate mappings for each table and they will be concatenated into a single file upon export. I am doing this for my fish collection and it works great. Send me an email and I can send you my xml file so you can see how it is done.
#3. I was under the impression that you could create multiple formatters for a table and then specify in the query which formatter to use. I have never done it so am not sure how but @Grant may be able to help.
#4. Could use use the audit log table to get at the modified date field?
#5. Not sure about this one but will give it some more thought. You may want to reach out to the IPT gurus to see if they can help.
#6. Not sure you would want to publish all collections as a single resource as there would be information specific to the collection as well as individual collection codes (I would assume) that you would want to preserve for each collection.

Hope that helps

Andy

AndyBentley · February 28, 2025, 10:19pm

Here is my mapping file for my tissue collection:
DwCA_tissue.xml (18.9 KB)

AndyBentley · February 28, 2025, 10:19pm

It has multiple extension files for DwC.

nfshoobs · March 3, 2025, 4:23pm

#1. unfortunately I am already using the CE table formatter in order to display CE numbers in the proper format in exports.
#2. Thanks for the Audobon core extension recommendation and xml example. This solves this particular problem!
#3. This is the same problem as #1, since I use one formatted field already, I can’t have it occur more than once. It is true that you can define multiple formatters and select which one you want to use in query exports, but you can only do it one time per table per query, as far as I can tell.

#4. Could use use the audit log table to get at the modified date field?

I don’t think there’s any single audit log field that would work in the way that the dwc:modified term implies. But I could be wrong! I guess I could just leave the term out of the export.

#6. Not sure you would want to publish all collections as a single resource as there would be information specific to the collection as well as individual collection codes (I would assume) that you would want to preserve for each collection.

Since the query includes collectioncode and I map occurrenceID to a formatted field that includes the collectioncode as part of a GUID, I don’t see a reason why I shouldn’t export them as one DwCA. I have our bivalve, gastropod, and crustacean data stored in separate Specify collections principally because they use separate catalog numbering schemes (i.e. OSUM 1234 refers to a bivalve lot, a gastropod lot, and a crayfish lot that are entirely unrelated to one another). Many other institutions, USNM for example, have duplicate catNos for different phyla, but export combined occurrence datasets to GBIF. I don’t feel too strongly about this, it’s not a huge deal if this is not possible.

AndyBentley · March 3, 2025, 4:28pm

#6 all depends on whether you want it to appear in GBIF as a single resource and the metadata applies to all. If not, you would want to publish them separately.

nfshoobs · March 3, 2025, 4:33pm

Same metadata applies to all, yes, because all of our IZ material is managed (by yours truly) by one single “division” of the museum, and I’m the main point of contact and only full time staff member in the Division. We even use a shared collecting event table for all 3 collections.

nfshoobs · April 10, 2025, 1:58pm

@Grant Any advice on the GBIF uploading front?
Issues #1 and #3 are still blocking our publishing to GBIF.

Grant · April 22, 2025, 5:49pm

Hi @nfshoobs,

Since our date data is in the form of a start date and end date both conforming to ISO 8601 (YYYY-MM-DD), we want to report the start and end dates concatenated with “/” between them if both are present in the dwc:eventDate field, and the enddate with a “/” prefix if there is no startdate. Is there a way to do this without an intermediate processing step?

As Andy suggested, the only approach now is to use the Collecting Event table format. You mentioned that you are already using the format, which precludes the possibility for you to define one that displays the startDate and endDate, and map that formatted field in the query to the eventDate term in dwc. It seems that intermediate processing is necessary until further software extension takes place.

Our taxon table uses the WoRMS AphiaID to match taxa to WoRMS. In the DWC export, in order to conform to the dwc:taxonID and dwc:acceptedNameUsageID terms, the AphiaID should be exported with its LSID prefix (urn:lsid:marinespecies.org:taxname:), but I don’t record that prefix in the field because the WoRMS API / URL stems often use the aphiaID by itself. Is there a way I can format the AphiaID with a static text prefix that doesn’t use the taxon table formatter? Since I can only put one formatted record per table in a query, I can’t use it for both fields, and I am actually already using it for another dwc field.

This is not currently possible without using a table format as well, for the same issue described above. I’ve added a GitHub issue for our team to consider to integrate this capability:

github.com/specify/specify7

Support multiple table formats and aggregations

opened 05:49PM - 22 Apr 25 UTC

grantfitzsimmons

2 - Queries 2 - Exporting Data

**Is your feature request related to a problem? Please describe.** At this time,… you can only add each table format or aggregation one time. The big issue <img width="847" alt="Image" src="https://github.com/user-attachments/assets/fd48e93d-1c8c-4a86-a05d-afcadfa882ef" /> **Describe the solution you'd like** * An `(aggregated)` or `(formatted)` table relationship should be able to be added more than once * Each `(aggregated)` or `(formatted)` query field should enable the user to select a distinct table format / aggregation * Each line in the query should be mappable in an export definition to a different DwC term <img width="678" alt="Image" src="https://github.com/user-attachments/assets/637cb17f-3dc4-4d8e-adfa-839e609c09d3" /> **Describe alternatives you've considered** Post-processing data is currently necessary to achieve what multiple table formats can provide for presenting data in query exports and during data publishing. This approach is neither sustainable nor automated within the software. **Reported By** Our team, Nate Shoobs at Ohio Mollusks on the [Speciforum](https://discourse.specifysoftware.org/t/dwca-questions/2385/5?u=grant), and many others > 3. Our taxon table uses the WoRMS AphiaID to match taxa to WoRMS. In the DWC export, in order to conform to the dwc:taxonID and dwc:acceptedNameUsageID terms, the AphiaID should be exported with its LSID prefix (urn:lsid:marinespecies.org:taxname:), but I don’t record that prefix in the field because the WoRMS API / URL stems often use the aphiaID by itself. Is there a way I can format the AphiaID with a static text prefix that *doesn’t* use the taxon table formatter? Since I can only put one formatted record per table in a query, I can’t use it for both fields, and I am actually already using it for another dwc field.

nfshoobs · April 28, 2025, 4:38pm

Hey Grant,
Thanks for the clarification. I figured as much but wanted to make sure it wasn’t possible at present without an intermediary processing step.

The ability to make “calculated” fields in the xml query exports seems like it would also be a solution to this problem as well, but multiple formatters is a good GUI-based solution that makes more intuitive sense! Thanks for making a github issue for it.

One note: It does seem like the current way EML and metadata files are generated directly by specify for RSS feeds are not GBIF-compliant. This is something that probably should have its own issue. The simple text field in the RSS feed app resource GUI does not allow the level of detail that the IPT interface does. Somewhere in there is a field or set of fields that Specify-generated DwCAs currently lack, because if I take the same occurrence dataset and mapping file from Specify and generate a DwCA with an IPT, the GBIF validator says it’s good, but uploading the DwCA from a Specify 7 RSS feed to the GBIF validator will cause it to throw a few errors saying the EML file is formatted incorrectly / ‘does not validate against the Schema’.

Grant · April 30, 2025, 9:22pm

Hi @nfshoobs,

On that note, I’ve added your comment to the GitHub issue tracking this structural incompatibility:

github.com/specify/specify7

Update RSS Feed adds date at the end of the eml file

opened 07:08PM - 16 Apr 25 UTC

emenslin

2 - Exporting Data

**Describe the bug** When updating the RSS Export feed the published date is put… at the very end of the file but the GBIF validater wants it between the associated party and the language. The date being in the wrong place prevents the metadata from being validated by GBIF. **To Reproduce** Steps to reproduce the behavior: 1. Go to any db with RSS export feed set up (e.g. KUfish) 2. Click on update RSS feed 3. When completed, download zip file 4. Open the eml 5. See pubdate is at the end **Expected behavior** Published date should be in the correct spot where it can be validated by GBIF **Screenshots** Where export feed puts it: ![Image](https://github.com/user-attachments/assets/05ed5b28-6290-4375-9587-02e44afbb5dc) Where GBIF wants it: ![Image](https://github.com/user-attachments/assets/406e2833-9da9-44ed-8f06-79000d1ed792) Please fill out the following information manually: - OS: Windows 11 - Browser: Chrome - Specify 7 Version: 7.10.2 - Database Name: kufish - Collection name: KU Fish Voucher - User Name: spfishadmin - URL: https://kufish20250214-production.test.specifysystems.org/specify/ **Reported By** Willem @ SAIAB

system · May 7, 2025, 9:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DwCA Data Exporting in Specify 7 Configuration & Installation Specify-7 , Videos , Webinar , Members-only	0	1384	February 5, 2025
Export Determination History for Symbiota and GBIF Configuration & Installation Specify-7	0	323	December 2, 2022
Date format in DwcA export Get Help	1	270	March 20, 2023
How to Update Your RSS Feed :rss_: Configuration & Installation Specify-7	0	436	July 25, 2022
Darwin Core Archive Publishing Technical Docs	0	13	September 28, 2018

DwCA questions

Related topics