DwCA questions

Specify v7.9.6.2

I’ve just made a DwCA export and RSS feed for of our collection data (attached) DwCA_OSUM_Bivalves.xml (12.7 KB). I had a few quick questions for those with experience setting these up about configuring it so that the export can better conform to darwincore.

  1. Since our date data is in the form of a start date and end date both conforming to ISO 8601 (YYYY-MM-DD), we want to report the start and end dates concatenated with “/” between them if both are present in the dwc:eventDate field, and the enddate with a “/” prefix if there is no startdate. Is there a way to do this without an intermediate processing step?

  2. Similar issue – dwc:associatedMedia should have both the media associated with the CO and CE (since CE and CO are both “associated with” an occurrence record. Really, any table linked to CO with an attachment might be considered associatedMedia to GBIF). Is there a way to make both aggregated lists of media links (CO attachments Aggregated and CE Attachments Aggregated) to be concatenated into one single field in the DwCA file. The order doesn’t matter, I just want both aggregated strings to be concatenated with a “ | ” in between them. I currently have them formatted correctly so they display a list of links to the attachment files.

For the above two things: I’ve tried a couple of different tricks with the xml (like using conditional logic or concatenation rules that normally work on labels made in jaspersoft) and haven’t been able to make them work in the DwCA file without it failing with a traceback:null error.

  1. Our taxon table uses the WoRMS AphiaID to match taxa to WoRMS. In the DWC export, in order to conform to the dwc:taxonID and dwc:acceptedNameUsageID terms, the AphiaID should be exported with its LSID prefix (urn:lsid:marinespecies.org:taxname:), but I don’t record that prefix in the field because the WoRMS API / URL stems often use the aphiaID by itself. Is there a way I can format the AphiaID with a static text prefix that doesn’t use the taxon table formatter? Since I can only put one formatted record per table in a query, I can’t use it for both fields, and I am actually already using it for another dwc field.

  2. the dwc:Modified term is broader than the function of Specify CO table’s timestampModified field. dwc:dateModified is meant to capture the last time that any of the data in a single row was modified. If I modify a collecting event, the CO table’s timestampModified field doesn’t update for all related records, even though fundamentally the data I have changed effects them. Is there a way to have Specify take the most recent timestampModified of the CE, CO, Locality, Preparations, Determinations etc tables and use that as timestampModified? As it stands, I could edit the value of almost every single field that appears in the DwCA row, and the CO timestampmodified would not be changed.

Some general DwCA export questions:

  1. GBIF validator claims “The description of the dataset is missing or too short”:“DESCRIPTION_MISSING_OR_TOO_SHORT” and “The EML document does not validate against the schema”:“URI is not absolute”. I can’t find descriptions of these issues online. I setup the EML following the instructions on the forum here (swapping it out for the relevant info on our collection). I feel like this is probably an easy fix, but haven’t been able to figure it out yet. EML here: Test EML.xml (1.3 KB)

  2. Is there a way to have make an export that contains all of the collections using the same query? I assume no, but do recall reading that some settings allow queries across collections. I don’t normally want queries to access all collections, but in this specific context it would be convenient because it would mean 1 occurrence dataset to keep track of on GBIF vs 3.

Thanks!

#1. You could try using a formatter for the collecting event table if you aren’t already using one for something else.
#2. Yes but only if you use the Audubon Core extension to Darwin Core. Then you can create two separate mappings for each table and they will be concatenated into a single file upon export. I am doing this for my fish collection and it works great. Send me an email and I can send you my xml file so you can see how it is done.
#3. I was under the impression that you could create multiple formatters for a table and then specify in the query which formatter to use. I have never done it so am not sure how but @Grant may be able to help.
#4. Could use use the audit log table to get at the modified date field?
#5. Not sure about this one but will give it some more thought. You may want to reach out to the IPT gurus to see if they can help.
#6. Not sure you would want to publish all collections as a single resource as there would be information specific to the collection as well as individual collection codes (I would assume) that you would want to preserve for each collection.

Hope that helps

Andy

1 Like

Here is my mapping file for my tissue collection:
DwCA_tissue.xml (18.9 KB)

1 Like

It has multiple extension files for DwC.

#1. unfortunately I am already using the CE table formatter in order to display CE numbers in the proper format in exports.
#2. Thanks for the Audobon core extension recommendation and xml example. This solves this particular problem!
#3. This is the same problem as #1, since I use one formatted field already, I can’t have it occur more than once. It is true that you can define multiple formatters and select which one you want to use in query exports, but you can only do it one time per table per query, as far as I can tell.

#4. Could use use the audit log table to get at the modified date field?

I don’t think there’s any single audit log field that would work in the way that the dwc:modified term implies. But I could be wrong! I guess I could just leave the term out of the export.

#6. Not sure you would want to publish all collections as a single resource as there would be information specific to the collection as well as individual collection codes (I would assume) that you would want to preserve for each collection.

Since the query includes collectioncode and I map occurrenceID to a formatted field that includes the collectioncode as part of a GUID, I don’t see a reason why I shouldn’t export them as one DwCA. I have our bivalve, gastropod, and crustacean data stored in separate Specify collections principally because they use separate catalog numbering schemes (i.e. OSUM 1234 refers to a bivalve lot, a gastropod lot, and a crayfish lot that are entirely unrelated to one another). Many other institutions, USNM for example, have duplicate catNos for different phyla, but export combined occurrence datasets to GBIF. I don’t feel too strongly about this, it’s not a huge deal if this is not possible.

#6 all depends on whether you want it to appear in GBIF as a single resource and the metadata applies to all. If not, you would want to publish them separately.

Same metadata applies to all, yes, because all of our IZ material is managed (by yours truly) by one single “division” of the museum, and I’m the main point of contact and only full time staff member in the Division. We even use a shared collecting event table for all 3 collections.