Pipelines from attachment server, how to best convert between GUID's and catalogNumbers

We are in the process of setting up an attachment server for Specify, and I am beginning to think about the best way to automate things once staff save an image through Specify to the attachment server. The server writes the image to the disk with the format <GUID>.<extension>, however other image systems expect the filename to be <catalogNumber>.<extension>. Additionally, the GUID used for the filename, doesn’t appear to be the GUID used for the Collection Object (this makes sense, as other tables can have attachments). The question is how to get from image filename back to catalogNumber. If anyone has any suggestions or examples of how they have done pipelines based on attachment servers, I would love to hear them, below is my preliminary thinking.

Option 1: CO → Attachment GUIDs

Make a single API call to api/specify/collectionobject with limit set to 0, per collection, filtering for only those in which attachments is not equal to []. Parse the attachments value in the response to assemble a key value structure that maps the attachment GUID to catalogNumber. This should handle Collection Objects with multiple attachments just fine, as the value could be a list (this would work the other way around, with GUIDs as keys and catalogNumber as values. Computational time differences to consider and test).

{"B000001": ["f7949d01-c2b2-4546-bfdd-b5f5ce0df124",
"3b0c406d-8783-4345-9ebf-6646505ce7d9", 
"e0aa6155-a9c7-4b19-a006-0cdb2c0d98b2"]}

Option 2: Attachment table, use title

(route would be api/specify/attachment)
This seems more convenient at first, especially because it has nice timestamp information that could be utilized to avoid unnecessary syncing. however, it relies on the user to input the title in the correct format each time. Manual entry means there will be errors. Also would have to parse out attachments of other tables.

Option 3: Keep it simple silly, query to csv

Run a query for each collection on the CO table, viewing catalogNumber and attachment guid where attachment guid is not empty. Export that to csv and then use that as the lookup table for a script. Some manual work (would likely want to update that CSV about once a month), but would save time from API calls.

2 Likes