Images still not being served from Asset Server

Hi folks, as referenced in this post: Images not being served from Asset Server - #2 by Grant

I am still getting confusing errors when uploading and trying to access or upload images. I’ve checked my logs, restarted docker containers, etc., but I still can’t figure out why I can only intermittently upload images. Other users from my institution are having this issue as well. Would it be possible to re-open this issue?

Hi @phb,

Thanks for reaching out!

While investigating this, a new doc that explains how Specify 7 works with the Web Asset Server was a helpful resource!

Your Specify 7 instance is hosted on the Specify Cloud platform. I took a look at the requests being made to the server, and found that many of the attachments do not exist on the server.

For example, it looks like the image B020450_egg.tif (stored in the asset server as 4d7ecceb-d040-40d4-bb9f-3f3e12dea957.tif) cannot be reached, resulting in that ‘404 Not Found’ error you may see when viewing images:

Request:

https://assets-ca.specifycloud.org/fileget?coll=beaty&type=O&filename=4d7ecceb-d040-40d4-bb9f-3f3e12dea957.tif&downloadname=B020450_egg.tif

Result:

404 Not Found

Back in March, @markp collaborated with one of our developers to set up a self-managed asset server on your end. Your server has a whitelist of IP addresses, allowing only users within your network and authorized requests from our asset server to make successful requests.

At present, Specify does not appear to point to that asset server. Making a request from our end to your self-hosted asset server works exactly as expected, which is a good sign!

Request: (from a machine whitelisted on the asset server)

https://images.beatymuseum.ubc.ca/fileget?coll=beaty&type=O&filename=4d7ecceb-d040-40d4-bb9f-3f3e12dea957.tif&downloadname=B020450_egg.tif

Result:

58M TIF file! 🎉

Proposed Solution:
We need to update the Specify Cloud configuration for your instance to use your asset server (images.beatymuseum.ubc.ca) instead of ours (assets-ca.specifycloud.org). This update will only take a few minutes on our end once you give us the go-ahead!

So far, no new assets have been uploaded to your directory on our server since February of this year. This means that all attachments are only available on your asset server. Once we update the configuration, everything should be visible again and users should no longer have issues uploading or accessing assets.

Yes, I approve pointing the Specify Cloud configuration to our asset server. Confusingly, I have been able to upload images occasionally and even download them, but then the files have gone missing. Now I can’t even upload images, which is further confusing me.

The questions for me are:

  • Why did this change and how do I prevent this from happening again? Did someone or something overwrite a config file?
  • Where did our images go that we supposedly uploaded? There’s a non-zero chance that we’ve lost expensive and important images with the erroneous trust that they were being uploaded.
  • If there haven’t been any uploads to the Cloud directory since February, where have our images been going?

Hi @phb,

I spoke to the developer and he confirmed the assets you have been uploading to a different directory on our asset server and are safe and present! He found that 1,105 assets (JPGs, TIFFs, PNGs, PDFs, etc.) have been uploaded since March 1st of this year. We can work with you to sync over the attachments added to our server so they are present on yours, then we can switch to using yours instead.

Our developer has updated your Specify Cloud configuration to direct it to your custom asset server. However, this change will not take effect until you give us the approval to restart your Specify instance as it will interrupt any ongoing processes/users. Once this switch is made, the 1,105 assets that are visible in Specify now will become broken links until the assets are synced with your server. They will still be present, just inaccessible, and this can be quickly resolved by copying the assets over to your server via rsync or other file transfer options.

Attachments were first added to the Specify Cloud asset server (assets-ca.specifycloud.org) directory after the custom asset server was deployed on May 2nd. No assets were added again until August 13th, 14th, and 19th. Most assets were added in the past week, with a significant addition on September 11th and one added yesterday, September 15th.

I have emailed you the list of all these assets privately along with instructions on how to view them manually. I believe this confirms that everything uploaded to Specify is either on your asset server (pre-May 2nd) or on ours (after May 2nd).

Why did this change and how do I prevent this from happening again? Did someone or something overwrite a config file?

It appears that the custom asset server configuration was removed earlier this year, changing the server Specify used for depositing and retrieving assets. This change took place in early May, and all assets added afterward were deposited on our server. This represented a regression from the custom configuration established on April 7th, according to our emails with Mark.

To prevent this from happening again, our team must ensure that custom configurations remain intact during deployment changes and updates. Since the asset server change occurred, we have implemented a policy for more thorough auditing of all Specify instances after updates. We are implementing a policy to audit every change. This will help us understand when changes are made and who makes them.

On your end, in the scenario all attachments become inaccessible and broken, please reach out to us directly over email (via our Help Desk at support@specifysoftware.org). If you need a quick response, feel free to send another message to let us know, and we will do our best to get back to you as soon as possible. The question raised late last month was put on hold because we believed Mark was collaborating with your team to resolve it.

Confusingly, I have been able to upload images occasionally and even download them, but then the files have gone missing. Now I can’t even upload images, which is further confusing me.

Where did our images go that we supposedly uploaded? There’s a non-zero chance that we’ve lost expensive and important images with the erroneous trust that they were being uploaded.

From the application perspective, I believe the missing files are those that were uploaded before May 2nd (present only on the images.beatymuseum.ubc.ca server). The files that are visible and downloadable are from after May 2nd (present only on the assets-ca.specifycloud.org server).

  • I expect that all currently inaccessible assets should be accessible by adjusting the URL to point to the appropriate server.

    For example, missing files can be accessed by changing the base of the URL[1] (https://assets-ca.specifycloud.org/fileget?coll=beaty&type=O&filename=) to the one for your server (https://images.beatymuseum.ubc.ca/fileget?coll=beaty&type=O&filename=)

    https://assets-ca.specifycloud.org/fileget?coll=beaty&type=O&filename=4d7ecceb-d040-40d4-bb9f-3f3e12dea957.tif&downloadname=B020450_egg.tif

    to

    https://images.beatymuseum.ubc.ca/fileget?coll=beaty&type=O&filename=4d7ecceb-d040-40d4-bb9f-3f3e12dea957.tif&downloadname=B020450_egg.tif.


The next step for us is to sync the assets from our server (assets-ca.specifycloud.org) to yours (images.beatymuseum.ubc.ca), which our team is happy to help with. Once they are all there, everything should be visible again in Specify from the same server!



  1. This doc describes how the asset server builds URLs ↩︎

Hi folks, I’ve confirmed with our collections that they can hold off on uploading attachments for a day or so. You’ve got the go-ahead to restart the server. Once they’ve confirmed the new config is working, yes, let’s coordinate to start the transfer of the attachments.

1 Like

Hi @phb,

The instance restart is complete, and now Specify is accessing your asset server instead of ours! Please let me know if the images are displaying correctly on your end and confirm whether you can upload attachments again. They should now be deposited correctly on your server.

Great, confirmed that the attachments are being uploaded and displayed on the front end. I’ll check the back end logs too. What’s the proposition in terms of transferring the other files?

Server logs indicate that the files I tested with are indeed uploaded to our onsite server.

Hi @phb,

That’s great news! There are a few straightforward ways we can sync all of your assets (both originals and thumbnails) from our asset server to your self-hosted one. You can choose whichever option feels most efficient for you, and we can provide you with SSH access to a server that has your assets available.

  1. Compressed Archive + One-Off Transfer
    If remote access complicates things, we can bundle up each directory into a tar.gz, upload the archives to a temporary download location (or even email you links), and you can pull them down in one shot.

  2. SSH/SFTP
    We can spin up a locked-down SSH account on our asset server, chrooted into the beaty asset tree. Once you have your SSH key or password, you’d simply do:

    sftp -i ~/.ssh/specify_cloud_key assetuser@assets-ca.specifycloud.org
    # then in the sftp prompt:
    cd beaty/originals
    lcd /where/you/want/originals
    mget *
    cd ../thumbnails
    lcd /where/you/want/thumbnails
    mget *
    

    This method is simple and you can verify each file as it lands in your local directory.

  3. Rsync over SSH
    If you’re more familiar, you can also use rsync (though repeated syncs will not be necessary after the initial transfer):

    rsync -avz --progress \
      -e "ssh -i ~/.ssh/specify_cloud_key" \
      assetuser@assets-ca.specifycloud.org:~/beaty/originals/ \
      /mnt/images/beaty/originals/
    
    rsync -avz --progress \
      -e "ssh -i ~/.ssh/specify_cloud_key" \
      assetuser@assets-ca.specifycloud.org:~/beaty/thumbnails/ \
      /mnt/images/beaty/thumbnails/
    

    You’ll get a running tally of files transferred.

Let me know which approach works best for you (or if you’d like us to provision that SSH account now), and I’ll get you the details/credentials.

So, there’s no business logic that’s needed to ensure that these are correctly linked? It’s simply a drop into a folder?

Hi @phb,

Yes! Specify simply sends a request to the asset server using a constructed URL looking for a file by name (coming from the AttachmentLocation field in the attachment table).

If there is type=O in the URL, it looks for the original in the originals directory. If there is a type=T in the URL, it looks for the thumbnail in the thumbnails directory.

For example, the URL https://assets-test.specifycloud.org/fileget?coll=sp7demofish&type=O&filename=sp64504250189672699317.att.JPG&downloadname=29499.JPG just looks for p64504250189672699317.att.JPG in the originals directory configured for the instance.

Right, that makes sense. I did a quick curl yesterday of the images in the list that you had sent with type=O, I assume those would be identical to the ones that I would have downloaded? I’ll set type=T to get the thumbnails now. Can you provide a .zip or tarball so that I can cross reference?

Hi Paul,

Good to hear you already have those originals downloaded!

There is no need to grab the thumbnails as those are not strictly necessary as the asset server can generate them on demand, but if they are in place with the appropriate suffix denoting the resolution, the asset server will skip generating them.

I’ve prepared a ZIP file including both the originals and thumbnails directories from our asset server for your instance and shared it to your UBC email directly via Google Drive. If there is a more convenient delivery method for you, just let me know!

Once you have these files, you can simply add them to your directories, and the assets should display as expected in Specify. Hopefully this is the last piece of the puzzle!

OK, great, I’ve compare the zip to the original text file and my curl results and I see that there are two differences in the file contents. One I think might have just been a network error, but another is a text document that looks like it was uploaded on Sept 15. It looks like a mistake by one of my collection curators since it’s just Specify form XML, but I want to just double-check that this text document is not representing something that is a further problem.

Hi @phb,

That was a text document I uploaded to test the asset server and ensure it was responding as expected. The XML is simply the export of a Specify query. It was just a small file (2KB) on hand to confirm that the requests were successful. It can be discarded! :smile:

OK, thanks for letting me know. I’ve confirmed that my uploads are successfully being hosted, and I’m in the process of confirming with the other collections curators.

My next question is: How do we ensure that this doesn’t happen again? Our collections curators are understandably questioning whether we can trust Specify to handle our records, since they had been operating under the assumption that if it’s “in the database,” it has been backed up. We do our own internal backups of our assets, but currently we do that from Specify to ensure a single source of truth. We now have to make a plan assuming that Specify is not even trustable even when it has confirmed an upload. Any ideas?

And on the thumbnail side, I’m seeing that there exist assets that seem to download originals, but not thumbnails. Are you sure that the thumbnails get generated on demand?

Hi @phb,

Thank you for the detailed follow-up. That is a completely valid question, and I want to be very clear about what happened and how we’re ensuring it won’t happen again. You and your curators are right to be concerned, and I apologize for the frustration this has caused. We want to make sure you and your team have confidence in the system.

Why This Happened and How We’ll Prevent It

Your Specify Cloud instance had a custom configuration file that directed it to use your self-hosted asset server (images.beatymuseum.ubc.ca). During a routine software update on our platform in early May, this custom file was inadvertently overwritten by the default configuration, causing your instance to revert to using our standard asset server (assets-ca.specifycloud.org).

At no point was any data lost—the application continued to upload assets successfully—but it sent them to the wrong, default location. This created the confusing “split-brain” scenario where some assets were on your server and newer ones were on ours. This was our mistake, and we have taken immediate steps to correct our processes.

To prevent this from ever happening again, we are implementing a multi-layered approach:

  1. Automated Configuration Checks: We are updating our deployment scripts to specifically protect and verify custom configurations before and after any update. This directly prevents the type of overwrite that caused this problem.
  2. Post-Update Audits: In addition to the automated checks, our team has been conducting a manual audit of all instances with custom configurations after every deployment to provide a second layer of verification. This has been in effect since late July, but this started after the “split-brain” scenario began.
  3. Proactive Monitoring: We are implementing an alerting system that will monitor the connection to custom asset servers like yours. If it detects a connectivity failure, our team will be notified immediately, allowing us to notify you and investigate before it has an impact.

Data Integrity and Trust

Specify is a “single source of truth” where the database record is intrinsically linked to its corresponding digital asset. The record in the database holds the metadata and a pointer to the location of the asset file. The “upload confirmed” message you received was accurate: the record was created and the file was successfully transferred. The failure was in our server configuration, which pointed the transfer to the wrong destination.

The core database records (your metadata) were never at risk and are backed up nightly as part of our standard Specify Cloud hosting. The issue was confined to the storage location of the asset files. The immediate plan we went through today sent the files from our server back to yours so you can consolidate all assets and restore your server as the single, authoritative location for assets.

Thumbnails Not Loading

Regarding your second question, thumbnails are generated on demand by the asset server. When a user’s browser requests a thumbnail, it makes a call to the asset server, which generates and serves a smaller version of the original image. See our documentation “Thumbnails” for the technical details of it.

I imagine the discrepancy you are seeing is because you cannot modify an original URL to output a thumbnail just by changing the O to a T, as there is a set of construction steps taken by the app to build a URL:

Specify builds URLs to obtain assets from the Web Asset Server differently depending on the context (either full resolution or thumbnail). The URLs are constructed with the following parameters:

  • Base URL (baseUrl): comes from the attachment server settings (e.g., https://assets-test.specifycloud.org) for the Specify 7 service (configured in Docker).
  • Collection (coll): the collection key for the database (e.g., sp7demofish) (configured in Docker).
  • Filename (filename): the server-side stored filename from each Attachment record (from the attachmentLocation field in the attachment record directly)
  • Type (type): T for thumbnails/previews used in the gallery, originals use O. If T, you must add the scale component.
  • Scale (scale): The size of the thumbnail to generate. Only used for thumbnails.
  • Download Name (downloadname): Optional, you can specify the download name for the image.

Putting it together, a URL like this will make it possible to download the original attachment at full resolution:

${baseUrl}/fileget?coll=${coll}&type=O&filename=${filename}

and this will get you the thumbnail:

${baseUrl}/fileget?coll=${coll}&type=T&filename=${filename}&scale=123

We take our role as stewards of your data platform very seriously. Please let me know if this explanation helps.

Best,

Grant

Hi Grant, thanks for the detailed explanation of the problem. What does Specify do right now in terms of redundancy and data integrity checking?

Our current assumption now is that Specify is not reliable as a single-source-of-truth, and that we’ll need a structured plan at the museum to both verify and archive prior and run our own data integrity checks posterior to Specify uploads.

Hi @phb,

Specify is designed to be a “single source of truth,” where each database record is intrinsically linked to its corresponding digital asset. The database record contains metadata and a pointer to the asset file’s location. As I mentioned, the attachments were always uploaded successfully, and the pointer record was created in the database. The failure did not occur during data capture; rather, it was due to a server configuration error that directed the transfer to the wrong destination. Specify does not regularly check if all assets are accessible, as this verification happens in real-time when the asset is previewed or viewed in the app. If an asset is unavailable, the preview will not display, and the user will see a “404 Not Found” message along with a description of the issue (e.g., Missing original) when attempting to view the full resolution asset.

To reiterate, your core database records were never at risk. For all our cloud-hosted institutions, we perform nightly backups that include a multi-step verification process. Before the backup runs, our system automatically confirms that all database tables are present. After it completes, the file is checked again for integrity and completeness. Any failure in this process triggers an alert to our technical team. We ensure we always have a sound, complete, and reliable backup of your core records. By default, daily backups are stored for one week, monthly backups for one year, and yearly backups are kept indefinitely.

We ensure data integrity through a multi-layered approach that begins at the database level. The database schema enforces strict relational integrity, which makes it impossible to create orphaned records or link to non-existent data. Layered on top of that, there is business logic in the code to apply more complex validation and institution-specific rules during data entry. This includes uniqueness rules on critical fields, such as catalog numbers, which are enforced to prevent the creation of duplicate records within a collection. The audit log captures changes to records made in the software directly.

The issue we resolved was confined to the storage location of the asset files. Now everything should be consolidated on your server, restoring it as the single, authoritative location. To verify the data is present, you just need to verify that all pointers (attachmentLocation in attachment) correspond to an asset file on your server.

Given this, your plan to run data integrity checks is an implementation of best practices that we fully encourage and endorse. You can use the Specify APIs (or use your direct SQL access) to automate these checks and provide precisely the kind of verification and peace of mind that your curators need. If you develop a workflow that may be useful for other users or if you have any questions, please let us know!

Best,
Grant