Linking Attachments via SQL

:book: This guide is for those who need to perform a large-scale import of attachments and have access (or a colleague with access) to the server backend.

See also: Uploading Attachments via API

Recap: How Specify Links an Attachment to a File

Before you begin, make sure to review this documentation on how Specify 7 works with the Web Asset Server:

When you upload an attachment in the Specify 7 interface, it doesn’t store the file in the database. Instead, it performs two key actions:

  1. Stores the File: The file is copied to a designated folder on the Asset Server, and its filename is often changed to a unique string to prevent conflicts.
  2. Creates Database Records: Metadata about the file is saved in the database. This includes records in the attachment table (containing the new unique filename, original filename, MIME type, etc.) and a “joining table” record (like collectionobjectattachment) that links the attachment to a specific data record (like a Collection Object).

The link between the database record and the physical file is the attachmentLocation field in the attachment table. Specify uses this value to construct a URL and request the file from the Asset Server.


The Asset Server File System

For more context, see:

If you navigate to the directory on your server where assets are stored, you will typically find two subdirectories: originals and thumbnails.

  • originals: This folder stores the full-resolution, unmodified files that were uploaded. You can find a specific file here by searching for the filename listed in the attachmentLocation field in your database.
  • thumbnails: This folder caches the smaller, preview-sized images. You do not need to create these manually. Thumbnails are generated on-demand the first time a user requests a specific size. If a thumbnail of that size already exists, the server sends the cached version; otherwise, it creates it, saves it to this directory, and then sends it.

Mass Importing Attachments via SQL

This section directly addresses your scenario: performing a large-scale import and linking of attachments using CSV files and direct server/database access. For a large number of records , this backend method is vastly more efficient than using the WorkBench or any other frontend tool.

[!warning] Backup Your Database First
This procedure involves direct modification of your database. A mistake could lead to data corruption. Before you begin, please ensure you have a complete and recent backup of your Specify database.

Prerequisites

  1. Server Access: You (or your IT colleague) will need shell or SFTP access to the Asset Server to upload files.

  2. Database Access: You’ll need a SQL client (like DBeaver) with credentials to write to your Specify database.

  3. Your Data Map (CSV): You will create a single CSV file that maps each unique filename to the specific CollectionObjectID of the record it should be attached to. Using CollectionObjectID is crucial because it’s a unique identifier, unlike CatalogNumber which can be duplicated across different collections.

  4. Your AgentID: You need your AgentID to attribute the new records. You can find it with this SQL query:

    SELECT
      a.AgentID,
      CONCAT(a.FirstName, ' ', a.LastName) AS AgentName,
      d.Name                         AS DivisionName,
      GROUP_CONCAT(DISTINCT c.CollectionName
                   ORDER BY c.CollectionName
                   SEPARATOR ', ')   AS CollectionsUnderDivision
    FROM agent AS a
      JOIN division AS d
        ON a.DivisionID = d.UserGroupScopeId
      JOIN discipline AS ds
        ON ds.DivisionID = d.UserGroupScopeId
      JOIN collection AS c
        ON c.DisciplineID = ds.UserGroupScopeId
    WHERE a.LastName = 'Bentley' -- Change this to your last name or add another criteria
    GROUP BY
      a.AgentID,
      d.Name;
    

    The result:

    AgentID AgentName DivisionName CollectionsUnderDivision
    1514 Andrew Bentley Ichthyology KU Fish Observation Collection, KU Fish Teaching Collection, KU Fish Tissue Collection, KU Fish Voucher Collection

    The Agent ID for Andrew Bentley in the KU Fish Voucher Collection is 1514. Note that this is specific to the division.


Step 1: Prepare Your Linking Data

Before you can create your CSV map, you need to get the correct CollectionObjectID for each CatalogNumber you want to target. This step is vital to prevent linking attachments to the wrong records, especially if catalog numbers are not globally unique.

You can run the following query to export a list of CollectionObjectIDs and CatalogNumbers for your specific collection. You can get this list from within Specify using the Query Builder as well if desired by adding the ID field for both the collection and collectionobject tables.

-- Export CollectionObjectIDs and CatalogNumbers for a specific collection
SELECT
    co.CollectionObjectID,
    co.CatalogNumber
FROM
    collectionobject co
JOIN
    collection c ON co.CollectionID = c.CollectionID
WHERE
    c.CollectionName = 'KU Fish Voucher Collection'; -- <<-- REPLACE with your exact Collection Name

Export the results of this query as a CSV. You can now use this file (for example, in Excel or Google Sheets using a VLOOKUP formula) to match your list of filenames to the correct, unique CollectionObjectID.

Your final, prepared CSV file (e.g. attachment_import_map.csv) should look like this:

filename,collectionobjectid
KUI-1138a.jpg,83451
KUI-1138b.jpg,83451
KUI-1139.jpg,83452
KUI-1140_notes.pdf,83453
KUI-1141.jpg,84001
KUI-1142a.jpg,84002
KUI-1142b.jpg,84002

[!tip]
If you’d like to add custom titles for your files or other metadata, you can add a column here and adjust the instructions below to insert that as well.


Step 2: Upload Attachments & Import CSV

  1. Upload Files: Have your IT colleague place all your image and document files into the attachments/originals directory on the asset server. The filenames must exactly match those in your attachment_import_map.csv file.

  2. Import CSV to Database: Using your SQL client’s import tool, upload your attachment_import_map.csv into a new, temporary table in your database with the same name.

    This can be done using something like DBeaver, for example:


Step 3: Create attachment Records in the Database

Now, run this SQL query to create the base records in the main attachment table for every file you uploaded. You need to first find the following:

[!help] Table ID

You must find the Table ID for the table you wish to add attachments to. The Table ID for any given table can be found using the Specify Data Model Viewer from within any Specify 7 instance:

Simply find the Table ID for the table you wish to add attachments to (e.g. collectionobject has a Table ID of 1).

[!help] Scope Type

For the ScopeType field, you must to set the appropriate scoping for the attachment. This needs to be based on the scope described in the Specify Institutional Scopings article, under the heading Full Table Scoping.

COLLECTION = 0
DISCIPLINE = 1
DIVISION = 2
INSTITUTION = 3

If I were adding attachments for Collection Object, I would set the ScopeType = 0.
If I were adding attachments to Reference Work, I would set the ScopeType = 3.

[!help] Collection ID

You need to grab the CollectionID for the appropriate collection and add it to the script below:

SELECT CollectionID, CollectionName FROM collection;

In my example, the Collection ID for KU Fish Voucher Collection is 4.

-- This query reads from your temporary table and creates the base attachment records.
INSERT INTO attachment (
    TimestampCreated,
    TimestampModified,
    Version,
    CreatedByAgentID,
    MimeType,       -- IMPORTANT: Adjust if you have mixed MIME types (e.g., 'application/pdf')
    OrigFilename,
    AttachmentLocation,
    TableID,
    GUID,
    ScopeID,
    ScopeType,
    Title,
    IsPublic
)
SELECT
    NOW(),
    NOW(),
    1,
    1514, -- <<-- REPLACE with your AgentID
    'image/jpeg',
    t.filename,
    t.filename,
    1, -- <<-- REPLACE with the Table ID
    UUID(),
    4, -- <<-- REPLACE with your Collection ID
    0, -- <<-- REPLACE with your Scope Type
    t.filename,
    1 -- <<-- Replace with 0 if these should not be public
    
FROM
    `attachment_import_map.csv` t;

[!help] Default MIME Types
Ensure each attachment in your system uses the correct MIME type.
See MDN’s guide to Common media types for a full reference.

Common MIME Types

File Type MIME Type
JPEG image image/jpeg
GIF image image/gif
PNG image image/png
TIFF image image/tiff
WebP image image/webp
BMP image image/bmp
Plain text text/plain
PDF document application/pdf
ZIP archive application/zip
Microsoft Word application/vnd.openxmlformats-officedocument.wordprocessingml.document
Microsoft Excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
WAV audio audio/wav
MP4 video video/mp4
WebM video video/webm

Step 4: Link Attachments to Collection Objects

This final query creates the link between the new attachment records and the correct collectionobject records using the unique CollectionObjectID from your map.

-- This query links the new attachments to their corresponding collection objects.
INSERT INTO collectionobjectattachment (
    TimestampCreated,
    TimestampModified,
    Version,
    CollectionMemberID,
    CreatedByAgentID,
    ModifiedByAgentID,
    AttachmentID,
    CollectionObjectID,
    Ordinal
)
SELECT
    NOW(),
    NOW(),
    1,
    co.CollectionID,    -- Correctly and dynamically gets the CollectionID from the object.
    1514,               -- <<-- REPLACE with your AgentID
    1514,               -- <<-- REPLACE with your AgentID
    a.AttachmentID,
    t.collectionobjectid,
    0                   -- Sets display order. Can be customized if needed.
FROM
    attachment a
JOIN
    `attachment_import_map.csv` t ON a.AttachmentLocation = t.filename
JOIN
    collectionobject co ON t.collectionobjectid = co.CollectionObjectID
WHERE
    -- This safety check ensures you only link attachments created very recently. Modify if there has been more time between steps.
    a.TimestampCreated > (NOW() - INTERVAL 1 HOUR);

After running these queries, your attachments are fully imported and linked. You can verify this by navigating to one of the records in the Specify 7 interface. Thumbnails will be generated automatically when you view them for the first time.

2 Likes

To be clear, this is simply the ID of the target table for the attachment, correct? I’m a tiny bit confused because this guide uses CO as the example, but many tables have more complex relationships (i.e., my goal is to create FieldNoteBookPageAttachments, which need to be linked to FieldNoteBookPage, then FieldNoteBookPage needs to be linked one-to-many to the CO table.)

So, for the first query, to create records (FieldNotebookPageAttachments) for jpegs connected to FieldNotebookPage, the Table ID is 85, the ScopeType is 1 for Discipline, and CollectionID is 4 for Bivalves, and my agentID is 1, I would use:

INSERT INTO attachment (
    TimestampCreated,
    TimestampModified,
    Version,
    CreatedByAgentID,
    MimeType,
    OrigFilename,
    AttachmentLocation,
    TableID,
    GUID,
    ScopeID,
    ScopeType,
    Title,
    IsPublic
)
SELECT
    NOW(),
    NOW(),
    1,
    1,
    'image/jpeg',
    t.filename,
    t.filename,
    85,
    UUID(),
    4,
    1,
    t.filename,
    1
    
FROM
    `attachment_import_map.csv` t;

Following the instructions in Step 4, and replacing CollectionObject with FieldNotebookPage, won’t work cleanly, because the FieldNotebookPage records I want don’t exist yet, I need to create them first, and then link them to CO. I think we can probably figure it out, but some guidance on that front for tables where there is a “linking table” like CollectionObjectCitations or FieldNoteBookPage would be appreciated. Can those tables even have records if not linked to a CO?

Hi @nfshoobs,

Think of CollectionObjectID as more of a TargetTableID. The principle is the same: you always want to link your attachment to the unique, auto-incrementing ID of the record, not a human-readable field like CatalogNumber or PageNumber. So for your script, you’ll be using the FieldNotebookPageID. Consider the table to which you are adding attachments as your “target table,” regardless of its relationships with other tables.

You can create FieldNotebokPage (FNP) records without them being attached to a Collection Object (CO). You should first create these via WorkBench or the data entry forms before proceeding with this guide.

You can link an existing CollectionObject with an existing FieldNotebookPage via Batch Edit or a subsequent SQL script since there is a many-to-one relationship between Collection Object and Field Notebook Page (one FNP per CO). The bridge between the two is stored as a foreign key (FieldNotebookPageID) on the collectionobject record.

In some cases, where you have a dependent one-to-many like CollectionObjectCitations, it cannot exist without a parent CO. You can see this by looking at the data model viewer and seeing if a particular many-to-one relationship is required:

I recommend creating those records using the API, data entry forms, or WorkBench instead of SQL.


The query you already wrote is spot on. You correctly identified the TableID for FieldNotebookPage (85), ScopeType (1 for Discipline), and ScopeID (4 for your ‘Bivalves’ collection).


You’ll just need to link the attachments to the FNP pages you created:

INSERT INTO fieldnotebookpageattachment (
    TimestampCreated,
    Version,
    CreatedByAgentID,
    AttachmentID,
    FieldNotebookPageID,
    Ordinal
)
SELECT
    NOW(),
    1,
    1, -- <<-- REPLACE with your AgentID
    a.AttachmentID,
    fnp.FieldNotebookPageID,
    0
FROM
    attachment a
JOIN
    `attachment_import_map.csv` t ON a.AttachmentLocation = t.filename
JOIN
    fieldnotebookpage fnp ON t.pagenumber = fnp.PageNumber
WHERE
    a.TimestampCreated > (NOW() - INTERVAL 1 HOUR);

You could use an UPDATE statement to tell each CollectionObject which FieldNotebookPage belongs to it, but Batch Edit is safer!

1 Like

Great! Sorry, it’s hard to keep straight what is and isn’t possible in the workbench and batch edit since there have been so many expansions in capabilities recently (and concomitantly, so many bugs that prevent certain specific actions :wink:).

I was able to create those FieldNotebookPage records via workbench yesterday, and we ran the SQL statements successfully today, all 69731 records now have a link to one of 2792 FNP records, and all FNP records have an attachment linked.

  • One issue I’ve noticed is that the new attachments do not appear to be present in the “Attachments” tab in the sidebar.

  • Also, the Attachments page seems to have lost the option to sort by date added, weirdly? It used to sort by date added by default, until a few minutes ago when that option disappeared.

  • Slightly off topic, but is it safe to bulk delete the deprecated existing CollectionObjectAttachments by SQL query via the attachmentID and remove the files from the server by deleting based on AttachmentLocation? Or is it better to do that via the API or something?

EDIT: we ended up doing the deletion with this statement:

delete FROM invertebrates.attachment
where AttachmentLocation in (select filename from invertebrates._temp_attachments_to_remove tatr)

and then manually deleting the corresponding deprecated files from the filesystem. It seems to have worked without issues. Still no sign of the FieldNotebookPageAttachments on the “Attachments” tab in the sidebar, though they appear in the individual records where they should.