Find All Columns/Fields with Data in a Specify database

:book: This guide describes how to find all fields containing data in a Specify database using a SQL script run behind-the-scenes by an IT administrator.

[!danger] Read Me First
These steps should only be performed by an IT administrator. Before running any scripts on a database, always make sure to create a complete backup of the database.

To find all of the columns in Specify that contain data, it is necessary to run a script behind-the-scenes. The following procedure can be used to find all columns that contain data as well as return the user-assigned field captions for a given column.

What does this procedure do?

  1. Switch into your target database and choose a custom statement delimiter.
  2. Drop any existing version of the procedure (to avoid “already exists” errors).
  3. Define the new procedure signature, accepting:
    in_schema – the database/schema name
    in_tables – a comma-separated list of tables to scan
  4. Declare all local variables, handlers, and a single cursor up front. The cursor iterates every (table_name, column_name) pair for your chosen tables, but only those base tables with at least one row.
  5. Create a temporary table (tmp_results) to hold the output rows: (table_name, column_name, field_label).
  6. Open the cursor and loop through each table + column:
    a. Fetch one (v_table, v_column) at a time.
    b. If no more rows, exit the loop.
  7. For each column, dynamically count how many non-NULL values it contains.
  8. If the count is > 0:
    a. Look up the user-assigned label in splocaleitemstr.Text (matching table name and column name).
    b. If no label is found, fall back to the raw column name.
    c. Insert (table_name, column_name, field_label) into the temporary results.
  9. Close the cursor once all columns have been checked.
  10. Select and return every row from the temporary results table, ordered by table and column.
  11. Drop the temporary table and end the procedure.
USE db_name;

DELIMITER $$

DROP PROCEDURE IF EXISTS get_nonempty_fields$$
CREATE PROCEDURE get_nonempty_fields(
  IN in_schema VARCHAR(64),
  IN in_tables TEXT    -- comma‐list e.g. 'Accession,AccessionAgent,Address,...'
)
BEGIN
  -- 1) all DECLAREs first
  DECLARE done       INT DEFAULT 0;
  DECLARE v_table    VARCHAR(64);
  DECLARE v_column   VARCHAR(64);
  DECLARE v_count    INT;
  DECLARE v_label    VARCHAR(255);
  DECLARE v_sql      TEXT;

  -- one cursor over every (table,column)
  DECLARE col_cur CURSOR FOR
    SELECT t.table_name, c.column_name
      FROM information_schema.tables AS t
      JOIN information_schema.columns AS c
        ON c.table_schema = t.table_schema
       AND c.table_name   = t.table_name
     WHERE t.table_schema = in_schema
       AND FIND_IN_SET(t.table_name, in_tables)
       AND t.table_type   = 'BASE TABLE'
       AND t.table_rows  >  0
     ORDER BY t.table_name, c.ordinal_position;

  DECLARE CONTINUE HANDLER FOR NOT FOUND
    SET done = 1;

  -- 2) temp results
  CREATE TEMPORARY TABLE IF NOT EXISTS tmp_results (
    table_name  VARCHAR(64),
    column_name VARCHAR(64),
    field_label VARCHAR(255),
    PRIMARY KEY(table_name, column_name)
  ) ENGINE=Memory;
  TRUNCATE tmp_results;

  -- 3) loop
  OPEN col_cur;
  column_loop: LOOP
    FETCH col_cur INTO v_table, v_column;
    IF done = 1 THEN
      LEAVE column_loop;
    END IF;

    -- count non-NULLs, storing into @tmp_cnt
    SET v_sql = CONCAT(
      'SELECT COUNT(*) INTO @tmp_cnt ',
      'FROM `', in_schema,'`.`',v_table,'` ',
      'WHERE `',v_column,'` IS NOT NULL'
    );
    PREPARE stmt_count FROM v_sql;
    EXECUTE stmt_count;
    DEALLOCATE PREPARE stmt_count;
    SET v_count = @tmp_cnt;

    IF v_count > 0 THEN
      -- lookup user‐label or fallback
		SELECT COALESCE(
		  (
		    SELECT GROUP_CONCAT(
		             CONCAT(sps.`Text`, ' (', d.Name, ')')
		             ORDER BY d.UserGroupScopeId
		             SEPARATOR ' | '
		           )
		      FROM splocaleitemstr      AS sps
		      JOIN splocalecontaineritem AS slci
		        ON sps.SpLocaleContainerItemNameID = slci.SpLocaleContainerItemID
		      JOIN splocalecontainer    AS slc
		        ON slci.SpLocaleContainerID      = slc.SpLocaleContainerID
		      JOIN discipline            AS d
		        ON slc.DisciplineID              = d.UserGroupScopeId
		     WHERE slci.Name  = v_column
		       AND slc.Name   = v_table
		  ),
		  v_column
		)
		INTO v_label;

      INSERT IGNORE INTO tmp_results
        (table_name, column_name, field_label)
      VALUES
        (v_table, v_column, v_label);
    END IF;
  END LOOP column_loop;
  CLOSE col_cur;

  -- 4) final output
  SELECT
    table_name,
    column_name,
    field_label
  FROM tmp_results
  ORDER BY table_name, column_name;

  DROP TEMPORARY TABLE tmp_results;
END$$

DELIMITER ;

You can call this procedure by running the following:

CALL get_nonempty_fields(
  'db_name',
  'Accession,AccessionAgent,AccessionAttachment,AccessionAuthorization,AccessionCitations,Address,AddressOfRecord,Agent,AgentAttachment,AgentGeography,AgentIdentifier,AgentSpecialty,AgentVariant,Appraisal,Attachment,AttachmentImageAttribute,AttachmentMetadata,AttachmentTag,AttributeDefinition,Author,AutoNumberingScheme,Borrow,BorrowAgent,BorrowAttachment,BorrowMaterial,BorrowReturnMaterial,CollectingEvent,CollectingEventAttachment,CollectingEventAttribute,CollectingEventAttribute,CollectingEventAuthorization,CollectingTrip,CollectingTripAttachment,CollectingTripAttribute,CollectingTripAuthorization,CollectionObject,CollectionObjectAttachment,CollectionObjectAttribute,CollectionObjectAttribute,CollectionObjectCitation,CollectionObjectProperty,CollectionRelationshipType,CollectionRelationship,Collector,CommonName,CommonNameTxCitation,ConservatorDescription,ConservatorDescriptionAttachment,ConservatorEvent,ConservatorEventAttachment,Container,DNAPrimer,DNASequence,DNASequenceAttachment,DNASequencingRun,DNASequencingRunAttachment,DNASequencingRunCitation,DataType,Deaccession,DeaccessionAgent,DeaccessionAttachment,Determination,DeterminationCitation,Determiner,Discipline,Disposal,DisposalAgent,DisposalAttachment,DisposalPreparation,Division,ExchangeIn,ExchangeInAttachment,ExchangeInPrep,ExchangeOut,ExchangeOutAttachment,ExchangeOutPrep,Exsiccata,ExsiccataItem,Extractor,FieldNotebook,FieldNotebookAttachment,FieldNotebookPage,FieldNotebookPageAttachment,FieldNotebookPageSet,FieldNotebookPageSetAttachment,FundingAgent,GeoCoordDetail,Geography,GeographyTreeDefinition,GeographyTreeDefinitionItem,Chronostratigraphy,GeologicalTimePeriodTreeDefinition,GeologicalTimePeriodTreeDefinitionItem,Gift,GiftAgent,GiftAttachment,GiftPreparation,GroupPerson,InformationRequest,Institution,InstitutionNetwork,Journal,LatLonPolygon,LatLonPolygonPnt,Lithostratigraphy,LithostratigraphyTreeDefinition,LithostratigraphyTreeDefinitionItem,Loan,LoanAgent,LoanAttachment,LoanPreparation,LoanReturnPreparation,Locality,LocalityAttachment,LocalityCitation,LocalityDetail,LocalityNameAlias,MaterialSample,MorphBankView,OtherIdentifier,PaleoContext,PcrPersons,Permit,PermitAttachment,PickList,PickListItem,PreparationType,Preparation,PreparationAttachment,PreparationAttribute,PreparationAttribute,PreparationProperty,Project,RecordSet,RecordSetItem,ReferenceWork,ReferenceWorkAttachment,RepositoryAgreement,RepositoryAgreementAttachment,Shipment,Storage,StorageAttachment,StorageTreeDefinition,StorageTreeDefinitionItem,Taxon,TaxonAttachment,TaxonAttribute,TaxonCitation,TaxonTreeDefinition,TaxonTreeDefinitionItem,TreatmentEvent,TreatmentEventAttachment'
);

The full list of tables searched is here, but this should be modified to add new tables in the future. This list is not comprehensive to prevent unnecessary system tables from being included:

Accession
AccessionAgent
AccessionAttachment
AccessionAuthorization
AccessionCitations
Address
AddressOfRecord
Agent
AgentAttachment
AgentGeography
AgentIdentifier
AgentSpecialty
AgentVariant
Appraisal
Attachment
AttachmentImageAttribute
AttachmentMetadata
AttachmentTag
AttributeDefinition
Author
AutoNumberingScheme
Borrow
BorrowAgent
BorrowAttachment
BorrowMaterial
BorrowReturnMaterial
Collection
CollectingEvent
CollectingEventAttachment
CollectingEventAttribute
CollectingEventAttribute
CollectingEventAuthorization
CollectingTrip
CollectingTripAttachment
CollectingTripAttribute
CollectingTripAuthorization
CollectionObject
CollectionObjectAttachment
CollectionObjectAttribute
CollectionObjectAttribute
CollectionObjectCitation
CollectionObjectProperty
CollectionRelationshipType
CollectionRelationship
Collector
CommonName
CommonNameTxCitation
ConservatorDescription
ConservatorDescriptionAttachment
ConservatorEvent
ConservatorEventAttachment
Container
DNAPrimer
DNASequence
DNASequenceAttachment
DNASequencingRun
DNASequencingRunAttachment
DNASequencingRunCitation
DataType
Deaccession
DeaccessionAgent
DeaccessionAttachment
Determination
DeterminationCitation
Determiner
Discipline
Disposal
DisposalAgent
DisposalAttachment
DisposalPreparation
Division
ExchangeIn
ExchangeInAttachment
ExchangeInPrep
ExchangeOut
ExchangeOutAttachment
ExchangeOutPrep
Exsiccata
ExsiccataItem
Extractor
FieldNotebook
FieldNotebookAttachment
FieldNotebookPage
FieldNotebookPageAttachment
FieldNotebookPageSet
FieldNotebookPageSetAttachment
FundingAgent
GeoCoordDetail
Geography
GeographyTreeDefinition
GeographyTreeDefinitionItem
Chronostratigraphy
GeologicalTimePeriodTreeDefinition
GeologicalTimePeriodTreeDefinitionItem
Gift
GiftAgent
GiftAttachment
GiftPreparation
GroupPerson
InformationRequest
Institution
InstitutionNetwork
Journal
LatLonPolygon
LatLonPolygonPnt
Lithostratigraphy
LithostratigraphyTreeDefinition
LithostratigraphyTreeDefinitionItem
Loan
LoanAgent
LoanAttachment
LoanPreparation
LoanReturnPreparation
Locality
LocalityAttachment
LocalityCitation
LocalityDetail
LocalityNameAlias
MaterialSample
MorphBankView
OtherIdentifier
PaleoContext
PcrPersons
Permit
PermitAttachment
PickList
PickListItem
PreparationType
Preparation
PreparationAttachment
PreparationAttribute
PreparationAttribute
PreparationProperty
Project
RecordSet
RecordSetItem
ReferenceWork
ReferenceWorkAttachment
RepositoryAgreement
RepositoryAgreementAttachment
Shipment
Storage
StorageAttachment
StorageTreeDefinition
StorageTreeDefinitionItem
Taxon
TaxonAttachment
TaxonAttribute
TaxonCitation
TaxonTreeDefinition
TaxonTreeDefinitionItem
TreatmentEvent
TreatmentEventAttachment

This produces the following for all tables and fields that contain data:

  • table_name: This is the name of the table
  • column_name: This is the name of the column at the database level
  • field_label: This is the caption assigned in the Schema Config for that field, with the discipline name after if it exists.

Note that in some cases multiple schema records exist for a single field.

table_name column_name field_label
collectionobjectattribute Text1 Sex (Herpetology) | Sex (Ichthyology)
collectionobjectattribute Text14 Collector\n Number Remarks (Herpetology) | Collector\n Number Remarks (Ichthyology)
collectionobjectattribute Text16 Collector\n Number (Herpetology) | Collector\n Number (Ichthyology)
collectionobjectattribute Text17 Sex\n Remarks (Herpetology) | Sex\n Remarks (Ichthyology)
collectionobjectattribute Text18 Observation Behaviour (Herpetology) | Behaviour (Ichthyology)
collectionobjectattribute Text19 Number of Individuals Remarks (Herpetology) | Tissue Preparation Remarks (Ichthyology)
collectionobjectattribute Text2 Age/Stage\n Remarks (Herpetology) | Age/Stage\n Remarks (Ichthyology)
collectionobjectattribute Text22 Additional Measurements (Herpetology) | Additional Measurements (Ichthyology)
collectionobjectattribute Text26 Other Catalogue\n Number (Herpetology) | Other Catalogue\n Number (Ichthyology)
collectionobjectattribute Text27 Parasite (Herpetology) | Parasite (Ichthyology)
collectionobjectattribute Text28 Other Catalogue\n Number Remarks (Herpetology) | Other Catalogue\n Number Remarks (Ichthyology)
collectionobjectattribute Text29 text29 (Herpetology) | Host Remarks (Ichthyology)

Once you have this information, you can build a query in Specify from the relevant base tables (e.g. Collection Object, Taxon, Storage, etc.) which can be used as the basis of a full data export from the system.

Calling the procedure again in the future will update the results, ensuring that any data added to previously empty columns will be returned as well.

1 Like