This guide describes how to find all fields containing data in a Specify database using a SQL script run behind-the-scenes by an IT administrator.
[!danger] Read Me First
These steps should only be performed by an IT administrator. Before running any scripts on a database, always make sure to create a complete backup of the database.
To find all of the columns in Specify that contain data, it is necessary to run a script behind-the-scenes. The following procedure can be used to find all columns that contain data as well as return the user-assigned field captions for a given column.
What does this procedure do?
- Switch into your target database and choose a custom statement delimiter.
- Drop any existing version of the procedure (to avoid “already exists” errors).
- Define the new procedure signature, accepting:
•in_schema– the database/schema name
•in_tables– a comma-separated list of tables to scan - Declare all local variables, handlers, and a single cursor up front. The cursor iterates every
(table_name, column_name)pair for your chosen tables, but only those base tables with at least one row. - Create a temporary table (
tmp_results) to hold the output rows:(table_name, column_name, field_label). - Open the cursor and loop through each table + column:
a. Fetch one(v_table, v_column)at a time.
b. If no more rows, exit the loop. - For each column, dynamically count how many non-NULL values it contains.
- If the count is > 0:
a. Look up the user-assigned label insplocaleitemstr.Text(matching table name and column name).
b. If no label is found, fall back to the raw column name.
c. Insert(table_name, column_name, field_label)into the temporary results. - Close the cursor once all columns have been checked.
- Select and return every row from the temporary results table, ordered by table and column.
- Drop the temporary table and end the procedure.
USE db_name;
DELIMITER $$
DROP PROCEDURE IF EXISTS get_nonempty_fields$$
CREATE PROCEDURE get_nonempty_fields(
IN in_schema VARCHAR(64),
IN in_tables TEXT -- comma‐list e.g. 'Accession,AccessionAgent,Address,...'
)
BEGIN
-- 1) all DECLAREs first
DECLARE done INT DEFAULT 0;
DECLARE v_table VARCHAR(64);
DECLARE v_column VARCHAR(64);
DECLARE v_count INT;
DECLARE v_label VARCHAR(255);
DECLARE v_sql TEXT;
-- one cursor over every (table,column)
DECLARE col_cur CURSOR FOR
SELECT t.table_name, c.column_name
FROM information_schema.tables AS t
JOIN information_schema.columns AS c
ON c.table_schema = t.table_schema
AND c.table_name = t.table_name
WHERE t.table_schema = in_schema
AND FIND_IN_SET(t.table_name, in_tables)
AND t.table_type = 'BASE TABLE'
AND t.table_rows > 0
ORDER BY t.table_name, c.ordinal_position;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET done = 1;
-- 2) temp results
CREATE TEMPORARY TABLE IF NOT EXISTS tmp_results (
table_name VARCHAR(64),
column_name VARCHAR(64),
field_label VARCHAR(255),
PRIMARY KEY(table_name, column_name)
) ENGINE=Memory;
TRUNCATE tmp_results;
-- 3) loop
OPEN col_cur;
column_loop: LOOP
FETCH col_cur INTO v_table, v_column;
IF done = 1 THEN
LEAVE column_loop;
END IF;
-- count non-NULLs, storing into @tmp_cnt
SET v_sql = CONCAT(
'SELECT COUNT(*) INTO @tmp_cnt ',
'FROM `', in_schema,'`.`',v_table,'` ',
'WHERE `',v_column,'` IS NOT NULL'
);
PREPARE stmt_count FROM v_sql;
EXECUTE stmt_count;
DEALLOCATE PREPARE stmt_count;
SET v_count = @tmp_cnt;
IF v_count > 0 THEN
-- lookup user‐label or fallback
SELECT COALESCE(
(
SELECT GROUP_CONCAT(
CONCAT(sps.`Text`, ' (', d.Name, ')')
ORDER BY d.UserGroupScopeId
SEPARATOR ' | '
)
FROM splocaleitemstr AS sps
JOIN splocalecontaineritem AS slci
ON sps.SpLocaleContainerItemNameID = slci.SpLocaleContainerItemID
JOIN splocalecontainer AS slc
ON slci.SpLocaleContainerID = slc.SpLocaleContainerID
JOIN discipline AS d
ON slc.DisciplineID = d.UserGroupScopeId
WHERE slci.Name = v_column
AND slc.Name = v_table
),
v_column
)
INTO v_label;
INSERT IGNORE INTO tmp_results
(table_name, column_name, field_label)
VALUES
(v_table, v_column, v_label);
END IF;
END LOOP column_loop;
CLOSE col_cur;
-- 4) final output
SELECT
table_name,
column_name,
field_label
FROM tmp_results
ORDER BY table_name, column_name;
DROP TEMPORARY TABLE tmp_results;
END$$
DELIMITER ;
You can call this procedure by running the following:
CALL get_nonempty_fields(
'db_name',
'Accession,AccessionAgent,AccessionAttachment,AccessionAuthorization,AccessionCitations,Address,AddressOfRecord,Agent,AgentAttachment,AgentGeography,AgentIdentifier,AgentSpecialty,AgentVariant,Appraisal,Attachment,AttachmentImageAttribute,AttachmentMetadata,AttachmentTag,AttributeDefinition,Author,AutoNumberingScheme,Borrow,BorrowAgent,BorrowAttachment,BorrowMaterial,BorrowReturnMaterial,CollectingEvent,CollectingEventAttachment,CollectingEventAttribute,CollectingEventAttribute,CollectingEventAuthorization,CollectingTrip,CollectingTripAttachment,CollectingTripAttribute,CollectingTripAuthorization,CollectionObject,CollectionObjectAttachment,CollectionObjectAttribute,CollectionObjectAttribute,CollectionObjectCitation,CollectionObjectProperty,CollectionRelationshipType,CollectionRelationship,Collector,CommonName,CommonNameTxCitation,ConservatorDescription,ConservatorDescriptionAttachment,ConservatorEvent,ConservatorEventAttachment,Container,DNAPrimer,DNASequence,DNASequenceAttachment,DNASequencingRun,DNASequencingRunAttachment,DNASequencingRunCitation,DataType,Deaccession,DeaccessionAgent,DeaccessionAttachment,Determination,DeterminationCitation,Determiner,Discipline,Disposal,DisposalAgent,DisposalAttachment,DisposalPreparation,Division,ExchangeIn,ExchangeInAttachment,ExchangeInPrep,ExchangeOut,ExchangeOutAttachment,ExchangeOutPrep,Exsiccata,ExsiccataItem,Extractor,FieldNotebook,FieldNotebookAttachment,FieldNotebookPage,FieldNotebookPageAttachment,FieldNotebookPageSet,FieldNotebookPageSetAttachment,FundingAgent,GeoCoordDetail,Geography,GeographyTreeDefinition,GeographyTreeDefinitionItem,Chronostratigraphy,GeologicalTimePeriodTreeDefinition,GeologicalTimePeriodTreeDefinitionItem,Gift,GiftAgent,GiftAttachment,GiftPreparation,GroupPerson,InformationRequest,Institution,InstitutionNetwork,Journal,LatLonPolygon,LatLonPolygonPnt,Lithostratigraphy,LithostratigraphyTreeDefinition,LithostratigraphyTreeDefinitionItem,Loan,LoanAgent,LoanAttachment,LoanPreparation,LoanReturnPreparation,Locality,LocalityAttachment,LocalityCitation,LocalityDetail,LocalityNameAlias,MaterialSample,MorphBankView,OtherIdentifier,PaleoContext,PcrPersons,Permit,PermitAttachment,PickList,PickListItem,PreparationType,Preparation,PreparationAttachment,PreparationAttribute,PreparationAttribute,PreparationProperty,Project,RecordSet,RecordSetItem,ReferenceWork,ReferenceWorkAttachment,RepositoryAgreement,RepositoryAgreementAttachment,Shipment,Storage,StorageAttachment,StorageTreeDefinition,StorageTreeDefinitionItem,Taxon,TaxonAttachment,TaxonAttribute,TaxonCitation,TaxonTreeDefinition,TaxonTreeDefinitionItem,TreatmentEvent,TreatmentEventAttachment'
);
The full list of tables searched is here, but this should be modified to add new tables in the future. This list is not comprehensive to prevent unnecessary system tables from being included:
Accession
AccessionAgent
AccessionAttachment
AccessionAuthorization
AccessionCitations
Address
AddressOfRecord
Agent
AgentAttachment
AgentGeography
AgentIdentifier
AgentSpecialty
AgentVariant
Appraisal
Attachment
AttachmentImageAttribute
AttachmentMetadata
AttachmentTag
AttributeDefinition
Author
AutoNumberingScheme
Borrow
BorrowAgent
BorrowAttachment
BorrowMaterial
BorrowReturnMaterial
Collection
CollectingEvent
CollectingEventAttachment
CollectingEventAttribute
CollectingEventAttribute
CollectingEventAuthorization
CollectingTrip
CollectingTripAttachment
CollectingTripAttribute
CollectingTripAuthorization
CollectionObject
CollectionObjectAttachment
CollectionObjectAttribute
CollectionObjectAttribute
CollectionObjectCitation
CollectionObjectProperty
CollectionRelationshipType
CollectionRelationship
Collector
CommonName
CommonNameTxCitation
ConservatorDescription
ConservatorDescriptionAttachment
ConservatorEvent
ConservatorEventAttachment
Container
DNAPrimer
DNASequence
DNASequenceAttachment
DNASequencingRun
DNASequencingRunAttachment
DNASequencingRunCitation
DataType
Deaccession
DeaccessionAgent
DeaccessionAttachment
Determination
DeterminationCitation
Determiner
Discipline
Disposal
DisposalAgent
DisposalAttachment
DisposalPreparation
Division
ExchangeIn
ExchangeInAttachment
ExchangeInPrep
ExchangeOut
ExchangeOutAttachment
ExchangeOutPrep
Exsiccata
ExsiccataItem
Extractor
FieldNotebook
FieldNotebookAttachment
FieldNotebookPage
FieldNotebookPageAttachment
FieldNotebookPageSet
FieldNotebookPageSetAttachment
FundingAgent
GeoCoordDetail
Geography
GeographyTreeDefinition
GeographyTreeDefinitionItem
Chronostratigraphy
GeologicalTimePeriodTreeDefinition
GeologicalTimePeriodTreeDefinitionItem
Gift
GiftAgent
GiftAttachment
GiftPreparation
GroupPerson
InformationRequest
Institution
InstitutionNetwork
Journal
LatLonPolygon
LatLonPolygonPnt
Lithostratigraphy
LithostratigraphyTreeDefinition
LithostratigraphyTreeDefinitionItem
Loan
LoanAgent
LoanAttachment
LoanPreparation
LoanReturnPreparation
Locality
LocalityAttachment
LocalityCitation
LocalityDetail
LocalityNameAlias
MaterialSample
MorphBankView
OtherIdentifier
PaleoContext
PcrPersons
Permit
PermitAttachment
PickList
PickListItem
PreparationType
Preparation
PreparationAttachment
PreparationAttribute
PreparationAttribute
PreparationProperty
Project
RecordSet
RecordSetItem
ReferenceWork
ReferenceWorkAttachment
RepositoryAgreement
RepositoryAgreementAttachment
Shipment
Storage
StorageAttachment
StorageTreeDefinition
StorageTreeDefinitionItem
Taxon
TaxonAttachment
TaxonAttribute
TaxonCitation
TaxonTreeDefinition
TaxonTreeDefinitionItem
TreatmentEvent
TreatmentEventAttachment
This produces the following for all tables and fields that contain data:
table_name: This is the name of the tablecolumn_name: This is the name of the column at the database levelfield_label: This is the caption assigned in the Schema Config for that field, with the discipline name after if it exists.
Note that in some cases multiple schema records exist for a single field.
| table_name | column_name | field_label |
|---|---|---|
| collectionobjectattribute | Text1 | Sex (Herpetology) | Sex (Ichthyology) |
| collectionobjectattribute | Text14 | Collector\n Number Remarks (Herpetology) | Collector\n Number Remarks (Ichthyology) |
| collectionobjectattribute | Text16 | Collector\n Number (Herpetology) | Collector\n Number (Ichthyology) |
| collectionobjectattribute | Text17 | Sex\n Remarks (Herpetology) | Sex\n Remarks (Ichthyology) |
| collectionobjectattribute | Text18 | Observation Behaviour (Herpetology) | Behaviour (Ichthyology) |
| collectionobjectattribute | Text19 | Number of Individuals Remarks (Herpetology) | Tissue Preparation Remarks (Ichthyology) |
| collectionobjectattribute | Text2 | Age/Stage\n Remarks (Herpetology) | Age/Stage\n Remarks (Ichthyology) |
| collectionobjectattribute | Text22 | Additional Measurements (Herpetology) | Additional Measurements (Ichthyology) |
| collectionobjectattribute | Text26 | Other Catalogue\n Number (Herpetology) | Other Catalogue\n Number (Ichthyology) |
| collectionobjectattribute | Text27 | Parasite (Herpetology) | Parasite (Ichthyology) |
| collectionobjectattribute | Text28 | Other Catalogue\n Number Remarks (Herpetology) | Other Catalogue\n Number Remarks (Ichthyology) |
| collectionobjectattribute | Text29 | text29 (Herpetology) | Host Remarks (Ichthyology) |
Once you have this information, you can build a query in Specify from the relevant base tables (e.g. Collection Object, Taxon, Storage, etc.) which can be used as the basis of a full data export from the system.
Calling the procedure again in the future will update the results, ensuring that any data added to previously empty columns will be returned as well.