WorkBench Uploader

upload_data WorkBench Uploader

The WorkBench is an integrate part of the Specify platform.

Data sets in the Specify WorkBench can be uploaded into Specify 6 with the Uploader tool. This makes the uploader a fast and easy way to transfer data from an MS Excel or CSV file into Specify by first importing the data into the WorkBench.

When a data set is uploaded a record set in Specify will be created using the main table. The record set will be named using the name of the data set as well as the date of the upload.

Guest users are not allowed access to the Upload tool in the default Specify permissions.

Before uploading the data set, please note the following:

  • Check the data set mapping and verify that data set fields are being mapped to the correct fields in Specify 6. When importing MS Excel and CSV files the auto-mapping does not always find the correct fields (especially when using latitude 1, Genus, Species and Author fields) and may require some re-mapping.
  • Check existing data in Specify to verify that data being uploaded will not add a duplicate record because of a typo or abbreviation. One example is Geography. If you have run the Specify Setup Wizard and uploaded the Geography data the United States exists within the Geography table (and tree) as ‘United States’. If your data set includes the United States as anything other than ‘United States’ (for example, ‘US’ or ‘USA’) the Upload function will create a new Geography record (and node on the Geography tree).
  • Columns in imported files that do not map to a specific Specify field should be mapped to an existing Text or Number field. These fields will then need to be renamed accordingly within Specify to display correctly on forms and become searchable in queries. This can be accomplished by changing the field caption in the Schema Configuration tool. These fields can also be renamed for use in future mappings using the WorkBench Schema Configuration tool
  • The data set must have data in fields that are required by Specify 6. For example, if a data set column is mapped to a Taxon field in Specify, all required Taxon fields, including enforced ranks, must be mapped and include data in the data set.
  • At this time the Uploader does not update or change data in existing records, it only adds new records.
  • Any Specify tabs that allow data to be written to the database must be closed when uploading from the WorkBench. In other words, while uploading no user in the Collection can edit Forms, Trees, Tree Definitions, System Configuration or the Schema Configuration. Specify will automatically close any open tabs after prompting the users.
  • In a multi-collection database the WorkBench is unable to verify the data sets for an appropriate collection. It is up to the user to ensure that the data set is being uploaded to the correct Collection.
    Images can be uploaded to the following tables:* Accession, Agent, Borrow, Collecting, Event, Collection, Object, Conservation, Description, Conservation, Event, DNA, Sequence, DNA, Sequencing, Run, Field, Notebook, Field, Notebook, Page, Field, Notebook, Page, Set, Gift, Loan, Locality, Permit, Preparation, Reference, Work, Repository, Taxon

If none of these tables are present in the mapping file for the data set an error message will display informing the user that the image can not be uploaded. Any images not attached to a table in the WorkBench (at the time they are added) will be attached to the ‘main’ table that is present in the data set mapping. The uploader will look for main tables in this order: Collection Object, Accession, Taxon, Collecting Event and then Locality. When more than one of the attachment tables is present in the data set a dialog will display to choose which table to associate with the attachment file.

Note: Images may only be attached to the Agent table when using fields from Agent, or including AgentAddress in the mapping file.

Note: Images may only be attached to Taxon when using fields from the Only Taxon table.

Note: Attachments are not available for tables with multiple field types. For instance, when PrepType1 and Count1 fields are included within the Preparation table images may be attached to the table, but if PrepType2 and Count2 are also added within the Preparation table image attachments are no longer available.

The WorkBench has a limit of 4000 records that can be uploaded at one time. MS Excel files that include more than 4000 records can be separated using the Specify File Slicer application that ships with Specify.

Validate Data

The validation tool compares the data in the data set to the required fields and field formats of the Specify fields to which they are mapped. It then indicates which cells violate requirements, don’t match formats, or will create new data in taxon, geography and agent tables.

When a dataset is opened the validation panel is not automatically displayed. Simply click the validate (validation) icon to show or hide the panel.

validation screen

Validation Panel

Check the Highlight Invalid Cells to turn the validation off and on. This type of validation compares the values in each cell of the WorkBench with the values in Specify fields and highlights cells that contain invalid values in red. The number of invalid cells will be shown in the validation panel.

A tooltip is available within each cell that offers information about the invalid value. Simply hold the mouse over the invalid cell for the tooltip to appear. An example of an invalid value may be a date in an incorrect format or one that does not match the predetermined terms in a Specify Pick List. A more complete list of invalid values is offered in the Data Set Requirement section of this document.

Note: The validation tool can only check one level of hierarchical fields per validation. Column headings that are mapped to trees will be validated from the top of the tree structure. For example, in the Geography tree the validation tool will check the Continent field, then Country, then State, then County and finally City (your geography tree may have a different structure). If your data set contains new data in both country and state cells the validation tool will only alert you to the new data in the Country cells.

Note: The validate function will not clean up or detect bad (misspelled or incorrect) data. For example, it will upload any entry for ‘genus’ even if the data includes both genus and species information, is not a valid name or is misspelt. These should however be highlighted as new data in yellow (see below). The user is responsible for establishing that the data is clean and correct.

And / Or

Check the Hightlight New Records to turn the validation on and off. This type of validation compares the data in each cell of the WorkBench with the data in Specify fields and highlights cells that contain new data in orange. These cells should be checked for spelling and other errors as this may be the only reason these are being highlighted. The number of new data cells will be shown in the validation panel.

An blue highlight indicates that multiple matches have been found in the database e.g. if there is a Fred Smith and Jack Smith in the database and you have Smith mapped to an Agent last name field with no first name the validation tool will not know whether Smith should be matched to Fred or Jack, or be a new Agent.

Note: The WorkBench does not compare the cells within the WorkBench. Therefore, there may be times when multiple instances of the same new data occur in the dataset, but the validation tool will count and display each cell as containing new data. For instance; if the county Douglas is not included in the Geography tree within Specify but is included in 36 data set records, the validation tool will display the information: 36 New Cells, even though Douglas is only one new term.

Note: Validating new cells in large data sets can take some time. If it is necessary to cancel the Highlight New Cells function during a validation simply double click on the status bar.

The buttons updown (up and down) allow you to move between the Invalid or New cells only. You can navigate through each invalid cell using the up and down arrows.

Auto-Fill Auto-Incrementing, Numeric Catalog Numbers

If your database includes numberic catalog numbers that are auto-incrementing, the uploader can auto-fill catalog numbers when uploading data. Simply leave the catalog numbers empty in the WorkBench Data Set and check the Auto-fill Cat # checkbox, located next to the Validate Data tools.

validation screen

Note: if Catalog Number is a required field, and the Auto-fill Cat # box is not checked, the Data Set will not upload and any empty catalog number fields will display as invalid (red) when validating the Data Set.

Data Set Requirement Examples:

Catalog Number

  • Collecting Event and Taxon (not the Taxon Only table) data must be associated with a Catalog Number.

  • The Catalog Number format in the data set must match the field format (including the length) for Catalog Number in Specify.

  • Catalog Numbers that include a numeric sequence that is incremented in Specify can be left blank in the data set and will be created automatically when Uploading into Specify. The new numbers will begin their numbering sequence based on the last number in the database.

  • Catalog Numbers that are not formatted as an incremented number must contain data to upload.

Accession

  • The format of the accession number in the data set must match the field format for Accession Number in Specify.

Locality

  • The Latitude and Longitude must be in one of the accepted formats:

DD.DDDD (32.7619) DD MMMM (32. 45.714) DD MM SS.SS (32 45 42.84) DD.DDDD N/S/E/W (32.7619 N) DD MM.MM N/S/E/W (32 45.714 N) DD MM SS.SS N/S/E/W (32 45 42.84 N)

Formats can be changed using the Lat/Long Converter.

  • Double quotes must be used for seconds (not single quotes).

Taxon, Geography, Lithostratigraphy, Chronostratigraphy and Storage

  • All required ranks must be present. If the ranks below the lowest required rank are present in the data set and contain data, then the lowest required rank must also contain data. If no data is present for ranks below the lowest required rank, then it does not require data to upload.

  • Tree levels (ranks) being uploaded must be present in the Tree Definition in Specify.

  • When uploading only Taxon data (to add to the Taxon tree) use the mapping fields from the Taxon Only data types (in the data set mapping).

  • When uploading Taxon data along with other data the Genus, Species, Subspecies and Variety fields need to come from the Determinations data type (in the data set mapping).

  • If the highest rank in the data set does not match with a rank in the taxon tree new ranks will be created with the name of the data set being uploaded.

Collection Objects, Accessions, Loan

  • Any formats used in the data set must match the format specified for the mapped field in Specify.
  • Any preset values for Pick Lists in Specify must match the values (and format) in the data set or the data will not upload.

Dates

  • Dates can be in any format in the data set and will appear in Specify in the system preference format.
  • Partial Dates are only allowed for Collecting Event Start, Collecting Event End, Cataloged Date, Determination Date and Preparation Date in both the uploader and in the Specify database. Partial dates are uploaded in the US format (month, day, year) and allow zeros or dashes for unknown month or day. Any dashes or zeros for month or day are stored as ‘01’ in the Specify database, which is used for search results.

00 — 1968 (= 1968 in the form but 01 01 1968 in the database) 00-----1968 (= 1968 in the form but 01/01/1968 in the database) 00 00 1999 (= 1999 in the form but 01/01/1999 in the database) 11.00.1989 (= 11/1989 in the form but 11/01/1989 in the database) 00 11 1989 (this is invalid) 11 12 1989 (= 11/12/1989 on the form and in the database) 00/—/2000 (= 2000 in the form but 01/01/2000 in the database) 00 00 04 (= 2004 in the form but 01/01/2004 in the database) 00-00-04 (= 2004 in the form but 01/01/2004 in the database) 04 00 04 (= 04/2004 in the form but 04/01/2004 in the database) 04-00-04 (= 04/2004 in the form but 04/01/2004 in the database) 1989/00/00 (= 1989 in the form but 01/01/1989 in the database) 1999/01/00 (= 1999 in the form but 01/01/1999 in the database) 1999 00 00 (= 1999 in the form but 01/01/1999 in the database) 1999 02 00 (= 02/1999 in the form but 02/01/1989 in the database)

  • Valid separators are ‘.’ (period), ‘/’ (forward slash), ’ ’ (space), and ‘-’ (dash).

Preparations

If the data set does not include a Prep Type field one will not be created automatically.

Pick Lists

Columns in the WorkBench may be mapped to fields that are formatted as pick lists in Specify. In version 6.3 and older values from the data set can be uploaded into an existing pick list. The format of the pick list will determine the requirement for the data being uploaded.

  • Read Only pick lists do not allow new values to be added through the Uploader – they must be added manually in the Pick List Editor. Values in the data set that do not match the values in the existing pick list can not be uploaded. Cells that do not match the predefined pick list values will highlight in red when validated. A tool tip will list the first 13 available pick list values for the field in Specify. To upload the record you must first add the pick list using the Pick List Editor or edit the cell to include one of the available value options.
  • An Entire Table pick list allows new terms to be added to the existing pick list. New values can be uploaded from the data set. When validated the cells containing new values will highlight in yellow.
  • A Field From Table pick list allows new terms to be added to the existing pick list. New values can be uploaded from the data set. When validated the cells containing new values will highlight in yellow.
  • A non-Read Only pick list allows the user to enter new values into the pick list from within the form. New values can be uploaded from the data set. When validated the cells containing new values will NOT highlight in yellow.

Cells that are mapped to fields formatted as pick lists in Specify are also available as pick lists. Simply double click inside the cell and it will change to a pick list. Then click wb_picklist (an arrow) to open the pick list and choose a value.

A tool tip will alert you to the first 13 pick list values for the field in Specify. Simply hold your mouse over the highlighted cell to view the tooltip.

Also, double click on a cell to display a list of valid pick list values. You can then select one of the values from the list.

Upload a Data Set

  1. Open a data set in the Specify 6 WorkBench.
  2. Click the upload_data_20x20(Upload) button found in the Workspace Item Bar. The WorkSpace will split to include the WorkBench Uploader. The WorkBench Uploader includes 3 areas: Affected Tables lists the tables included in the data set file to be uploaded. Messages is used by the Uploader to present progress and other information to the user. Buttons:
    • Validate the data before allowing the upload.
  3. Upload starts the upload.
  4. Settings configures the upload for matching records.
  5. Close will close the Uploader.
  6. If you have not used the Validation tool then click the Validate button to validate the data at this time. This allows the data in the data set to be compared to the required fields and field formats in the Specify fields to which they are mapped. It also ensures that the necessary related fields are present to link multiple tables. When validation errors are present the Uploader will split the Messages window showing validation errors in the top portion and general messages below as well as a View Report button. Click the button to view and print a report of the validation errors. The Upload button will remain inactive until all validation errors have been corrected. Some errors may need to be fixed within the data set and require the user to exit the Uploader tool by clicking the Close button.
  7. Click the Settings button to configure the upload for processing matching records between the data set and the Specify database. It is important to note that when only one record is found in the data set that matches a record in the Specify database, the Uploader does not create a new record, but will use the matched record for any related information. For example, when uploading a Collection Object with Collecting Event data the Collecting Event may already exist in Specify. The Uploader matches the Collecting Event data in the data set to the existing record in Specify, and does not add a new Collecting Event in Specify, but does create a link between the Collection Object being uploaded and the existing Collecting Event in Specify.
  8. Tables lists the tables that are included in the upload. Match Action configures the Uploader response when more than one matching record is found.
    • Prompt will display the matching records in a dialog and require the user to choose which of the matching records to use. A choice to add a new record is also provided.
  9. Add New Record will simply add a new record and not search for matches. This may cause duplicate records in the database.
  10. Pick First will choose the first match found and not prompt the user. A message indicating these will appear in the message portion of the Uploader.
  11. Skip Row will skip the matching data set record.

Remember Choices will treat the first record with matches exactly as a ‘Prompt’ above, but will remember the choice and automatically use it again when identical values are found for succeeding records. Match Empty Cells will search empty fields as part of the search criteria and require a match when set to True and disregard empty fields when set to False.

  1. Click Upload. While the upload is processing, each Table in the Affected Tables window keeps a tally of the number of records updated. The individual records that are updated are listed in the Messages window. The Upload can be paused at any time by clicking the Pause button on the status bar, or stopped by clicking the Pause button and then the Close button. Click the Upload button to start the upload process again. Any uploaded records can be discarded by clicking the Undo button on the Workspace.
  2. Once the upload is complete the data is still not committed to the database. Both Query and Simple Searches can be used to examine the uploaded data before committing it. Records opened in a form from the results window are only available to view and can not be edited. Records in the results window can not be saved as Record Sets. The following buttons will be enabled on the Uploader: View displays the uploaded records in a custom results page, indicating each table that is affected. These results can be printed, viewed in a form, or exported as an MS Excel file. Undo removes all uploaded data from the database. Close will open a dialog to
  3. Click the Save button to commit the uploaded records to the database.

Note: If a tree rank is included in an upload that has not been included in the Tree Definition, Specify will use the name of the file to label the undefined node.

Note: When latitude and longitude values are entered into the WorkBench, the original value is captured into a hidden field to assist with format conversions. When data is uploaded into Specify, data from these fields is copied into the Specify (Source) field and will display on the Locality form.

Suggestion:@SpecifyMembership This list needs commas to separate the table names

@SpecifyMembership: Links on this page do not work. The href information seems to have been lost upon import of the posting.

@mcruz Thank you for your feedback! The links have now been corrected & commas have been added between the table names.

I have always felt that the terminology is backwards in Workbench…

The first step, what is called “Import” and on the button text, is really an upload of the dataset.
And the final step, called “Upload” is in reality the import of the dataset into the relational tables.