Using SGR

Using SGR

In collaboration with the Lifemapper Project (www.lifemapper.org), the Specify Project serves a copy of the online specimen data cached by the Global Biodiversity Information Facility (GBIF) from the world’s online collections. Initially, Specify and Scatter, Gather, Reconcile (SGR) support botanical collections, with entomology and other disciplines to be supported in future releases. The Project indexes GBIF specimen data for fast searches and makes the searching facility available through a web service. This network infrastructure is invisible to the user and is used by the SGR plug-in for comparing specimen data records to those from other collections. Each time you compare one of your records or a set of them in a Workbench Data Set to GBIF’s holdings; records are returned from the GBIF database and sent to your machine. For this reason your machine must be connected to a network to run SGR.

For more information about SGR concepts please refer to the Introduction to SGR help page.

Requirements:

  1. To use SGR your machine must be connected to the internet.
  2. You must have a Data Set to compare your records against records in the SGR index. The Data Set does not have to be populated with values -- you can use SGR as you enter data, but it must include some of the fields SGR uses to find matching records in the SGR index. The more fields you can match against, the better your results will be.

SGR Search Criteria

SGR search criteria are the data types which SGR uses to logically organize the data in your WorkBench records for the purpose of matching on the GBIF database.

Many of the SGR search criteria use data from a combination of fields in the WorkBench. The following table lists the SGR search criteria and their corresponding fields and originating tables in the WorkBench:

SGR Search Criterion

WorkBench Table and Fields
used for searching and matching

Collectors

Table: Collecting Event

Fields: Collector First Name, Collector Middle Name, Collector Last Name

Collector /Field #

Table: Collection Object

Field: Field Number

Table: Collecting Event Field

Station Field Number

Location

Table: Geography

Fields: Country, State, County

Table: Locality

Fields: Locality Name, Named Place

Table: Collecting Event

Field: Verbatim Locality

Date

Table: Collecting Event

Fields: Verbatim Date, Start Date, End Date

Taxon Name

Table: Determination

Fields: Genus 1, Species 1, Subspecies 1

   

 

 

SGR Side Bar Tools

Matchers

Matchers list which criteria and search settings SGR uses to compare your data to the GBIF/SGR cache. Once created, the settings for Matchers may not be edited. It is a recipe for the search process. This allows Matchers to be kept as a property of Match Results, but new matchers can be created and defined at any time.

Create a Matcher

Matchers are created using the Create Matcher dialog. Simply click on the AddCreate Matcher button to open the dialog.

create_matcher

Create a Matcher

 

  • Name is used to name the Matcher. Use a name that relates to the kind of matching data being used.

  • Index refers to the type Index you wish to search.


  • Exclude is used to exclude your institution code and/or collection code. Use the codes that represent any data that you already have in GBIF to assure that you exclude your own data from the SGR matches. Normally, you wouldn't want to match new WorkBench data set records with what you may have already published to GBIF.

  • Match Criteria includes the search criteria.

Select the criteria you wish to use to compare your data to the GBIF data indexed by the SGR server by simply clicking on the pick list next to the field name. You can also choose the amount of emphasis you wish to place on each field:

Ignore Do not use this criterion for matching .
Low Place a lower emphasis on this criterion.
Normal Place an average amount of emphasis on this criterion.
High Place a greater emphasis on this criterion.
  • Remarks is a text field for writing extra information about the Matcher. For future reference - e.g. what the Matcher was defined to do.

 

Run a Matcher

Once a Matcher has been created simply click on it in the side bar and choose a Data Set from the resulting dialog.

choose_dataset

Choose a Data Set

Or, drag a Data Set and drop it onto a Matcher on the side bar.

SGR will then use the Solr algorithm to compare the records in your Data Set to those in the GBIF/SGR cache. The results will appear in a split screen in the Form View. Your records are located in the left panel and the GBIF/SGR cache records are displayed in the right panel.

Note: The Data Set and matching GBIF/SGR cache records are not saved when simply running a matcher. To save entire Data Sets along with their matching GBIF/SGR cache records use the Process Data Set button located on the side bar under Match Results.

Edit a Matcher

Right-click on a matcher and click on the resulting Edit button to edit a Matcher.

Once a matcher has been created only the Name and the Remarks may be edited. This allows users to reuse matchers and know that they are exactly what have been used in the past as well as review a matcher to see exactly what criteria were used on past searches.

Matchers may not be deleted unless there are no Match Results that refer to them.

The Name of the Matcher can be changed, and will be automatically updated in the properties of Match Results.

 

Match Results

Create a Match Result if you wish to save your Data Set together with its matching SGR/GBIF records.

SGR also reorders the Match Result Data Set with the most likely records to have accurate matches first. These records are also color coded based on the probably accuracy of the match with green respresenting a good match and red representing an unlikely match.

When a Match Result is open the original Data Set and Match Result will become inactive on the side bar.

SGR adds an SGR Score field to the Match Result file. This allows you to click on the field name and return the Match Result to the original, unsorted state.

Create a Match Result

Click on the addProcess Data Set button, then choose a Data Set in the resulting dialog.

choose dataset

Choose a Data Set

Next, you will be asked to choose a Matcher.

Or drag and drop a Data Set from the side bar onto the Process Data SetProcess Data Set button and choose a Matcher in the resulting dialog.

select_matcher

Select Matcher

Next, give the Match Result a name and Remarks.

While SGR is processing the Data Set, the Match Result name will appear in the side bar with a ear (system) icon will be displayed next to it. When it is complete an SGR icon will appear next to it in the side bar.

When the SGR process is complete the Data Set will open in Grid view. It is now reordered with the records that have the most probably matches at the top and the least likely at the bottom.

To view the Data Set records along with the resulting GBIF/SGR cache records, click on the form view Form View button. The results will appear in a split screen with your Data Set records editable in the left panel and GBIF/SGR cache records viewed the right panel.

 

Context Menu

Right-click on a Match Result for a context menu. The context menu is not available when Match Results are open in SGR. If the desired Match Result is inactive simply close it in SGR.

match result context menu

Match Result Context Menu

  • Edit properties

You can rename the batch result and/or add notes or comments about the batch result in a properties editor. The properties editor also displays the name of the matcher used to create the batch result.

match result properties

Edit Match Result Properties

  • Stop Processing

Large Data Sets may take a few minutes to process. You may choose to stop the process the SGR by clicking the Stop Processing button.

To view the progress, hold your mouse over the Match Result on the side bar. If the process was stopped two numbers will appear separated by a slash (status). A completed Match Result will show the number of items (finished_process).

You may also hold your mouse over a Match Result while it is processing to view how many records have been processed. To update, move your mouse off the Match Result, then place it back over the Match Result, allowing the status to refresh.

 

  • Resume Processing

Resume processing the rest of the Data Set by clicking the Resume Processing button.

 

  • Show Histogram

A historgram offers a graphical view of the distribution of SGR scores. It uses 30 colors to represent the SGR score and plots the number of rows that fall within a particular score (color) range. A histogram which displays a bimodal distribution is often an indication of good results. This type of histogram includes two peaks with a 'neck' in between them. The neck represents a threshold that separates data set records with good matches from those that have poorly matching data from GBIF.

  • Delete

Click the Delete button in the context menu or drag and drop a Match Result to the garbage can to delete it.

 

Using SGR Results

There are two basic data acquisition workflows which can be supported with SGR :

  1. Search GBIF/SGR cache records as you are entering data into an incomplete WorkBench data set. This allows you to find and use existing information rather than typing it all in yourself.

  2. Augment your existing Specify database records with information found in the GBIF/SGR records.

Both workflows use SGR results in the same way, but when working with a complete data set a Match Result can be created before viewing the SGR/GBIF record results. This orders the Data Set records with the most likely to include a correct GBIF/SGR match to the least likely to include a correct match.

Matching GBIF/SGR records are only visible in Form View. In Form View, the workspace window is separated into two panels. Single records in your Data Set are editable in the left panel and matching GBIF/SGR cache records are viewed in the right panel. The 10 best GBIF/SGR cache matches for each record in the data set are returned. Use the slider bar at the bottom of the panel to view all 10 results.

sgr screenshot

SGR Results in Form View

 

SGR/GBIF cache matches can be updated while in Form View by using the buttons on the Work Space Item Bar. When finding matches in an incomplete Data Set simply add values into fields, then click on the matcher Matcher button to update/refresh the search to include the new values. You may also change the matcher itself and run a new SGR search by clicking on the button and choosing a new Matcher from the drop-down list.

drop-down

Matcher Drop-Down List

When you find a match, simply copy and paste the necessary GBIF/SGR cache values into the appropriate Data Set fields. Once you save the changes they will be included in your original Data Set.

When all the records in your Data Set have been compared to GBIF/SGR records and the Data Set has been uploaded into the Specify database it is a good practice to delete the Match Result.

 

Work Space Item Bar Tools

The Work space item bar, located under the work space, offers several tools while in form view.

Record Number Control
l<
Go to the first record.
<
Go to the previous record.
>
Go to the next record.
>l
Go to the last record.
add
Add a record.
remove
Delete a record.
Edit Form Properties
edit properties
Allows the columns to be resized, moved and renamed. See Edit Form Properties.
Refresh/Select Matcher

Refresh the search by clicking on the Matcher button. This can be used when entering new data. Simply enter values for a few fields and run the Matcher. If you do not get the expected results, enter more data, then click on the Matcher button to update the search with the new values.

Click the  arrow to expand the Matcher button and select a different Matcher. The window will be refreshed with the results for the new matcher.

Image Window
Toggles the Image Window on and off.
Configure Carry Forward
carry forward
Opens the 'Carry Forward' configuration dialog. See Carry Forward.
Grid View
Grid Edit
Displays the Data Set in a grid view (resembling a spreadsheet). Matching GBIF records are not displayed in this view.

 

Keyboard Shortcuts

Shift + Page_Up will advance to the next record.

Shift = Page_Down will move to the previous record.