How to use the Specify API as a generic webservice

How to use the Specify API as a generic webservice.

This is a simple introduction to accessing the Specify API using
the curl command line utility. For legibility, JSON output is piped
through a pretty-printer provided by the Python json library. It is
hoped that these examples illustrate how the API could be utilized by
any environment which supports HTTP and JSON.

These examples illustrate use with a demo server on
localhost. Substitute an apropriate hostname as necessary.

Logging in

To prevent unauthorized access to the API the browser session is
tracked with a session cookie that must be associated with a logged in
user.

A GET request to the login URL provides the available collections.

curl -i -c cookies.txt -b cookies.txt -X GET http://localhost:8000/context/login/
HTTP/1.0 200 OK
Date: Thu, 13 Apr 2017 21:54:55 GMT
Server: WSGIServer/0.1 Python/2.7.10
Cache-Control: no-cache, no-store, must-revalidate, max-age=0
Vary: Cookie
Expires: Thu, 13 Apr 2017 21:54:55 GMT
Content-Type: application/json
Last-Modified: Thu, 13 Apr 2017 21:54:55 GMT
Set-Cookie:  csrftoken=8BetyLjqobHb; expires=Thu, 12-Apr-2018 21:54:55 GMT; Max-Age=31449600; Path=/

{"username": null, "password": null, "collections": {"KUFishtissue": 32768, "KUFishvoucher": 4}, "collection": null}

To log in as a user of the API for one of the listed collections issue
a PUT request to the same URL. The csrftoken from the previous
response must be passed as a header. The Referer header must be included
when using HTTPS, but may be omitted when using HTTP.

curl -i -c cookies.txt -b cookies.txt -X PUT \
    --data '{"username":"sp7demofish","password":"sp7demofish","collection":4}' \
    -H "X-CSRFToken: 8BetyLjqobHb" \
    -H "Referer: http://localhost:8000/" \
    http://localhost:8000/context/login/
HTTP/1.0 204 No Content
Date: Thu, 13 Apr 2017 22:14:02 GMT
Server: WSGIServer/0.1 Python/2.7.10
Cache-Control: no-cache, no-store, must-revalidate, max-age=0
Vary: Cookie
Expires: Thu, 13 Apr 2017 22:14:02 GMT
Content-Type: text/html; charset=utf-8
Last-Modified: Thu, 13 Apr 2017 22:14:02 GMT
Set-Cookie:  csrftoken=b9SUdhWCbMGG; expires=Thu, 12-Apr-2018 22:14:02 GMT; Max-Age=31449600; Path=/
Set-Cookie:  sessionid=44kms3gdg6; httponly; Path=/
Set-Cookie:  collection=4; expires=Fri, 13-Apr-2018 22:14:02 GMT; Max-Age=31536000; Path=/

Note that a new CSRF token is generated when logging in and should be
used in subsequent data modifying requests.

The currently logged in user can be queried from the following resource:

curl -c cookies.txt -b cookies.txt -X GET http://localhost:8000/context/user.json \
    | python -m json.tool
{
    "accumminloggedin": null,
    "agent": {
        "abbreviation": "tcu",
        "addresses": [],
        "agentattachments": [],
        "agentgeographies": [],
        "agentspecialties": [],
        "agenttype": 1,
        "collcontentcontact": null,
        "collectors": "/api/specify/agent/?agent=3",
        "colltechcontact": null,
        "createdbyagent": null,
        "dateofbirth": null,
        "dateofbirthprecision": 1,
        "dateofdeath": null,
        "dateofdeathprecision": 1,
        "datetype": null,
        "division": "/api/specify/division/2/",
        "email": "testuser@ku.edu",
        "firstname": "Test",
        "groups": [],
        "guid": "f2fec95a-7699-45f2-a794-337040b5bbc8",
        "id": 3,
        "initials": null,
        "instcontentcontact": null,
        "insttechcontact": null,
        "interests": null,
        "jobtitle": null,
        "lastname": "User",
        "members": "/api/specify/agent/?member=3",
        "middleinitial": "C",
        "modifiedbyagent": null,
        "organization": null,
        "orgmembers": "/api/specify/agent/?organization=3",
        "remarks": null,
        "resource_uri": "/api/specify/agent/3/",
        "specifyuser": "/api/specify/specifyuser/1/",
        "suffix": null,
        "timestampcreated": "2012-08-09T12:29:16",
        "timestampmodified": "2012-08-09T12:29:16",
        "title": "mr",
        "url": null,
        "variants": [],
        "version": 3
    },
    "agents": "/api/specify/specifyuser/?specifyuser=1",
    "available_collections": [
        [
            4,
            "KUFishvoucher"
        ],
        [
            32768,
            "KUFishtissue"
        ]
    ],
    "createdbyagent": null,
    "email": "testuser@ku.edu",
    "id": 1,
    "isadmin": true,
    "isauthenticated": true,
    "isloggedin": true,
    "isloggedinreport": false,
    "logincollectionname": "KUFishtissue",
    "logindisciplinename": "Ichthyology",
    "loginouttime": "2017-04-11T16:03:08",
    "message_set": "/api/specify/specifyuser/?user=1",
    "modifiedbyagent": null,
    "name": "Sp7demofish",
    "resource_uri": "/api/specify/specifyuser/1/",
    "spappresourcedirs": "/api/specify/specifyuser/?specifyuser=1",
    "spappresources": "/api/specify/specifyuser/?specifyuser=1",
    "spquerys": "/api/specify/specifyuser/?specifyuser=1",
    "tasksemaphores": "/api/specify/specifyuser/?owner=1",
    "timestampcreated": "2012-08-09T12:29:16",
    "timestampmodified": "2012-08-09T12:29:16",
    "usertype": "Manager",
    "version": 349,
    "workbenches": "/api/specify/specifyuser/?specifyuser=1",
    "workbenchtemplates": "/api/specify/specifyuser/?specifyuser=1"
}

Logging out of the session is accomplished using the login URL with
the username and password set to null.

curl -i -c cookies.txt -b cookies.txt -X PUT \
    --data '{"username":null,"password":null,"collection":4}' \
    -H "X-CSRFToken: 44kms3gdg6" \
    -H "Referer: http://localhost:8000/" \
    http://localhost:8000/context/login/

Making API requests

Once a logged-in session has been established and stored in the
cookie.txt file, making API requests is easy. Let’s fetch the
collection object with id 12.

curl -b cookies.txt http://localhost:8000/api/specify/collectionobject/12/ \
    | python -m json.tool
{
    "accession": "/api/specify/accession/614/",
    "altcatalognumber": null,
    "appraisal": null,
    "availability": null,
    "catalogeddate": "2006-06-13",
    "catalogeddateprecision": 1,
    "catalogeddateverbatim": null,
    "cataloger": "/api/specify/agent/66/",
    "catalognumber": "000038407",
    "collectingevent": "/api/specify/collectingevent/8868/",
    "collection": "/api/specify/collection/4/",
    "collectionmemberid": 4,
    "collectionobjectattachments": [],
    "collectionobjectattribute": {
        "bottomdistance": null,
        "collectionmemberid": 4,
        "collectionobjects": "/api/specify/collectionobjectattribute/?collectionobjectattribute=42336",
        "createdbyagent": "/api/specify/agent/1/",
        "direction": null,
        "distanceunits": null,
        "id": 42336,
        "modifiedbyagent": null,
        "number1": null,
        "number10": null,
        "number11": null,
        "number12": null,
        "number13": null,
        "number14": null,
        "number15": null,
        "number16": null,
        "number17": null,
        "number18": null,
        "number19": null,
        "number2": null,
        "number20": null,
        "number21": null,
        "number22": null,
        "number23": null,
        "number24": null,
        "number25": null,
        "number26": null,
        "number27": null,
        "number28": null,
        "number29": null,
        "number3": null,
        "number30": null,
        "number31": null,
        "number32": null,
        "number33": null,
        "number34": null,
        "number35": null,
        "number36": null,
        "number37": null,
        "number38": null,
        "number39": null,
        "number4": null,
        "number40": null,
        "number41": null,
        "number42": null,
        "number5": null,
        "number6": null,
        "number7": null,
        "number8": null,
        "number9": null,
        "positionstate": null,
        "remarks": null,
        "resource_uri": "/api/specify/collectionobjectattribute/42336/",
        "text1": null,
        "text10": null,
        "text11": "45-51mm SL",
        "text12": null,
        "text13": null,
        "text14": null,
        "text15": null,
        "text16": null,
        "text17": null,
        "text18": null,
        "text2": null,
        "text3": null,
        "text4": null,
        "text5": null,
        "text6": null,
        "text7": null,
        "text8": null,
        "text9": null,
        "timestampcreated": "1999-10-25T12:32:15",
        "timestampmodified": null,
        "topdistance": null,
        "version": 0,
        "yesno1": null,
        "yesno2": null,
        "yesno3": null,
        "yesno4": null,
        "yesno5": null,
        "yesno6": null,
        "yesno7": null
    },
    "collectionobjectattrs": [],
    "collectionobjectcitations": [],
    "conservdescriptions": [],
    "container": null,
    "containerowner": null,
    "countamt": 1,
    "createdbyagent": "/api/specify/agent/1/",
    "deaccessioned": false,
    "description": null,
    "determinations": [
        {
            "addendum": null,
            "alternatename": null,
            "collectionmemberid": 4,
            "collectionobject": "/api/specify/collectionobject/12/",
            "confidence": null,
            "createdbyagent": "/api/specify/agent/1/",
            "determinationcitations": [],
            "determineddate": "2006-06-13",
            "determineddateprecision": 1,
            "determiner": "/api/specify/agent/66/",
            "featureorbasis": null,
            "guid": "df1929e3-1ed3-11e3-bfac-90b11c41863e",
            "id": 41841,
            "iscurrent": true,
            "method": null,
            "modifiedbyagent": "/api/specify/agent/1514/",
            "nameusage": null,
            "number1": 212.0,
            "number2": null,
            "preferredtaxon": "/api/specify/taxon/11486/",
            "qualifier": null,
            "remarks": null,
            "resource_uri": "/api/specify/determination/41841/",
            "subspqualifier": null,
            "taxon": "/api/specify/taxon/11486/",
            "text1": null,
            "text2": null,
            "timestampcreated": "1999-10-25T12:32:15",
            "timestampmodified": "2006-09-19T10:37:01",
            "typestatusname": null,
            "varqualifier": null,
            "version": 1,
            "yesno1": false,
            "yesno2": null
        }
    ],
    "dnasequences": [],
    "exsiccataitems": [],
    "fieldnotebookpage": null,
    "fieldnumber": null,
    "guid": "db192e8c-1ed3-11e3-bfac-90b11c41863e",
    "id": 12,
    "integer1": null,
    "integer2": null,
    "inventorydate": null,
    "leftsiderels": "/api/specify/collectionobject/?leftside=12",
    "modifiedbyagent": "/api/specify/agent/1514/",
    "modifier": " ",
    "name": null,
    "notifications": null,
    "number1": null,
    "number2": null,
    "objectcondition": null,
    "ocr": null,
    "otheridentifiers": [],
    "paleocontext": null,
    "preparations": [
        {
            "collectionmemberid": 4,
            "collectionobject": "/api/specify/collectionobject/12/",
            "countamt": 4,
            "createdbyagent": "/api/specify/agent/1/",
            "deaccessionpreparations": "/api/specify/preparation/?preparation=35011",
            "description": null,
            "exchangeinpreps": "/api/specify/preparation/?preparation=35011",
            "exchangeoutpreps": "/api/specify/preparation/?preparation=35011",
            "giftpreparations": "/api/specify/preparation/?preparation=35011",
            "id": 35011,
            "integer1": null,
            "integer2": null,
            "isonloan": false,
            "loanpreparations": "/api/specify/preparation/?preparation=35011",
            "modifiedbyagent": "/api/specify/agent/66/",
            "number1": null,
            "number2": null,
            "preparationattachments": [],
            "preparationattribute": null,
            "preparationattrs": [],
            "preparedbyagent": null,
            "prepareddate": null,
            "prepareddateprecision": 1,
            "preptype": "/api/specify/preptype/2/",
            "remarks": null,
            "reservedinteger3": null,
            "reservedinteger4": null,
            "resource_uri": "/api/specify/preparation/35011/",
            "samplenumber": null,
            "status": null,
            "storage": null,
            "storagelocation": null,
            "text1": null,
            "text2": "Jar",
            "timestampcreated": "2002-07-29T13:38:25",
            "timestampmodified": "2006-06-13T10:13:09",
            "version": 0,
            "yesno1": null,
            "yesno2": null,
            "yesno3": null
        }
    ],
    "projectnumber": null,
    "remarks": null,
    "reservedinteger3": null,
    "reservedinteger4": null,
    "reservedtext": null,
    "reservedtext2": null,
    "reservedtext3": null,
    "resource_uri": "/api/specify/collectionobject/12/",
    "restrictions": null,
    "rightsiderels": "/api/specify/collectionobject/?rightside=12",
    "sgrstatus": null,
    "text1": "known",
    "text2": null,
    "text3": null,
    "timestampcreated": "1999-10-25T11:53:37",
    "timestampmodified": "2006-09-19T10:37:01",
    "totalvalue": null,
    "treatmentevents": [],
    "version": 1,
    "visibility": 0,
    "visibilitysetby": null,
    "yesno1": null,
    "yesno2": null,
    "yesno3": null,
    "yesno4": null,
    "yesno5": null,
    "yesno6": null
}

The result is a JSON representation of the requested object. The first
thing to notice is that related objects can either be referenced by
links to their resources or have their data included in line in the
parent resource.

Example of a link:

"createdbyagent": "/api/specify/agent/1/",

Example of inline:

"determinations": [
  // Array of records here
  ...
}

Whether related object is inline or not depends on whether the relationship is marked as “Dependent” in the data model.

You can open the “Database Schema” from the “User Tools” menu, then click on the “Collection Object” table, and see that in the list of relationships, “Determinations” is dependent, whereas “Created By Agent” is not.

The URIs for resources take the form /api/specify/TABLE/ID/. To
retrieve a collection of resources use URIs of the form
/api/specify/TABLE/.

curl -b cookies.txt http://localhost:8000/api/specify/collectionobject/ \
    | python -m json.tool

By default a maximum of twenty rows will be returned when requesting a
collection. The limit query parameter can be used to adjust the
amount. Using limit=0 returns all rows.

curl -b cookies.txt http://localhost:8000/api/specify/taxon/?limit=2 \
    | python -m json.tool

The offset query parameter can be used along with limit to
implement paging.

Filtering

Often it will be necessary to retrieve a collection of resources that
meet certain criteria. The API implements basic filtering for this
purpose. E.g. to retrieve all preparation records for collection
object with id 12:

curl -b cookies.txt http://localhost:8000/api/specify/preparation/?collectionobject=12 \
    | python -m json.tool

Collection requests can also be filtered by domain, which means
that only records which are in the same collection context as the user
will be returned.

curl -b cookies.txt http://localhost:8000/api/specify/taxon/?domainfilter=true \
    | python -m json.tool

Updates

Resources can be updated by making PUT requests to the corresponding
URI. Because the API implements optimistic locking, it is necessary to
obtain the current version of the resource before attempting an
update.

curl -b cookies.txt http://localhost:8000/api/specify/agent/3/ | grep -o '\"version\": [0-9]*'

If the output is, for example, "version": 3, then the resource
could be modified as follows:

curl -b cookies.txt -X PUT \
    -H "X-CSRFToken: 44kms3gdg6" \
    -H "Referer: http://localhost:8000/" \
    --data '{"version": 3, "remarks":"test"}' \
    http://localhost:8000/api/specify/agent/3/ \
        | python -m json.tool

The response will be the data of the updated resource.

If the version if is omitted, the server will return a 400 Bad Request. If the version does not match the current version in the
database, 409 Conflict will be returned.

Resource creation

Resources can be created by issuing a POST request to the collection
URI representing the target table. To create a new collection object:

curl -b cookies.txt -X POST \
    -H "X-CSRFToken: 44kms3gdg6" \
    -H "Referer: http://localhost:8000/" \
    --data '{"collection":"/api/specify/collection/4/"}' \
    http://localhost:8000/api/specify/collectionobject/ \
        | python -m json.tool

The response will be the complete representation of the newly created
resource.

Resource deletion

Delete resources by issuing DELETE to the resource’s URI. Because no
data is included in a DELETE request, the version information is
placed in the If-Match HTTP header:

curl -b cookies.txt -X DELETE \
    -H 'If-Match: 0' \
    -H "X-CSRFToken: 44kms3gdg6" \
    -H "Referer: http://localhost:8000/" \
    http://localhost:8000/api/specify/collectionobject/123456789/

where 0 and 123456789 are replaced with the appropriate
values. Again, if the version info is missing or out of date, an error
will be returned.

Swagger UI API Pages

Within every instance of Specify 7, there is API documentation using Swagger UI. This allows the REST API to be easily visualized and interacted with.

View our demo here: (Login and password is sp7demofish)
Specify 7 Tables API - Tables API
Specify 7 Operations API - Operations API

To view your instance’s API, replace the base URL with your Specify 7 installation’s hostname. (For example, sp7demofish.specifycloud.org would be replaced withsp7.yourinstitution.org)

On the Tables API page, you can view and test the parameters associated with each supported database table as well as modify and view the responses.

On the Operations API page, you can view and test the system operations endpoints requests and responses.

How does one access the Swagger UI of one’s very own installation of Specify7 ???

Oh, got it! Just add ‘/documentation/api/tables/’ to the URL…

1 Like

I’m trying to connect to this API using python.requests, but keep on getting a 403 ‘Forbidden’ back even though I get a hold of the csfr token.

import requests 

response = requests.request('GET', "https://specify-test.science.ku.dk/context/login/", verify=False)
csrftoken = response.cookies.get('csrftoken')
headers = {'content-type': 'application/json', 'X-CSRFToken': csrftoken, 'Referer' : 'https://specify-test.science.ku.dk/' }
response = requests.request('PUT', "https://specify-test.science.ku.dk/context/login/", data={'username' : 'test', 'password' : 'redacted', 'collection': 5}, headers=headers,verify=False)

What am I missing here?

From our programmer:

I think he needs to use a persistent session as described here:
Advanced Usage — Requests 2.31.0 documentation
That allows cookies set by the server to be carried over from one request to the next.

1 Like

interesting information

It would be great to list an example of both here.
Is the “determinations” array an example of in line?
Is “createdbyagent” an example of linked to their resource?

Does this mean that if domainfilter=false, then the query will run against all collections in the database that the user has access to? Does this mean that cross-collection queries are possible?

Correct,

Whether related object is inline or not depends on whether the relationship is marked as “Dependent” in the data model.

You can open the “Database Schema” from the “User Tools” menu, then click on the “Collection Object” table, and see that in the list of relationships, “Determinations” is dependent, whereas “Created By Agent” is not.

I will clarify this better in the documentation

Correct

I was also getting a 403 error attempting to access the API via code (ass opposed to on the command line with curl). The examples above rely on storing a cookie file on disk but when accessing the API programmatically, this is not practical.

The trick to getting this to work is to store the the raw cookie data in a variable and pass it from one request to the next. But there is a catch, that took me a little while to work out.

The raw cookie data from the response is found in the Set-Cookie response header and it needs to be passed in the next request via the Cookie request header. However it needs to be modified before passing through. The modification is to replace all commas with semi colons with a text replace function, such as replaceAll(',',';').

I hope this saves someone else a couple of hours of frustration, that it caused me.

2 Likes

I’m looking for same service. My try is on postman so if anyone makes it work on postman, share it. Seems the try of sending data via postman (cookie) always ends on 403 error

Can you simplify your test by using curl first to obtain a valid cookie and then hard code that value into your application code and pass it through. The idea would be to rule out the need to modify the cookie data as the previous post indicated by switching commas to semicolons.

Once you verify that your code can send valid cookie data and get a response other than 403, then you can turn your focus towards troubleshooting the next item.

Were you able to accomplish the following?
Make get request to obtain collection list and obtain the csrftoken
Make put request to authenticate
Make API request to /api/specify/collectionobject

I have done some testing today in Insomnia (another API client but should be pretty comparable to Postman) and passing the CRSFtoken as a header seems to have worked as expected on the sp7demofish instance. You can do the entire workflow in Postman if you like, there shouldn’t be a need to use curl if you don’t want to.

Here was my workflow to test:

Request 1:

Following the steps outlined in this documentation, get an overview of collection ids and the original CRSF token by issuing a GET request to https://sp7demofish.specifycloud.org/context/login. In Insomnia, the token will be returned as a cookie in the respective menu. The same is likely true for Postman.

Request 2:

Login using the demo credentials. Issue a PUT request to https://sp7demofish.specifycloud.org/context/login, with the following as JSON formatted data in the GUI.

{ "username": "sp7demofish",
"password": "sp7demofish",
"collection": 4}

With this request, create two header parameters within the GUI, the first has header X-CSRFToken and the value of the csrf token from request one. The second has header Referer and value https://sp7demofish.specifycloud.org

You will now have three key value pairs returned to you in the cookies menu: collection, csrftoken and sessionid. In all future requests, pass the value of the csrftoken as a header, using X-CSRFToken as the key.

To test that I now had write access, I submitted a PUT request to https://sp7demofish.specifycloud.org/api/specify/collectionobject/12/ with the following as the data

{"version": 1, "remarks": "testing api"}

This was confirmed successful using the following query: https://sp7demofish.specifycloud.org/specify/query/88/

Here is an experimental workflow that I used today to access the API via the python requests module. There were a couple quirks to work out

  1. When issuing the put request to update collection object 12, the version had to be passed in as a parameter, not as part of the data dict. Since this increments by one each time an edit is made, I think that a get request would always be required before the put in order to get the most up to date version?
  2. json.dumps() seems to be required when passing the dict to the data argument. I think this has to do with how requests or the api handles the json encoding (dumps converts dict to JSON formatted string), because a JSONDecodeError was raised when it was not present.
  3. I still have to learn more about the Session class, not exactly how relevant it is here.
import requests
import json

# Intialize the session
s = requests.Session()

# Get the available collections
available_collections = s.get("https://sp7demofish.specifycloud.org/context/login/")

# Log in using the demo credentials and the crsftoken returned from the previous response
token = available_collections.cookies["csrftoken"]
data = {"username": "sp7demofish", "password": "sp7demofish", "collection": 4}
headers = {"X-CSRFToken": token, "Referer": "https://specifydemofish.specifycloud.org"}
log_in = s.put(
    "https://sp7demofish.specifycloud.org/context/login/",
    data=json.dumps(data),
    headers=headers,
)

# User is now logged in, future requests can be submitted with new csrftoken
token = log_in.cookies["csrftoken"]
headers = {"X-CSRFToken": token, "Referer": "https://specifydemofish.specifycloud.org"}
base_url = "https://sp7demofish.specifycloud.org/api/specify/"

# Test GET request for collection object
get_collectionObject = s.get(
    base_url + "collectionobject/12",
    headers=headers,
)

version = get_collectionObject.json()["version"]

# Test PUT request for collection object
data = {"remarks": "testingpythonapi"}
params = {"version": version}
put_request = s.put(
    base_url + "collectionobject/12/",
    params=params,
    data=json.dumps(data),
    headers=headers,
)

print(put_request.json())

One thing about the requests to the API for any one table is that a lot of the fields will display an api route (such as /api/specify/collectingevent/8868/ above for the collecting event), which limits visibility into some key fields like locality, or the taxonomic identification. I tried to come up with a solution to be able to call a collection object via its GUID, and have fields from across tables show up in one place. Below is a rough sketch of that process, I thought I would share here in case it is useful to others. The example below is with the demo instance.

Step 1: Build a class to control the session that handles the requests to the Specify instance

This will be responsible for logging in, fetching information, and converting between guid’s and id’s. For right now, I have just tested this with one record, but would want to leverage one session throughout all requests so that login is only done once.

import requests
import json

class SpecifySession:
    """
    Initiates a requests session with a specify instance using the username and password supplied.
    """

    S = requests.Session()

    def __init__(
        self,
        username: str,
        password: str,
        colid: int,
        instance_url: str,
    ):
        self.username = username
        self.password = password
        self.instance_url = instance_url
        self.colid = colid

    def login(self):
        """
        Log into the collection using the credentials supplied. This method must be run before any other requests can be made
        """
        available_collections = self.S.get(self.instance_url + "/context/login/")
        token = available_collections.cookies["csrftoken"]
        data = {
            "username": self.username,
            "password": self.password,
            "collection": self.colid,
        }
        headers = {"X-CSRFToken": token, "Referer": self.instance_url}
        log_in = self.S.put(
            url=(self.instance_url + "/context/login/"),
            data=json.dumps(data),
            headers=headers,
        )
        if log_in.status_code != 204:
            raise RuntimeError("Login failed.")
        self.collection = log_in.cookies["collection"]
        self.sessionid = log_in.cookies["sessionid"]
        self.token = log_in.cookies["csrftoken"]
        self.headers = {
            "X-CSRFToken": self.token,
            "collection": self.collection,
            "sessionid": self.sessionid,
            "Referer": self.instance_url,
        }
        self.base_url = self.instance_url + "/api/specify/"

    def get_from_id(self, id: int, table: str) -> json:
        """
        Submits a get request to the specified table.
        """
        response = self.S.get(
            self.instance_url + f"/api/specify/{table}/" + str(id) + "/",
            headers=self.headers,
        )
        return response.json()

    def get_from_route(self, route: str) -> json:
        """
        Gets data for a table from a complete relative route. For example, /api/specify/collectionevent/1
        """
        response = self.S.get(self.instance_url + route)
        return response.json()

    def uuid_to_id(self, uuid: str, table: str) -> int:
        """
        Converts a uuid to an id for a given table
        """
        response = self.S.get(self.instance_url + f"/api/specify/{table}/?guid={uuid}")
        response_json = response.json()
        object = response_json["objects"][0]
        return object["id"]

Step 2: Create a class to hold the specify record

I decided not to call it CollectionObject to emphasize that it is not just the CO table, but linked tables as well. Here, the one limitation is how to do the taxonomy efficiently above genus, which I still need to work out. Additionally this is only built for the first preferred determination. I attempted to minimize calls to the api by making only one call per table, and then deriving all fields possible from that table without making further requests. Attributes are named via darwin core, instead of the specify schema. If Agents were used for recordedBy, identifiedBy etc would probably look similar to how verbatimCoordinates is done.

class SpecifyRecord:
    def __init__(self, id):
        self.id = id

    def fetch_data(self, specify_session):
        """Fetches data for related tables from a collection object api call"""
        # Collection Object
        self.co = specify_session.get_from_id(self.id, table="collectionobject")
        self.occurrenceID = self.co["guid"]
        self.catalogNumber = self.co["catalognumber"]
        self.otherCatalogNumbers = self.co["altcatalognumber"]
        self.catalogedDate = self.co["catalogeddate"]
        self.dateLastModified = self.co["timestampmodified"]
        # Collecting event
        self.ce = specify_session.get_from_route(self.co["collectingevent"])
        self.recordedBy = self.ce["text1"]
        self.recordNumber = self.ce["stationfieldnumber"]
        self.eventDate = self.ce["startdate"]
        self.verbatimEventDate = self.ce["startdateverbatim"]
        # Locality
        self.loc = specify_session.get_from_route(self.ce["locality"])
        self.locality = self.loc["localityname"]
        self.decimalLatitude = self.loc["latitude1"]
        self.decimalLongitude = self.loc["longitude1"]
        self.verbatimCoordinates = (
            f"{self.loc['verbatimlatitude']} {self.loc['verbatimlongitude']}"
        )
        # Geography
        self.geo = specify_session.get_from_route(self.loc["geography"])
        self.country = self.geo["fullname"].split(",")[0]
        self.stateProvince = self.geo["fullname"].split(",")[1]
        # Determinations
        self.det = self.co["determinations"][0]
        self.identifiedBy = self.det["text1"]
        self.dateIdentified = self.det["determineddate"]
        # Taxon
        self.taxon = specify_session.get_from_route(self.det["preferredtaxon"])
        self.scientificName = self.taxon["fullname"]
        self.genus = self.scientificName.split(" ")[0]
        self.specificEpithet = self.scientificName.split(" ")[1]
        if len(self.scientificName.split(" ")) == 2:
            self.infraspecificEpithet = self.scientificName.split(" ")[2]
        else:
            self.infraspecificEpithet = ""
        self.scientificNameAuthorship = self.taxon["author"]

    def output_json(self, fields: list) -> json:
        """
        Takes a list of fields and creates a single json with those fields as keys, and the values of those fields as values
        """
        dict = {}
        for f in fields:
            dict[f] = getattr(self, f)
        return json.dumps(dict, sort_keys=True, indent=4)

Step 3: Combine everything together and use on a test record

Note that here I have hard-coded the credentials, but in a non-demo setting they should be injected through some other process.

import helpers.specify as specify
from pygments import highlight, lexers, formatters

# Login
specify_session = specify.SpecifySession(
    username="sp7demofish",
    password="sp7demofish",
    colid=32768, # KU Tissue
    instance_url="https://sp7demofish.specifycloud.org",
)
specify_session.login()

# Define which record we would like to see via its GUID
id = specify_session.uuid_to_id(
    uuid="97cec4c7-0e59-4994-906d-fa047b3c8404", table="collectionobject"
)

# Fetch the data associated with that GUID and print to terminal
test_record = specify.SpecifyRecord(id)
test_record.fetch_data(specify_session=specify_session)
test_json = test_record.output_json(
    [
        "occurrenceID",
        "catalogNumber",
        "otherCatalogNumbers",
        "catalogedDate",
        "dateLastModified",
        "recordedBy",
        "recordNumber",
        "eventDate",
        "verbatimEventDate",
        "locality",
        "decimalLatitude",
        "decimalLongitude",
        "genus",
        "specificEpithet",
        "dateIdentified",
        "country",
        "stateProvince",
    ]
)

colorful_json = highlight(
    test_json,
    lexers.JsonLexer(),
    formatters.TerminalFormatter(),
)
print(colorful_json)

Here is what the output looks like, you could choose to sort the keys alphabetically as I have done here, or you could group logically by table (display in the order that they are listed when input into the method).

{
    "catalogNumber": "000001418",
    "catalogedDate": null,
    "country": "Ethiopia",
    "dateIdentified": null,
    "dateLastModified": "2011-05-12T03:52:55",
    "decimalLatitude": "8.1999998093",
    "decimalLongitude": "34.8499984741",
    "eventDate": "1994-11-22",
    "genus": "Barbus",
    "locality": "Bonga River about 40km E of Gambela near Bonga village",
    "occurrenceID": "97cec4c7-0e59-4994-906d-fa047b3c8404",
    "otherCatalogNumbers": null,
    "recordNumber": "AG 94-14",
    "recordedBy": null,
    "specificEpithet": "bynni",
    "stateProvince": " Ilubabor",
    "verbatimEventDate": null
}
1 Like