Remove Old WorkBench/Batch Edit Data Sets for Any User

:bookmark: This guide describes how to remove old WorkBench and Batch Edit data sets

[!caution]
This guide is intended for administrators. This requires full permissions to the spdataset table in Security & Accounts. Make sure to be responsible and careful when removing data sets using this approach.

WorkBench and Batch Edit data sets can take up quite a large amount of space, so we recommend removing unused data sets after some period of time to free up space in the database.

To start, we need to import a query that searches the spdataset table. This is the table that contains all of the data sets we are looking to clean up. If you have any questions about the import process, consult the guide Export and Import a Query.

  1. Click on Queries :queries_: in the navigation menu.

  2. Click on New in the bottom right of the dialog.

  3. Click on Import in the bottom right of the dialog.

  4. Download, then select and import the following query: Find WorkBench Data Sets.json

  5. After it has been imported, you should see this query:

    This returns the following information:

    • Name: The name given to the data set. If this starts with “Backs”, that means it is the backup data set created automatically by Batch Edit for use with the rollback feature.
    • ImportedFileName: If the data set was created from an external file, this is the name it had before import.
    • UploadResult: This includes the status of the upload. If it includes "success": true, that means it has already been uploaded. This also includes information like the ID for the agent who uploaded it and the timestamp.
    • SpecifyUser: This is the name of the Specify User who currently “owns” this data set. It is not always the same as the one who initially created it as data sets can be transferred between users.
    • CreatedByAgent: The name of the agent who created the data set.
    • ModifiedByAgent: The name of the agent who last modified the data set.
    • TimestampCreated: When the data set was created.
    • TimestampModified: When the data set was last modified.
    • CollectionName: The name of the collection in the database where this data set appears.

    [!warning]
    You will not be able to see the full contents of the data set from the query results. To see a data set in the WorkBench or Batch Edit interface, you will need to log into the account of the ‘Specify User’ who “owns” the data set and access it via the appropriate Collection.

    Running this will return a comprehensive list of all data sets across all collections in the database:

  6. You can now use Browse In Forms or navigate to each result individually to review the contents and choose whether or not to Delete each data set.

Thanks, @Grant, this is very useful!
Is there also a way to inspect the size of the datasets from the query?
That is the most important property for prioritizing the deletion.

Hi @Eyal,

At the moment, there is no way to inspect size using the Query Builder per se as this would need to be calculated dynamically. If you have direct access to the database, you can do this calculation by running a script like this one, but note it may take several minutes to complete:

SELECT
  s.id AS spdataset_id,
  s.name AS dataset_name,
  c.CollectionName AS collection_name,
  u.Name            AS user_name,
  ROUND(OCTET_LENGTH(s.data) / 1024 / 1024, 2) AS data_size_mb
FROM spdataset AS s
JOIN collection AS c
  ON s.collection_id = c.UserGroupScopeId
JOIN specifyuser AS u
  ON s.specifyuser_id = u.SpecifyUserID
ORDER BY data_size_mb DESC;

Once you have this, you can find those data sets using the query above!

If you want to find just empty data sets to start, you can do that from within the query by searching for cases where the Data field equals []:

1 Like

Thanks, @Grant.
I am already using a SQL query to do this, and I hoped there was a simpler way to do so.