Data clean up of duplicate records

I have about 4000 duplicate records in about 116000. The duplicates have the same bar code but different GUID. They were accidentally imported into Specify twice from a spreadsheet. Is there any way to batch delete these duplicates?

2 Likes

I’m not aware of a way to batch delete anything - my suggestion is to just make a big record set, put on a movie, and just start clicking until the record set disappears.

More importantly - have you identified why it happened in the first place?

I have identified how it happened. It was from a gigantic upload of 100,000 specimens in a spreadsheet in batches of less than 10,000.The person doing it appeared to have lost their place in the spreadsheet. Are you saying that records can be permanently deleted from the database using a record set?

Hi @rlombardi and @Carmella_Lombardi,

Recommended solution:

If this data was imported recently and new records are not referencing the imported data, you should do a roll back from the WorkBench and correct the upload data before proceeding. Find which data set(s) contains the duplicated CO records and roll them back. You can export the data set, remove the records, and reupload using the existing record plan.


Alternatives:

It is possible to delete said records from the database manually using MySQL/MariaDB, however I would not advise using that method unless you are very familiar with those technologies. You would need to ensure that you delete released records and clean up the references to the records being deleted.

We do have a request on our Specify 7 GitHub page (#1340) for the ability to bulk delete a record set. This would be a very helpful option, so I have mentioned it again to the programmers. I think it would be a better solution to your issue.

Paul’s suggestion will work for you, but you would need to click “Delete” in the bottom left of every single record.


I hope that you can resolve this fairly easily! Duplicate records can be quite a headache. Please let me know if you have any questions.

Grant Fitzsimmons

1 Like

I don’t think I would like to venture into MySQL land. But would the batch edit tool work? I have used it for smaller editing issues but not for deleting a batch. Not sure how I would go about how to locate them in one place, though.

I reccomend that you roll back the data through the WB and reimport where duplication has occured. You can search a value from the duplicated records on the spreadsheets you have imported in the WorkBench to find where the duplication has occurred.

Batch editing is for modifying fields within a table. Unfortunately it does not support mass deletion.

With due diligence, you can try to do it directly using SQL. This will be easier if there are no dependent tables, otherwise you’ll have to delete those dependent records first. If you’re really brave you temporarily change the constraints to cascade deletions. Can only be recommended if you know EXACTLY what you’re doing and how it will affect the Specify database.