Important: Specify Database Backups Can Fail

Backup Issue with Specify Databases

Specify Collections Consortium, 8 December 2022

Summary

Under specialized conditions a Specify database backup operation can produce a backup file that is incomplete and unusable for a database restore.

Context

This issue affects a small subset of Specify installations, specifically sites that have large Workbench Data Sets in Specify 7. It does not affect collections that only use Specify 6. If you have not used Specify 7, you can safely ignore this entire notice. The problem also does not affect Specify 7 databases hosted in the Specify Cloud service, their backups are complete and secure.

The issue involves use of the Specify SpBackupRestore utility or the MySQL “mysqldump” utility with large Specify Data Sets that are usually created for importing large record batches into a Specify 7 database. (Specify 6 Data Sets are stored in a different set of data tables precluding them from this problem.) Specify 7 enables much larger Data Set sizes of tens of thousands of data rows and these very large sets can cause this backup problem to appear.

The issue affects the operation of the MySQL command “mysqldump”, which is the underlying software for the Backup and Restore functions in Specify 6 Preferences Menu and for the SpBackupRestore utility provided in the Specify 6 installer package. Backup failure happens only when a database has been accessed with Specify 6 and 7, and when it has at least one very large Data Set in Specify 7. The problem can also occur with Specify 7-only sites with large Data Sets that use the SpBackupRestore utility distributed with the Specify 6 installation package or the MYSQL “mysqldump” command line utility.

Issue

When the Specify Data Set table (SpDataSet) has one or more rows that are larger than the default client/server packet size limit in MySQL (or MariaDB), it can result in a SQL dump file that is missing data tables.

When using “mysqldump” on the command line to back up a Specify 7 database that has large entries in the SpDataSet table, you might receive an error message like this:

mysqldump: Error 2013: Lost connection to server during query when dumping table `spdataset` at row: 28

A SQL dump file is created but the backup process stops while copying out the SpDataSet table. When that happens, all data tables after SpDataSet are ignored and not included in the backup file.

This error indicates that the connection to the database server was lost during the backup process. We have found that the issue is related to max_allowed_packet parameter settings on the MySQL server and client. In the Remedy section below, links to MySQL documentation explain how to increase this parameter to accommodate future backups with large Data Sets.

In this situation, backup operations fail because all of the data for one Data Set is stored in a single database row whose size exceeds the default parameter value in MySQL or MariaDB for the maximum amount of data that can be transmitted in a single network packet.

The Specify 6 Backup/Restore tools, both the standalone SpBackupRestore utility and the backup function under the Specify 6 preferences menu do not warn that the backup was not successful. In contrast, the MySQL command line tool “mysqldump” will report an error, as shown above.

Detection

The quickest way to detect if your database backups are affected by this condition is to open a SQL backup file for your collection in a text editor–like Notepad++ on Windows, BBEdit on macOS, or Vim/nano on Linux. Your dump file may be large and require substantial memory for your text editor to work effectively.

The backup was successful if the last line of the file looks like this:

-- Dump completed on 2022-12-08 15:08:05

Alternatively, one can determine if a backup is complete by noting if there are any data tables after the SpDataSet table in the file.

Remedy

Because the backups fail due to a Data Set whose size that exceeds the default parameter value in MySQL/MariaDB that constrains how much data can be transmitted in a single network packet, a short term solution is to increase the maximum amount of information that can be transmitted. This MySQL documentation page describes how to change the value of maximum allowed packet size parameter:

https://dev.mysql.com/doc/refman/5.7/en/packet-too-large.html

We recommend changing the max_allowed_packet size value to 1024M. It is important to note that both the client and the server have their own max_allowed_packet variable, and that the parameter’s value must be changed in both places.

You can edit the MySQL configuration file to include the following under the [mysqld] header to enable the maximum allowed packet size:

[mysqld]

innodb_buffer_pool_size=100MB
max_allowed_packet=1024MB
net_read_timeout=3600
net_write_timeout=3600

Solution

SCC engineers are working on a permanent fix. We plan to change the way Specify 7 stores Data Sets so that they are spread over multiple database rows. That change will eliminate the need to adjust the MySQL max_allowed_packet parameter and it will support Data Sets of unlimited size in a database backup.

Summary

Specify 6-only installations and Specify 7 Cloud installations are not affected by this problem. But if you have a collection database with large Data Sets that has been managed with Specify 7, your backup files may be incomplete. If you are unsure if your database backups have been affected, we will be happy to inspect your backup files to verify their integrity. To have us do that, SCC members can e-mail: support@specifysoftware.org. If you have any questions or concerns, let us know.

Hello, I’m running MariaDB inside a container and having trouble with incomplete backups. I’ve tried using both the SPBackupRestore application and the DBeaver dump database tool, but neither produces a complete backup.

When I set up my MariaDB instance in the container, I configured it with a max_allowed_packet of 1024M. Here’s the command I used:

podman run --detach --pod mariadb_4_sp7_pod --name mariadb_4_sp7 -v /apps/data/lfc/containers/volumes/specify-mariadb:/var/lib/mysql:Z --env-file='./mysql.env' mariadb:10.11 --max-allowed-packet=1024M

When I run the following command to check the configuration, it returns a value of 1073741824:

SHOW VARIABLES LIKE 'max_allowed_packet';

1024 MB is equal to 1,073,741,824 bytes so it seems ok.

Has anyone encountered this issue with MariaDB in a container and found a solution?

FYI. I could not find a solution for the incomplete Specify MariaDB backups. I just noticed that all my previous backups were incomplete. Unfortunately, I had to recreate a new database and re-importe all my data through the Workbench. :frowning:
In the future I will test my backups once in a while to maker sure they are valide.

Hi @Heryk,

I’m still abroad at SPNHC-TDWG 2024, so I apologize that my message may be more brief than usual.

Did you try to add an extra command argument for max_allowed_packet on the client side? For instance, in DBeaver:

The server configuration appears to be set up correctly based on your description, so now I would look to the client.

1 Like

Hi @Grant,
Thanks for this very useful info. I will try it the next time I have the same issue.

For now, I rebuilt my database. To bad that creating backups in the past with the Specify backup/restore tool did not generate any error message when they were incomplete. I will double check my backups from now on.

If anyone else is wondering where to add these “extra command args” to DBeaver client, I invite you to read this page: DBeaver Documentation

Hope you have a nice trip back!
Cheers
HĂ©ryk