Backup Issue with Specify Databases
Specify Collections Consortium, 8 December 2022
Summary
Under specialized conditions a Specify database backup operation can produce a backup file that is incomplete and unusable for a database restore.
Context
This issue affects a small subset of Specify installations, specifically sites that have large Workbench Data Sets in Specify 7. It does not affect collections that only use Specify 6. If you have not used Specify 7, you can safely ignore this entire notice. The problem also does not affect Specify 7 databases hosted in the Specify Cloud service, their backups are complete and secure.
The issue involves use of the Specify SpBackupRestore utility or the MySQL “mysqldump” utility with large Specify Data Sets that are usually created for importing large record batches into a Specify 7 database. (Specify 6 Data Sets are stored in a different set of data tables precluding them from this problem.) Specify 7 enables much larger Data Set sizes of tens of thousands of data rows and these very large sets can cause this backup problem to appear.
The issue affects the operation of the MySQL command “mysqldump”, which is the underlying software for the Backup and Restore functions in Specify 6 Preferences Menu and for the SpBackupRestore utility provided in the Specify 6 installer package. Backup failure happens only when a database has been accessed with Specify 6 and 7, and when it has at least one very large Data Set in Specify 7. The problem can also occur with Specify 7-only sites with large Data Sets that use the SpBackupRestore utility distributed with the Specify 6 installation package or the MYSQL “mysqldump” command line utility.
Issue
When the Specify Data Set table (SpDataSet) has one or more rows that are larger than the default client/server packet size limit in MySQL (or MariaDB), it can result in a SQL dump file that is missing data tables.
When using “mysqldump” on the command line to back up a Specify 7 database that has large entries in the SpDataSet table, you might receive an error message like this:
mysqldump: Error 2013: Lost connection to server during query when dumping table `spdataset` at row: 28
A SQL dump file is created but the backup process stops while copying out the SpDataSet table. When that happens, all data tables after SpDataSet are ignored and not included in the backup file.
This error indicates that the connection to the database server was lost during the backup process. We have found that the issue is related to max_allowed_packet
parameter settings on the MySQL server and client. In the Remedy section below, links to MySQL documentation explain how to increase this parameter to accommodate future backups with large Data Sets.
In this situation, backup operations fail because all of the data for one Data Set is stored in a single database row whose size exceeds the default parameter value in MySQL or MariaDB for the maximum amount of data that can be transmitted in a single network packet.
The Specify 6 Backup/Restore tools, both the standalone SpBackupRestore utility and the backup function under the Specify 6 preferences menu do not warn that the backup was not successful. In contrast, the MySQL command line tool “mysqldump” will report an error, as shown above.
Detection
The quickest way to detect if your database backups are affected by this condition is to open a SQL backup file for your collection in a text editor–like Notepad++ on Windows, BBEdit on macOS, or Vim/nano on Linux. Your dump file may be large and require substantial memory for your text editor to work effectively.
The backup was successful if the last line of the file looks like this:
-- Dump completed on 2022-12-08 15:08:05
Alternatively, one can determine if a backup is complete by noting if there are any data tables after the SpDataSet table in the file.
Remedy
Because the backups fail due to a Data Set whose size that exceeds the default parameter value in MySQL/MariaDB that constrains how much data can be transmitted in a single network packet, a short term solution is to increase the maximum amount of information that can be transmitted. This MySQL documentation page describes how to change the value of maximum allowed packet size parameter:
https://dev.mysql.com/doc/refman/5.7/en/packet-too-large.html
We recommend changing the max_allowed_packet size value to 1024M. It is important to note that both the client and the server have their own max_allowed_packet variable, and that the parameter’s value must be changed in both places.
You can edit the MySQL configuration file to include the following under the [mysqld] header to enable the maximum allowed packet size:
[mysqld]
innodb_buffer_pool_size=100MB
max_allowed_packet=1024MB
net_read_timeout=3600
net_write_timeout=3600
Solution
SCC engineers are working on a permanent fix. We plan to change the way Specify 7 stores Data Sets so that they are spread over multiple database rows. That change will eliminate the need to adjust the MySQL max_allowed_packet
parameter and it will support Data Sets of unlimited size in a database backup.
Summary
Specify 6-only installations and Specify 7 Cloud installations are not affected by this problem. But if you have a collection database with large Data Sets that has been managed with Specify 7, your backup files may be incomplete. If you are unsure if your database backups have been affected, we will be happy to inspect your backup files to verify their integrity. To have us do that, SCC members can e-mail: support@specifysoftware.org. If you have any questions or concerns, let us know.