MariaDB database optimisation

We are in the process of moving our Specify instance into the cloud, which also involves changing the database from MySQL to MariaDB.

The performance of the database seems fine with both Specify 6 and 7 clients (but then I am the only user so far), but with the IPT it is quite a lot slower. Producing a Darwin Core Archive for our main collection now takes well over two hours, while it used to take about 35 minutes (pretty involved query). While this, by itself, is not terrible, as it is not like I am sitting there waiting for it, I think some optimisation of the MariaDB server might be warranted.

As I have never had to optimise a MySQL server before, it would be good to get some suggestions for improvements I can make in the configuration. In fact, having a config. file with the Docker Compositions that is optimised for most situations, with suggestions for improvements that can be made for certain situations, would be a very good thing, I think. One of the things that I will definitely is the maximum allowed number of connections. It would be very nice if I could do these things in Docker, so I do not have to remind myself to do it inside the container every time a container is re-created.

Hi Niels,

Sorry I did not see this message sooner.
Are you aware that you can use S6 to connect to your database and then use the “Backup” feature to create a SQL Dump file that you can then use to seed an S6 Maria DB docker container using the docker-compose project?

How to create a SQL Dump with S6:
https://discourse.specifysoftware.org/t/send-a-backup-to-specify-pc/767

How to seed a Maria DB using the docker-compose project
https://github.com/specify/docker-compositions/tree/production/all-in-one

I am not sure what “optimization” is needed that you are referring to, so perhaps you are trying to do something completely different than the process above which easily allows you to migrate a MySQL DB to MariaDB.

  • Matthew

Thanks Matthew, we cannot rely on manual backups, so I do my backups from the command line and that happens every night without me lifting a finger, so I do not know if that takes any longer than before.

The particular problem I was referring to is getting the data out through the IPT (just like I cannot use the backup facility in Specify 6, I can also not use the DwCA creator in Specify 7). The time that takes is 30-35 minutes in our production system that uses MySQL to 2.5 hours in my development system that uses the Docker compositions and MariaDB. I have since found out that external scripts using the Python MySQL implementation are extremely sluggish now and mostly time out. This, for me, is a bigger problem than the IPT being slow, as I come in through the back door a lot (much more than through the Specify clients).

As I said, the Specify clients seem to be fine, so I think it might not necessarily be MariaDB, but probably more the JDBC or the particular Python+MySQL implementation that do not work as well with MariaDB as with MySQL. I cannot do anything about the JDBC, as that comes with the IPT and I cannot use the native MySQL “bridge” (I am not sure if I am using the right word) as I use Docker, so I do not have MySQL on my computer. Likewise, all implementations specifically for MariaDB require me to have MariaDB, including the server that I would never use, on my computer. The same is actually the case for the native MySQL solution in Python, as when you install it, it needs a config. file from the MySQL server without which it crashes.

Also, while the Specify clients work fine now with me as the only user, I think it will be a different story when the number of users goes up to 40 (that’s how many users we have) from one day to the next. In fact, I know there is going to be problems. When we implementated Specify 6 13 years ago, me and my colleague had been working with Specify for a year no problem, but then she gave a training workshop for ten people (the first batch of users we let in) only four could get in. I had to increase the maximum number of database connections then and I expect I will have to do that this time.

So, custom configuration of the database is necessary no matter what and it does not seem to make much sense to have Docker to orchestrate your setup and then have to go into the database container every time to customise the configuration. But then again, it might be better to noy have the database server in the Docker composition at all?

I am not a system administrator or even a proper developer, just a botanist who can hack a little, and it is these sort of architectural things that I have trouble with – and frankly do not enjoy even having to think about – and need help with.