Taxon tree manipulations lock database

We’ve been having episodes where any larger manipulation done to the taxon tree by anyone locks the database for minutes affecting other users and giving them timeouts, much to their confusion and frustration. We are already soon moving to more powerful servers, but I fear this will not be enough. We need a long term solution to this, because we’re only quickly adding more taxa soon.

I understood the foundational problem with the tree are the child node numbers and the time spent recalculating these upon every change to the tree. When moving entire branches, this process obviously has to traverse all the child branches recursively to ensure the integrity of these numbers.

I was told that a solution is being worked on, where this will be done in a different way, so no performance problems will be noticed. I would like to hear a timeline on when we can expect this to be implemented.

Hi @fedoras,

I spoke to one of our developers and he put together a list of upcoming changes related to your inquiry:

There are two bottlenecks when creating new taxon nodes and performing tree mutations.

  1. The first is rebuilding the full names when changes are made.

    The first bottleneck has been addressed in #3175.

    We have observed that the full name fix currently provides sufficient speed-up. The fix will likely be released in the upcoming v7.9.1 release.

  2. The second bottleneck, which occurs because the entire tree is renumbered each time a change is made, has been partially resolved. You can find an early draft of this fix on the node_numbering_fix branch.

    We are not currently considering implementing this solution because we need to evaluate whether the performance enhancements from #3175 are sufficient for most scenarios.

  3. We have improved the speed of tree count queries by implementing recursive CTE queries in the code. This improvement is included in the pull request #3613. This fix will also be include in Specify 7.9.1.

    If Specify is still using MySQL 5.7 instead of MariaDB, the current query method will be used as CTE queries are unsupported in MySQL 5.7.

Thank you for reaching out to us and inquiring about our progress on these issues! We are optimistic that these upcoming updates will significantly improve tree performance in the next few months.

1 Like

Thanks for sharing the planned fixes - it’ll be great to get to try these out. We have experienced similar issues to Fedoras with users experiencing issues if the there is work being done in the tree. We have also found that workbench uploads including taxa cause users to experience crashes and problems working with taxa - we currently have a period of time once a week where workbench uploads take place to minimise disruption.

1 Like

Oh yes indeed I forgot to mention that this especially occurs when importing with Workbench to the point that we are now forced to do imports outside of office hours.