Different results when running same query in Specify

Hello,
One of the scientists that is using Specify has noticed that the same query has completely differing results when run in Specify 6 vs 7.
In Specify 6 the query returns ~447k results, whilst in Specify 7 it returns on ~29k results.

Here is an example of the query in Specify 6:


VS Specify 7:

Is there an obvious change in functionality between Specify 6 & 7 that would be causing such a difference? The query is the exact same (ran from the saved queries on the same db under the same account.)

We’re running Specify 6.8.03 and Specify v7.9.6.2.
If there’s any other information that might be helpful let me know and I’ll happily supply it - i’m hoping that I’ve missed something simple in the way that the results grouping differs between 6 and 7.

Specify 7 System Information - 2025-01-28T05_35_33.520Z.txt (1.0 MB)

Thank you for your message and for the detailed description!

One of the scientists that is using Specify has noticed that the same query has completely differing results when run in Specify 6 vs 7.
In Specify 6 the query returns ~447k results, whilst in Specify 7 it returns on ~29k results.

There are a couple of likely reasons you are seeing a discrepancy in the query results:

  1. Your Specify 6 instance is configured to search across collections, which does not change query behavior in Specify 7.
  2. This query has filters in Specify 7 that are not visible or usable in Specify 6.

I believe the most likely reason is point 1, but to verify, could you check if the following lines are included in your Global Preferences app resource?

GLOBAL_SEARCH=true
GLOBAL_SEARCH_AVAIL=true

Hi Grant,

Neither GLOBAL_SEARCH or GLOBAL_SEARCH_AVAIL are in the Global App Resources:

Adding them also does not seem to change the numbers in the query either - the count in Specify 7 with the two global preferences added remains at 29K records:


(I made sure to clear the cache after changing the global preferences)

The query was originally written in Specify 6 and wasn’t modified at all in Specify 7 before use (just ran directly from the saved query) - are there any filters that might be added in Specify 7 that would cause the discrepancy?

Cheers,
Leo

Hi @leobrimblecombe,

For us to investigate this discrepancy further, we would likely need to see a copy of your Specify database. Is it possible for you to share this with us?

Hi Grant,
Sorry to bring up an old ticket after not following up, but I’ve found another query that has the same issue as this and was wondering if it could be looked at again.

I sent a copy of the Specify database when we were dealing with the Collection Object not saving back in March - would it be possible to try and see if there’s it’s replicable in there. We have since updated the aggregators for a few fields but the issue has not changed.

Cheers,
Leo

Hi @leobrimblecombe,

I recreated the discrepancy using the backup between Specify 6 and 7, and I created a GitHub issue to track it:

The construction of the database query differs between versions. Upon closer examination, I found that Specify 6 uses significantly fewer joins (7 total) compared to Specify 7 (20 total). The query in Specify 6 is much simpler and focuses on direct relationships rather than filtering on higher taxonomy for all ranks included in the query.

I’ll need a developer to take a look at this sometime soon. Can you share some information about the other query you encountered this issue with? You can also export the query so we can take a look!

Hi Grant,
Thanks for looking into it!

The other query that it came up on was our check Queensland query - exported from both Specify 6 & 7:
CHKQLD Specify 6.xml (4.5 KB)
CHKQLD1 S7.json (11.2 KB)

It’s a pretty similar structure to the other query with differing results.
I’ve actually been trying to rewrite it directly as a database query in SQL - would you be able to send me the what the Specify 6 and 7 database queries look like, like you did in the Github issue for the other query?

I really appreciate your help!
Cheers,
Leo

1 Like

Hello,

Hope all is going well!
Just wondering if there has been any updates on this?
Let me know if you’d rather I follow up on the Github issue if that’s easier.

Best regards,
Leo

1 Like

Hi @leobrimblecombe,

Thanks so much for following up!

This issue needs investigation by one of our developers, but our team has been busy with other projects. I’ve added another comment with the details from your response in April.

SQL Queries

As for your request to provide a direct SQL query, here they are below for the CHKQLD query:

Specify 7

SELECT
  group_concat (DISTINCT collectionobject.`CollectionObjectID`) AS group_concat_1,
  taxon_1.`GroupNumber` AS `taxon_1_GroupNumber`,
  CASE
    WHEN (taxon_1.`TaxonTreeDefItemID` = 11) THEN taxon_1.`Name`
    WHEN (taxon_2.`TaxonTreeDefItemID` = 11) THEN taxon_2.`Name`
    WHEN (taxon_3.`TaxonTreeDefItemID` = 11) THEN taxon_3.`Name`
    WHEN (taxon_4.`TaxonTreeDefItemID` = 11) THEN taxon_4.`Name`
    WHEN (taxon_5.`TaxonTreeDefItemID` = 11) THEN taxon_5.`Name`
    WHEN (taxon_6.`TaxonTreeDefItemID` = 11) THEN taxon_6.`Name`
    WHEN (taxon_7.`TaxonTreeDefItemID` = 11) THEN taxon_7.`Name`
    WHEN (taxon_8.`TaxonTreeDefItemID` = 11) THEN taxon_8.`Name`
    WHEN (taxon_9.`TaxonTreeDefItemID` = 11) THEN taxon_9.`Name`
    WHEN (taxon_10.`TaxonTreeDefItemID` = 11) THEN taxon_10.`Name`
    WHEN (taxon_11.`TaxonTreeDefItemID` = 11) THEN taxon_11.`Name`
    WHEN (taxon_12.`TaxonTreeDefItemID` = 11) THEN taxon_12.`Name`
    WHEN (taxon_13.`TaxonTreeDefItemID` = 11) THEN taxon_13.`Name`
    WHEN (taxon_14.`TaxonTreeDefItemID` = 11) THEN taxon_14.`Name`
    WHEN (taxon_15.`TaxonTreeDefItemID` = 11) THEN taxon_15.`Name`
    WHEN (taxon_16.`TaxonTreeDefItemID` = 11) THEN taxon_16.`Name`
    WHEN (taxon_17.`TaxonTreeDefItemID` = 11) THEN taxon_17.`Name`
    WHEN (taxon_18.`TaxonTreeDefItemID` = 11) THEN taxon_18.`Name`
    WHEN (taxon_19.`TaxonTreeDefItemID` = 11) THEN taxon_19.`Name`
    WHEN (taxon_20.`TaxonTreeDefItemID` = 11) THEN taxon_20.`Name`
  END AS anon_1,
  taxon_1.`Text11` AS `taxon_1_Text11`,
  taxon_1.`Text7` AS `taxon_1_Text7`,
  taxon_1.`Text20` AS `taxon_1_Text20`,
  taxon_1.`Text1` AS `taxon_1_Text1`,
  taxon_1.`Text3` AS `taxon_1_Text3`,
  taxon_1.`Text2` AS `taxon_1_Text2`,
  taxon_1.`Integer1` AS `taxon_1_Integer1`
FROM
  collectionobject
  LEFT OUTER JOIN determination AS determination_1 ON collectionobject.`CollectionObjectID` = determination_1.`CollectionObjectID`
  LEFT OUTER JOIN taxon AS taxon_1 ON taxon_1.`TaxonID` = determination_1.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_2 ON taxon_1.`ParentID` = taxon_2.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_3 ON taxon_2.`ParentID` = taxon_3.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_4 ON taxon_3.`ParentID` = taxon_4.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_5 ON taxon_4.`ParentID` = taxon_5.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_6 ON taxon_5.`ParentID` = taxon_6.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_7 ON taxon_6.`ParentID` = taxon_7.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_8 ON taxon_7.`ParentID` = taxon_8.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_9 ON taxon_8.`ParentID` = taxon_9.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_10 ON taxon_9.`ParentID` = taxon_10.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_11 ON taxon_10.`ParentID` = taxon_11.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_12 ON taxon_11.`ParentID` = taxon_12.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_13 ON taxon_12.`ParentID` = taxon_13.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_14 ON taxon_13.`ParentID` = taxon_14.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_15 ON taxon_14.`ParentID` = taxon_15.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_16 ON taxon_15.`ParentID` = taxon_16.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_17 ON taxon_16.`ParentID` = taxon_17.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_18 ON taxon_17.`ParentID` = taxon_18.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_19 ON taxon_18.`ParentID` = taxon_19.`TaxonID`
  LEFT OUTER JOIN taxon AS taxon_20 ON taxon_19.`ParentID` = taxon_20.`TaxonID`
  LEFT OUTER JOIN collectingevent AS collectingevent_1 ON collectingevent_1.`CollectingEventID` = collectionobject.`CollectingEventID`
  LEFT OUTER JOIN locality AS locality_1 ON locality_1.`LocalityID` = collectingevent_1.`LocalityID`
  LEFT OUTER JOIN geography AS geography_1 ON geography_1.`GeographyID` = locality_1.`GeographyID`
  LEFT OUTER JOIN geography AS geography_2 ON geography_1.`ParentID` = geography_2.`GeographyID`
  LEFT OUTER JOIN geography AS geography_3 ON geography_2.`ParentID` = geography_3.`GeographyID`
  LEFT OUTER JOIN geography AS geography_4 ON geography_3.`ParentID` = geography_4.`GeographyID`
  LEFT OUTER JOIN geography AS geography_5 ON geography_4.`ParentID` = geography_5.`GeographyID`
  LEFT OUTER JOIN collectingeventattribute AS collectingeventattribute_1 ON collectingeventattribute_1.`CollectingEventAttributeID` = collectingevent_1.`CollectingEventAttributeID`
WHERE
  collectionobject.`CollectionID` = 4
  AND determination_1.`IsCurrent` = true
  AND CASE
    WHEN (geography_1.`GeographyTreeDefItemID` = 4) THEN geography_1.`Name`
    WHEN (geography_2.`GeographyTreeDefItemID` = 4) THEN geography_2.`Name`
    WHEN (geography_3.`GeographyTreeDefItemID` = 4) THEN geography_3.`Name`
    WHEN (geography_4.`GeographyTreeDefItemID` = 4) THEN geography_4.`Name`
    WHEN (geography_5.`GeographyTreeDefItemID` = 4) THEN geography_5.`Name`
  END = 'Queensland'
  AND collectingeventattribute_1.`Text13` NOT IN (
    'Cultivated',
    'Possibly cultivated',
    'Presumably cultivated',
    'Unknown'
  )
  AND taxon_1.`RankID` > '180'
  AND (
    taxon_1.`TaxonTreeDefID` IN (1)
    OR taxon_1.`TaxonTreeDefID` IS NULL
    OR geography_1.`GeographyTreeDefID` IN (1)
    OR geography_1.`GeographyTreeDefID` IS NULL
  )
GROUP BY
  taxon_1.`GroupNumber`,
  CASE
    WHEN (taxon_1.`TaxonTreeDefItemID` = 11) THEN taxon_1.`Name`
    WHEN (taxon_2.`TaxonTreeDefItemID` = 11) THEN taxon_2.`Name`
    WHEN (taxon_3.`TaxonTreeDefItemID` = 11) THEN taxon_3.`Name`
    WHEN (taxon_4.`TaxonTreeDefItemID` = 11) THEN taxon_4.`Name`
    WHEN (taxon_5.`TaxonTreeDefItemID` = 11) THEN taxon_5.`Name`
    WHEN (taxon_6.`TaxonTreeDefItemID` = 11) THEN taxon_6.`Name`
    WHEN (taxon_7.`TaxonTreeDefItemID` = 11) THEN taxon_7.`Name`
    WHEN (taxon_8.`TaxonTreeDefItemID` = 11) THEN taxon_8.`Name`
    WHEN (taxon_9.`TaxonTreeDefItemID` = 11) THEN taxon_9.`Name`
    WHEN (taxon_10.`TaxonTreeDefItemID` = 11) THEN taxon_10.`Name`
    WHEN (taxon_11.`TaxonTreeDefItemID` = 11) THEN taxon_11.`Name`
    WHEN (taxon_12.`TaxonTreeDefItemID` = 11) THEN taxon_12.`Name`
    WHEN (taxon_13.`TaxonTreeDefItemID` = 11) THEN taxon_13.`Name`
    WHEN (taxon_14.`TaxonTreeDefItemID` = 11) THEN taxon_14.`Name`
    WHEN (taxon_15.`TaxonTreeDefItemID` = 11) THEN taxon_15.`Name`
    WHEN (taxon_16.`TaxonTreeDefItemID` = 11) THEN taxon_16.`Name`
    WHEN (taxon_17.`TaxonTreeDefItemID` = 11) THEN taxon_17.`Name`
    WHEN (taxon_18.`TaxonTreeDefItemID` = 11) THEN taxon_18.`Name`
    WHEN (taxon_19.`TaxonTreeDefItemID` = 11) THEN taxon_19.`Name`
    WHEN (taxon_20.`TaxonTreeDefItemID` = 11) THEN taxon_20.`Name`
  END,
  taxon_1.`Text11`,
  taxon_1.`Text7`,
  taxon_1.`Text20`,
  taxon_1.`Text1`,
  taxon_1.`Text3`,
  taxon_1.`Text2`,
  taxon_1.`Integer1`
ORDER BY
  taxon_1.`GroupNumber` ASC,
  CASE
    WHEN (taxon_1.`TaxonTreeDefItemID` = 11) THEN taxon_1.`Name`
    WHEN (taxon_2.`TaxonTreeDefItemID` = 11) THEN taxon_2.`Name`
    WHEN (taxon_3.`TaxonTreeDefItemID` = 11) THEN taxon_3.`Name`
    WHEN (taxon_4.`TaxonTreeDefItemID` = 11) THEN taxon_4.`Name`
    WHEN (taxon_5.`TaxonTreeDefItemID` = 11) THEN taxon_5.`Name`
    WHEN (taxon_6.`TaxonTreeDefItemID` = 11) THEN taxon_6.`Name`
    WHEN (taxon_7.`TaxonTreeDefItemID` = 11) THEN taxon_7.`Name`
    WHEN (taxon_8.`TaxonTreeDefItemID` = 11) THEN taxon_8.`Name`
    WHEN (taxon_9.`TaxonTreeDefItemID` = 11) THEN taxon_9.`Name`
    WHEN (taxon_10.`TaxonTreeDefItemID` = 11) THEN taxon_10.`Name`
    WHEN (taxon_11.`TaxonTreeDefItemID` = 11) THEN taxon_11.`Name`
    WHEN (taxon_12.`TaxonTreeDefItemID` = 11) THEN taxon_12.`Name`
    WHEN (taxon_13.`TaxonTreeDefItemID` = 11) THEN taxon_13.`Name`
    WHEN (taxon_14.`TaxonTreeDefItemID` = 11) THEN taxon_14.`Name`
    WHEN (taxon_15.`TaxonTreeDefItemID` = 11) THEN taxon_15.`Name`
    WHEN (taxon_16.`TaxonTreeDefItemID` = 11) THEN taxon_16.`Name`
    WHEN (taxon_17.`TaxonTreeDefItemID` = 11) THEN taxon_17.`Name`
    WHEN (taxon_18.`TaxonTreeDefItemID` = 11) THEN taxon_18.`Name`
    WHEN (taxon_19.`TaxonTreeDefItemID` = 11) THEN taxon_19.`Name`
    WHEN (taxon_20.`TaxonTreeDefItemID` = 11) THEN taxon_20.`Name`
  END ASC,
  taxon_1.`Text20` ASC

Specify 6

select distinct
  taxon2_.GroupNumber as col_0_0_,
  taxon2_.TaxonID as col_1_0_,
  taxon2_.Text11 as col_2_0_,
  taxon2_.Text7 as col_3_0_,
  taxon2_.Text20 as col_4_0_,
  taxon2_.Text1 as col_5_0_,
  taxon2_.Text3 as col_6_0_,
  taxon2_.Text2 as col_7_0_,
  taxon2_.Integer1 as col_8_0_
from
  collectionobject collection0_
  left outer join determination determinat1_ on collection0_.CollectionObjectID = determinat1_.CollectionObjectID
  left outer join taxon taxon2_ on determinat1_.TaxonID = taxon2_.TaxonID
  left outer join collectingevent collecting3_ on collection0_.CollectingEventID = collecting3_.CollectingEventID
  left outer join locality locality4_ on collecting3_.LocalityID = locality4_.LocalityID
  left outer join geography geography5_ on locality4_.GeographyID = geography5_.GeographyID
  left outer join collectingeventattribute collecting6_ on collecting3_.CollectingEventAttributeID = collecting6_.CollectingEventAttributeID
where
  determinat1_.IsCurrent = 1
  and (geography5_.NodeNumber between 203 and 218)
  and (
    collecting6_.Text13 not in (
      'Cultivated',
      'Possibly cultivated',
      'Presumably cultivated',
      'Unknown'
    )
    or collecting6_.Text13 is null
  )
  and taxon2_.RankID > 180
  and collection0_.CollectionMemberID = 4

Explanation

To get the SQL run on the database when a query is constructed by Specify, you can run the following command immediately following you executing the query in either version:

SHOW FULL PROCESSLIST;

From here, you should see the results of all ongoing processes in MariaDB, including the SQL query constructed by Specify in the Info column:

Id User Host db Command Time State Info Progress
68 masteruser localhost:55220 queensland_2025_03_17 Query 0 Sending data select distinct taxon2_.GroupNumber as col_0_0_, taxon2_.TaxonID as col_1_0_, taxon2_.Text11 as col_2_0_, taxon2_.Text7 as col_3_0_, taxon2_.Text20 as col_4_0_, taxon2_.Text1 as col_5_0_, taxon2_.Text3 as col_6_0_, taxon2_.Text2 as col_7_0_, taxon2_.Integer1 as col_8_0_ from collectionobject collection0_ left outer join determination determinat1_ on collection0_.CollectionObjectID=determinat1_.CollectionObjectID left outer join taxon taxon2_ on determinat1_.TaxonID=taxon2_.TaxonID left outer join collectingevent collecting3_ on collection0_.CollectingEventID=collecting3_.CollectingEventID left outer join locality locality4_ on collecting3_.LocalityID=locality4_.LocalityID left outer join geography geography5_ on locality4_.GeographyID=geography5_.GeographyID left outer join collectingeventattribute collecting6_ on collecting3_.CollectingEventAttributeID=collecting6_.CollectingEventAttributeID where determinat1_.IsCurrent=1 and (geography5_.NodeNumber between 203 and 218) and (collecting6_.Text13 not in (‘Cultivated’ , ‘Possibly cultivated’ , ‘Presumably cultivated’ , ‘Unknown’) or collecting6_.Text13 is null) and taxon2_.RankID>180 and collection0_.CollectionMemberID=4 limit 50000 0

From here, you can reformat the query and save it for future use!