WHERE OR vs WHERE IN


First Published 29 Dec 2022                                                                                                                                             Difficulty level :   Moderate

Section Links:
        Introduction
        Tests
        Results
        Query Execution Plans
        Summary of Query Execution Plans
        Further Tests (large dataset)
        System Info
        Downloads
        Conclusions
        Feedback



O Access, Access, Where Or Art Thou In, Access?          (With apologies to William Shakespeare)



1.   Introduction                                                                                                                                         Return To Top

This is the thirteenth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding.

This article follows on from the last speed test Finding Unmatched Records in which, amongst other outcomes, it was shown that a NOT IN subquery could not use indexes.

There are two common methods of filtering data using multiple criteria: OR and IN.

I wondered whether either of these would be more efficient in terms of query performance and carried out the following tests:
a)   Aggregate query using a WHERE clause including OR
b)   Aggregate query using a WHERE clause including IN
c)   Modified aggregate query using a WHERE clause including OR

There are two similar versions of this set of speed tests.
The main difference is the search fields are NOT INDEXED in one and INDEXED in the other.

MainForm
In a real world application, the standard advice is that fields being searched regularly or used in joins should ALWAYS be INDEXED.
The indexes will increase file size but normally reduce search times.
Access searches the index rather than each individual record so results can be retrieved very quickly in a query or SQL statement.

In this set of tests, although the time differences were relatively small, there were some interesting and unexpected outcomes.

TLDR:
a)   Using IN is more efficient than using OR . . . except when it isn't!
b)   Using INDEXING is more efficient than NOT INDEXING . . . except when it isn't!

Clear as mud? There's a (hopefully) better explanation later on . . .



2.   Tests                                                                                                                                                     Return To Top

In this example, each test is designed to count the number of each type of temp and saved query in the test database (excluding SELECT queries) that were updated after a specified date.

The query results in each case were:

QueryResults
In the first set of tests, the system table MSysObjects provides a convenient method of getting the query list.
A second 'helper' table tblSysObjectTypes was used to denote the type of each query. This included two indexed fields used in the query joins

The read only MSysObjects system table is automatically updated whenever objects are added / deleted / edited / renamed.
However, it does NOT have a primary key field and has no indexed fields

These are the 3 sets of queries used. The sections shown in RED indicate the differences

a)   Query qryWhereOR

QueryOR
There are a many OR values that need to be checked in the Flags field.
The only Flags values being excluded are those for SELECT queries (0 and 8)

b)   Query qryWhereIN

QueryIN
Using IN significantly shortens the query SQL and I expected it to be slightly faster

c)   Query qryWhereOR_v2

QueryOR_v2
In this case, I used a more efficient method of checking the Flags values using a greater than (>) OPERATOR

NOTE: Operators such as > cannot be used with IN



3.   Results                                                                                                                                                 Return To Top

As the dataset is relatively small, I looped through each query 50 times and measured the total time taken in each case

For comparison, I created a standard table tblSysObjects with exactly the same fields and same data.
I then added a primary key field and indexed all fields used in joins and to filter the data.

I ran identical queries on the second table, expecting those to all run faster.
I repeated each query 10 times and, to my surprise, those based on the indexed table were ALWAYS slower (by about 4%)

TestResults
These are the average values for each test

AvgResults
Here are the average values displayed in a chart

Report
The results clearly showed that although the time differences were small, using IN was more efficient than OR.
However, taking advantage of the > OPERATOR with OR was faster still.

However, they also showed the the non-indexed queries based on the system table were consistently faster than the indexed equivalent based on a standard table



4.   Query Execution Plans                                                                                                                         Return To Top

I decided to examine the query execution plans (QEP) using the JET SHOWPLAN feature. See my article: Show Plan . . . Run Faster !

a) Comparing the original OR queries

~qryWhereOR (Not indexed)

ShowPlan1
~qryWhereOR (Indexed)

ShowPlan4



Three main points came out of this:
i)   The indexed version IS using the indexing effectively . . .but is still slower
ii)   Both versions use the indexing on the 'helper' table tblSysObjectsTypes
iii)   Although written as WHERE OR, Access executes these as IN to improve efficiency


b) Comparing the IN queries

~qryWhereIN (Not indexed)

ShowPlan2
~qryWhereIN (Indexed)

ShowPlan5



Two main points came out of this:
i)   The indexed version IS again using the indexing effectively . . .but is still slower
ii)   The execution plans are IDENTICAL to those for the WHERE OR versions . . . but in each case run slightly faster


c) Comparing the modified OR queries

~qryWhereOR_v2 (Not indexed)

ShowPlan3
~qryWhereOR_v2 (Indexed)

ShowPlan6



Two main points came out of this:
i)   Both versions run faster due to the simplified criteria used for filtering the Flags values
ii)   The indexed version is now able to use the query optimising Rushmore technology . . . yet it is still slower than the non-indexed equivalent



5.   Summary from the Query Execution Plans                                                                                         Return To Top

The query execution plans for OR & IN are identical.
Perhaps IN is faster as Access doesn't need to convert the query first? However, as Access uses the query execution plan that doesn't really make sense.

The simplified OR query is faster still as it requires much less processing.
The indexed version is able to use Rushmore technology which SHOULD help the query run faster. See my article: Optimising Queries

I was baffled by the effect of indexing causing equivalent queries to run slower in these tests.
I wondered whether system tables created automatically by Access are inherently more efficient than standard tables created by the user

To test this, I decided to make a second non-indexed copy of tblSysObjects for comparison
The average speed test results in this case were:

AvgResultsAltTest
As can be seen, indexing again caused each query to run slower! So that didn't help explain the results



6.   Further Tests (large dataset)                                                                                                               Return To Top

By now, totally perplexed, I decided to repeat all the same tests on a much larger database with well over 3000 temp and saved queries

The query results for the large database were:

QueryResultsLargeDB

The average speed test results for the large database were:

AvgResultsLargeDB

This time, although the relative order of the 3 queries is unchanged, the INDEXED query results are FASTER (as would normally be expected)

ReportLargeDB

Once again, I checked the query execution plans for this large database . . .

a)   Comparing the original OR queries (Large DB)

~qryWhereOR (Not indexed)

ShowPlan1_LargeDB
~qryWhereOR (Indexed)

ShowPlan4_LargeDB



The query execution plans are similar to before in many ways except:
i)   Here, the 'helper' table tblSysObjectsTypes is referenced differently as it is a linked table
ii)   This time, the indexed version this time does use Rushmore . . . so it runs faster


b)   Comparing the IN queries (Large DB)

~qryWhereIN (Not indexed)

ShowPlan2_LargeDB
~qryWhereIN (Indexed)

ShowPlan5_LargeDB



Two main points came out of this:
i)   The indexed version again uses Rushmore ... so once again it is faster
ii)   The QEPs are again IDENTICAL to the WHERE OR versions of this query, but in each case run slightly faster


c) Comparing the modified OR queries (Large DB)

~qryWhereOR_v2 (Not indexed)

ShowPlan3_LargeDB
~qryWhereOR_v2 (Indexed)

ShowPlan6_LargeDB



Two main points came out of this:
i)   Both versions again run faster due to the simplified criteria used for filtering the Flags values
ii)   The indexed version is again able to use the query optimising Rushmore technology . . . so it is the fastest of all versions of this query

So why is the order different in this case?

Clearly the use of Rushmore is the explanation but I am unclear why it wasn't used in the original indexed tests

Indeed, one of the the main reasons for writing this article was to invite feedback to try and explain what is going on . . .

All ideas welcomed!



7.   System Info                                                                                                                                           Return To Top

The test database also allows you to easily obtain some useful system information
These are the system info results for my test workstation

SystemInfo


8.   Downloads                                                                                                                                             Return To Top

Click to download the test database and query execution plans

      WHERE OR vs WHERE IN      Approx 2.9 MB (zipped)

      Query Execution Plans (both sets)      Approx 20 kB (zipped)

NOTE:
The larger database I used is the DEMO version of my School Data Analyser application
It is not included here as the total file size for the front end & back end databases is about 900 MB!



9.   Conclusions                                                                                                                                           Return To Top

Definitely not as clear as I'd hoped / expected when I started preparing this article!

The differences in times between the various tests are not that large and are unlikely to be noticeable in many cases

Using IN is faster than using OR . . . UNLESS the OR filter can be optimised using an OPERATOR
INDEXING doesn't always speed up searches



10.   Feedback                                                                                                                                             Return To Top

Please use the contact form below to let me know whether you found this article interesting/useful or if you have any questions/comments.

Do let me know if there are any errors or if you are able to further explain the results obtained

Please also consider making a donation towards the costs of maintaining this website. Thank you



Colin Riddington                     Mendip Data Systems                           Last Updated 29 Dec 2022



Return to Speed Test List Page 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Return to Top