WHERE OR vs WHERE IN
First Published 29 Dec 2022 Difficulty level : Moderate
Section Links:
Introduction
Tests
Results
Query Execution Plans
Summary of Query Execution Plans
Further Tests (large dataset)
System Info
Downloads
Conclusions
Feedback
O Access, Access, Where Or Art Thou In, Access?
(With apologies to William Shakespeare)
1. Introduction Return To Top
This is the thirteenth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding.
This article follows on from the last speed test
Finding Unmatched Records in which, amongst other outcomes, it was shown that a NOT IN subquery could not use indexes.
There are two common methods of filtering data using multiple criteria: OR and IN.
I wondered whether either of these would be more efficient in terms of query performance and carried out the following tests:
a) Aggregate query using a WHERE clause including OR
b) Aggregate query using a WHERE clause including IN
c) Modified aggregate query using a WHERE clause including OR
There are two similar versions of this set of speed tests.
The main difference is the search fields are NOT INDEXED in one and INDEXED in the other.
In a real world application, the standard advice is that fields being searched regularly or used in joins should ALWAYS be INDEXED.
The indexes will increase file size but normally reduce search times.
Access searches the index rather than each individual record so results can be retrieved very quickly in a query or SQL statement.
In this set of tests, although the time differences were relatively small, there were some interesting and unexpected outcomes.
TLDR:
a) Using IN is more efficient than using OR . . . except when it isn't!
b) Using INDEXING is more efficient than NOT INDEXING . . . except when it isn't!
Clear as mud? There's a (hopefully) better explanation later on . . .
2. Tests Return To Top
In this example, each test is designed to count the number of each type of temp and saved query in the test database (excluding SELECT queries) that were updated after a specified date.
The query results in each case were:
In the first set of tests, the system table MSysObjects provides a convenient method of getting the query list.
A second 'helper' table tblSysObjectTypes was used to denote the type of each query. This included two indexed fields used in the query joins
The read only MSysObjects system table is automatically updated whenever objects are added / deleted / edited / renamed.
However, it does NOT have a primary key field and has no indexed fields
These are the 3 sets of queries used. The sections shown in RED indicate the differences
a) Query qryWhereOR
There are a many OR values that need to be checked in the Flags field.
The only Flags values being excluded are those for SELECT queries (0 and 8)
b) Query qryWhereIN
Using IN significantly shortens the query SQL and I expected it to be slightly faster
c) Query qryWhereOR_v2
In this case, I used a more efficient method of checking the Flags values using a greater than (>) OPERATOR
NOTE: Operators such as > cannot be used with IN
3. Results Return To Top
As the dataset is relatively small, I looped through each query 50 times and measured the total time taken in each case
For comparison, I created a standard table tblSysObjects with exactly the same fields and same data.
I then added a primary key field and indexed all fields used in joins and to filter the data.
I ran identical queries on the second table, expecting those to all run faster.
I repeated each query 10 times and, to my surprise, those based on the indexed table were ALWAYS slower (by about 4%)
These are the average values for each test
Here are the average values displayed in a chart
The results clearly showed that although the time differences were small, using IN was more efficient than OR.
However, taking advantage of the > OPERATOR with OR was faster still.
However, they also showed the the non-indexed queries based on the system table were consistently faster than the indexed equivalent based on a standard table
4. Query Execution Plans Return To Top
I decided to examine the query execution plans (QEP) using the JET SHOWPLAN feature. See my article: Show Plan . . . Run Faster !
a) Comparing the original OR queries
~qryWhereOR (Not indexed)
|
~qryWhereOR (Indexed)
|
Three main points came out of this:
i) The indexed version IS using the indexing effectively . . .but is still slower
ii) Both versions use the indexing on the 'helper' table tblSysObjectsTypes
iii) Although written as WHERE OR, Access executes these as IN to improve efficiency
b) Comparing the IN queries
~qryWhereIN (Not indexed)
|
~qryWhereIN (Indexed)
|
Two main points came out of this:
i) The indexed version IS again using the indexing effectively . . .but is still slower
ii) The execution plans are IDENTICAL to those for the WHERE OR versions . . . but in each case run slightly faster
c) Comparing the modified OR queries
~qryWhereOR_v2 (Not indexed)
|
~qryWhereOR_v2 (Indexed)
|
Two main points came out of this:
i) Both versions run faster due to the simplified criteria used for filtering the Flags values
ii) The indexed version is now able to use the query optimising Rushmore technology . . . yet it is still slower than the non-indexed equivalent
5. Summary from the Query Execution Plans Return To Top
The query execution plans for OR & IN are identical.
Perhaps IN is faster as Access doesn't need to convert the query first? However, as Access uses the query execution plan that doesn't really make sense.
The simplified OR query is faster still as it requires much less processing.
The indexed version is able to use Rushmore technology which SHOULD help the query run faster. See my article: Optimising Queries
I was baffled by the effect of indexing causing equivalent queries to run slower in these tests.
I wondered whether system tables created automatically by Access are inherently more efficient than standard tables created by the user
To test this, I decided to make a second non-indexed copy of tblSysObjects for comparison
The average speed test results in this case were:
As can be seen, indexing again caused each query to run slower! So that didn't help explain the results
6. Further Tests (large dataset) Return To Top
By now, totally perplexed, I decided to repeat all the same tests on a much larger database with well over 3000 temp and saved queries
The query results for the large database were:
The average speed test results for the large database were:
This time, although the relative order of the 3 queries is unchanged, the INDEXED query results are FASTER (as would normally be expected)
Once again, I checked the query execution plans for this large database . . .
a) Comparing the original OR queries (Large DB)
~qryWhereOR (Not indexed)
|
~qryWhereOR (Indexed)
|
The query execution plans are similar to before in many ways except:
i) Here, the 'helper' table tblSysObjectsTypes is referenced differently as it is a linked table
ii) This time, the indexed version this time does use Rushmore . . . so it runs faster
b) Comparing the IN queries (Large DB)
~qryWhereIN (Not indexed)
|
~qryWhereIN (Indexed)
|
Two main points came out of this:
i) The indexed version again uses Rushmore ... so once again it is faster
ii) The QEPs are again IDENTICAL to the WHERE OR versions of this query, but in each case run slightly faster
c) Comparing the modified OR queries (Large DB)
~qryWhereOR_v2 (Not indexed)
|
~qryWhereOR_v2 (Indexed)
|
Two main points came out of this:
i) Both versions again run faster due to the simplified criteria used for filtering the Flags values
ii) The indexed version is again able to use the query optimising Rushmore technology . . . so it is the fastest of all versions of this query
So why is the order different in this case?
Clearly the use of Rushmore is the explanation but I am unclear why it wasn't used in the original indexed tests
Indeed, one of the the main reasons for writing this article was to invite feedback to try and explain what is going on . . .
All ideas welcomed!
7. System Info Return To Top
The test database also allows you to easily obtain some useful system information
These are the system info results for my test workstation
8. Downloads Return To Top
Click to download the test database and query execution plans
WHERE OR vs WHERE IN Approx 2.9 MB (zipped)
Query Execution Plans (both sets) Approx 20 kB (zipped)
NOTE:
The larger database I used is the DEMO version of my
School Data Analyser application
It is not included here as the total file size for the front end & back end databases is about 900 MB!
9. Conclusions Return To Top
Definitely not as clear as I'd hoped / expected when I started preparing this article!
The differences in times between the various tests are not that large and are unlikely to be noticeable in many cases
Using IN is faster than using OR . . . UNLESS the OR filter can be optimised using an OPERATOR
INDEXING doesn't always speed up searches
10. Feedback Return To Top
Please use the contact form below to let me know whether you found this article interesting/useful or if you have any questions/comments.
Do let me know if there are any errors or if you are able to further explain the results obtained
Please also consider making a donation towards the costs of maintaining this website. Thank you
Colin Riddington Mendip Data Systems Last Updated 29 Dec 2022
Return to Speed Test List
Page 13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Return to Top
|
|