SELECT DISTINCT vs GROUP BY
First Posted 28 Feb 2022 Last Updated 16 Mar 2022 Difficulty level : Moderate
This is the ninth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding.
Example databases are provided in each case so the same tests can be done on your own workstations
This article follows a request by Access World Forums moderator CJ_London in this thread: Mythbusters: Having vs Where
He asked me to test the following:
We often see OP's using aggregate queries without any aggregation rather than SELECT DISTINCT - do you have an example to compare a SELECT DISTINCT v GROUP BY with perhaps a criteria or two?
I must confess I'd never thought about comparing these before and asked the following question of other forum members:
What are your predictions before I make the results public?
Before I started, I had a gut feeling that SELECT DISTINCT would be slightly faster than GROUP BY though I couldn't have given any meaningful reason for that viewpoint.
Most of the forum members who responded tended to agree . . . BUT in fact, the opposite was true . . .
I did various tests on 2 different datasets and on each test (without exception), GROUP BY was slightly faster.
The differences were relatively small but they were consistent in every test
First of all, I tested my Postcodes dataset (2.6 million records)
In my first set of tests, I ran the following queries:
SELECT DISTINCT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
SELECT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector;
Each gave exactly the same output of 14415 records.
All fields in the queries were indexed
I looped through each query 3 times and measured the total time in each case.
I repeated each test & calculated the averages:
I then filtered the data when running each query to see if that would change the results. Same outcome...
I then repeated the tests, but this time appended the query output into a temp table each time to ensure that the query had been loaded fully during the tests.
Same outcome ....
I ran JET ShowPlan to check the execution plan on each of the queries. The results were almost identical:
Next I tried using a different dataset supplied by CJ_London.
It was a smaller dataset of approx 150K records so I looped through each query 20 times.
This time, there were 2 non-indexed fields in the queries
Once again the two queries gave identical outputs = 8324 records
Did that affect the outcomes? NO
Next I ran Jet ShowPlan on each query. The execution plans were identical:
The 'by scanning' reference indicates that Access had to scan the entire dataset in each case due to the inclusion of non-indexed fields
I can't upload my Postcodes dataset as the ACCDB file is about 1.6 GB
However, I have uploaded the example app with CJ_London's dataset (with his agreement)
For more info on Jet ShowPlan, see my article: ShowPlan . . . Run Faster
Also see this excellent article by Mike Wolfe: JetShowPlan: A Primer
UPDATE 16 Mar 2022
Mike Wolfe has just published a related article today which follows up on the likely reasons for the slight differences in times between GROUP BY and SELECT DISTINCT.
Mike used a very detailed analysis tool called Process Monitor to look into the temporary table created when using SELECT DISTINCT but not with GROUP BY
In addition, Scott Prince (AKA Frothingslosh) has just posted some equivalent test results where the data is stored in SQL Server. See this post at Access World Forums.
In Scott's tests, the results are the same as SELECT DISTINCT is also treated as an aggregate query in SQL Server so the query execution plans are identical.
Click to download: Select Distinct vs Group By With Filter Approx 16 MB (zipped)
Please use the contact form below to let me know whether you found this article useful or if you have any questions/comments.
Do let me know if there are any errors
Please also consider making a donation towards the costs of maintaining this website. Thank you
Colin Riddington Mendip Data Systems Last Updated 27 Feb 2019
Return to Speed Test List Page 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Return to Top