Click any image to view a larger version



SELECT DISTINCT vs GROUP BY


First Posted 28 Feb 2022           Last Updated 16 Mar 2022                                      Difficulty level :   Moderate


This is the ninth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding.

Example databases are provided in each case so the same tests can be done on your own workstations

This article follows a request by Access World Forums moderator CJ_London  in this thread: Mythbusters: Having vs Where
He asked me to test the following:

We often see OP's using aggregate queries without any aggregation rather than SELECT DISTINCT - do you have an example to compare a SELECT DISTINCT v GROUP BY with perhaps a criteria or two?

I must confess I'd never thought about comparing these before and asked the following question of other forum members:

What are your predictions before I make the results public?

Before I started, I had a gut feeling that SELECT DISTINCT would be slightly faster than GROUP BY though I couldn't have given any meaningful reason for that viewpoint.
Most of the forum members who responded tended to agree . . . BUT in fact, the opposite was true . . .

I did various tests on 2 different datasets and on each test (without exception), GROUP BY was slightly faster.
The differences were relatively small but they were consistent in every test

First of all, I tested my Postcodes dataset (2.6 million records)
In my first set of tests, I ran the following queries:

qryDistinct

SELECT DISTINCT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes;


qryGroupBy

SELECT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector;


Each gave exactly the same output of 12415 records.

All fields in the queries were indexed
I looped through each query 3 times and measured the total time in each case.
I repeated each test & calculated the averages:

IMG1
I then filtered the data when running each query to see if that would change the results. Same outcome...

IMG2
I then repeated the tests, but this time appended the query output into a temp table each time to ensure that the query had been loaded fully during the tests.
Same outcome ....

IMG3
I ran JET ShowPlan to check the execution plan on each of the queries. The results were almost identical:

IMG4
IMG5
Next I tried using a different dataset supplied by CJ_London.
It was a smaller dataset of approx 150K records so I looped through each query 20 times.
This time, there were 2 non-indexed fields in the queries
Once again the two queries gave identical outputs = 8324 records

Did that affect the outcomes? NO

IMG6
Next I ran Jet ShowPlan on each query. The execution plans were identical:

IMG7
IMG8
NOTE:
The 'by scanning' reference indicates that Access had to scan the entire dataset in each case due to the inclusion of non-indexed fields

I can't upload my Postcodes dataset as the ACCDB file is about 1.6 GB
However, I have uploaded the example app with CJ_London's dataset (with his agreement)



Further Reading:

For more info on Jet ShowPlan, see my article:     ShowPlan . . . Run Faster

Also see this excellent article by Mike Wolfe:    JetShowPlan: A Primer

UPDATE 16 Mar 2022
Mike Wolfe has just published a related article today which follows up on the likely reasons for the slight differences in times between GROUP BY and SELECT DISTINCT.
Mike used a very detailed analysis tool called Process Monitor to look into the temporary table created when using SELECT DISTINCT but not with GROUP BY

In addition, Scott Prince (AKA Frothingslosh) has just posted some equivalent test results where the data is stored in SQL Server. See this post at Access World Forums.
In Scott's tests, the results are the same as SELECT DISTINCT is also treated as an aggregate query in SQL Server so the query execution plans are identical.



Download:

Click to download:       Select Distinct vs Group By With Filter     Approx 16 MB (zipped)



Colin Riddington           Mendip Data Systems                 Last Updated 16 Mar 2022



Return to Access Articles Page Return to Top Page 9 of 11 1 2 3 4 5 6 7 8 9 10 11