Regex Or Not
First Published 9 Mar 2022 Last Updated 24 Dec 2022 Difficulty level : Moderate
This is the tenth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding.
This article was written in response to a thread at Access World Forums: Range of ASCII values in criteria
Regular Expressions
Regular expressions (RegEx) are particularly suited to finding complex and repeating patterns in data.
However, regular expressions are NOT part of the standard VBA object library.
To start using RegEx with VBA, you need to select and activate the Microsoft VBScript Regular Expressions 5.5 object reference library from the references list.
To work with RegEx, you need to initiate a RegEx object.
The RegEx object is the mechanism through which you can utilize the RegEx properties and operations. To use the RegEx object, you must declare it through the following syntax:
Dim regexObject As RegExp
Set regexObject = New RegExp
This approach is called early binding and gives the benefit of Intellisense
The alternative is to use late binding in which case the above reference is NOT required
Dim regexObject As Object
Set regexObject = CreateObject(“VBScript.RegExp”)
The attached example uses late binding but this decision should have no effect on the speed of the tests
Tests & Results
In the Range of ASCII values in criteria thread at Access World Forums, the original poster was trying to review text and allow only space, apostrophe, A-Z, a-z, and "Latin-1 Supplement" (other European) letters (À-ÿ = extended ASCII values 192-255).
Two different solutions were suggested by AWF moderators Jack Drawbridge and Galaxiom using standard VBA functions and regular expressions respectively.
Each achieved the same result.
1. Checking the ASCII valueof each character in the text via Select Case (Jack Drawbridge)
2. Using Regex - RegExIllegalSearch procedure (Galaxiom)
The question then arose as to whether there was any significant difference in the execution times.
Galaxiom asked if I would compare the speed of each approach
As with many of the speed comparison tests I have posted, the expectation was that the differences would be minimal.
However, in this case there was a VERY marked difference in the times measured (see below).
For this example, I found a file of World Cities data (12959 records) at https://simplemaps.com/data/world-cities that contained various alphabetic, numeric and punctuation characters for testing and imported several (not all) fields from all records.
Two queries qryNoRegex and qryRegex were written to invoke the approaches mentioned above.
The module RegexRelatedTests includes two functions:
1. Testnames (function written by Jack Drawbridge) which invokes qryNoRegex
2. RegExIllegalSearch (function written by Galaxiom) which invokes qryRegex
In the same module are procedures RunTest which runs the 2 functions; and RepeatTest which executes RunTest 20 times in turn to check for variations in execution times.
Times were measured using clsTimer which uses the api GetTime Alias "timeGetTime".
Output was saved to the table tblResults. Each set of tests were performed 5 times giving 100 sets of results and average times were calculated.
Each set of results was very consistent with minimal variation between values
However, on the first set of tests the average time using Regex was 0.399 seconds whereas the standard VBA approach took an average of 0.067 seconds (approx 1/6 of the time)
The outcomes were so surprising that the tests were run on 3 workstations for comparison:
a) Desktop PC with 16 GB RAM – Access 365 32-bit
b) Laptop with 8 GB RAM – Access 365 64-bit
c) Windows tablet with 4 GB RAM – Access 365 64-bit
In each case, the approach using Regex took between 3 and 6 times longer than the standard VBA approach
Download
Click to download the example app used in the above tests: RegexNoRegex_v2.zip approx 3.4 MB (zipped)
UPDATE 24 Dec 2022
A recent thread at Access World Forums : Replacing Multiple Spaces from the Middle of a String provided an ideal opportunity to revisit this article
The author of that thread wrote:
Is there a simple way of replacing multiple spaces from the middle of string with a single space? This works
Ship To Name: Replace(Replace(Replace(Replace([SHIP TO Name]," "," ")," "," ")," "," ")," "," ")
Is there a cleaner solution?
Several forum members responded with suggestions which fell into two groups:
a) looping through repeatedly replacing double spaces with single spaces until complete
b) looking for patterns using regular expressions
There were 5 different code samples (3 loops & 2 Regex) suggested by different forum members.
I tested each sample, checking that each gave the correct output. I found that all code samples were very fast on such a simple task.
I tested three different strings varying in length from 26 to over 12,000 characters repeating each test 10,000 times and measuring the total time taken
I have used the fastest sample for each approach in the attached app but all 5 code samples are provided.
Loop CODE
Public Function ReplaceMultiSpace(strIN As String) As String
'Code supplied by @MajP
Dim old As String
Dim newout As String
old = Trim(strIN)
newout = old
Do
old = newout
newout = Replace(old, " ", " ")
Loop Until old = newout
ReplaceMultiSpace = newout
End Function
Regex CODE
Public Function OneSpaceOnly(ByVal var As Variant)
'Code supplied by @arnelgp
Static oReg As Object
Dim s As String
On Error GoTo create_object
If IsNull(var) Then Exit Function
s = var
With oReg
.Pattern = " {2,}"
.Global = True
s = .Replace(s, " ")
End With
OneSpaceOnly = s
Exit Function
create_object:
Set oReg = CreateObject("vbscript.regexp")
Resume Next
End Function
The average times for each set of tests were as follows:
For the short string of only 26 characters, there was little difference between the methods. However, for such a short string, it would be just as easy to edit it manually!
However, for the two longer strings, the speed benefits of using regular expressions for a task of this type became very significant.
The results are displayed as a chart below where the Loop results are shown in BLUE and the Regex results in ORANGE.
Many thanks to Access World Forum members arnelgp, ebs17 and MajP for providing code used in this second set of tests
Download
Click to download the example app used in the second set of tests: RegexOrLoop.zip approx 1.3 MB (zipped)
Conclusions
Regular expressions are an extremely powerful tool for developers to make use of.
They can be applied in a wide variety of situations and learning how to use them effectively cn be extremely beneficial
In the first set of tests, using RegEx was clearly disadvantageous. Perhaps its use was overkill or inappropriate for the test done in that case?
By contrast, the second task was ideal for using regular expressions which are designed to work with matching patterns
In certain situations, Regex may provide the best or only method of obtaining results.
Of course, (as with any of the tests in this series), speed isn't the only factor to consider. For example, the additional time needed to devise a Regex solution (for those who are inexperienced in its use) may outweigh the time it may save in terms of execution.
Further Reading about Regular Expressions
https://software-solutions-online.com/vba-regex-guide/
https://www.devhut.net/vba-using-regular-expressions-regex/
https://regex101.com/
https://regexlib.com/Default.aspx
Feedback
I would be grateful for any feedback on this article
I would also welcome any suggestions for other tests in order to further assess the comparative strength of regular expressions against other methods.
Please use the contact form below to let me know whether you found this article useful or if you have any questions/comments.
Please also consider making a donation towards the costs of maintaining this website. Thank you
Colin Riddington Mendip Data Systems Last Updated 24 Dec 2022
Return to Speed Test List
Page 10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Return to Top
|
|