First Published 14 Jan 2023
This article discusses how best to deal with old, stale data
To start with, this is the answer that I got by asking ChatGPT
Hard to argue with any of that advice.
Or, taking the lyrics of the classic song by the Clash from 1982 totally out of context:
Should I stay or should I go now?
If I go there will be trouble
An' if I stay it will be double
So come on and let me know . . .
Should I stay or should I go
At some point, all database administrators have to decide what to do with old and rarely needed data.
The main choices are:
a) Delete it (after making a backup first)
b) Archive it to separate tables in the same db
c) Archive it to a separate database
d) Mark it as old data (but keep it in the same tables)
Personal Experience . . . or The Benefits of Hindsight
About 20 years ago, I took over responsibility for a large and well-established schools database.
At the time, it was a large split database with an Access backend
Up to 100 staff used it several times each day for managing assessments, reporting and pastoral issues.
It was a largely well structured and mature database which had been in widespread use in multiple schools over a period of several years
However inevitably, performance was poor at times of significant staff use.
The decision was made to move the BE to SQL Server at the first available opportunity to improve stability, security and performance.
However, before doing that, there were other, even more important priorities.
Student data was updated each night from the main school management system (SIMS) with new students added and leavers removed.
The update process meant that all student leaver data was automatically deleted from the database.
A backup was automatically saved every night before the data update.
However, these were only kept for a limited amount of time due to space constraints.
Not only was this probably illegal, it also made management of the old data when needed a major problem
For example, when students move schools or apply for college places or employment, it is standard for a reference to be written.
To do so, it was necessary to load a backup database from just before the student’s leaving date.
At times, references were required many years after a student had left and, often, records were no longer available.
Similarly, if a student leaver returned to the school at a later date, all records from the initial period at the school were unavailable
So, the first (and long overdue) decision made was to retain all data when students left the school.
As the database was already large and well established, the school asked that this be done with as little disruption as possible
It was therefore decided NOT to alter all the existing queries, forms & reports
Instead the decision was made to archive the data:
a) Move assessment & reporting data at the end of each school year into separate archived tables in the same BE
b) Move leavers data into separate leavers tables.
c) This also meant that the leavers data was moved into separate leavers archived tables at the end of each year
This worked perfectly as far as the client school was concerned.
However, for data management purposes, this was a big mistake which I regretted almost from the start
As a result of this decision, instead of one set of tables, we now had four sets of tables to maintain
In order to use all these new tables, we also had to create new queries, forms & reports.
Many of these were largely duplicates of the original objects but with a different record source
Out of a total of 330 tables, there were 18 Archived tables, 62 Leavers tables and 8 LeaversArchived tables with similar knock-on effects for queries, forms & reports
In addition, when the structure of any of the main student tables were altered with e.g. an additional field, all the related archived tables/queries etc, needed to have the same sets of changes made.
Much of this work would have been avoidable by using a different approach.
Mark each student leaver record with a suitable 'flag' but don’t remove the record
The simplest way to do this is to add a boolean field called e.g. Active preferably with a LeavingDate field.
Or better still just use a LeavingDate field where, for students currently on roll, the LeavingDate field is null
Doing that does still require existing queries to be modified by filtering for LeavingDate Is Null (or Not Null for student leavers)
However, now only one set of tables needs to be managed.
Similar methods work for other types of data based on e.g. DeletedDate and DeletedBy fields (or similar). Using this approach significantly reduces the work needed to maintain the database whilst ensuring both accountability and easy access to the old data when it is needed
NOTE: It is usually best not to rely on a LastModifiedDate field / property as many processes can update these automatically
Is there ever a reason to archive data?
In certain cases, it may still be the right decision to move data
For example, where such data is only VERY rarely needed OR when storage space is an issue
Whilst an Access backend database can not exceed 2 GB (and ideally should be kept below about 1 GB), moving the BE to a different RDBMS such as SQL Server provides much greater capacity. Even the free Express version of SQL Server allows for databases of up to 10 GB which is likely to be sufficient for many years for all small to medium sized organisations.
If it is considered necessary to archive old data, I would usually recommend moving it to a separate database.
However, doing this each year is generally unnecessary and often counter-productive
Of course, whichever approach is used, all data should be backed up regularly with backups safely stored in more than one location.
At some point, disaster WILL occur. So you need to plan in advance to deal with such events and with the minimum possible disruption
Back once more to the Clash . . .
This indecision's buggin' me
If you don't want me, set me free
Hmmm. Maybe not the best advice in this case!
Archive Access data - Microsoft Support
Dealing with stale data on File Servers - Microsoft Community Hub
Data Archiving Strategies for SQL Server (microsoft.com)
Please use the contact form below to let me know whether you found this article useful or if you have any questions.
Please also consider making a donation towards the costs of maintaining this website. Thank you
Colin Riddington Mendip Data Systems Last Updated 14 Jan 2023
Return to Access Blog Page
Return to Top