Viewpoints
An accurate, technical explanation
of the PFD data loss
By Norm Snyder
March 22, 2007
Thursday AM
You may have read a story about a Permanent Fund Dividend (PFD)
data loss, following is an accurate, technical explanation of
the occurrence.
The PFD Division uses a pair of clustered Dell servers running
MS SQL 2005 connected to Dell EMC storage arrays with approximately
3 Terabytes (Tb) of usable space. The PFD database consists of
approximately 1.5 Tb of data and images with an additional 300-500Mb
of data and images added each year. The PFD Division uses a pair
of clustered Dell servers running MS SQL 2005 connected to Dell
EMC storage arrays with approximately 3 Terabytes (Tb) of usable
space. The PFD database consists of approximately 1.5 Tb of data
and images with an additional 300-500Mb of data and images added
each year.
The majority of the PFD database consists of paper documents
that have been scanned and added to the database as digital images.
Since the space requirement for images is quite large, the image
portion of the database consists of a partitioned table, which
is divided into file groups, each year being a separate partition
in it's own file group. The active year is a read/write file
group and once closed out, it is marked as read only. Then the
entire database is backed up using filegroup (partial) backups,
followed by a transaction log backup. These database backup files
are then backed up to tape and safely stored at several locations.
The partition table is a new feature of MS SQL 2005, making it
much easier to backup large databases. Since the data in read-only
file groups does not change, it does not have to be backed up
as frequently as the information kept in the active read-write
file groups. This flexibility also adds another layer of complexity
to file management. As new active read/write file groups are
added to a database, it is imperative that they are also backed
up at the same time as the other active file groups, along with
transactions logs, otherwise you will be unable to completely
restore and bring online the database.
In June of 2006 hard errors were reported by one of the disk
drive storage array processors. Our Network Specialist established
a service call with the hardware vendor, and was advised to run
a Background Verify as the last option to repair inconsistencies
in the Raid stripe set. This failed to correct the problem. The
remaining option left was to unbind then bind the two LUN's that
were corrupted. First the Specialist moved the database filegroups
stored on the corrupted LUN's to another LUN in preparation for
the unbind and bind (unbinding and rebinding destroys all data
on the LUN). Unfortunately, it is difficult to correlate which
LUN number corresponds to which drive letter designation on the
server, and some of the files were mistakenly moved to the LUN's
which were to be rebound. The Spec was working on a remote desktop
session with a Dell storage specialist at the time of the unbind
and rebind, and the Dell specialist also missed this error. After
the rebind it was discovered that one of the LUN's rebound had
contained data files and also the SQL backup files.
The Network Specialist then attempted to restore the database
backups from tape. Unfortunately one critical file, the primary
filegroup (MDF) for the PFD Database had inadvertently not been
selected to be written to tape during the normal backup process.
Without a current backup of the primary .MDF file, the database
could not be restored and brought online.
Our IT Staff worked over the weekend restoring all of the Read
Only historical file groups for the years 2000 thru 2005, using
an older backup of the Primary (MDF) file, but was unable to
successfully restore the 2006 filegroup. Even though there was
a current backup for 2006 and transaction log backups, a matching
backup of the Primary file was also needed.
While little actual data was lost, what was lost was 800,000
paper documents that had been received during 2006 and scanned
into the system. It must be made perfectly clear that the loss
of these images was not the fault of the backup software, the
tape library, storage array, or any other components. It was
strictly human error and the consequences of not placing a check
box next to the .MDF file, instructing the tape backup software
to place the file onto tape.
The PFD division retains all paper documents received for at
least one full year. To recover, the database was placed online
with an active (2006) file group containing no data. Four seasonal
employees returned to work in the summer of 2006 and for a period
of 2 1/2 months, the paper documents were rescanned and the active
file group was repopulated with the newly scanned images. The
images that could be OCR'd were automatically linked with their
corresponding data elements. Those that could not be OCR'd were
manually linked. Once again, no data or images were permanently
lost in the process.
Our IT Staff and PFD application programmers are now much more
familiar with what is required to successfully restore this database
from filegroup backups, some of which is new in MS SQL 2005.
We have since added additional disk storage to the disk drive
array and now have a total of 7 Tb usable space making it much
easier to restore databases and perform regular tests. We now
have a formal written backup plan where active file groups and
logs are backed up daily and full backups which include all file
groups are performed quarterly. A team of IT and PFD Programming
staff now review and certify backup logs daily and review all
database properties, backup procedures, and scripts monthly.
Backup cycle rotations are reviewed and each scheduled backup
is now certified by three people as being complete. IT and PFD
Programming staff also perform end to end backup and restore
drills quarterly, verifying that the entire database can be restored
from tapes used in the Quarterly full backups. Future plans include
the development of an offsite file replication system to be located
in our Anchorage facility.
Norm Snyder
IT Manager
Alaska Department of Revenue
Note: Comments published on Viewpoints are the opinions of
the writer
and do not necessarily reflect the opinions of Sitnews.
Publish A Letter on SitNews Read Letters/Opinions
Contact the Editor
SitNews
©2007
Stories In The News
Ketchikan, Alaska
|