1-800-MAGIC: Testing disk backup

Now that I've written a simple disk backup problem, I need to ensure that it is correct. After all, we're dealing with the user data here - terrible things will happen if it will read (or write) a wrong cluster once in a while!

So, obviously, before I can use it, I need to persuade myself that it won't eat my data.

What makes a great test? The best test of all runs the software on all possible data sets in all possible environments, and verifies that it ran correctly. In our case, verification means that the backup can be restored, and the restored files match the files that were backed up perfectly.

Verification is actually is quite straightforward - we'll take a backup of one partition, restore it to another, and then compare the files.

Much harder is to generate all combinations of file systems and all environments (or partition sizes, in this case).

When there's a test that needs a lot of various combination, and I can't foresee the "interesting" data sets, I like the random number generator. I like it for two reasons. First, given enough time, it will come up with things I could not foresee. Second, most random number generators aren't really random - they generate a fixed sequence of numbers, given the starting number (a seed). For testing, this is actually an asset - this means that the test that is driven by a pseudorandom number generator is REPRODUCIBLE. Why is this good (actually, necessary)? Because when the test does not work, you want to repeat the conditions under which the software broke.

What about running in all possible environments? For us, the environment is the partition size. Here we will have to yield to reality - there is no way we can test a lot of partition configurations. We'll just have to go with the sequence I consider "interesting" - partition sizes of powers of two, powers of two +1, powers of two -1, and a random size in between.

Ok, then, the test. It was written in python and run from the same directory where the source for the program is.

"""Test for discio.

Usage:
  python.exe test.py disk_no letter1 letter2
Where:
  disk_no The number of the disk to use.
      The disk must contain no information.
      Everything on the disk will be wiped out!
  letter1 The letter for the first volume to be
      created on the test disk. Must not be
      an existing volume, or else it will be
      wiped out.
  letter2 The letter for the second volume to be
      created on the test disk. Must not be
      an existing volume, or else it will be
      wiped out.

  WARNING: If run incorrectly, this test can
      be enormously destructive! It cleans disks,
      formats volumes etc. If you call it with an
      incorrect disk number, that disk will be gone!
"""

import os
import random
import sys
import tempfile
import time

class Error(Exception):
  """Base error for the test."""
  pass


class GenericFailure(Error):
  """Test has failed. The string describes the failure."""
  pass


__DISKPART = 'c:\\windows\\system32\\diskpart.exe'
__FORMAT = 'c:\\windows\\system32\\format.com'
__DISKIO = 'Release\\discio.exe'
__BACKUPFILE = 'test.bkf'
__SVI = 'System Volume Information'

A function that compares directories and files. If given two files, it just keeps reading them in 2K chunks, and comparing the results. If given two directories, it lists them, sorts them, and then compares the sorted lists in the merge-sort fashion: if the names are the same, match the files (or directories). If not, say that the smaller (in lexicographical order) does not exists, and the advance the pointer to the smaller one.

def Compare(f1, f2):
  """Compares two filesystem objects.

  Prints out the differences.
  Returns:
    True if the files compare OK.
  """
  ret = True

  if (__SVI in f1 and
      __SVI in f2):
    return ret

  if os.path.isfile(f1) and os.path.isfile(f2):
    fd1 = open(f1, 'rb')
    fd2 = open(f2, 'rb')
    while True:
      s1 = fd1.read(2048)
      s2 = fd2.read(2048)
      if s1 != s2:
         ret = False
         break
      if not s1:
         break
    fd1.close()
    fd2.close()
    if not ret:
      print '%s and %s are different' % (f1, f2)
    return ret

  if os.path.isfile(f2):
    print '%s is a dir, %s is a file' % (f1, f2)
    return False

  if os.path.isfile(f1):
    print '%s is a dir, %s is a file' % (f2, f1)
    return False

  #print "%s <-> %s" % (f1, f2)

  files1 = os.listdir(f1)
  files2 = os.listdir(f2)

  files1.sort()
  files2.sort()

  i = 0
  j = 0

  while i < len(files1) and j < len(files2):
    if files1[i] == files2[j]:
      if not Compare(os.path.join(f1, files1[i]),
                     os.path.join(f2, files2[j])):
        ret = False
      i = i + 1
      j = j + 1
      continue

    ret = False

    if files1[i] < files2[j]:
      print 'Missing ' + os.path.join(f2, files1[i])
      i = i + 1
    else:
      print 'Missing ' + os.path.join(f1, files2[j])
      j = j + 1

  while i < len(files1):
    print 'Missing ' + os.path.join(f2, files1[i])
    i = i + 1
    ret = False

  while j < len(files2):
    print 'Missing ' + os.path.join(f1, files2[j])
    j = j + 1
    ret = False

  return ret

This one is the biggest bottleneck in the whole system: imagine generating 4G of data and writing it to files in a SCRIPT! Yet that's exactly what it does, computer time be damned!

def GenerateFiles(base, size):
  """Generates a bunch of random files and directories.

  Args:
    base The base directory where the files should be.
        Must end with '\\'.
    size The size in kilobytes of files that it should
        not exceed.
  """
  if not os.path.exists(base):
    os.mkdir(base)

  if size > 100:
    GenerateFiles(os.path.join(base, 'dir1'), size / 6)
    GenerateFiles(os.path.join(base, 'dir2'), size / 6)
    GenerateFiles(os.path.join(base, 'dir3'), size / 6)
    size = size / 2

  file_num = 0
  size = size * 1024

  while size > 0:
    if size < 2000:
      file_size = size
    else:
      file_size = random.randint(0, size)
    fd = open(os.path.join(base, ('file%d.txt' % file_num)),
              'w')

    str = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    str = str + str.lower() + '\n'

    try:
      for i in xrange(0, file_size/len(str)):
        fd.write(str)

      fd.flush()
    finally:
      fd.close()

    file_num = file_num + 1
    size = size - file_size

The reason I remove some files is to generate holes - unused clusters between the used ones. Otherwise I would be backing up a completely filled disk - almost as uninsteresting as testing backup of an empty disk!

def RemoveSomeFiles(base):
  """Removes every third file from a tree."""
  counter = 0
  for (basepath, dirs, files) in os.walk(base):
    if __SVI in basepath:
      continue
    for file in files:
      if counter % 3 == 0:
        full_file_name = os.path.join(basepath, file)
        os.remove(full_file_name)
      counter = counter + 1


def GenerateFileSystem(base, size):
  """Generates a bunch of random files and directories.

  Args:
    base The base directory where the files should be.
        Must end with '\\'.
    size The size in kilobytes of files that it should
        not exceed.
  """
  try:
    GenerateFiles(base, int(size * 9 / 10))
  except IOError:
    pass

  RemoveSomeFiles(base)

Finally, the test itself. We use diskpart to wipe and repartition the disk on every iteration - which means that we need the whole disk to ourselves to run. If you haven't figured this out on your own by now :-).

def RunTest(size, fs, disk, source, dest):
  """The test.

  Creates a two partitions, on the disk,
  formats one, fills it with a bunch of random
  files, backups it, and restores the results
  on the second partition.

  Then verifies that the files are identical
  on both.

  Args:
    size The size of an individual partition
    fs The file system to use, FAT, FAT32,
        or NTFS
    disk The number of the disk to use in testing
        Everything on this disk will be gone!
    source The letter to use for volume on which
        the test data will be generated
    dest The letter to use for volume to which
        the backup will be restored.

  Throws:
    GenericFailure if the test fails

  """
  print 'Running the test.'
  print '  Physical disk: %d ' % disk
  print '  Parition size: %dM' % size
  print '  File system:   %s' % fs

  (fi, script) = tempfile.mkstemp()
  os.close(fi)
  fd = open(script, 'w')
  fd.write('SELE DIS %d\n' % disk)
  fd.write('CLEAN\n')
  fd.write('CREAT PART PRI SIZE=%d\n' % size)
  fd.write('ASSIGN LETTER=%s\n' % source)
  fd.write('CREAT PART PRI SIZE=%d\n' % size)
  fd.write('ASSIGN LETTER=%s\n' % dest)
  fd.write('EXIT\n')
  fd.close()

  print 'Preparing the partitions'

  ret = os.spawnv(os.P_WAIT, __DISKPART,
                  [__DISKPART,
                   '/s',
                   script])

  os.remove(script)

  if ret != 0:
    raise GenericFailure(
        'Could not partition the disk!')

  ret = os.spawnv(os.P_WAIT, __FORMAT,
                  [__FORMAT,
                   source + ':',
                   '/fs:' + fs,
                   '/v:test',
                   '/y'])

  if ret != 0:
    raise GenericFailure(
        'Could not format the test partition')

  print 'Generating test files'

  GenerateFileSystem(source + ':\\', size * 1024)

  time.sleep(20)

  print 'Running backup'

  ret = os.spawnv(os.P_WAIT, __DISKIO,
                  [__DISKIO,
                   source + ':\\',
                   'readto',
                   __BACKUPFILE]);

  if ret != 0:
    raise GenericFailure('Backup failed!')

  print 'Running restore'

  ret = os.spawnv(os.P_WAIT, __DISKIO,
                  [__DISKIO,
                   dest + ':\\',
                   'writefrom',
                   __BACKUPFILE]);

  if ret != 0:
    raise GenericFailure('Restore failed!')

  if not Compare(source + ':\\', dest + ':\\'):
    raise GenericFailure(
        'Restore does not match backup!')

  print 'Test succeeded!'
  print

And finally, the main. First and foremost it is trying to keep you safe: prevent you from wiping out some useful data.

def main(args):
  """Main entry point.

  Args:
    args: Command line arguments. See usage.

  """

  random.seed(0)

  if len(args) < 3:
    print __doc__
    return
    
  try:
    disk = int(args[0])
  except ValueError:
    print 'Disk should be a number!'
    return

  if disk <= 0:
    print 'Disk cannot be a system disk (0).'
    return

  source = args[1].upper()
  dest = args[2].upper()
  if len(source) != 1 or len(dest) != 1:
    print 'Only use 1 letter to specify drives'
    return

  if (dest < 'K' or dest > 'O'
      or source < 'K' or source > 'O'):
    print 'Use letters K through O for test drives.'
    print 'This is to minimize potential of a'
    print 'collision with the real drives.'
    return

  if dest == source:
    print 'Source and destination should be different!'
    return

  if os.path.exists(source + ':\\'):
    print 'Cannot continue: %s:\\ exists!' % source
    return

  if os.path.exists(dest + ':\\'):
    print 'Cannot continue: %s:\\ exists!' % dest
    return

  print 'Running the test on physical disk %d' % disk
  print 'and using drive letters %s and %s' % (source,
                                               dest)

And here we run through a bunch of partition sizes (in Megabytes) I thought were interesting, and run backup test on 3 file systems: FAT16, FAT32, and NTFS.

  try:
    for size in [50, 100, 128, 200, 256, 300, 315,
                 1000, 1024, 2047, 2048, 2049, 4095,
                 4096, 4097]:
      RunTest(size, "FAT", disk, source, dest)
      RunTest(size, "FAT32", disk, source, dest)
      RunTest(size, "NTFS", disk, source, dest)
  except GenericFailure, f:
    print "Test failed: " + f.message

  
if __name__ == '__main__':
  main(sys.argv[1:])

And - voila! This ran overnight and succeeded!

1-800-MAGIC

Friday, February 29, 2008

Testing disk backup

No comments:

NIST Clock

About Me

Blog Archive

Links

Visitors