Archive+MD5 and other checksum ideas

Kevin Day thekevinday at
Sun Feb 17 13:39:16 PST 2008

This is sort of off topic and not directly related to HLFS, but
considering the subject I thought I'd shoot this out at you guys.

Transparently Transmitting Checksums With Archives
I have recently had my mind on MD5 sums for file downloads and such.
I put very little thought towards md5sums beyond normal use, but I
started to think that this could be made easier.
It seems to me that md5sums would be easier to transmit with the
actual file and not separately.
The idea is to make md5 checksums implicit.

For a given compressions format, say gzip, first compress the
particular archive and then make an md5 checksum for the particular
Once both of those are setup,wrap the original archive and its
checksum in a second (tar) archive.
Any application might be able to untar the original file and then run
a checksum automatically and only continue if the checksum passes.
This method would require no installation of anything new.
Something could be installed to handle the extraction and auto-check
the checksum on extract.

Here is a bash script I use to create a TMG file (TML = Tar MD5 Gzip):

for i in $* ; do
  I=$(echo $i | sed -e "s|/$||")
  echo "Attempting to archive, compress, and checksum: $I"
  tar --numeric-owner -pc $I | gzip --best > $I.tgz &&
  echo "Archived & Compressed: $I" &&
  md5sum $I.tgz > $I.md5 &&
  echo "Created MD5 Checksum: $I" &&
  tar --numeric-owner -pc $I.tgz $I.md5 > $I.tmg &&
  rm -f $I.tgz $I.md5 &&
  echo "Created TMG File: $I" ||
  echo "An error occured"


System Integrity Scans
With checksum, one could also perform regular checkups on the state of
system libraries and programs (binaries in general).
Given that binaries do not normally change, except when they are
updated, one could create a root-only directory (say /checksum with
drwx------) and have the init program run regular checkups on the
state of the binaries.

For security, this integrity check would allow the system to identify
a potentially infected or damaged binary.
In the rare case of the linux-virus, the md5sum would detect a
potentially infected or insecure binary due to a checksum failure.
The system could then move the infected file somewhere safe, make the
infected file un-executable, and attempt to replace the infected (or
damaged) file.
On a package managed system, the application performing the checksum
could attempt to download the correct binary and replace the old one.
On a source system with a set of how-to-compile rules, a recompilation
of the infected or damaged file could be auto-performed.

I imagine this would also be useful to set off alarms on the system to
tell the user to check the hardware and/or filesystem(s) for problems.

For an embedded system, there should be little overhead of performing
these checksums.
A less aggressive approach would be to check timestamps and only
perform checks against those files that changed, but this will not
help as well against hardware or filesystem failures.

In a quick test against my /sbin/ which contains 28M of programs:
Checksum Space Usage: 1.1M
Time it took to perform checksum against /sbin/:
  real    0m1.401s
  user    0m0.210s
  sys     0m0.187s

For my unusual /bin/ directory of 168M:
Checksum Space Usage: 4.5M
Time it took to perform checksum against /bin/:
  real    0m4.943s
  user    0m1.133s
  sys     0m1.010s

So on a standard desktop computer, these checksums will take a
negligible amount of time and resources.
Performing the checks once a day at midnight prove useful.

Kevin Day

More information about the hlfs-dev mailing list