Archive for the ‘md5’ Category

Checksum-based Backup Methodology – Thoughts

Friday, December 10th, 2010

Looks like I’ve evolved a hybrid system:

  1. Proper backup / synch tool: SuperFlexibleFileSynchronizer (abbrev to “SuperFlex”).
  2. Procedural with lower-level tools: File drag-copy combined with MD5Summer.

The idea is that normally I would use SuperFlex, but for occasions where I already have manually-created (supposed) mirrors, I can retrospectively check consistency at content (not just datetime and size) level.  Have yet to experiment with SuperFlex to see if it can verify existing copies of files (as opposed to copies that it is making).

md5 Check-Sum Apps for Windows

Thursday, December 9th, 2010

md5 Checksumming in Windows:

  • There is no in-built checksumming tool I know of in Windows, but they are standard in linux etc.
  • The best third-party md5 checksum tool I have discovered for Windows is MD5Summer (V. 1.2.0.11)  [http://www.md5summer.org/download.html].
    • When you run it, it first checks file associations and offers to set it as default for [.md5] files.  In W7 an attempt to do so may be denied (admin status required).
    • You first select a root-folder then files within it (or all, recursively).  To select files, it’s not sufficient to highlight them, you have to double-click them (or click the Add button), so that they appear in the right-hand pane.  Only then will the OK button be enabled.
    • By running it recursively from a root folder, rather than on individual folders, a single [.md5] file is created that does not pollute the subfolders (e.g. BPAV folders as recorded by XDCAM-EX).
    • Had a slight issue once where I manually copied from XP to W7 machines, then generated MD5 on source machine (XP) and verified on target machine (W7).  At first, the verifier immediately returned all mismatches.  So immediately in fact that it seems likely it was not performing any computation.  Later on it worked as expected.  Not known what changed to cause this, maybe simply the act of opening the md5 file in NotePad..

Concepts (to the best of my understanding):

  • A check-sum of some data is a “fingerprint” numeric value that is probably-unique to that data.  A fingerprint is useful when it is much smaller than the data it represents yet (virtually) uniquely identifies the content of that data.  A checksum of a file reflects that file’s data-contents but is not affected by the file name etc.
  • Various checksum schemes/algorithms exist, one of the most popular being md5.   An md5 fingerprint is extremely unlikely to be the same for any other file and will (for that reason) almost certainly change in value if the contents of a file is changed by even the tiniest degree.
  • A typical md5 tool will take one or more specified files and generate their fingerprints as corresponding entries (each entry being a filename and a fingerprint) in a [.md5] file.  Correspondingly it will take a given [.md5] file and report whether or not the stated and (re-) computed fingerprints agree.
  • An [.md5] file is a text file with one or more lines consisting of a checksum value (in hexadecimal) then a space then an asterisk then a file name, possibly preceded by a folder path (with respect to the folder containing the [.md5] file).  It can also have comment-lines, each beginning with a hash (#) character.  Example entries:
    • eb574b236133e60c989c6f472f07827b *fred.exe
    • [eb574b236133e60c989c6f472f07827b *tmp/fred.exe].
  • Some download sites include [ .md5] files alongside or along with their associated  files.  Some sites just display the fingerprint itself on the webpage.  Typically the purpose is to allow the user to check whether a download was complete or was corrupted.
  • The fingerprint computed by a typical checksumming application is not affected by a file’s name or read-only status etc., only by its data contents.  Thus it is not a total basis for consistency-checking of system configuration.

(more…)