DROID is a file format identification tool. It relies upon a so-called signature file which is an XML file based on data from the PRONOM technical registry. The official DROID signature file currently contains entries for more than 1300 file formats. This is generally a good thing since it means more than 1300 file formats that may be identified by DROID.
However, file format identification with DROID can be faster when using a signature file that is restricted to identify only those formats that are actually of interest for the analysis at hand. If, for example, a digital archive accepts only a small, well-defined set of file formats for preservation purposes, only those need to be identified exactly in the ingest process. All other formats will be rejected anyway, so it may be sufficient to just label them as "unknown file format" and treat them as corrupt data that has to be revised.
Such a restriction of a signature file based on a list of PUIDs (denoting
the accepted file formats) can be automated by the DROID Signature File
Minimizer, or droidsfmin.
See the website.
The droidsfmin tool is written in Haskell. Executables may be compiled on
and for Linux, Windows and Mac OS X.
You need:
- The Haskell Platform which includes a Haskell compiler (GHC) and a build environment (Cabal).
- The
xmlpackage. - The
droidsfminsource files.
Install the Haskell Platform, cd into the droidsfmin source directory and
run the following commands:
$ cabal sandbox init
$ cabal install xml
$ cabal build
You will find your shiny new executable somewhere below the ./dist/build
directory.
To clean up your workspace use the (surprise, surprise) cabal clean command.
You may create a source distribution (i.e., a *.tar.gz or *.zip file) with
the cabal sdist command, for example to move the whole stuff from your Linux
development system to a Windows system for building a Windows executable.
Kudos to the https://github.com/KOST-CECO/KaD_SignatureFile project where the idea this tool is based on was already manually implemented.