[Bell Historians] Preferred soft copy archiving medium

John Harrison john at jaharrison.me.uk
Sun Apr 19 10:00:10 BST 2020


In article <02b201d615b0$4252d340$c6f879c0$@lovesguide.com>,
   Dickon Love <dickon at lovesguide.com> wrote:
> What is the perceived wisdom regarding the best medium for digitally
> archiving information? Is PDF ok, or there is a preferred format?

It depends on what sort of information is being stored and what failure
mechanisms you want it to survive. 

A PDF file explicitly describes what is where on the page whereas a word
processor file relies on a set of rules to re-assemble the document, which
might or might North East end up with things in the same place depending on
hidden constraints.  PDF is more open and less likely to change than
proprietary word processors.  OTOH if a PDF file becomes corrupted it can
be very hard to extract the content, whereas a corrupt WP file will contain
most of the original text (but not necessarily all in the right order, as I
discovered when rescuing one last year).

For tabular data, a format like CSV file holds the data (but not formulae)
in a far more transparent more that a native spreadsheet file, so will not
only outlive the availability of the spreadsheet but will also enable
better recovery if it becomes corrupt.

For images I suspect there is a trade-off between compactness and
recoverability but others may be able to comment on the robustness of
different formats.

For complex documents (words, images, diagrams, tables) there is merit in
saving the components separately to reduce reliance on the layer of
integration holding them together.  Note that this is done to a degree with
some modern formats like ODT and the MS XML style files, where what used to
be a black box is now a zip archive containing lots of components - though
it is disguised to look like a single file.  

For the adventurous who would like to test that statement, make a copy of a
.docx file, make sure its file extension is visible and then change it from
.docx to .zip.  Double click it and you will see the contents.

-- 
John Harrison
Website http://jaharrison.me.uk



More information about the Bell-historians mailing list