[Bell Historians] Preferred soft copy archiving medium

AMH hodgeam at btinternet.com
Sun Apr 19 10:09:24 BST 2020


--- I am not a specialist in these things and the question was about pdfs.  The experts among you are probably all well aware that another issue regarding archiving is the storage medium used. People who backed up and archived on floppy disks, tapes and other now obsolete media, may have the media (even if that has not degraded) but do they still have all the hardware machines in working order to run that media reliably? Storage may now be better with cloud storage - but surely there are still risks with that as it is intangible.  

Alison

-----Original Message-----
From: Bell-historians <bell-historians-bounces at lists.ringingworld.co.uk> On Behalf Of John Harrison
Sent: 19 April 2020 10:00
To: bell-historians at lists.ringingworld.co.uk
Subject: Re: [Bell Historians] Preferred soft copy archiving medium

In article <02b201d615b0$4252d340$c6f879c0$@lovesguide.com>,
   Dickon Love <dickon at lovesguide.com> wrote:
> What is the perceived wisdom regarding the best medium for digitally 
> archiving information? Is PDF ok, or there is a preferred format?

It depends on what sort of information is being stored and what failure mechanisms you want it to survive. 

A PDF file explicitly describes what is where on the page whereas a word processor file relies on a set of rules to re-assemble the document, which might or might North East end up with things in the same place depending on hidden constraints.  PDF is more open and less likely to change than proprietary word processors.  OTOH if a PDF file becomes corrupted it can be very hard to extract the content, whereas a corrupt WP file will contain most of the original text (but not necessarily all in the right order, as I discovered when rescuing one last year).

For tabular data, a format like CSV file holds the data (but not formulae) in a far more transparent more that a native spreadsheet file, so will not only outlive the availability of the spreadsheet but will also enable better recovery if it becomes corrupt.

For images I suspect there is a trade-off between compactness and recoverability but others may be able to comment on the robustness of different formats.

For complex documents (words, images, diagrams, tables) there is merit in saving the components separately to reduce reliance on the layer of integration holding them together.  Note that this is done to a degree with some modern formats like ODT and the MS XML style files, where what used to be a black box is now a zip archive containing lots of components - though it is disguised to look like a single file.  

For the adventurous who would like to test that statement, make a copy of a .docx file, make sure its file extension is visible and then change it from .docx to .zip.  Double click it and you will see the contents.

--
John Harrison
Website http://jaharrison.me.uk

_______________________________________________
Bell-historians mailing list
Bell-historians at lists.ringingworld.co.uk
https://lists.ringingworld.co.uk/listinfo/bell-historians




More information about the Bell-historians mailing list