Data storage to last 100 years: part 1

by PhoenixGames on Mar.20, 2014, under Inventions, Musings

I have previously created a “DataCache” (Posts are Here) , which is basically a solar-powered Data storage device with a wireless (bluetooth) radio for data transmission. Data is accessed by any Android phone with a bluetooth chip installed, and of course, apps could be written for other devices as well. The prototype device which I built does work, but is not designed for outdoor or extremely long-term use.

I have been interested in long term data storage ever since I worked on this project, and while researching this I came across this post, regarding this product from SanDisk which is basically a USB storage device which claims to be able to store data for 100 years. This is a quite a claim, since, obviously, the company can’t have tested this product for 100 years, they have presumably just simulated the wear and tear that the product would go through over that period.

However, the above article raises another interesting point. Even if this device, let’s call it a “data vault”, was able to store data for a century, how would that data be accessed? Computer standards, ports, expansion slots, etc, change extremely quickly. How useful, for example, would an 8″ floppy disk be today? I don’t meant the 3.5″ floppy disks that can *possibly* still be used for BIOS updates on some motherboards, I mean the old 8″ diskettes? Even if you could find a working disk drive, finding drivers that work on a modern machine would be quite a challenge. Fast forward a century, and a modern USB device would be next to useless. The interface would have to be as simple as possible, both in terms of hardware, and the software used to handle the data transfer. The schematics would need to be printed on the device, so that any competent computer professional of the future could easily build a compatible interface device for a modern computer system.

Then there is the question of file format. Storing data in a .doc or . jpg format might be fine for the next five or ten years, but with future operating systems, future software releases, and new compression algorithms, these formats will likely be completely dead well within a century. Accessing the data would then be essentially like deciphering Eqyptian hieroglyphs without the benefit of the Rosetta stone!

What would be needed is a file format that stores the data in as simple a way as possible, likely just 1’s and 0’s. No compression or optimisation algorithms could be used, since the code to decompress or decode the data using these algorithms may not still be in use when the data vault is opened. Any file structuring or encoding information that must be used in storing the data should be printed on the data vault itself, in sufficient detail so as to allow a computer programmer to easily write a program to parse and display the data from the vault.

Another approach would be to store not just the data, but an entire micro-computer system, including an operating system, monitor, keyboard, etc, into the data vault itself. Then, assuming all of the hardware survived, the machine could be powered up, and the data accessed. It could then be either manually copied, or the Data Vault machine could be used to convert the data to a more useful format, possibly by writing a program onto the data vault machine itself. Of course, a compiler, and all relevant libraries, command references, and API’s would need to be on the Data Vault as well.

This is similiar, in concept at least, to what NASA did with the Pioneer and Voyager space probes. These probes had information plates on them which contained data on Earth, Humankind, etc, intended to communicate with an extraterrestrial race if one was encountered by the probe. The problem for NASA is that the aliens probably don’t speak english! Therefore, they had to come up with a means to express the data that would be accessible to any intelligent species, despite the fact that that species would have no knowledge of the language or context of the data at all.

They eventually used a kind of pictogram system, based on fundamental mathematical and scientific principles (Pictures of the structure of the hydrogen atom, for example, and a star map of Earth’s position). The same approach could be used with the data vault, although this task would be much simpler, since the world will likely still be speaking english in 100 years, and it is somewhat likely that computers will still be based on the binary system.

I think that breaking the data down into it’s fundamental elements could provide a means for a future technology professional to access the data using whatever technology is then being used.

This is an interesting area of research, and I think it could become very important in the future. Consider, for example, all of the posts people make to social media, to blogs, to facebook, all of the photos people take of their friends, their travels, etc etc. We live in a digital age, and the record of our lives is stored digitally. In years gone past, a person would leave behind letters and old photos to be remembered by, but in a digital age, we are going to lose that resource. After a person has died, what happens to their digital legacy? Do you take the drive out of their computer, search through it, looking for their old photos, and keeping them? What if you don’t know their password, or where they keep their photos?

Having some kind of long-term Data Vault to store a record of a persons digital life is something that I could see as being quite useful. It would essentially be the electronic equivalent of a box in the attic. It could also be useful for storing records of legal or medical documents, or other documents that would need to be stored for a long period of time.