Secure and Encrypted Online Backup

This article describes a combination of two different services which allows for encrypted online storage of private data and limiting the probability of data loss.

Motivation

Storing private data securely can be a big challenge. From my point of view, the most important stuff to store are digital photos. Nowadays, most photos will never be printed, but exist only as files on a computer. Both theft and hardware problems threaten the files on a computer, in particular a mobile one.

Simply storing everything "in the cloud" is not the best solution either; I really do not want all my private photos being indexed by a company without having any control over what happens.

For me it was obvious that I should use a system to backup up my data that should

Approaches

By chance, I came around two interesting and very different approaches to this problem, git-annex and tarsnap. This is not a full description of these approaches or a How-To. The goal is to present what the tools do and to give an idea how I use them.

git-annex

git-annex is mainly an extension of git to store binary files in a repository. Yet, the idea is not to store different versions of a binary file, but to allow to have different copies of the repo with different sets of files.

For example, one could have a git-annex repo containing all digital photos and videos. One would then simply use the git-annex commands to add all those files to the repo. This creates git objects from the files and the file names are replaced with symbolic links to these objects. One can add remote repositories to have additional copies of the repo, this includes special remotes, e.g., Amazon S3, and also supports GPG encryption of the files on the special remote.

In the most basic usage, all these repos contain the same set of files. But the real strength of git-annex is that you can drop files from a repo (as long as the file still exists in another one). The repo then only has the information where the file is stored, the information is propagated when the repositories are synchronized.

In this way, one could for example have all important photos in a repo on the laptop, in a second one on an external HD and encrypted on Amazon S3. Less important ones would likely only be the laptop and HD, as remote repos are slow to access and cost money. Other photos which are just kept around might only be stored on the HD, in order to make space on the laptop.

git-annex also supports having other rules than just "keep at least one copy of each files on one of the connected repos", i.e., it can ensure that your marriage photos are in each copy of the repo and cannot be dropped.

tarsnap

I came to using tarsnap after seeing it as sponsor of BSD Now TV. tarsnap is completely different to the above approach. It calls itself "Online backups for the truly paranoid". It achieves this by using GPG to both encrypt and sign online data in order to prevent spying and modification.

The UI is modeled after the command line tool tar which is widely used in the unix world. After creating an account and a local public/private key pair, one can create archives on the tarsnap server. These archives are encrypted and signed locally and then pushed to the server where they cannot be modified any more!

The really interesting part of tarsnap is its ability to deduplicate data accross archives, i.e., if two archives contain the same file, it is stored only one time. This seems like a detail, but allows you to do easy incremental, encrypted and signed backups without wasting space. It is even possible to create cronjobs that backup the data regularly and be sure that only new files are actually sent to the server.

One can for example create an archive, again containing all private photos and store it on the tarsnap server. Then each time new photos are taken, a new archive containing all photos can be generated, but only the new one will take space on the server.

general observations

I now use both methods at the same time to store private data on my laptop, external HDs and on both Amazon S3 and tarsnap servers. This may seem very paranoid, but I do not like at all the idea of losing photos to s.th. simple as a wrong format command or theft.

Some difficulties of course remain:

Protecting private data from loss and preventing unauthorized access to it, should be an important topic for almost everyone today. The two methods I outlined are not easy to use and highlight the often-observed problems with strong cryptography. Nevertheless, git-annex in particular strives offer a more accessible way to organize the repos using the graphical assistant tool, which frees us from command line interface usage.