Tuesday 22 January 2013

(Rsnapshot) backup and security - I see problems

In my previous post I was asking for suggestions for backup solutions that would be open/free software, do backups over the network to a local HDD, be cross platform to allow Windows and Linux clients and not be too CPU/memory hungry (on the server).

Several people suggested rsnapshot, BackupPC, areca-backup, and rsync. Thank you for all your suggestions, you have been a tremendous help. I have decided to give rsnapshot a try since it was suggested to me that it would actually do what is supposed to do for Windows clients, too (which was initially my perceived show stopper for rsnapshot).

Still, when getting to the implementation, I was a little disappointed by the very permissive access that needs to be provided on the client machines, since the backup is initiated from the backup server. Even the so called more secure suggested solutions seem way too permissive for my taste, since losing the control over the backup system means basically giving total access to the data from all client machines, which is quite a big problem in my opinion.

The data-transfer mechanism employed by rsnapshot is simply
  1. S ==(connects and reads all data)==> C
  2. S stores data in the final storage area
Am I the only one seeing a problem with this idea? If the server can connect to all your client machines and read all areas as it pleases, even if you restrict it to some directories, the data is already compromised when the backup server is compromised (think .ssh private keys, files with wireless network passwords and so on; I won't say card information - you don't keep credit/debit card information on your computer, or at least not in plain text, do you?).

What I would consider a better alternative would be a server-initiated dialogue which goes a little like this (S is server, C is client, '=' represent connections via ssh):
  1. S ---(requests backup initiation procedure)---> C
  2. S waits for a defined period of time that C connects back to send (already encrypted) data; if it doesn't arrive, it aborts
  3. S <===(sends encrypted data to be backed up)=== C
  4. S <-(signals the completion of data transfer)-- C
  5. S stores the data in the final storage area
This way, the server can allow access from the clients only in designated areas (even a chroot is possible) from designated clients, access can even be provided only after a port knocking procedure and only during the backup time frame (since the server initiates the negotiation, it can expect only then the knocks, but only then), so the server is quite well secured. The connection to the server can even be done through an unprivileged account, it can even be one account per client machine which can be limited to a scponly shell, if you care for that level of security.

On the other hand, the client information is secure since it can be encrypted directly on the client machine and sent only after encryption, the client machine can decide and control what it sends, while the backup server can only store what the client provides. Also, if the server is compromised, the clients' data and system aren't compromised at all, since the data is on the backup machine, but is encrypted with a key only known on the client (and a backup copy of it can be stored somewhere safe).

I am aware this approach can be problematic for permission (user/group preservation), but it doesn't happen if there is a local <-> remote user mapping or simply the numeric IDs are kept.

I am also aware this means smarter clients and might mean the Windows machine might not be able to implement this completely, but a little more security than "here is all my data" can still be achieved, can't it?

What do other people think? Am I insane or paranoid?

I think I can implement this type of protocol in some scripts (at least one for server and one for clients) and use the backup_script feature of rsnapshot to keep this clean and nice within rsnapshot.

What might prove problematic with this approach is that rsync spedup is lost (might be?) because the copy is done to a temporary directory which, I assume, is empty, so tough luck. Another problem seems to be that every time the backup is done, the client has to encrypt each of the files to backup, which seems to be a real performance penalty, especially if the data to be backed up is quite large.

Is there an encryption layer that does this automatically at file level in the same/similar manner that LUKS does for entire block devices? Having the right file names, but with scrambled/encrypted contents seems to be the ideal solution from this PoV.

Thanks for reading and possible suggestions you might point me to.

P.S.: I just thought of this, if there was an encryption layer implemented with fuse which is mounted in some directory on the client machine, the default rsnapshot mechanism could actually work, and this would mitigate the data accessibility issue and the performance issue since that file system could be contained within a chroot and the encryption/scrambling would be done transparently on the client, so no data is plainly accessible. Does anybody know such a FUSE implementation that does on-the-fly file encryption?

P.P.S.: EncFS does exactly what I want with its --reverse option which is exactly designed for this purpose:
Normally EncFS provides a plaintext view of data on demand. Normally it stores enciphered data and displays plaintext data. With --reverse it takes as source plaintext data and produces enciphered data on-demand. This can be useful for creating remote encrypted backups, where you do not wish to keep the local files unencrypted.
Great!

12 comments:

Anonymous said...

This is crazy:

c="c" complete="complete" data="data" is="is" signals="signals" the="the" transfer="transfer"

Looks like a technology failure to me. Your arrow looked like an XML comment and something tried to sanitize it, perhaps?

Anonymous said...

have a look at ecryptfs

eddyp said...

@Anonymous: corrected, thanks.

Bob said...

The backup server security is always most critical. If I were going to try to blackhat steal data from a system I would always start by looking to see if I could steal it from the backup first. In the old days for example if a backup tape were available unsecured then the contents of the entire system would immediately be available all in one place without ever even breaking into the target system. In these days of hotswap devices it isn't unreasonable to worry about someone pulling a device out of the backup machine and walking away with it. And the same for over-the-net security.

Anonymous said...

ecryptfs would be such an encryption layer

Anonymous said...

think .ssh private keys, files with wireless network passwords and so on; I won't say card information - you don't keep credit/debit card information on your computer, or at least not in plain text, do you?

ssh keys shouldn't be stored as plaintext either. You may not be paranoid enough.

The "on-the-fly file encryption" idea feels a bit kludgy to me, and would leak metadata. I'd be more inclined to pick something like obnam that was designed for this. You can run it on the client and have it push backups to the server. (I don't know about Windows support; but the inability to create secure backups on Windows shouldn't stop you from doing it on other systems.)

Anonymous said...

"...Does anybody know such a FUSE implementation that does on-the-fly file encryption..."

encfs - it is fuse, on top of that use rsync or http://www.noah.org/wiki/Rsync_backup or http://www.nongnu.org/rdiff-backup/

Unknown said...

Hi,

Strange that nobody suggested "duplicity" for the backup mechanism. It does remote encrypted backup + rsync algorithm.

All you have to do is open an SSH/FTP/whatever chrooted space on the server, and launch the backup from the client. No need for port knocking or complex handshakes.

You can even wrap it up via the simple "duply" script, which makes its configuration very easy.

regards,

Anonymous said...

@Olivier: Probably because Duplicity has no/bad Windows support.

Anonymous said...

We use rsnapshot, but have the ssh keys on the client use command= and have that set to a script that looks at the original command passed by rsnapshot and verifies it against a config file that lists allowed commands and arguments - that allows you to limit which directories rsnaphsot can be run against.

Ganneff said...

Hi

even if you did discard bacula in the past already, you should go and look at it again.

Bacula can encrypt your clients backup data in a way that the backup server has no way of accessing the data. You still have the server initiate backup and tell it where to send it to (from client to the storage server, your director (backup controller) can be elsewhere).

Yes, it has its own storage format. Yes, it is a tiny bit more work to setup first.
Yes, it is totally worth it with the requirements you list.

Ganneff

eddyp said...

@Ganneff: I don't consider bacula a viable solution because I want to be able to restore the backup without the need for a special client/app to provide me access to my data. I know encfs is such an app, but it is trivial to run even on a low-powered system, and my backup server is very slow (266MHz ARM with 32MB of RAM).

Also, due to the nature of my backup server hw limitations, I must exclude anything which relies on high computation power on the server side.