I have been battling for a while to find the best way to outsource my data "in the cloud". I mean personal data: camera pics, documents, backups, etc. What I want to achieve is:
no worry about hard disk failure, theft, fire and so on (you never know - our life is digital by now and this can be very annoying)
no more discrepancies and distributed home storage hassle (some data on laptop 1, other data on desktop 2, "where did you save the last version?", etc.)
remote access when on leave/abroad/at work/on vacation
on-demand scalable storage size
a cheap solution, no $500 SAN/NAS hardware at home with huge electricity bills
Naturally, there are already packaged solutions, especially:
the so called cloud storage (Amazon S3, Google cloud, Rackspace, etc.) OpenStack/Swift compatible in theory. In theory because of the amazing Github boilerplate code you need to make it accessible from linux, because each of these platforms has its own slight difference with the standard. Moreover the REST container approach to storing my mail, pictures or speadsheets is much oversized and unpractical. I tried sshfuse, s3fuse and other drivers but it always ends with disconnections, errors, authentication problems, etc. HTTP layer for storage is also very slow and inefficient for small files and somewhat frequent read/writes. The same remarks apply to WebDAV solutions (Yandex Disk).
Dropbox and the like - Please I want a real technical solution with full compatibility with standard networking protocols...
SCP/SFTP solutions (hidrive, adrive, etc.) seemed interesting. I gave them a try, some are much cheaper than others. Therefore I fear that some of them will close eventually, and in this case I am not sure of my data availability. Moreover the SSH servers are not always configured as you would like. Frequent disconnections, limited directory depth or filename length (which makes you very happy when you discover these undocumented limits), etc. Upload and download speed not mentionned, therefore not guaranteed, etc.
A warm recommendation for rsync.net, one of the only storage provider which runs a unison server, allowing very efficient two-ways synchronization, as well as remote ssh access, and responsive customer service. I was happy with them, but unfortunately they are too expensive ($16 for 50Go per month) - however a good service for professionals customers.
A remote dedicated server with 1 To hard disk. You can find some not too expensive. The problem is that those hard disks WILL fail eventually. It is not a question of "IF", but a question of "WHEN". Lucky am I, I experienced a failure, and it was replaced under 6 hours. But with an empty new one...
Finally I ended up with a VPS with RAID 5/10/50/100 redundant storage. They are really cheap by now, especially those based on OpenVZ. However the storage capacity is often poor and the price rises really fast if you need say 500 Go or more. Moreover in this kind of remote product, there are also many new and (too) cheap providers (see lowendbox.com) for a list) which make you suspicious about the availability of your data in two of three years - how many of them will survive? By googling a bit I was convinced by the reliability of Ramnode, which received almost only positive comments.
You can get a VPS with 50Go redundant storage for $2 per month. Okay it has a poor processor and almost no RAM. But it is not what I am looking for anyway. You can install whatever you need as protocol: sftp, rsync, unison, ftp, etc. BUT how can you deal with scalability? By upgrading ? Unfortunately, if you want a 200Go or even 1To VPS, you have to order also the multicore processor and 16Go of RAM that come with it. And that is not what I want. I just want storage space.
What about another small VPS with 50Go storage? Now the problem is that I don't have 100Go. I have 2x50Go. And that is quite a hassle for everyday use. Why not 5x50Go in 2 months, even 15x50Go in a year? This is not a satisfying solution. What we need is an expandable storage like LVM. But LVM relies for block devices, and we have only logical devices on these VPS.
So let's try to emulate a 50Go block device on each VPS. This is possible with NBD, a somewhat old protocol which turns out to be very useful here. Why not the modern iSCSI? Because I found out after many hours of googling that cheap VPS (which are running on OpenVZ virtualisation software) almost never support block protocols without loading a kernel module on the host, which is operated by the provider. And many provider do not include this module for security reasons. Therefore iSCSI does not work. NBD is much smarter with the kernel and runs entirely in userland, allowing to use a raw file as a block storage. That is the key.
So let's create a 50Go raw block device in a file on each of the VPS (debian-based) :
openvz1$ sudo aptitude install nbd-server openvz1$ sudo dd if=/dev/zero of=/root/storage.dev seek=1K bs=50000000 count=1 openvz1$ sudo vi /etc/nbd-server/config
[storage] exportname = /root/storage.dev authfile = /etc/nbd-server/allow
Warning, nbd does not encrypt the traffic between the client and the server. This might not be a problem if you choose the same provider as the client (see below). Anyway we will see below how to encrypt the data therefore this is not really a problem. Let's however configure an IP-based access control list.
openvz1$ sudo vi /etc/nbd-server/allow
127.0.0.1/32 <nbd client ip - see below>/32
The same procedure holds for every VPS you have, therefore for each 50Go bucket. Now we need a frontend VPS which will run LVM and provide further access to our personal computers. This unique frontend VPS MUST NOT run on OpenVZ, but on KVM virtualisation layer instead (for the same reasons as above). Therefore it is a little bit more expensive (say $4 per month). You don't need big RAM/processor/storage either. Let's configure the frontend vps and connect it to the servers.
kvm1$ sudo aptitude install nbd-client kvm1$ sudo nbd-client <nbd server openvz1 ip> /dev/nbd0 -name storage kvm1$ sudo nbd-client <nbd server openvz2 ip> /dev/nbd1 -name storage kvm1$ ... etc.
For automatic nbd connection at boot time we can edit the nbd client configuration file.
kvm1$ sudo vi /etc/nbd-client
and add one block for each server, ranging from nbd0..n:
NBD_DEVICE=/dev/nbd0 NBD_TYPE=f NBD_HOST=<nbd server openvz1 ip> NBD_PORT= NBD_NAME=storage NBD_EXTRA=-persist
Now you should have a new block device for nbd0..n:
kvm1$ sudo fdisk /dev/nbd0 Disk /dev/nbd0: 51.2 GB, 51249999872 bytes 202 heads, 43 sectors/track, 11524 cylinders, total 100097656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0c8f24b3
You have to create a partition on each nbd device such that you get the following table:
Device Boot Start End Blocks Id System /dev/nbd0p1 2048 100097655 50047804 8e Linux LVM
Now everything is like you have a VPS with n local block devices, 50Go each. You just have to configure LVM as usual:
kvm1$ sudo pvcreate /dev/nbd0p1 /dev/nbd0p2 kvm1$ sudo vgcreate mycloudvolumegroup /dev/nbd0p1 /dev/nbd0p2 kvm1$ sudo lvcreate -L 100G -n mycloudlogicalvolume mycloudvolumegroup
Now you get a new logical block device over LVM in /dev/mycloudvolumegroup/mycloudlogicalvolume. Just format it as usual:
kvm1$ sudo mkfs.ext4 /dev/mycloudvolumegroup/mycloudlogicalvolume kvm1$ sudo mkdir /mnt/mycloud kvm1$ sudo mount -t ext4 /dev/mycloudvolumegroup/mycloudlogicalvolume /mnt/mycloud
And you're done. You can modify /etc/fstab to automatically mount the volume at boot time:
/dev/mycloudvolumegroup/mycloudlogicalvolume /mnt/mycloud ext4 defaults
Whenever you need more storage and add another 50Go VPS server, you will just have to extend the LVM by:
kvm1$ sudo pvcreate /dev/nbd0p3 kvm1$ sudo vgextend mycloudvolumegroup /dev/nbd0p3 kvm1$ sudo lvextend -L +50G /dev/mycloudvolumegroup/mycloudlogicalvolume kvm1$ sudo resize2f /dev/mycloudvolumegroup/mycloudlogicalvolume
Now back at home. I bought a debian HP thin client for $50 which I installed in a cupboard.
home$ sudo mkdir /mnt/cloud home$ sudo sshfs login@kvm1:/mnt/mycloud /mnt/cloud -o allow_other
We still have not worried about encryption. You should not outsource your personal data without proper encryption. We will use encfs, which provide on-the-fly encryption at the file level. It is very easy to configure.
home$ sudo mkdir /mnt/cloud.cleartext home$ sudo encfs --public /mnt/cloud /mnt/cloud.cleartext
Now you can work in the /mnt/cloud.cleartext directory, everything will be encrypted on-the-fly, tunnelled by SSH to your KVM VPS, which transparently distributes it on the NBD servers on the OpenVZ VPS, which save it on the RAID5/10/50/100 provider infrastructure. Now you can easily share the /mnt/cloud.cleartext directory with Samba with all your local computers. You can even install a SSH server on your home computer, or an openVPN server in order to connect to it from anywhere in the world.
We have just reached our goal which was:
no worry about hard disk failure - you get RAID5/10/50/100
no more distributed home storage - you get a central repository shared with Samba
remote access - you can access your central repository by SSH/OpenVPN
on-demand scalable storage size - you can order another 50Go VPS and add it (or remove it) to the LVM pool easily
a cheap solution, $4/month the the KVM VPS + $2/month for each 50Go