[linux] Efficiently copy whole directory trees

Recently I’ve upgraded my A/V storage disk from 1TB to 2TB (size of modern HDDs are amazing…), and I had to transfer data from one disk to another in some way. Easiest was of course to simply use cp to copy the whole directory structure. But as USB interface is slow enough, I wanted to employ any possible method to speed things up.

From the old times as a sysadmin I’ve remembered that copying, doesn’t matter local or over the network, was faster if data was copied in chunks – that was true especially when having a lot of small files.

So, to kill two birds with one stone, I employed… tar. Yeah, it’s a very versatile tool – in some time I’ll describe how to use it as a part of a network encrypted backup solution. But here, I used only two instances of tar. You have to remember that if you specify absolute path to the directory you want to transfer, tar will record all elements of the path, only stripping “/” from the beginning. So there are two ways: either record whole path and strip it when “decompressing”, or not to include it at all.

Lets assume that we’re copying data from /media/external0 to /media/external1.

First option:

cd /media/external1
tar cf - /media/external0 | tar x -f - --strip-components=2

Second option:

cd /media/external0
tar cf - * | tar x -C /media/external1 -f -

The first tar command “compresses” the data (either from given directory or from the current one), sends it to the standard output, which goes into the second tar’s input, which decompresses it into the target directory.

To see the progress, you can put cpipe application in the middle like that:

tar cf - /media/external0 | cpipe -vt | tar x -f - --strip-components=2

One note on an improved performance using this method. I didn’t do any benchmarks, but it shouldn’t be worse than a standard cp. tar will be better if the OS is not using any smart pre-buffering. Also I’ve decided to write this entry as an introduction to a future one about over-the-net backup using tar