TIP Fast Copy
From Gentoo Linux Wiki
| Terminals / Shells • Network • X Window System • Portage • System • Filesystems • Kernel • Other |
Contents |
[edit] The normal way
To copy files recursively from src1/, src2/, to dest/ you do
cp -Rv src1/ src2/ dest/
[edit] The piped and pax way
cp uses character by character copy. Using the kernel pipes support, we could copy files block by block. tar converts the directories recursively into a single stream. At one end we create a single stream out of source directories and at other end we extract this stream and put it in the destination directory. The transfer between the two ends is by means of pipes.
tar -cS src1/ src2/ | tar -C dest/ -x
tar is more flexible than cp and may also be much faster. You may wish to put the -v flag on the right side with -x which will cause tar to print what it's doing, however, with this option the terminal may limit throughput.
Note that ext2, ext3 and most modern Linux filesystems have a capability called 'sparse files'. This is used to store large files with lots of zeroed content in an efficient manner. Most of the time you should not care about this, but certain programs make extensive use of this feature (e.g. net-p2p/mldonkey). You should be careful when copying sparse files using this method, as your disk usage can explode. Whereas the cp utility does handle sparse files automatically, tar does not. Sparse files in the source directory will be stored in full representation on the destination directory, unless the -S flag is used on the lefthand side of the pipe.
Also PAX can be used with the same result
1. cd into the directory
2. Check destination exists.
3. Issue the next command.
pax -rw . destination_directory
And it is easier to remember.
[edit] Network with SSH
- Local to Remote
tar czv ListOfFiles | ssh remote.box.com tar xz -C /home/user/PathToCopy
- Local to Remote, faster encryption.
tar czv ListOfFiles | ssh -c blowfish remote.box.com tar xz -C /home/user/PathToCopy
- Remote to Local
ssh remote.box.com tar cz BeginDirCopyFiles |tar xz -C DirToCopy
- Taking tar out of the loop
Remote to local:
scp -C remotebox:path/to/sourcefile .
Local to remote:
scp -C localfiles remotebox:path/to/destination/
Also CPIO can be used with the same result:
ssh remotebox "cd /etc/ ; ls -1 issue motd |cpio -oH odc" >archive.cpio
(we have cd in /etc/ for relative path)
for extract this CPIO archive:
cpio -iv <archive.cpio
Note: The compatible arguments for retrieve files from HP-UX are "cpio -oc"
[edit] Network with netcat
Transfers over netcat can have minuscule CPU needs, unlike transfers over ssh. However the data is transmitted without encryption and authentication. Further disadvantages: (1) you have to open a port on destination system's firewall and close it afterwards. (2) you have to enter commands on both source- + destination-server. (3) do not install netcat on a critical system for security reasons.
Destination box: nc -l -p 2342 | tar -C /target/dir -xzf - Source box: tar -cz /source/dir | nc Target_Box 2342
For further CPU use reduction, lzop can be used in place of the tar z option for much faster but less effective compression.
Destination box: nc -l 2342 | lzop -d | tar -C /target/dir -xf - Source box: tar -c /source/dir | lzop | nc Target_Box 2342
[edit] Network with socat
Same idea as Netcat.
Destination box: socat -u tcp4-listen:2342 - | tar x -C /target/dir Source box: tar c /source/dir | socat -u - tcp4:Target_Box:2342
We can use a variety of compression methods in a general way:
Destination box: socat -u tcp4-listen:2342 - | ${UNZIP} | tar x -C /target/dir
Source box: tar c /source/dir | ${ZIP} | socat -u - tcp4:Target_Box:2342
We then define ZIP as one of the following:
- cat
- lzop
- gzip
- bzip2
and UNZIP as:
- cat
- lzop -d
- gunzip
- bunzip2
Each of these compression programs has a different tradeoff between compression ratio and cpu time, so the best choice will depend on your processor, hard drive, and network speeds. The following results should give you a better feel for which program to use for a given situation:
# load the files into block cache
tar /usr/src/linux-2.6.3 > /dev/null
# determine compression ratio
tar /usr/src/linux-2.6.3 | ${ZIP} > /tmp/foo; ls -sh /tmp/foo
# determine cpu time
time -p tar /usr/src/linux-2.6.3 | ${ZIP} | cat > /dev/null
| cat: | 182MB | 1.1sec |
| lzop: | 64MB | 4sec |
| gzip --fast: | 51MB | 9sec |
| gzip: | 41MB | 18sec |
| gzip --best: | 41MB | 49sec |
| bzip2: | 32MB | 134sec |
[edit] Insanely fast with rsync
rsync -a source <user>@<host>:<somedir> rsync -a <user>@<host>:<somedir> <localdir>
--question for clarification: why do say rsync copies are so insanely fast? i noticed them to be very slow on the first (creating) run ...
--answer: rsync compares the files on both computers (e.g. by last modified timestamp) and only copies if anything has changed. That's also why your portagetree update (emerge sync code>) is as fast as it is: rsync only downloads the packages that have changed. Why else it would be named rsync (synchronize)? ;-)
--answer2: rsync is slower than netcat or socat in the first run. rsync only gains its speed from its file-compare algorithm _after_ it has been run once.
--note: Use tar over ssh to begin with (if you need the initial run to be fast), then use rsync to keep up to date. Rsync doens't keep and doen't need metadata to work, there is no prerequisite for files to be transported initially using rsync.
- In other words, it's only faster when there's going to be some data that can be omitted. A fresh copy will always be slower.
