Fitus.github.io

What are synchronization programs good for ?

For Backups, for Transferring of Huge Data, for Taking Your Work With You

Backups

Backups are additional copies of your data. If something goes wrong with your primary data (e.g. a disk failure, fire, flood, a cyber-attack or just an accidental deletion), the backups will save you.

What storage media are used for backups today ?

The times when backup data has been written to magnetic tapes are gone. Even the times when backup data has been burned to CDROM’s and DVD’s are gone. Today, HDD’s, SSD’s and USB flash drives have such immense capacities and access speeds and are so cheap that they are the backup media of choice.

Also, having a second HDD/SSD as backup media in a PC is now a frequent solution. This protects against failure of the primary HDD/SSD, but not against other risks like fire, flood, or a cyber attack. These risks are addressed by media that are connected only when backups are made, and otherwise are disconnected and (ideally) stored in a different location.

The role of a synchronization program

Let’s first clarify what a synchronization program actually does:

Well, this is a bit simplified, but sufficient for now.

A synchronization program makes it possible to write to backup media only the files that have been changed (or have been newly created) since the last backup. It is not necessary to overwrite each time the whole backup data with whole source data.

It is not always that simple

The above said should not give the impression that backups are always that simple. If databases are involved, and/or in high-availability IT systems, backups require complex solutions. Synchronization programs might be parts of such solutions, but not the solutions as such, of course.

Transferring of Huge Data

We are talking about data so huge (TeraBytes range and higher) that transferring them over a network is unfeasible. There are tools (e.g. RSYNC) that attempt to optimize such transfers by splitting files into parts and transfer only the parts that differ. However, in last years a new trend emerged (try to google, for instance, for AWS Snowmobile, which is an (extreme) example of this trend): A mobile storage with sufficient capacity is connected to the source machine and data is synchronized to the mobile storage. Then the mobile storage is physically transferred to the location of the backup machine, connected to the backup machine, and data is synchronized from the mobile storage to the backup machine.

The mobile storage, in that case, holds third copy of the data and can be stored at a third location, which is an extra benefit. Additionally, if the mobile storage is used repeatably, only the changed (or new) files are copied during the synchronizations, not whole source data.

Taking Your Work With You

Imagine you commute (regularly or irregularly) between two workplaces: At both places, you work with a software (e.g. a CAD program) that saves its data to files and directories. After you finish at one location, you can synchronize your work data to a mobile storage (e.g. a USB flash drive), take that storage with you, and at the second location synchronize the data from the mobile storage to the work area of the CAD program. This allows you to continue your work at the second location …

Choice of a synchronization program

Several tens of backup and synchronization programs exist. One can choose between commercial programs and open source/free programs, between GUI-based programs and command-line programs, and many programs offer special features like operation over networks, encryption of data, automatic scheduling, special data transfer protocols and/or storage formats targeted at minimizing network traffic and/or storage media use and so on.

This place hosts the Zaloha2.sh synchronizer, along with Zaloha2_Snapshot.sh, an add-on script to create hardlink-based snapshots of the backup directory.

The following reasons speak for using these programs: