
I've been working on a backup strategy for our Zarafa groupware environment, using Bacula backup & restore software.
Since we have a Zarafa support subscription at Operator Groep Delft, we have a utility called zarafa-backup. Regrettably, this utility only allows you to create a Full or Incremental backup. Here's how the utility works:
This is incompatible with a decent backup strategy. Creating a full backup in a larger environment is rather expensive, so you would want one full, say, every so many months or so. With small incrementals stacking over a longer period of time though (31 days in a month), you can imagine that restoring yesterday's mailbox content can quickly become one full backup and a million-and-one incremental backups. You would need to restore the yesterday's .index.zbk file along with all .data.zbk files from your backup suite. The incremental backup may be spanned over several locations, though, and your expiration policy or storage limitations may require you to expunge data from tape or disk after a couple of months. At some point, restoring just becomes such a huge effort it outweighs the downside of creating a new full backup.
Let's state some numbers, just for fun, and for further reference:
Both attachments and message content (text) count towards the quota, but in the most extreme scenario, all mailboxes are full. The theoretical maximum volume to backup would be about 1TB.
The actual volume will always be lower; Even though some users have a tendency to complain they run out of quota, it turns out that, given a progress bar counting upwards to their maximum, and the simple fact that they will not receive any more email when they run out of quota, causes them to clean out their mailboxes. That is, if nobody ever gives in to requests to increase the quota, and you automaticalle expire everything that is irrelevant (calendar entries in the '90s) The average usage currently is about 15%. The total footprint as you can see is a lot less then the theoretical maximum of 1TB.
However, this "15%" (or ~150GB) will take a little over 24 hours to export through zarafa-backup. As running the utility impacts the overall performance of the groupware environment (the backend zarafa-server process is heavily utilized and zarafa-backup in that sense behaves like a big, fat client), you can see how this (if you expect higher average usage) might result in a policy that requires you to create a full backup somewhere between Christmas and Newyears. Other reasons for the volume to increase beyond what is reasonable to create a full backup for on a regular basis may include company growth, more temporary hires, what-have-you.
Ergo, a proper backup strategy is required. Something that let's you restore a mailbox with yesterday's contents, but without the hassle of getting 364 incrementals. Somehow, you would need to preserve the .index.zbk files along with the .data.zbk files on a daily, weekly, monthly, bi-monthly or even yearly basis, so that you can use those to backup "what has been changed since the last full", so that you can expire all the little incrementals of "what had been changed per day the last month".
So, let's assume you have the following backup requirements:
This means you can restore any mailbox day-by-day for the past three weeks, week-by-week for the past month, and month-by-month for the past 6 months. Presumably, your Bacula Schedule resource would look like:
Schedule {
Name = "Zarafa-Monthly"
# A full backup starts every first weekend and takes little
# over 24 hours to complete.
Run = Level=Full 1st Saturday at 11:59
# During the work-week, we backup every day
Run = Level=Incremental Monday-Friday at 23:05
# In weekends other then the first, ...
Run = Level=Differential 2nd-5th Saturday at 23:05
}
This starts a full backup every month, which you may want to decrease. I for one specifically don't want to do that. I'm saying: "If the backup impacts production too much, give me more power." As you can see I start the backup on Saturday's though, so it has more then 44 hours to complete before it's Monday morning 8am. Admittedly, our environment is mostly idle during the weekend except for a few ActiveSync clients. Here's the FileSet:
File Set {
Name = "Zarafa-Bricklevel"
Include {
Options {
Signature = SHA1
}
# No trailing slash!
File = /var/lib/zarafa/bricklevel
}
}
As you can see, I'm only interested in /var/lib/zarafa/bricklevel/. Then, last but not least, the most interesting part of the configuration; the Job and Pools:
Job {
Name = "Zarafa-Backup"
Type = Backup
Level = Incremental
Client = "mail01.ogd.nl"
File Set = "Zarafa-Bricklevel"
Schedule = "Zarafa-Monthly"
Storage = "Zarafa-File"
Messages = "Standard"
Pool = "Default"
Full Backup Pool = "Zarafa-Full-Pool"
Incremental Backup Pool = "Zarafa-Inc-Pool"
Differential Backup Pool = "Zarafa-Diff-Pool"
Write Bootstrap = "/var/lib/bacula/bootstraps/Zarafa-Backup.bsr"
Run Script {
Runs When = Before
Runs on Client = Yes
Fail Job on Error = Yes
Command = "/usr/local/sbin/zarafa-backup %l --before"
}
Run Script {
Runs When = After
Runs on Client = Yes
Runs on Failure = No
Command = "/usr/local/sbin/zarafa-backup %l --after"
}
}
Pool {
Name = "Zarafa-Full-Pool"
Pool Type = Backup
Recycle = Yes
AutoPrune = Yes
Volume Retention = 6 months
Maximum Volume Jobs = 1
Label Format = "Zarafa-Full-"
Maximum Volumes = 9
}
Pool {
Name = "Zarafa-Diff-Pool"
Pool Type = Backup
Recycle = Yes
AutoPrune = Yes
Volume Retention = 40 days
Maximum Volume Jobs = 1
Label Format = "Zarafa-Diff-"
Maximum Volumes = 9
}
Pool {
Name = "Zarafa-Inc-Pool"
Pool Type = Backup
Recycle = Yes
AutoPrune = Yes
# Keep a Volume for 3 weeks minimum
Volume Retention = 21 days
# Rotate volume every week, whether failed jobs are in
# there or not.
Maximum Volume Jobs = 5
Label Format = "Zarafa-Inc-"
# 5 jobs (e.g. one week) in one volume;
# 3 volumes per 21 days
# + a little leeway
Maximum Volumes = 6
}
You may have noticed the RunScripts I'm using are something custom. That's right! Please see the attached zarafa-backup script.
NOTE: This is just what I whipped up just now, I haven't truly tested it. Please be careful when implementing it in your environment ;-)
NOTE^2: zarafa-backup is a perfect utility to restore only part of a mailbox, too. You can choose categories of items such as only a user's Email, Calendar, Tasks or Contacts. This is why you need to consider using zarafa-backup, as opposed to a snapshot-and-sync type of backup.
| Attachment | Size |
|---|---|
| zarafa-backup.txt | 9.03 KB |