Data Storage And Backup and Restore
Exchange Data Storage
Exchange stores its data in the closed, proprietary "Jet" database. This database has known issues, including issues with backup and restore:
- Backup and Restore require some level of freeze/replicate
- Backup is difficult or impossible to make incremental ("just backup what changed since last backup"), especially with large stores
- Restore is similarly very difficult or impossible to make granular ("just restore this one message")
Other key issues with the Exchange storage model include:
- The combination of Exchange and Jet drives inefficient use of storage - the system reads and re-reads, writes and re-writes the data on the disks, "thrashing" the disk heads and the whole storage subsystem. This in turn drives customers to buy very expensive high-end SANs to support Exchange's storage needs.
- When the "spreading corruption" strikes, Exchange mis-assigns a database ID. Over a period of time (days, weeks, or even months), the corruption spreads within the database until finally the system crashes. Unfortunately, when a recent backup is restored, it too contains corruption, and so the system crashes again shortly thereafter. Depending on what backups are available, it can be difficult or even impossible to rebuild a working system. For those who experience it, the "spreading corruption" issue is one they tend not to forget.
- Database Compaction - Exchange requires the Jet database to be compacted from time to time (Compacting removes white space in the database and also attempts to defragment individual data objects). Compaction requires a complete copy of the database to be made on disk - effectively doubling the required size of the storage. Compaction is a task that the administrator must complete from time to time to ensure that the system does not run out of space and/or slow to a crawl.
PostPath Server Data Storage
By contrast, PostPath uses the file system directly to store data with no intermediate database. Under this storage model, each user has their own folder within the store; each user's folder contains subfolders corresponding to calendar, Inbox and so on; and then within an individual subfolder each message is represented by a single file (approximately one file per message).
Using the file system offers a range of advantages:
- Speed - the store is very fast, not having multiple levels of indirection between the email and the storage subsystem, enabling use of lower cost storage.
- Simplicity of backup - backup becomes extremely simple and can be carried out using any standard file-server backup tool. Its live (no freeze/replicate needed), incremental, and granular down to the message (file) level, and as simple as backing up a file server. There is no need for snap-shot capable hardware, or for specialized Exchange backup or archiving tools.
- Backup is incremental - most file backup tools will easily perform an incremental backup on PostPath's store; backing up messages that have changed since the previous day is simply a matter of backing up files changed from the previous day.
- Restore is granular - restore a single message by restoring a single file, a folder by restoring a folder, a whole user by restoring that user's folder and subfolders, or the whole store simply by moving back the folder tree that contains all the users. There is no need to mount the back-up in a PostPath Server - the PostPath file-browser and restore tool can restore files directly from a copy of the old file-set, or even from a subset of the file-set if you know what file or folders you are interested in.
- Reliability - files are independent of one another, so a bad sector would (typically) affect just one message.
- Exploit capabilities of modern filing systems and storage systems. Modern filing systems support features like Journaling (for example, used in playback of operations following a power-cut), replication (data could be replicated to an offsite location), clustering, semi-offline storage (low cost storage for rarely accessed files), and so on. These features are available on Linux filing systems including Reiser and XFS, and on Networked-Attached Storage (NAS) as well as on Storage Area Networks (SANs).
- Easy to build low-cost clusters. Given a clustered and/or highly available file system, an active/passive pair of servers in front of the file-system provide simple efficient and highly available clustering. In the event that the active server goes offline, the passive server can pick up its connection in less than a minute.
- For branch offices, offsite replication can be combined with remote backup. It works like this: if a branch-office PostPath server's storage subsystem replicates its data to HQ, then backup and restore can be run purely on the HQ replicated data, removing the need to reach down to the remote office server to perform these tasks.
- Reduce/eliminate PST files. The very high performance enables the use of much lower cost storage; combined with the efficient backup and restore capabilities, it also enables much larger user stores. Today, administrators limit the size of mailboxes, explicitly or implicitly encouraging users to move their data to local PST files. Those PST files then end up getting copied to network drives, where they are backed up - and of course the backup is completely un-incremental, a single change to the PST file and the whole thing has to be backed up again.
- Since the storage for the PostPath Server can be just as cost effective as for an ordinary file server, and since backup is just as easy as on an ordinary file server, enterprises can dispense with PST files and simply give everyone a larger mailbox. Using the PostPath Server, this change simply swaps one kind of file storage (of PST files) for another (file store of PostPath Server). Administrators have better control over the data, and users have a better service.
- Single-instance store at the File-System level. PostPath uses single-instance store for large data objects attached to messages or even for large email bodies. The PostPath Server separates each such large object into a file by itself and allows it to be linked from multiple places. This "single-instance store" behavior saves both on storage space and it also provides much higher performance since the file needs to be written only once, not many times.
- Similar to single-instance storage is single-instance transmission. PostPath uses the Postfix mail-transport-agent (MTA) for sending email to, and receiving email from, other email servers. Postfix carefully supports sending only one instance of a given message to each server even if there are multiple recipients of the message on the given server; this is supported whether the receiving server is another PostPath Server, a Microsoft Exchange server, or some other generic SMTP receiver. Again, this saves money by making the best possible use of the available bandwidth. Postfix is widely used in Internet infrastructure, and is recognized as one of the most efficient mail routing agents in existence.
- For even more efficient storage use, PostPath also supports data compression for large data objects. Alternatively, this can also be handled at the filing system level by some self-compressing filing systems.