PostPath's Approach to Storage
To achieve a significant improvement in performance and allow the email system to scale on a cost-effective basis as well as decrease system management and administration overhead, its necessary to adopt a different approach to storage; one that minimizes or eliminates completely the problems listed above. PostPath has done just that. After a review of existing storage approaches, we chose a storage architecture that eliminates the database entirely and seeks to leverage the speed, flexibility, reliability, and efficiency of modern filers and filing systems.
Now, rather than having to build an interlocking system of expensive hardware, complex software, and subtle procedures that attempt to contain issues with the Jet database, the PostPath Server™ solves the problem at the source, making the server's storage as easy to manage and maintain as any file server's. With this file-based approach, there is no intermediate database to fragment or become corrupted. Under the filing-system storage model, each user has their own folder within the store; each user's folder contains subfolders corresponding to calendar, Inbox etc. Within an individual subfolder each message is represented by a file. Thus the basic approach of the store is "one file per message"; there is also a use of links to avoid duplicating file data (i.e. the store is a true "single instance" store).
Similarly, leveraging the file store in this way offers a significant performance improvement and potential cost savings. Performance improves because the file system does not have multiple levels of indirection between the email and the storage subsystem that is common with the Jet approach. Cost savings come from being able to use moderate or even low performance storage systems, rather than the high performance systems required by Jet.
In the following sections we will review how this file-based approach works with various storage technologies such as direct attached storage (DAS) as well as network attached storage (NAS) and storage attached networks (SAN). The storage technology that you choose (or have chosen) for your email and collaboration system will ultimately depend on your needs for performance and availability contrasted with the cost and complexity of the implementation based on the size of your organization and its requirements.
Simply put, using the filing system for storage offers a range of significant advantages that we have categorized below:
- Speed - The store is very fast, without multiple levels of indirection between the email and the storage subsystem which enables the use of lower cost storage. It delivers high performance even using low cost mass storage.
- Simplicity of backup - Backup becomes extremely simple and can be carried out using any standard file-server backup tool. Backup is live (no freeze/snapshot step is required), incremental, and granular down to the message (file) level and as simple as backing up a file server.
- Backup is incremental - Most file backup tools will easily perform an incremental backup on the PostPath message store; backing up messages that have changed since the previous day is simply a matter of choosing to backup files that have changed since the previous day.
- Restore is granular - Restore a single message by restoring a single file, a folder by restoring a folder, a whole user by restoring that user's folder and subfolders, or the whole store simply by restoring the folder tree that contains all the users.
- No need to load the back-up into a standby email server - The PostPath file-browser and restore tool can restore files directly from a copy of the old file-set or from a subset of the file-set to the live running server.
- Restore via command-line, graphical or web UI - The PostPath Backup and Restore Tool supports graphical browsing of both the live and the back-up copy of the store; actions can be scripted via the command-line or carried out using either interface.
- Flexibility to leverage the capabilities of modern filing and storage systems - Modern filing systems support features like Journaling (for example, used in playback of operations following a power-cut), replication (data could be replicated to an offsite location), clustering, semi-offline storage (low-cost storage for rarely accessed files), and so on. These features are available on Linux filing systems including Reiser and XFS, and on Networked-Attached Storage (NAS) as well as on Storage Area Networks (SANs).
- File system support - The PostPath Server™ is agnostic to what filing system is being run, and supports Linux journaling filing systems like XFS, ReiserFS, Reiser4, Ext3 etc. as well as remote NFS-mounted NAS or SAN partitions and more specialized filing systems such as Veritas CFS. It also supports virtualized file systems, including the LVM/LVM2 used in RedHat Enterprise Linux. It is also quite possible and practical to run multiple filing systems for a single server, if desired.
- Reliability - Messages are independent of one another, so a bad sector on a disk would (typically) affect just one message rather than initiate the kind of page level corruption that can cause an Exchange™ Jet database to fail over time. Similarly, file system clusters can be built to achieve any desired level of file system reliability. Customers can take advantage of clustered Linux filing systems like GFS, of Linux replication technologies like DRBD or ENBD, or of commercially available high-availability file storage systems.
- Easy-to-build low-cost server clusters - To create a clustered and/or highly available file system, we recommend an Active/Passive pair of servers in front of the file-system to provide simple efficient and highly available clustering. In the event the active server goes down, the passive server will pick up its connection in less than a minute.
- Single-instance store at the file system level. PostPath uses single-instance store for large data objects attached to messages or even for large email bodies. The PostPath Server™ separates each such large object into a file by itself and allows it to be linked from multiple places. This "single-instance store" behavior saves both on storage space and also provides much higher performance since the file needs to be written only once, not many times.
- For branch offices, offsite replication can be combined with remote backup. It works like this: if a branch-office PostPath Server's™ storage subsystem replicates its data to HQ, then backup and restore can be run purely on the HQ replicated data, removing the need to reach down to the remote office server to perform these tasks. Customers can also configure a passive server at HQ, so that, in the event of system failure at the remote office, HQ can support the remote office's email needs.
- Eliminate PST files. In many Outlook™/Exchange™ environments, the Exchange™ mailboxes are kept small, because Exchange™ storage is expensive and its backup is difficult. Users will then use a local "PST" file in Outlook™ to retain and manage their email data. Users also copy PST files to network drives for backup, creating another backup headache since PST are completely non-incremental for backup purposes (touch one email, and the whole PST has to be backed up again). With PostPath Server™ mailboxes can be much larger, since they are not constrained by cost, backup complexity, or performance limitations, creating the opportunity for users to keep their data in their server mailbox instead of PST file, and enabling administrators to take advantage of incremental backup of this data, much of which does not change frequently.