Weaknesses of Current Enterprise Email Storage

There are a wide range of demands placed on modern email servers. In addition to high availability and performance, the email store must be flexible, scaleable, and cost effective.  Email content and the related attachments will continue to see significant growth. As the richness of the media we use to communicate increases, so do the storage requirements.

With this in mind, it seems that many IT professionals have been forced to accept that storage resources allocated to users must be severely limited. The pressure for limited email storage comes from two primary sources: i) The expense of the storage, expense being driven by the need for high performance sometimes combined with proprietary failover mechanisms; and ii) The difficulty of backup, especially the backup of large stores.

Yet, despite the limited storage being made available to users, email server weaknesses often force the purchase of expensive storage systems to ensure availability, uptime, acceptable performance and an acceptable level of backup convenience. Furthermore, users are often denied the ability to recover individual lost messages due to the complexity of restore operations.  To circumvent these problems, users can be found creating their own archives using PST files and storing them on their local machines. These PST files create security and compliance risks, can waste a significant amount of time, and create their own backup and restore issues since of course backing them up is neither incremental nor granular (the whole PST is backed up as a single object).

Despite common belief, solving these problems should not require a buying a more expensive storage system and instigating complex redundancy procedures. Instead it requires an overhaul of the basic approach to how the email server stores email.

1. Performance Efficiency and Scalability
The database is often at the heart of the scalability issues. As an example, the MS Exchange™ Jet database requires the storage sub-system to perform large numbers of separate I/O operations to complete a single transaction. This can "thrash" the disk subsystem as it reads and re-reads, writes and re-writes the data on the disks, creating performance and scalability issues, as well as driving the cost of storage subsystems.

While there are some excellent high-end commercial storage systems in the market, others can be expensive to purchase, implement, and support. If they are to be used, purchasing these systems should be part of the overall desire to optimize the infrastructure rather then a requirement to deal with a deficiency created by the email server.

2. Backup Operations
The Jet database approach also complicates backup and restore operations. A complete backup of an email server using Jet requires that it be stopped and snapshotted with subsequent post-backup journal replays - a substantially different approach to the more natural one of creating backups that happen live with the server running, incrementally (only backing up what has changed since the last backup), and in a granular fashion down to the message level.
Jet backup problems grow worse as the size of the store increases, in part because achieving any kind of granular backup becomes harder and harder (essentially, the "granules" get larger and hence less granular). Additionally, the time required to backup the system becomes longer and longer, until it overruns the time available to carry out the backup.
Again, there are products that can be added to the environment from a variety of vendors that mitigate these issues, but these solutions address, at the cost of significant expense and complexity, the symptoms of the problem rather then its root causes; and the underlying causes still tend to drive unnecessary restrictions on both mailbox size and on restore services provided for users.

3. Restore Operations
The purpose of creating backup records in the first place is to allow records to be easily restored if they are accidentally lost, deleted, or required for compliance or other regulatory purposes. In this case, the Jet approach suffers from even worse difficulties, especially when the administrator is seeking to restore a relatively small amount of data without bringing down the live server. To complete this operation, the system administrator is often required to take the following steps:

This seems overly complex simply to restore a single message and as a result, many IT organizations simply don't restore users' messages that were accidentally deleted. Again, there are tools and systems from a variety of vendors that can partially help with these issues, but most organizations would prefer to see a solution that does not require them to integrate additional third party tools.

4. Database Corruption
Because of the way it stores information, data within Jet can become corrupt. This corruption can occur in a variety of ways and generally spreads when Exchange™ mis-assigns a database ID. Over a period of time (days, weeks, or even months), the corruption spreads within the database until the system finally crashes. The natural remedy for this problem is to restore from the most recent backup. Unfortunately, when a recent backup is restored, it too often contains a less widespread form of the same corruption, and so as the corruption spreads again the system crashes once more. Depending on what backups are available, it can be difficult or even impossible to rebuild a working system.

5. Database Compaction
Because of the way information is stored, Exchange™ requires the Jet database to be compacted at various intervals depending on desired performance parameters and the volume of information that is changing. Though there are tools to help here, most often the entire database must be taken off-line for this operation to be completed. Compacting the database removes white space and also attempts to defragment individual data objects. Exchange™ compaction requires a complete copy of the database to be made on disk - effectively doubling the required size of the storage to complete this operation. This can significantly increase the amount of storage required (and not available to users) just to keep the system running, as well as increasing the administrative complexity and total cost of ownership.

6. Disaster Recovery
Disaster recovery from any of the scenarios involves a set of complex and time consuming steps. The recommended method involves maintaining a mirrored parallel universe of your Domain/AD and Exchange™ infrastructure. It involves building a backup domain controller on your existing network, and then putting it in an isolated LAN and promoting it to the PDC. Then you must carefully and meticulously build an Exchange™ server on that isolated LAN from scratch without making a single mistake in spelling using the identical settings of your production environment. Only then can this parallel universe be used in the event of a catastrophic failure of a production Exchange™ environment. This procedure is complex, prone to error, and expensive - and now, largely unnecessary.