The PostPath architecture supports full system redundancy, with hot or cold failover, by using active-passive server pairs. Typically, the active server responds to all user requests while the passive server is in a hot standby mode. If the active server fails, then the standby server can take over. If hot failover is configured, the switchover takes a few tens of seconds. Otherwise, the time is dependent on how long it takes for the system administrator to issue the switch over command.
The active-passive server pairs take advantage of the file system nature of the store either by both being placed in front of the same redundant file system cluster or redundant NAS or SAN or by each having their own copy of the file-store with file-level replication being carried out between the stores.
There are essentially two different modes that might be needed for file-system redundancy.
1. Local redundancy. In this mode, the active and passive servers are located close to one another with high bandwidth LAN links between the two. In this situation, it is most common to use a single logical clustered filing system that can be accessed by either the active or passive server. For DAS, this leads us to suggest a couple of options:
a. GFS: GFS is an advanced clustered file system for Linux supported by RedHat and a number of other distributions. GFS is configured as a highly available file-store. The active and passive servers are attached to the same GFS cluster. GFS servers can be separate machines from the active and passive servers, or alternately, the active and passive servers can themselves provide the GFS capability to each other.
b. ENBD and RAID-1: In this configuration, both the active and passive servers have their own dedicated disk storage. The storage at the passive server is mounted at the active server using ENBD, and then a software RAID-1 partition is created at the active server covering both the active server's local storage and the storage mounted via ENBD. The active server's RAID-1 (RAID mirroring) facility then keeps the storage at the passive server in synch with the storage at the active server. If the active server fails, then the passive server takes over. When the active server is brought back on line, and the passive server reverts to being passive, RAID-1 will bring the disks back in synch with one another as per normal RAID operation.
GFS is more sophisticated solution that can be scaled up to multiple levels of redundancy, while ENBD has the advantage of simplicity.
2. Remote or asynchronous redundancy (offsite replication). Here, the active server may be located on-site while the passive server is located off-site. If the network link between the active and passive servers is relatively low bandwidth, the approaches of using GFS or ENBD would still work; however, because those approaches keep two (or more) copies of the store in perfect synchronization with one another, they can slow down writes dramatically in the low bandwidth case - a given write does not complete so far as the user is concerned until it has been successfully written to both the local and remote copies of the store. This makes the network bandwidth a bottleneck for all write operations (though GFS and ENBD do enable high speed reads even in this situation).
The PostPath Server's&trade active/passive redundancy does not, in fact, require the two stores to be in perfect synch, allowing the use of another technology, DRBD, part of the Linux-HA (Linux High Availability) collection of technologies for offsite replication. In DRBD, writes to the active server's storage are replicated to the passive server in the same order as they are made to the active server, but are allowed to complete at the active server before replication completes, meaning that write (as well as read) performance is determined by the speed of the active server's storage, not by the speed of the network link between the active and passive servers. DRBD is typically layered on top of conventional filing system such as XFS. DRBD can hence be thought of as "Offsite backup to the passive server as-near-to-real-time-as-network-bandwidth-allocation-will-allow." In periods of heavy load, the passive server's copy of the store might fall a few seconds or even minutes behind; but it can do this without slowing down the user experience and in doing so still remains in a valid state if the passive server needs to be brought up.
If the active server fails, a copy of the PostPath Server™ at the passive (backup) server can come on-line. Any data that was written to the active server but has yet to be replicated to the passive server is not visible until the active server is brought back on line; this is likely to be a small amount of data for any individual user. When the active server is recovered and the passive server reverts to being passive the two stores resynchronize. Failover from active to passive server can be done automatically via a heartbeat mechanism or can be carried out manually following an alert.
With offsite replication, if the passive server is located at a main operations center, backup of the data can be made entirely, if desired, from the passive server, whether using GFS, ENBD, or DRBD - there is no need to access the active server in making a backup.