Tandem Computers systems are the  fault-tolerant computer systems used for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss initially manufacture by Tandem Computers, Inc   and Now are placed within the server division of Hewlett Packard after the overtaking in 1977.

what is fault-tolerant computer system?

This the first question I get from others when ever I discuss about tandem system. Its quite easy if you consider the importance of time and data for the set of application where tandem system are used; they are basically Transaction processing application.

Fault-Tolerant system are designed with a simple concept: they must continue working  to same level of satisfaction as they normal do even in the presence of  faults this so called concept is Fault Tolerance.

Tandem system are designed to handle both the hardware failures and software failures. Hardware fault-tolerance is the most common application of these systems, designed to prevent failures due to hardware components. Most basically, this is provided by redundancy, particularly dual modular redundancy.

Software fault-tolerance is based more around nullifying programming errors using real-time redundancy, or static “emergency” subprograms to fill in for programs that crash. There are many ways to conduct such fault-regulation, depending on the application and the available hardware.

Tandem’s NonStop systems use a number of independent identical processors and redundant storage devices and controllers to provide automatic high-speed “fail over” in the case of a hardware or software failure. The first system was the Tandem/16 or T/16, later re-branded NonStop I.

To contain the scope of failures and of corrupted data, Tandem system are designed as multi-computer systems that have no shared central components, not even main memory. Conventional multi-computer systems all use shared memories and work directly on shared data objects. Instead, NonStop processors cooperate by exchanging messages across a reliable fabric, and software takes periodic snapshots for possible rollback of program memory state.

Besides handling failures well, this “shared-nothing” messaging system design also scales extremely well to the largest commercial workloads.

Each doubling of the total number of processors would double system throughput, up to the maximum configuration of 4000 processors. In contrast, the performance of conventional multiprocessor systems is limited by the speed of some shared memory, bus, or switch. Adding more than 4–8 processors that way gives no further system speedup.

Tandem NonStop systems have more often been bought to meet scaling requirements than for extreme fault tolerance. They compete well against IBM’s largest mainframes, despite being built from simpler minicomputer technology.

The Tandem NonStop series ran a custom operating system which was significantly different from Unix. It was initially called T/TOS (Tandem Transactional Operating System) but soon named Guardian for its ability to protect all data from machine faults or software faults.

In contrast to all other commercial operating systems, Guardian was based on message passing as the basic way for all processes to interact, without shared memory, regardless of where the processes were running. This approach easily scaled to multiple-computer clusters and helped isolate corrupted data before it propagates.