72 relations: ACM Computing Surveys, Airbag, Availability, Chernobyl disaster, Computer, Computer hardware, Control reconfiguration, Corrosion, Crossbar switch, Damage tolerance, Data redundancy, Data storage, Defence in depth, Dual modular redundancy, Ecological resilience, Elegant degradation, Error-tolerant design, Fail-deadly, Fail-fast, Fail-safe, Failing badly, Failover, Failure, Failure semantics, Failure transparency, Fatigue (material), Fault detection and isolation, Fault-tolerant computer system, Firewall (computing), Forward compatibility, Graceful exit, Gravity, High availability, Hot swapping, HTML, Human error, Human spaceflight, John von Neumann, List of system quality attributes, Lockstep (computing), Mean time between failures, Mean time to repair, National Institute of Standards and Technology, NonStop (server computers), Nuclear reactor, Parallel computing, Peter J. Denning, Progressive enhancement, Quorum, RAID, ..., Redundancy (engineering), Replication (computing), Resilience (engineering and construction), Resilience (network), Response time (technology), Reversion (software development), Robustness (computer science), Safe-life design, Safety-critical system, Seat belt, Self-stabilization, Single point of failure, Software, Software brittleness, Synchronization (computer science), System, Tandem Computers, Throughput, Triple modular redundancy, Uptime, Web browser, Western Electric. Expand index (22 more) » « Shrink index
ACM Computing Surveys (CSUR) is a peer reviewed scientific journal published by the Association for Computing Machinery.
An airbag is a type of vehicle safety device and is an occupant restraint system.
In reliability theory and reliability engineering, the term availability has the following meanings.
The Chernobyl disaster, also referred to as the Chernobyl accident, was a catastrophic nuclear accident.
A computer is a device that can be instructed to carry out sequences of arithmetic or logical operations automatically via computer programming.
Computer hardware includes the physical parts or components of a computer, such as the central processing unit, monitor, keyboard, computer data storage, graphic card, sound card and motherboard.
Control reconfiguration is an active approach in control theory to achieve fault-tolerant control for dynamic systems.
Corrosion is a natural process, which converts a refined metal to a more chemically-stable form, such as its oxide, hydroxide, or sulfide.
In electronics, a crossbar switch (cross-point switch, matrix switch) is a collection of switches arranged in a matrix configuration.
Damage tolerance is a property of a structure relating to its ability to sustain defects safely until repair can be effected.
In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data.
Data storage is the recording (storing) of information (data) in a storage medium.
Defence in depth (also known as deep or elastic defence) is a military strategy that seeks to delay rather than prevent the advance of an attacker, buying time and causing additional casualties by yielding space.
In reliability engineering, dual modular redundancy (DMR) is when components of a system are duplicated, providing redundancy in case one should fail.
In ecology, resilience is the capacity of an ecosystem to respond to a perturbation or disturbance by resisting damage and recovering quickly.
Elegant degradation is a term used in engineering to describe what occurs to machines which are subject to constant, repetitive stress.
An error-tolerant design (also: human-error-tolerant design) is one that does not unduly penalize user or human errors.
Fail-deadly is a concept in nuclear military strategy that encourages deterrence by guaranteeing an immediate, automatic, and overwhelming response to an attack.
In systems design, a fail-fast system is one which immediately reports at its interface any condition that is likely to indicate a failure.
A fail-safe in engineering is a design feature or practice that in the event of a specific type of failure, inherently responds in a way that will cause no or minimal harm to other equipment, the environment or to people.
Failing badly and failing well are concepts in systems security and network security (and engineering in general) describing how a system reacts to failure.
In computing and related technologies such as networking, failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network.
Failure is the state or condition of not meeting a desirable or intended objective, and may be viewed as the opposite of success.
In distributed computing, failure semantics is used to describe and classify errors that distributed systems can experience.
In a distributed system, failure transparency refers to the extent to which errors and subsequent recoveries of hosts and services within the system are invisible to users and applications.
In materials science, fatigue is the weakening of a material caused by repeatedly applied loads.
Fault detection, isolation, and recovery (FDIR) is a subfield of control engineering which concerns itself with monitoring a system, identifying when a fault has occurred, and pinpointing the type of fault and its location.
Fault-tolerant computer systems are systems designed around the concepts of fault tolerance.
In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules.
Forward compatibility or upward compatibility is a design characteristic that allows a system to accept input intended for a later version of itself.
A graceful exit (or graceful handling) is a simple programming idiom wherein a program detects a serious error condition and "exits gracefully" in a controlled manner as a result.
Gravity, or gravitation, is a natural phenomenon by which all things with mass or energy—including planets, stars, galaxies, and even light—are brought toward (or gravitate toward) one another.
High availability is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
Hot swapping (frequently inaccurately called hot plugging) is replacing or adding components without stopping or shutting down the system.
Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications.
Human error has been cited as a primary cause contributing factor in disasters and accidents in industries as diverse as nuclear power (e.g., the Three Mile Island accident), aviation (see pilot error), space exploration (e.g., the Space Shuttle Challenger Disaster and Space Shuttle Columbia disaster), and medicine (see medical error).
Human spaceflight (also referred to as crewed spaceflight or manned spaceflight) is space travel with a crew or passengers aboard the spacecraft.
John von Neumann (Neumann János Lajos,; December 28, 1903 – February 8, 1957) was a Hungarian-American mathematician, physicist, computer scientist, and polymath.
Within systems engineering, quality attributes are realized non-functional requirements used to evaluate the performance of a system.
Lockstep systems are fault-tolerant computer systems that run the same set of operations at the same time in parallel.
Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation.
Mean Time To Repair (MTTR) is a basic measure of the maintainability of repairable items.
The National Institute of Standards and Technology (NIST) is one of the oldest physical science laboratories in the United States.
NonStop is a series of server computers introduced to market in 1976 by Tandem Computers Inc., beginning with the NonStop product line, which was followed by the Hewlett-Packard Integrity NonStop product line extension.
A nuclear reactor, formerly known as an atomic pile, is a device used to initiate and control a self-sustained nuclear chain reaction.
Parallel computing is a type of computation in which many calculations or the execution of processes are carried out concurrently.
Peter James Denning (born January 6, 1942) is an American computer scientist and writer.
Progressive enhancement is a strategy for web design that emphasizes core webpage content first.
A quorum is the minimum number of members of a deliberative assembly (a body that uses parliamentary procedure, such as a legislature) necessary to conduct the business of that group.
RAID (Redundant Array of Independent Disks, originally Redundant Array of Inexpensive Disks) is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both.
In engineering, redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing.
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
The resilience concept originated from ecology and then gradually applied to other fields.
In computer networking: resilience is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.” Threats and challenges for services can range from simple misconfiguration over large scale natural disasters to targeted attacks.
In technology, response time is the time a system or functional unit takes to react to a given input.
In software development (and, by extension, in content-editing environments, especially wikis, that make use of the software development process of revision control), reversion or reverting is the abandonment of one or more recent changes in favor of a return to a previous version of the material at hand (typically software source code in the context of application development; HTML, CSS or script code in the context of web development; or content and formatting thereof in the context of wikis).
In computer science, robustness is the ability of a computer system to cope with errors during execution1990.
In safe-life design products are designed to survive a specific design life with a chosen reserve.
A safety-critical system or life-critical system is a system whose failure or malfunction may result in one (or more) of the following outcomes.
A seat belt (also known as a seatbelt or safety belt) is a vehicle safety device designed to secure the occupant of a vehicle against harmful movement that may result during a collision or a sudden stop.
Self-stabilization is a concept of fault-tolerance in distributed computing.
A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working.
Computer software, or simply software, is a generic term that refers to a collection of data or computer instructions that tell the computer how to work, in contrast to the physical hardware from which the system is built, that actually performs the work.
In computer programming and software engineering, software brittleness is the increased difficulty in fixing older software that may appear reliable, but fails badly when presented with unusual data or altered in a seemingly minor way.
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of Data.
A system is a regularly interacting or interdependent group of items forming an integrated whole.
Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss.
In general terms, throughput is the maximum rate of production or the maximum rate at which something can be processed.
In computing, triple modular redundancy, sometimes called triple-mode redundancy, (TMR) is a fault-tolerant form of N-modular redundancy, in which three systems perform a process and that result is processed by a majority-voting system to produce a single output.
Uptime is a measure of the time a machine, typically a computer, has been working and available.
A web browser (commonly referred to as a browser) is a software application for accessing information on the World Wide Web.
Western Electric Company (WE, WECo) was an American electrical engineering and manufacturing company that served as the primary supplier to AT&T from 1881 to 1996.
Damage tolerant design, Degrade gracefully, Degrades gracefully, Fail gracefully, Fail soft, Fail-soft operation, Failure resistance, Failure tolerance, Fault Tolerance, Fault tolerant, Fault tolerant design, Fault tolerant designs, Fault tolerant system, Fault tolerant systems, Fault-tolerance, Fault-tolerant, Fault-tolerant computing, Fault-tolerant design, Fault-tolerant designs, Fault-tolerant system, Fault-tolerant systems, Graceful degradation, Graceful failure.