2.1 System Design Approach
Architectural approaches to accomplish a desired safety integrity level (EN 50129 annex B.3):
- A high level of safety can be achieved through
inherent fail-safety when a function is performed by a single component,
provided that all credible failure modes are
nonhazardous.
Examples:
- A Zener diode used for overvoltage protection is placed in parallel with the load and in series with a fuse and a resistor. The Zener breakdown voltage (VZK), a manufacturing process parameter, remains consistent as long as the diode operates under nominal conditions. The described structure ensures a high level of safety through its inherent physical characteristics.
- An optocoupler or a transformer used for galvanic isolation can be considered safe as the isolation voltage is a function of the dielectric constant and of the distance between the isolated elements (coils/LED-photodiode). The voltage depends exclusively on electrical proprieties of the barrier material (mica foil, air gap, etc.) that remains constant over operational life. Therefore, the probability of the insulation function to collapse in a hazardous way, when operating in nominal conditions is extremely unlikely.
As previously mentioned, a hardware system is a complex design, making it unlikely to consist soley of components with inherent fail-safe properties. As a result, the required safety level should be achieved through a combination of subsystems with reduced Safety Integrity Level (SIL). ISO 26262 refers to this technique decomposition as: “Apportioning of redundant safety requirements to elements, with sufficient independence, conducing to the same safety goal, with the objective of reducing the ASIL of the redundant safety requirements that are allocated to the corresponding elements.” Similarly to synthesis described by IEC 61508.
It is important to note that according to IEC 61508-2 Annex E1, NOTE1 states: “At the present state of the art, knowledge and experience, it is not feasible to consider and take measures against all effects related to said element (single IC) to gain sufficient confidence for SIL 4.” Therefore, even if a single element meets all inherent fail-safety conditions, it cannot be used alone to achieve SIL 4.
Another way to achieve a higher level of safety is by adopting the two topologies that use a mixture of subsystems with a lower level of safety:
- Composite fail-safety: When each safety function is performed by at least two independent items. Non-restrictive decisions shall be executed only if multiple items agree. The technique is known as voting logic decomposition, and since parallel processing happened, the system reaction to hazardous faults is usually very fast.
- Reactive fail-safety: The safety function is performed by a single item, and its safe operation is ensured by the second element that detects and negates a hazardous fault and places the system in a Safe State. The two items (with lower safety levels) shall be independent to avoid the common cause of failure.
Because the safety level is a system characteristic, the safety goal can be achieved by any combination of approaches 1, 2 and 3.
The objective is always to place the system in a Safe State, once a permanent component failure is detected within a maximum time (PST). However, the system cannot recover if PST is exceeded, but a permanent failure is acceptable once the system returns to a Safe State within PST.
Reactive implies that the system continuously monitors its own integrity and upon detecting a failure, immediately takes a corrective action to prevent a hazardous situation. This means a single element (subsystem) within the structure can be responsible for executing the safety function, provided it is continuously supervised.
The most basic and traditional decomposition method (2) is shown in Figure 2-1. This configuration is known as 2oo2.
The two subsystems are identical in terms of processing power and execute the same safety function in parallel, and as result, the decision is computed by AND-ing the two decisions. As we noticed in the graph below, if a fault happened in Subsystem 1, the subsystems outputs, Decision 1 and 2 will differ (the voting result is false), and as result, the output is placed in safe state. Within required PST, the second subsystem detects the fault in the first subsystem and permanently disconnects the output through negation if the fault was not removed.
The second approach is the reactive fail-safe as shown in Figure 2-3. In this scenario, there are two subsystems: Subsystem 1, which is computationally intensive and performs the main safety function, and Subsystem 2, which is less powerful and monitors the health of the main subsystem. For most applications, Subsystem 2 does not require high performance because its reaction can happen within the PST, which is typically much longer than the time required for Subsystem 1 to react. Therefore, Subsystem 2 can evaluate the main safety function response indirectly, using techniques such as time integration.
As shown in the operating diagram, when a fault occurs in Subsystem 1, the output incorrectly enters into a temporary permissive state due to a faulty subsystem. Once the fault is detected by Subsystem 2, using averaging or other non-real-time techniques, after PST expires and the error was not recovered, the negation function is applied and will set the system in safe state.
In both decomposition and synthesis scenarios, Subsystem 1 and Subsystem 2, beside executing the safety function, perform health cross-checking as a method of Latent Failure Detection (LFD) to anticipate potential circumstances that could lead to a dangerous failure. Additionally, the isolation between the two subsystems reduces the probability of a Common-mode of failure, as the designed topology presumes that Subsystem 1 and Subsystem 2 will not fail due to a common cause.
