Maintainability, even more than reliability, must be built into complex equipment and systems. This has generally to be performed project specific with a mainte- nance concept. However, a certain number of design guidelines for maintainability apply quite generally. These will be discussed in this section for the case of complex electronic equipment and systems with high maintainability requirements (see e.g. also [1.22, 5.0, 5.14, 5.28, 6.82] for military applications).
5.2.1 General Guidelines
1. Partition the equipment or system into line replaceable units (LRUs), often PCBs for electronic systems, and apply techniques of modular construction, starting from the functional structure; make modules functionally independent and electrically as well as mechanically separable; develop easily identifiable and replaceable LRUs which can be tested with commonly available test equipment.
2. Plan and implement a concept for automatic faults ( failures and defects) detection and automatic or semiautomatic faults localization (isolation and diagnosis) down to the line replaceable unit (LRU) level, including hidden faults (failures & defects) and software defects as far as possible.
3. Aim for the greatest possible standardization of parts, tools, and testing equipment; keep the need for external testing facilities to a minimum.
4. Consider environmental conditions (thermal, climatic, mechanical) in field operation as well as during transportation and storage (see Section 5.2.5 for human, ergonomic and safety aspects).
5. Plan and realize an appropriate logistic support including user documentation, training of operating & maintenance personnel, and logistic support in field.
5.2.2 Testability (see also Section 5.1.5.3)
Testability includes the degrees of faults (failure and defects) detection and localization, the correctness of test results, and test duration (Section 4.2.1).
High testability can be achieved by improving observability (the possibility to check internal signals at the outputs) and controllability (the possibility to modify internal signals from the inputs).
1. Avoid asynchronous logic (asynchronous signals should be latched and strobed at the inputs); use only one master clock.
2. Avoid WIRED-ORs and simplify logical expressions as far as possible.
3. Improve testability of connection paths and simple circuitry using ICs with boundary-scan (IEEE STD 1149 [4.13]).
4. Separate analog and digital circuit paths, as well as circuitries with different supply voltages; make power supplies mechanically separable.
5. Make feedback paths separable
VCC
x y_
_ Logic
Control signal Test point
6. Realize modules as self-contained as possible, with small sequential depth, el.
separable and individually testable, in particular where redundancy appears;
for mechanically separable modules,assure easy removal and mech.keying.
Control signal Test point
Logic unit
1
Logic unit
2 VCC
VCC VCC VCC
Control signal Test point
withgates
Logic 1 Logic 2
Control signal 1 Control signal 2
I MU O
X
M U X
withMUXs
7. Allow for external initialization of sequential logic
Clock Ext. clock
VCC
Test point
Flip-Flop clear Test
point VCC
8. Fix acceptablelimitsforallmeasurableparameters; identifyallonly indirectly measurable parameters and define appropriate measurement (test) procedures.
9. Introduce built-in test (BIT) and corresponding BITE, as necessary to reach the required coverage level, in particular for critical functions and to satisfy operation monitoring (Table 4.1), i.e. implement built-in self-test (BIST); however,minimizethe amountof datatoberecorded formonitoringpurposes.
10. Design BIT/BITE considering worst-case operating conditions,andsothat their failure does not influence system's operation (FMEA); for critical functions, introduce redundancy also for BIT/BITE.
11. Implement means to identify whether hardware or software has caused a failure message, wherever possible.
12. Introduce test modi also for the detection of hidden faults (e. g. failures or defects in redundant elements); if not possible, give appropriate test proce- dures in the user documentation.
13. Provide manual test sequences to support testability, and describe them clearly in the user documentation.
14. Rely to a connector critical nodes of LRUs (to avoid internal probing access) and locate I/O test points close to each other, wherever possible.
15. Provide enough test points (ataminimumonfunctional-unitinputsandoutputs, as well as on bus lines) and support them with pull-up/pull-down resistors (Point 2 on p.150, Point10 on p.152); provideaccessfor a probe,taking into account the capacitive and/or resistive load, reflections, andpossibleprob- lems related to buffers; document all test points in the user documentation.
16. Make use of a scan path to reduce test time, wherever possible; the basic idea of a scan path is shown on the right-hand side of Fig. 5.1, the test procedure is:
1. Activate the MUX control signal (connect Z to B).
2. Scan-in with n clock pulses an appropriate n-bit test pattern, this pattern appears in parallel at the FF outputs and can be read serially with n−1 additional clock pulses (repeat this step to completely test MUXs & FFs).
3. Scan-in with n clock pulses a first test pattern for the combinatorial logic (feedback part) and apply an appropriate pattern also to the input −x (both patterns are applied to the combinatorial circuit and generate correspond- ing results which appear at the output −y and at the inputs A of the MUXs).
4. Verify the results at the output −y.
5. Deactivate the MUX control signal (connect Z to A).
6. Give one clock pulse (feedback results appear parallel at the FF outputs).
7. Activate the MUX control signal (connect Z to B).
8. Scan-out with n−1 clock pulses and verify the results, at the same time a second test pattern for the combinatorial circuit can be scanned-in.
9. Repeat steps 3 – 8 up to a satisfactory test of the combinatorial part of the circuit (see e.g. [4.17, 4.31] for special test algorithms).
Combinational logic
D-FF
D Q
CP D-FF
D Q
CP D-FF
D Q
CP
y_
Clock x_
Without scan path With scan path
D-FF
D Q
CP D-FF
D Q
CP D-FF
D Q
CP A
B S
Z
A B S
Z A B S
Z
Clock Control signal Scan in
_x y_
Scan out Combinational logic
Figure 5.1 Basic structure of a synchronous sequential circuit, without (left) and with (right) a scan path (n=3)
5.2.3 Connections, Accessibility, Exchangeability
1. Use preferably indirect plug connectors; distribute power supply and ground over several contacts (20%, far from signal leads); standardize pin assignment;
plan to have reserve contacts (e.g. for test stimuli); avoid any external me- chanical stress on connectors; define only one kind of extender for PCBs and plan its use; use fiber-optic connections for critical applications.
2. Standardize connectors and wires color as far as possible.
3. For not soldered or screwed connections, give preference to wire wrap.
4. Route wires and cables connections as clearly as possible, avoiding un- necessary overlapping and mechanical strains.
5. Provide self-latching access flaps of sufficient size.
6. Avoid the use of more than 4 fasteners to fix case or covers and the need for special tools; use clamp fastening with torque-set.
7. Assure accessibility to L R U s by considering the frequency of maintenance tasks, and make them accessible without removal of other LRUs.
8. Provide for speedy replaceability by means of plug-out/plug-in techniques.
9. Prevent faulty installation or connection of (not interchangeable) LRUs through mechanical keying.
10. Provide good access to degrading parts (also for cleaning & lubrification).
11. Locate LRU's identification and modification plates so as to be easily readable.
5.2.4 Adjustment
1. Limit any form of hardware adjustment (or alignment) in the field.
2. If an adjustment becomes unavoidable, describe the procedure carefully in the user documentation, make the adjustment easily accessible, and avoid sensitive adjustments.
5.2.5 Human, Ergonomic, and Safety Aspects
Human and ergonomic factors can have a great influence on the reliability, main- tainability, availability, and safety of complex equipment and systems. Experience shows that safety critical failures at system level are often caused by human errors related to design, manufacturing, installation (incl. handling & transportation), operation, or maintenance.
Errare humanum est should always be considered by a designer. Thus, because of the difficulties in modeling human's behavior in emergency situations, prevention must be preferred to modeling and, as for software quality, extensive requirements and design rules become important (see e.g. [5.14, 6.82] for military applications).
The following are basic design rules useful to avoid human errors during de- velopment, manufacturing, installation, and use of complex equipment and systems with high reliability and/or safety requirements, or at least to limit their effects (see pp. 10, 294-298 for modeling and further considerations).
1. Clearly define which subfunctions (of the required function) will be performed by machine and which by human.
2. Analyze the tasks assigned to the human and partition them in appropriate subtasks, separating operation and maintenance tasks (analysis focuses on input information to human, evaluation process, action to be taken, environ- ments & constraints, tools & job aids, skill required, feedback).
3. For safety critical decisions or subtasks, bypass the human wherever possible, e.g. using majority redundancy also for actuators (series for close, parallel for open) or, at least, introduce two-step actions (the first step being reversible).
4. Design go/no-go or fail-safe circuitries to warn from (or avoid) safety critical failures.
5. Make alarms(acoustic and/or visual),clear,differentforeachrelevantmalfunc- tion, and so that they can be correctly interpreted by operators&maintainers, taking care of their reaction time (use preferably tones for status indications and speech for all other information); minimize the number of alarms.
6. Use visual presentation for information which are long, complex,or needed later.
7. Limit the use of color-coded information (if necessary, combine color informa- tion with appropriate acoustical signals).
8. Describe system status, detected fault, and action to be accomplished concisely in full text and make them easily readable.
9. Consider ergonomic as well as man-machine aspects to avoid mistakes at operation or maintenance; in particular, select carefully shape & placement of control knobs and the layout of operating consoles.
10. Adapt control and display elements to the required skill for operators and maintainers.
11. In displaying information, consider that the optimal visual field is 15 degrees up, down, left, and right (order information left to right and top to bottom).
12. Simplify as far as possible operation and maintenance.
13. Use high standardization in selecting operational and maintenance tools.
14 Make any labeling simple and clear.
15. Conceive operation and maintenance procedures to be as simple as possible, taking care also of the user’s skill level; order all steps in a logical sequence;
document, wherever possible, the steps by a visual feedback, and describe them clearly and concisely in the user documentation.
16. Fix in the user documentation all assumptions (requirements) regarding skill, training, motivation, and work conditions for operators and maintainers, as well as related organizational controls.