Lesson 4: Designing for reliability

Lesson 4: Designing for reliability

Learning Objectives TLO: While reviewing a company’s proposal, consider all the activities that contractors may select in designing for reliability. • ELO 1: Explain why failure rates are easier to deal with • than MTBFs. • ELO 2: Distinguish mission reliability from logistics reliability • ELO 3: Explain the aim of Failure Modes, Effects and Criticality Analysis. • ELO 4: Explain the aim of Reliability-Centered Maintenance • Analysis. • ELO 5: List three purposes for reliability predictions • ELO 6: Discuss eight other reliability design or analysis activities

Defense Acquisition Guidebook 4.3.18.19. Reliability and Maintainability Engineering (regarding design for reliability) • The purpose of Reliability and Maintainability (R&M) engineering (Maintainability includes Built-In-Test (BIT)) is to influence system design in order to increase mission capability and availability, and decrease logistics burden and cost over a system’s life cycle.

Design for Reliability • RCMA • Software Reliability • Reliability Prediction • Reliability-Critical Items Analysis • Storage Analysis • Sneak Circuit Analysis • Allocation • Modeling • Integrity Analysis • Parts Selection • Derating • Thermal Design • FMECA

Reliability Allocation Apportioning the overall system level reliability requirements Need to allocate customer system requirements to lower levels of assembly, i.e., “top-down” • Subsystem • Line replaceable unit (LRU), Weapon replaceable assembly (WRA) • Shop replaceable unit (SRU), Shop replaceable assembly (SRA) • Individual components

Uses for Allocation • Establishing goals • Tracking progress • Apply to logistics models

Allocating the System Failure Rate Failure rates are easier to deal with than MTBFs because • Subsystem failure rates add up to the System failure rate. Note: Assumes the subsystems are mission-critical and subject to only randomly occurring failures at a constant failure rate (i.e., in the “useful life” phase of the Bath Tub curve)

Allocation Methods • Equal Allocation Method • Weighted Allocation Method • A General Iterative Allocation Method

Equal Allocation Method • Assigns equal reliability requirement to each subsystem • Fails to consider the relative effort required to achieve equal reliabilities • Limited usefulness.

Equal Allocation Method Example Assume: A new car has a system level failure rate requirement of (not to exceed) ten failures per 100,000 miles. Automobile mission critical subsystems are: Electrical (EL), Drive Train (DT), Steering (ST), and Brakes (BK)

Weighted Allocation Method Considers or weights factors such as complexity of the system, safety/mission criticality, number of parts, physical weight of components, maturity of the technology, past experience, etc. Example: “Weighted” model using expected parts count Situation: Same car requirement as before. Engineers have estimated the following parts count based upon past design history, new functional requirements, and new technologies.

Weighted Allocation Example Allocated requirement: Electrical10% x 10 failures/100,000 miles = 1 failure/100,000 miles Drive Train20% x 10 failures/100,000 miles = 2 failures/100,000 miles Steering20% x 10 failures/100,000 miles = 2 failures/100,000 miles Brakes50% x 10 failures/100,000 miles = 5 failures/100,000 miles

Iterative Allocation Method • Set up a system-level model for failure rate: • Estimate most probable, achievable failure rates for each subsystem in the model: • Calculate the resulting system-level failure rate: • Compare the model result with the actual system- level requirement • If the model result is smaller than the required failure rate, the input values become the allocated values Otherwise: • “Tweak” the model; add redundancy if practical (REQUIRES USE OF PROBABILITY MODEL). Reevaluate the failure rate inputs. Recalculate the model output. • Repeat this process until the model output is less than the required value. • If the desired model output cannot be achieved, then the system requirement is perhaps unrealistic and may have to be relaxed.

Reliability Allocation Example This shows how one company allocated the system reliability requirement to lower levels of indenture.

Reliability Modeling Sometimes a single unit design concept may not appear to provide sufficient safety or mission reliability. The solution sometimes focuses on employing one or more redundant items: • –Multiple, identical items • –Primary vs secondary items EXAMPLE: The initial design for a system has three subsystems (A, B, & C) in series (each is critical for the system to be successful). The mission reliability (MR) block diagram can be drawn with a series reliability model:

Mission Reliability What is the Mission Reliability (MR) = ? Since all three subsystems must operate MR = RA x RBx RC = 0.95 x 0.90 x 0.80 = 0.684

Reliability Modeling If the customer’s mission reliability requirement for system is 80%, will the proposed system meet the user’s need? • What alternatives exist to improve the system? • Delete one or more units and associated functions? • Improve reliability of A and/or B? • Improve reliability of C? • Technological limitations? • What if item C = GFE! • Incorporate redundancy?

Reliability Modeling - Redundancy Since “C” is the most unreliable subsystem, let’s add a second, identical, redundant subsystem “C” so that either can perform mission functions. Reliability (Redundant “C”s) = ?

Reliability Modeling - Redundancy Redundancy Improved the Mission Reliability

Redundancy Disadvantage • Improved MISSION Reliability is an apparent benefit of redundancy. • What are the DISADVANTAGES of redundancy? • How many mission critical failures would you expect for the above C1/C2subsystem in 1000 missions? • How many logistics failures (e.g. maintenance actions or removals) would you expect in the above C1/C2 subsystem in 1000 missions?

Logistics Reliability Logistics Reliability (LR) • is the probability that the mission will generate no corrective maintenance requirements • treats everything as if in series • is aka “Basic Reliability”

Logistics Reliability Logistics Reliability Block Diagram with Active Redundancy Active Redundancy Reduces Logistics Reliability

Redundancy Redundancy generally: • increases mission reliability. • decreases logistics reliability and therefore increases support costs. Try to improve the reliability of a single unit whenever possible; use redundancy as a last option.

Integrity Analysis • Predict cumulative stresses over expected lifetime • Design enough strength so product will not fail in expected lifetime • Reduces the need for a reliability growth program

Parts Selection Program Objectives: • Select parts of known and high reliability • Minimize the number of new parts entering the supply system Simplify logistic support: • Enhance interchangeability, reliability, maintainability • Conserve resources

Derating The practice of limiting electrical, thermal, and mechanical stresses on parts (devices) to levels below their specified or proven capabilities in order to enhance reliability. The use (application) of parts (devices) in such a manner that applied stresses are less than maximum ratings. Note: Derating and many other R&M-related activities are discussed in MIL-HDBK-338, “Electronic Reliability Design Handbook.”

Derating Design Approaches • Increase Average Strength. This approach is tolerable if size and weight increases can be accepted or if a stronger material is available. • Decrease Average Stress. Occasionally the average stress on bolts can be reduced, by say, increasing the number of bolts. • Decrease Stress Variation. The variation in stress is usually hard to control. However, the stress distribution can be effectively truncated by putting limitations on use conditions. • Decrease Strength Variation. The inherent part-to-part variation in strength can be reduced by improving the basic process, holding tighter control over the process, or by utilizing tests to eliminate the less desirable parts.

Nominal Strength and Stress Distributions The area under the strength and stress curve intersection is the probability of failure Strength Distribution Stress Distribution Mean Stress Mean Strength

Nominal Strength and Stress Distributions 1. Increasing the Average Strength Increases Reliability Mean Stress Strength Distribution Stress Distribution Mean Stress Mean Strength

Nominal Strength and Stress Distributions 2. Decreasing the Average Stress Increases Reliability Strength Distribution Stress Distribution Mean Stress Mean Strength

Nominal Strength and Stress Distributions 3. Decreasing the Stress Variation Increases the Reliability Strength Distribution Stress Distribution Mean Stress Mean Strength

Nominal Strength and Stress Distributions 4. Decreasing the Strength Variation Increases the Reliability Strength Distribution Stress Distribution Mean Stress Mean Strength

Thermal Design • Heat vs. reliability • All processes are less than 100% efficient and generate heat • Heat raises equipment operating temperature • Higher operating temperatures generally reduce reliability • Need to select “stronger” parts or reduce temperature • Two basic methods for reducing temperature • Generate less heat! More efficient designs, parts • Move or transfer heat from sensitive devices A one-degree Celsius drop in operating temperature reduces the failure rate of electronics by approximately 3%

Thermal Design • Predict temperatures of the parts in the operational environment. • Understand how hot the parts can get before failing • Design-in adequate cooling

FMECA Failure Modes Effects and Criticality Analysis A procedure for analyzing each potential failure mode in a product to determine the results or effects thereof on the product. When the analysis is extended to classify each potential failure mode according to its severity and probability of occurrence it is called a Failure Modes, Effects, and Criticality Analysis (FMECA). Electronic Reliability Design Handbook, MIL-HDBK-338B. Overall purpose: safer, more reliable initial design.

Steps in the FMECA Process • What is the function of the system? How does it work? • Parts? • Interfaces? • Software? • How many ways can each item malfunction? • If an item malfunctions, what happens? • To the next higher assemblies? • To the system? • What is the criticality with regards to safety, missions, costs? • Is there a high probability that it can happen?

FMECA Example

Reliability Centered Maintenance Reliability Centered Maintenance Analysis • Determines Preventive Maintenance Tasks and Intervals • Aim: to attain all the inherent reliability (delay the onset of wear-out)

Hazard Rate Bath Tub Curve Extend Useful Life with Preventative Maintenance

Software R&M Definition of software reliability: “The probability of failure-free operation of a software component or system in a specified environment for a specified time.” -C. R. Vick and C. V. Ramamoorthy, Handbook of Software Engineering, Van Nostrand Reinhold Co., Inc., NY. Note: “An assumption that all software is completely reliable shall be stated in instances where software reliability is not incorporated into the item reliability prediction.” MIL-HDBK-338 Software errors are incorporated during requirement, design and coding.

Software R&M Tools • Good identification of requirements • Modular design • Use of higher order languages • Re-usable software: pre-packaged, debugged s/w packages • Use of a single language • Fault tolerance • FMEA • Review and verification via second team • Functional testing -“debugging” the software • Good documentation

Software R&M Tools Software Fault Tolerance Via N-Version Programming

Software R&M Tools Another Software Fault Tolerance Technique: Recovery Blocks

Reliability Prediction: Overview • Why predict reliability or maintainability characteristics? • Evaluate feasibility of the design • Compare competing designs • Forecast logistics needs (spares, maintenance labor, test equipment, etc.) • Prediction methods • Comparative analysis • Parts Count • Stress Analysis • Translators

Reliability Prediction Using Comparability Analysis Useful when there’s not enough data for more sophisticated prediction models. Example: Advanced Tactical Fighter (ATF)Concept Exploration Phase. (Note: the ATF became the F-22 Aircraft) To predict MTBM-Inherent for mechanical components of the Flight Control system: • Choose a “comparable” system: F-16 • Assess a reliability multiplier since comparable system was designed: 1.1 • Extract field data from data base for the comparable system : F-16 MTBM-Inherent = 105 hours • Multiply: Predicted MTBM-inherent = (1.1) (105 hours) = 116 hours

Reliability Prediction Parts Count Method: • Useful when the design is still soft • Need: list of parts, failure rates, and quantities

Reliability Prediction Stress Analysis using MIL-HDBK-217F (Reliability Prediction of Electronic Equipment) • Provides “failure rates” of electronic devices • Ignores factors that can degrade field failure rates (manufacturing variability; packaging, shipping; handling; operator/maintainer error) • So, not a good predictor of absolute field failure rates! • But, of some use as predictor of relative field failure rates

Reliability Prediction Using “Translators” to Predict Field Reliability for a Black Box From Contractual Test Data Given: Contractual test data for a black box (e.g. 4000 operating hours resulting in 4 “failures”, as defined in the contract specification) lead to an MTBF estimate of 1000 (OH)/”failure”. Question: What field reliability (MFHBR) can we predict? Answer: Assuming the following “translators” based on historical data for this type of black box: Translator 1 (Converting operating hours to flying hours): On average, 1 out of every 2 operating hours is a flying hour. So, 1000 (OH) / “failure” = 500 (FH) / “failure”

Reliability Prediction Using “Translators” to Predict Field Reliability for a Black Box From Contractual Test Data Given: Contractual test data for a black box (e.g. 4000 operating hours resulting in 4 “failures”, as defined in the contract specification) lead to an MTBF estimate of 1000 (OH)/”failure”. Question: What field reliability (MFHBR) can we predict? Answer: Assuming the following “translators” based on historical data for this type of black box: Translator 2 (Converting “failures” to “removal”): On average, 1 out of every 4 removals is for a “failure.” So, 500 FH/ “failure” = 500 FH/ 4 removals = 125 FH/Removal Therefore, MFHBR = 125 flying hours between removals

Reliability Prediction - Summary • Reasons for predicting reliability • Methods for predicting reliability • Comparability analysis • Parts count • Stress Analysis • Translators

Lesson 4: Designing for reliability

Lesson 4: Designing for reliability

Presentation Transcript

CLR: Designing Managed AddIns For Reliability, Security, And Versioning

Chapter 4 – Reliability

Designing DCCP: Congestion Control Without Reliability

Designing Systems of Care for Reliability

Designing effective lesson plans

LESSON 4-4

Lesson 4-4

Designing DCCP: Congestion Control Without Reliability

Lesson 4-4

Designing DCCP: Congestion Control Without Reliability

Lesson 4 - 4

Designing a Lesson for High School Chinese

Chapter 4 Supplement Reliability

Designing for System Reliability

Designing Winning Lesson Plans

Designing for System Reliability

Lesson 4-4

LESSON 4–4

LESSON 4-4

Lesson 4-4

Designing DCCP: Congestion Control Without Reliability

Lesson 4-4