design for mechanical reliability


Presentation Description

No description available.


Presentation Transcript



Chapter Objectives :

Chapter Objectives Introduce the need for design for reliability List the main causes of reliability failures How do failures relate to their mechanisms Describe each failure Propose design guidelines against the failure

What is Reliability?:

What is Reliability? Reliability is: The ability of an item to perform its required function under defined customer operating conditions for a stated period of time. The probability that no (system) failure will occur in a given time interval In research, the term reliability means "repeatability" or "consistency". A measure is considered reliable if it would give us the same result over and over again

Other Names of DFR:

Other Names of DFR DFR has many aliases: Design for Durability Design for Robustness Design for Useful Life

What do Reliability Engineers Do?:

What do Reliability Engineers Do? Implement Reliability Engineering Programs across all functions Engineering Research manufacturing Testing Packaging field service

What is Probability?:

What is Probability? Probability is: A measure that describes the chance or likelihood that an event will occur. The probability that event (A) occurs is represented by a number between 0 (zero) and 1. When P(A) = 0, the event cannot occur. When P(A) = 1, the event is certain to occur. When P(A) = 0.5, the event is as likely to occur as it is not.

Why Design for Reliability?:

Why Design for Reliability? Reliability can make or break the long-term success of a product. Too high reliability will cause the product to be too expensive Too low reliability will cause warranty and repair costs to be high and therefore market share will be lost .

Cost-Reliability Functions:

Cost-Reliability Functions

What are Noise Factors?:

What are Noise Factors? Noise Factors are sources of disturbing influences that can disrupt the ideal function, causing error states which lead to quality problems.

Reliability Terms:

Reliability Terms Mean Time To Failure ( MTTF ) for non-repairable systems Mean Time Between Failures for repairable systems ( MTBF ) Reliability Probability (survival) R(t) Failure Probability (cumulative density function ) F(t)=1-R(t) Failure Probability Density f(t) Failure Rate (hazard rate) λ (t)


MTBF & MTTF M ean T ime B etween F ailures – Applies to repairable items. M ean T ime T o F ailure – Applies to non-repairable items. Both of these terms indicate the average time an item is expected to function before failure.

Important Relationships:

Important Relationships Where is the failure rate function

Serial reliability:

Serial reliability Simple product of the probabilities of failure of components More components = less reliability Parallel reliability Dramatically reduces probability of failure.

Reliability Failure Modes:

Reliability Failure Modes Failures may be SUDDEN (non-predictable) or GRADUAL (predictable). They may also be PARTIAL or COMPLETE . A Catastrophic failure is both sudden and complete. A Degradation failure is both gradual and partial. Two root causes: 1. lack of robustness 2. mistakes

Causes of Failure:

Causes of Failure Misuse – Failures attributable to the application of stresses beyond the stated capabilities of the item. Inherent Weakness – Failures attributable to weakness inherent in the item itself when subjected to stresses within the stated capabilities of the item.

Classifications of Reliability Failure:

Classifications of Reliability Failure Early stage failure – C auses for such type of failure are inadequate design, poor manufacturing, and inappropriate usage. these can be catastrophic to human life. Overstress Mechanisms – These occur due to insufficient safety factor in design, higher than expected random loads, human errors, misapplication. Wearout Mechanisms – Occur late in life and then increase with age.This happens on corrosion, material fatigue, poor maintenance, creep , degradation in strength.

Failure mechanisms in microelectronic system packages:

Failure mechanisms in microelectronic system packages

Common Measures of Unreliability:

Common Measures of Unreliability % Failure - % of failures in a total population MTTF (Mean Time To Failure) - the average time of operation to first failure. MTBF (Mean Time Between Failure) - the average time between product failures. Repairs Per Thousand (R/1000) B q Life – Life at which q% of the population will fail

The Bathtub Curve:

The Bathtub Curve Reliability specialists often describe the lifetime of a population of products using a graphical representation called the bathtub curve. The bathtub curve consists of three periods: an infant mortality period with a decreasing failure rate followed by a normal life period (also known as "useful life") with a low, relatively constant failure rate and concluding with a wear-out period that exhibits an increasing failure rate.


21 Reliability Age Prob of dying in the next year (deaths/ 1000) From the Statistical Bulletin 79 , no 1, Jan-Mar 1998

Steps in Designing for Reliability:

Steps in Designing for Reliability Develop a Reliability Plan Determine Which Reliability Tools are Needed Analyze Noise Factors Tests for Reliability Track Failures and Determine Corrective Actions

Develop a Reliability Plan:

Develop a Reliability Plan Planning for reliability is just as important as planning for design and manufacturing. Why? To determine: useful life of product what accelerated life testing to be used Reliability must be as close to perfect as possible for the product’s useful life. You MUST know where your product's major points of failure are!

Tools for testing:

Tools for testing Stress Analysis Reliability Predictions (MTBF) FMEA (Failure Mode and Effects Analysis) Fault Tree Analysis Reliability Block Diagrams

Why do Reliability Calculation?:

Why do Reliability Calculation? Reliability calculations make the product more reliable which can be used as a selling feature by the marketing department. Also, this adds to the company reputation and can be used for comparisons with competition.

Stress Analysis:

Stress Analysis It establishes the presence of a safety margin thus enhancing system life. Stress analysis provides input data for reliability prediction. It is based on customer requirements.

Reliability Predictions (MTBF):

Reliability Predictions (MTBF) MTBF (Mean Time between Failures) for an existing product can be found by studying field failure data. For a new product however, or if significant changes are made to the design, it may be required to estimate or calculate MTBF before any field data is available.

Failure mode and effects analysis (FMEA):

Failure mode and effects analysis (FMEA) Failure Mode: Consider each component or functional block and how it can fail. Determine the Effect of each failure mode, and the severity on system function. Determine the likelihood of occurrence and detecting the failure. Calculate the Risk Priority Number (RPN = Severity X Occurrence X Detection). Consider corrective actions (may reduce severity of occurrence, or increase probably of detection). Start with the higher RPN values (most severe problems) and work down. Recalculate RPN after the corrective actions have been determined, the aim is to minimize RPN.

CASE STUDY: Network Storage Evaluations Using Reliability Calculations:

CASE STUDY: Network Storage Evaluations Using Reliability Calculations This section uses a case study to introduce concepts and calculations for systematically comparing redundancy and reliability factors as they apply to network storage configurations. We will determine a reliability figure on three very basic architectures. The starting point of our study is the network storage requirements.

Network Storage Requirements:

Network Storage Requirements We want networked storage that has access to one server. Later, this storage will be accessible to other servers. The server is already in place, and has been designed to sustain single component hardware failures (with dual host bus adapters (HBAs), for example). Data on this storage must be mirrored, and the storage access must also stand up to hardware failures. The cost of the storage system must be reasonable, while still providing good performance.

Architecture 1:

Architecture 1 Architecture 1 provides the basic storage necessities we are looking for with the following advantages and disadvantages: Advantages: Storage is accessible if one of the links is down. Storage A is mirrored onto B. Other servers can be connected to the concentrator to access the storage. Disadvantages: If the concentrator fails, we have no more access to the storage. This concentrator is a single point of failure(SPOF).

Architecture 2:

Architecture 2 Architecture 2 has been improved to take into account the previous SPOF. A concentrator has been added. Advantages: If any links or components go down, storage is still accessible (resilient to hardware failures). Data is mirrored (Disk A <-> Disk B). Other servers can be connected to both concentrators to access the storage space.

Architecture 3:

Architecture 3 The main difference is that Disk A and Disk B have only one data path. Disk A is still mirrored to Disk B, as required. This architecture has all the advantages of the previous architectures with the following differences: Disk A can only be accessed through Link C, and Disk B only through Link D. There is no data multi pathing software layer, which results in easier administration and easier troubleshooting.

Determining Reliability:

Determining Reliability Using the reliability formulas , we can determine which architecture has the highest reliability value. For the purpose of this article , we will use sample MTBF values (as obtained by the manufacturer) and AFR*(Annual Failure Rate) values shown in the table below: *(The AFR for each component was calculated using the MTBF where (8760/MTBF) = AFR). The example MTBF values were taken from real network storage component statistics. However, such values vary greatly, and these numbers are given here purely for illustration .

Determining Reliability:

Determining Reliability Component AFR Variable Sample MTBF Values (hours) AFR HBA 1 H 800,000 0.011 HBA 2 H LINK A L 400,000 0.022 LINK B L Concentrator 1 C 580,000 0.0151 Concentrator 2 C LINK C L 400,000 0.022 LINK D L Disk A D 1,000,000 0.0088 Disk B D

Determining Reliability:

Determining Reliability Having the rate of failure of each individual component, we can obtain the system's annual failure rate AFR and consequently the system reliability (R) and system MTBF values. The AFR values of redundant components are multiplied to the power equal to the number of redundant components. The AFR values of non-redundant components are multiplied by the number of those components in series.


Calculation In case of Architecture 1, concentrator(C) is the only non-redundant component. AFR1 = (H+L)2 + C + L2 + D2 AFR1 = (0.011+0.022) 2 + 0.0151 + (0.022)2 + (0.0088)2 = 0.0167 R1 = 1 - AFR1 = 1 – 0.0167 = 0.9833, or 98.33% MTBF1= 8760/AFR1 = 8760/0.0167 = 524,551 hours.


Calculation The architecture 2 has a different configuration with no non-redundant components. AFR2 = (H+L+C+L) 2 + D2 AFR2 = (0.011+0.022+0.0151+0.022) 2 + (0.0088)2 = 0.0005 R2 = 1 – AFR2 = 1 – 0.0005 = 0.995, or 99.50% MTBF2= 8760/AFR2 = 8760/0.0005 = 1,752,000 hours.


Calculation Architecture 3 has yet another configuration and has no non-redundant components. AFR3 = (H+L+C+L+D) 2 AFR3 = (0.011+0.022+0.0151+0.022+0.0088) 2 = 0.0062 R3 = 1 – AFR3 = 1 – 0.0062 = 0.9938, or 99.38% MTBF3= 8760/AFR3 = 8760/0.0062 = 1,412,903 hours.


Conclusion When the calculations are complete, we compare the data: Architecture 1 = 98.33%, or a System's MTBF = 524,551 hours Architecture 2 = 99.50%, or a System's MTBF = 1,752,000 hours Architecture 3 = 99.38%, or a System's MTBF = 1,412,903 hours The MTBF figures are the most revealing, and indicate that architecture 2 is statistically the most reliable of all.

Failure Effects (What customer experiences):

Failure Effects (What customer experiences) Noise Inoperability Instability Intermittent operation Roughness Excessive effort requirements Unpleasant or unusual odor Poor appearance

PowerPoint Presentation:

Design & Manufacture Pre-Production Design Control of Production Working Tolerances Material Quality Component Quality Component Stress I nstallation & Environmental Temperature Humidity Vibration Chemical Attack Interconnections Factors Affecting Reliability

Design against failure:

Design against failure Important to understand the failure (why, where, how long, application, etc.) Two methods for design against failure: By reducing the stress that cause the failure. By increasing the strength of the component. Either one can be achieved by: Selecting materials Changing the package geometry Changing the dimensions Protection

What is Fatigue?:

What is Fatigue? Fatigue is the most common mechanism of failure and responsible for 90% of all structural and electrical failures. Occurs in metals, polymers, and ceramics. Metal paper clip example Bend in both directions Repeat the process

Typical Fatigue Load Cycle:

Typical Fatigue Load Cycle Stress vs. time, max & min, Δ S, S a Fatigue cycle – successive maxima/minima in load or stress The number of fatigue cycles to failure designed by N f The number of fatigue cycles per second – cyclic frequency The average of the max and min stress – mean stress, S mean

Design Against Fatigue Failure:

Design Against Fatigue Failure Increase fatigue strength. Reduce the amplitude of cylic loading. avoid stress concentration region

Design Against Brittle Fracture:

Design Against Brittle Fracture Brittle fracture is an overstress failure mechanism that occurs rapidly with little or no warning when the induced stress in the component exceeds the fraction strength of the material. Occurs in brittle materials (ceramics, glasses and silicon). Applied stress and work could break the atomic bonds.

Design Guidelines to Reduce Brittle Fracture:

Design Guidelines to Reduce Brittle Fracture Designs with materials and processing conditions that would produce the least stress in brittle materials should be created. The brittle material should be polished to remove surface flaws to enhance reliability.

Design Against Creep Failure:

Design Against Creep Failure What is Creep? A time-dependent deformation process under load. Thermally-activated process: the rate of deformation for a given stress level increases significantly with temperature. Deformation depends on The applied load. The duration through which the load is applied Elevated temperature

Design Against Creep Failure:

Design Against Creep Failure Creep can occur at any stress level. Creep is most important at elevated temperatures. Creep fatigue failure in a lead/tin solder circuit board connection

Design Guidelines to Reduce Creep-Induced Failure.:

Design Guidelines to Reduce Creep-Induced Failure. Use materials with high melting point if the application calls for harsh temperature conditions. Reduction of mechanical stress will reduce creep deformation. Creep is a time controlled phenomenon.

Design Against Plastic Deformation:

Design Against Plastic Deformation What is Plastic Deformation? When the applied mechanical stress exceeds the elastic limit or yield point of a material. It is permanent. Excessive deformation and continued accumulation of plastic strain due to cyclic loading will eventually lead to cracking of the component and make it unusable.

Design Guidelines Against Plastic Deformation:

Design Guidelines Against Plastic Deformation Limit the design stresses in the packaging structure below the yield strength of the materials used. If possible, use materials that have high yield strength. Design and control the local plastic deformation at regions of stress concentrations.

Chemically Induced Failures:

Chemically Induced Failures What are Chemically Induced Failures? Chemical process such as electrochemical reactions can result in cracking of vias , traces, or interconnects leading to electrical failures. Two Types Corrosion Intermetallic Diffusion

Design Against Corrosion-Induced Failure:

Design Against Corrosion-Induced Failure What is Chemical Corrosion? The chemical or electrochemical reaction between a material, usually a metal, and its environment that produces a deterioration of the material and its properties.

Design Guidelines to Reduce Corrosion:

Design Guidelines to Reduce Corrosion Metals with a high oxidation potential tend to corrode faster. Use hermetic packages to prevent moisture absorption. Ensure there are no trapped moisture or contaminants during the processing an assembly of the packages.

Design Against Intermetallic Diffusion:

Design Against Intermetallic Diffusion What is Intermetallic Diffusion? During wirebonding and solder reflow, the joining process generates intermetallic layers which are byproducts of the joining process.

Design Guidelines Against Intermetallic Diffusion:

Design Guidelines Against Intermetallic Diffusion Limit the process temperatures and control the time exposed to high temperatures during the joining process. Control the temperature range and cycles of exposure at the high temperature period. Application of nickel/gold coating on the bare copper pad surfaces.

Achieving reliability growth:

Achieving reliability growth Detect failure causes Feedback Redesign Improved fabrication Verification of redesign


References “Mechanical reliability and design” by “A.D.S Carter” “Introduction to reliability in design” by “Charles O. Smith.”

authorStream Live Help