Map Project Reliabilty Programme Overview

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

MAP Project Reliability Program OverviewMichael BayMAP System EngineerMichael.Bay@gsfc.nasa.govJackson and TullAerospace Engineering DivisionHow FMEA, FTA, RBD and PRAs were used on MAP, and how they fit into overall mission success in the Faster Better Cheaper environment. 9 August 2000w/ updates September 2000 : 

MAP Project Reliability Program OverviewMichael BayMAP System EngineerMichael.Bay@gsfc.nasa.govJackson and TullAerospace Engineering DivisionHow FMEA, FTA, RBD and PRAs were used on MAP, and how they fit into overall mission success in the Faster Better Cheaper environment. 9 August 2000w/ updates September 2000

Current Environment : 

Current Environment

How to Do It? : 

How to Do It? Understand: What makes a Mission Successful? Proper Execution of the Basics in Engineering and Project Management Attention to Detail Appropriate Discipline and Rigor Test, Test, and Retest. Retest after changes. What makes a Mission Unsuccessful? Recent Failures have not been due to the unexpected behavior of a breakthrough new technology Recent Failures have been due to missing an important detail at more than one level of assembly or test. Most of the Recent Failures would not have been caught by classical FMEAs, FTAs or PRAs alone. However, it is the mind set, the systems thinking, and all the questions asked in the process that would surface the issues. The Devil is in the Details Expect the Unexpected Do not become complacent Beware of inappropriate application of “Heritage” and assumptions that do not fit

How to do More with Less : 

How to do More with Less Do not forget the basics of what goes into a Successful Mission Understand the Risks in both the Flight/Ground Segments and Project Execution Important Distinction for Risk Management Flight/Ground Segments are the end products performing their desired function in their operational environment FMEA, FTA, RBD and PRAs are good tools here Project Execution is the ability to deliver the desired product meeting requirements, on time and within cost. Classical FMEA, FTA, RBD do not apply here, although the techniques could be applied PRAs are appropriate here

Orchestrating a Balance : 

Orchestrating a Balance

Risk Management : 

Risk Management Risk is the Uncertainty of Performance, Reliability, Cost or Schedule To quantify risk look at likelihood and consequence of an event Risk Management What Can Go Wrong? How Will We Know Something Has Gone Wrong? When Will We Know that Something Has Gone Wrong? What Will We Do About It? Expect the Unexpected so that The Unexpected Becomes Expected These Questions Asked Globally Every Day from Design through Manufacturing, Test and Operations will do the most to Assure Mission Success Recent Red Team Activities ask these types of questions Try to quantify Risk Attempt to identify if anything was missed in the basics

MAP Observatory : 

MAP Observatory Microwave Anisotropy Probe Map the Cosmic Microwave Background Radiation Follow on to COBE with 50 times the resolution Medium Size Explorer, MIDEX Operate at L2, Store and forward data every day 3 Axis, Scan Sky at 1 revolution every 2 minutes 840 Kg Estimate, Approx 3.6 meters tall, 5.1 Across 400 Watts, 72 Kg Fuel

MAP Reliability Philosophy : 

MAP Reliability Philosophy Maximize Science Return for given Cost and Schedule AO Direction Due to cost limits systems predominantly nonredundant or “single string” Selective redundancy encouraged where resources allow Redundancy up to each mission and the PI Reliability Designed in from the Beginning MAP Assurance Requirements Cover Total Program: Design with proper parts application Grade 3 Parts program selected as best value for the MAP Program Workmanship Inspection program to NHB5300 or equivalent A Peer Reviewed, Simple and Robust Design providing graceful degradation Failure Modes and Effects Analysis, Fault Tree Analysis, Reliability Predictions and Probabilistic Risk Assessment used to identify mission ending failures, designs adjusted where possible to shift effect from “mission ending” to “degraded mission” Test program accumulating significant mission specific test time Constant drive to identify and strengthen “weak links” to mission success

Identify Weak Links : 

Identify Weak Links

The Basics, Launch Readiness Flow : 

The Basics, Launch Readiness Flow

Reliability Analysis Flow : 

Reliability Analysis Flow

Reliability Improvement Approach : 

Reliability Improvement Approach Identify Weak Links Estimate failure rates for each subsystem element (component or card) Compute failure rates using MIL-HDBK-217 techniques Collect measured failure rates from flight, life test, or vendor data Average Failure Rates where multiple sources exist Evaluate effects/consequence of failure, revisit Failure Modes and Effects Analysis Evaluate possible mitigation approaches Total Redundancy Minimal “point design” hardware to augment existing system “Back door” paths to allow backup functions Compute System Reliability improvement for each mitigation approach Study resources necessary to implement each mitigation approach Mass, Power, Cost, Parts Availability, Schedule, ability to “descope”, Manpower Select mitigation approaches to maximize efficiency of total program Total System Reliability Improvement vs Required Resources

Probabilistic Risk Assessment : 

Probabilistic Risk Assessment

Reliability Process Overview : 

Reliability Process Overview Design FMEA to identify mission threatening failures from mission degradation Revise designs to convert “loss of mission” failures into “loss of function” or mission degradation Reliability failure rate analysis used to weigh the relative benefit of one design implementation versus another Verification of proper parts application in design Peer Review Process Manufacturing Workmanship Inspection to verify as built hardware meets designer’s intent Materials and Process Control Testing Verify as built hardware meets designer’s requirements in the intended application Sufficient Test time to find infant mortality failures Operations Onboard Fault Detection and Correction to safe spacecraft to provide ground time to react and potentially recover from an anomaly Operational Contingency Procedures and Backup Plans for mission critical and recoverable failures Reliability Philosophy and MAP Mission Assurance Requirements communicated to MAP Hardware Suppliers (Very important to assure a supplier is not a weak link)

Reliability Process Design and Analysis Phase : 

Reliability Process Design and Analysis Phase 1. Perform System Level FMEA and FTA to determine failures that result in mission loss versus mission degradation 2. Adjust design or implementation such that failures categorized as mission loss are moved to the degraded mission category. The overall goal is to reduce the number of potential mission failures. 3. Reliability failure rate analysis and Reliability Block Diagrams are used to weigh the relative benefit of one design implementation versus another. 4. Where failures result in graceful degradation and require rapid ground intervention or changes in operational plans to save the spacecraft, prepare contingency procedures or software loads to implement them. 5. Critically review the design of the spacecraft power bus. A short on the primary power bus can take out the whole spacecraft. The design of the power bus is such that shorts are considered not credible by design. 6. Peer Review process for both Hardware and Software to identify potential design and/or implementation problems.

Design and Analysis PhaseFailure Modes and Effects Analysis : 

Design and Analysis PhaseFailure Modes and Effects Analysis Reliability and Failure Modes and Effects Analysis have different goals for redundant and single string spacecraft. As a single string spacecraft MAP strives to minimize the effects of a failure whereas a redundant spacecraft strives to avoid single point failures. For a single string mission, large number of faults can result in mission loss. However, there are also many failures that may result in partial loss of function or in a reduction in performance. These type of failures result in “graceful degradation”. Look at interfaces and down to the circuit level. A redundant spacecraft design focuses primarily on preventing single point failures and focuses less on designing in graceful degradation. Usually stops at interfaces to assure faults do not propagate to redundant unit. For MAP designing in graceful degradation is much more important since there are minimal redundant units available for backup. The FMEA is synchronized with the Fault Tree at the major component functional level (i.e. Transponder Receiver, ACE Safehold, PSE Load Switching)

Integrated Mission Fault Tree : 

Integrated Mission Fault Tree White Box - Failure Propagation Red Colored Box - Single Point Failures Yellow Colored Box - Graceful Failures Green Colored Box - Redundancy Failures Yellow Outline Box - Ground Contingency Procedure Blue Outline Box - Onboard FDC

Design and Analysis PhaseFault Tree Analysis : 

Design and Analysis PhaseFault Tree Analysis Fault Tree Starts with “Loss of Mission” as the top block. Key to this Top Block is Understanding what defines a mission loss, Mission Success Criteria Knowing the Design of the System and how it will be operated, Postulate the faults that could result in loss of mission. Faults are logically combined and further decomposed until the lowest desired level is reached. Lowest level should overlap and be consistent with the FMEA. Typically the component major function level. (i.e. Transponder Receiver, ACE Safehold, PSE Load Switching) Requirements for Contingency procedure and Onboard Autonomous Switching (Fault Detection and Correction) should be included to show where action is required. The Fault Tree provides a graphical format for organizing postulated failures, understanding their consequence on the system, and understanding their relationship to other systems and subsystems

MAP Reliability Block Diagram : 

MAP Reliability Block Diagram

Design and Analysis PhaseReliability Block Diagram : 

Design and Analysis PhaseReliability Block Diagram Uncertainty in the absolute number of a total mission reliability prediction. Large error bars. Relative comparison between approaches or implementation are fairly good Indicates the relative improvement of redundancy Comparisons allow selection of more reliable solutions Computations based on Schematics and MIL-HDBK-217 Some historical data available from operations database Averaging of computations and historical data possible

Reliability Improvement Study Results : 

Reliability Improvement Study Results

Representative FMEA/PRA Summary : 

Representative FMEA/PRA Summary

PRA, Graphical Tree Format : 

PRA, Graphical Tree Format White Box - Failure Propagation Red Colored Box - High Risk Failure Yellow Colored Box - Medium Risk Failure Green Colored Box - Low Risk Failures Yellow Outline Box - Ground Contingency Procedure Blue Outline Box - Onboard FDC

Design and Analysis PhaseNew Technology and Mission Success : 

Design and Analysis PhaseNew Technology and Mission Success Select Approach to Mitigate New Technology Risks Risks to Technical Performance in End Item Application Risks to Project Execution Use Risk Management Techniques to weigh benefit of new technology versus the consequence of it not being ready or not working. Mitigation Steps Test working hardware/software as soon as possible Early verification through Engineering Test Units (ETUs) Define Alternate or Backup Sources Descope Plan - Prepare to scale back to minimum mission requirements

Reliability Process (cont.) Manufacturing and Inspection Phase : 

Reliability Process (cont.) Manufacturing and Inspection Phase 1. Failures are viewed as mechanical. Whenever an item fails it usually means that something moved, whether internal to a chip, on a circuit card or in harness. If it worked once and then does not, something moved. (EMI is the exception.) 2. Stress relief against vibration, mechanical motion, and thermal expansion. 3. Clearance to protect against shorts. Close inspection as lower level sub assemblies are assembled. 4. The power system electronics are carefully inspected during assembly to screen for potential shorts. Shorts on the power bus are considered not credible following inspection. 5. Eliminate sources and provide barriers to contamination that could cause shorts or degrade the surface properties of instruments or thermal control surfaces 6. Walkdowns and Inspections for critical items dependant on workmanship, RF Shields and grounding for ESD protection are examples. 7. Manufacturing process control and inspection are as important as they are on a redundant spacecraft. Manufacturing process control may even be more important because there is only one chance to get it right.

Reliability Process (cont.) Test Phase : 

Reliability Process (cont.) Test Phase 1. Accumulate sufficient test time to gain confidence infant mortality failure period has passed. Goal is on the order of ≈1000 hours total with last 100 failure free. 2. Test and or execute the sequences planned for the mission. Perform steps and send commands in the expected sequence with the expected timing 3. Command sequences are verified prior to first time execution onorbit. If a sequence is performed onorbit for the first time, analysis should exist that indicates the item will work. Items are tested in “pieces or in steps” instead of relying on analysis alone. 4. Critically test flight and ground software against requirements as well as the intended end item function independent of the “requirements”. 5. Exercise the hardware and software together during environmental test in the modes they are operated during the mission. 6. Specifically seek out “What is not Tested in Flight Configuration”. Review assumptions made in verification program especially where verification is accomplished in pieces or by simulation.

Basics, What is not Tested? : 

Basics, What is not Tested? Identify items that can not be test in the flight configuration and environment Assure that simulations and assumptions are appropriate Typical areas applicable to most projects End-to-end Instrument Optical or RF check at flight temperature(Tested in Pieces) Loaded Propulsion System and Thruster firings (Component Test) Solar Array deployment in zero G, vacuum, and temperature with gradients. (tested in pieces) Power System working with illuminated solar arrays (Verified against simulator with sufficient margin) ACS Operating in closed loop with flight hardware and flight software (Hardware tested with stimulators open loop, software verified with HDS closed loop, sensor actuator end to end phasing verification) Launch, ascent, separation, and acquisition sequence with the correct timing of external environmental events. Vibration, Thermal, Vacuum, Solar, etc. (Verified in pieces) Inability to test radiation (SEU in particular) environment (Parts testing based on design & engineering judgement) Inability to test surface or internal charging environment (Materials usage and testing based on design & engineering judgement)

Reliability Process (cont.) Operations Phase : 

Reliability Process (cont.) Operations Phase 1. Utilize a simple subset of the total Spacecraft electronics suite to provide an ACS Safehold that allows additional time for the ground to recover from an anomaly 2. Onboard failure detection to minimize the impact of mission threatening anomalies 3. Spacecraft informs ground of serious off nominal situations 4. Contingency procedures prepared for critical subsystems and mission events 5. Ground system capable of identifying adverse trends and/or off nominal performance 6. Training and exercising of the flight and ground systems during prelaunch mission simulations 7. Separation/Deployment and Propulsion Maneuvers performed within ground contact

Summary : 

Summary Overall Reliability process addresses total program lifecycle including: Design, Manufacturing, Test and Operations phases Reliability built in from the beginning FMEA, FTA, RBD, and PRA used as tools in an overall Reliability Assurance Program to optimize the architecture and design The PRA is maintained and updated with test results and other changes throughout the project life cycle Failure mitigators address Moving Parts, Parts Application, Environments, Software/Operations, Workmanship, and Random failure causes As part of the total reliability program MAP has implemented designs that provide graceful degradation and backups in selective areas MAP has achieved a balance of Performance, Reliability, Cost and Schedule within the available resources

Acronym List : 

Acronym List ACS Attitude Control System ACE Attitude Control Electronics AEU Analog Electronics Unit (Part of Instrument Electronics) CSS Coarse Sun Sensors DEU Digital Electronics Unit (Part of Instrument Electronics) FBC Faster, Better, Cheaper FMEA Failure Modes and Effects Analysis FTA Fault Tree Analysis LVPC Low Voltage Power Converter MAC MIDEX Attitude and C&DH MAP Microwave Anisotropy Probe PDU Power Distribution Unit (Part of Instrument Electronics) PRA Probabilistic Risk Assessment PSE Power System Electronics RBD Reliability Block Diagram XRSN Transponder Remote Services Node