DOT&E Reliability Course Overview
DOT&E Reliability Course Overview
Laura Freeman
Matthew Avery
Jonathan Bell
Rebecca Dickinson
10 January 2018
1/10/2018-1
Course Objective and Overview
Objective
• Provide information to assist DOT&E action officers in their review and assessment of
system reliability.
Overview and Agenda
• Course briefings cover reliability planning and analysis activities that span the acquisition
life cycle. Each briefing discusses review criteria relevant to DOT&E action officers
based on DoD policies and lessons learned from previous oversight efforts
1/10/2018-2
Motivation for Improving System Reliability
Ground
4% 28% 68%
Combat
Rotary
Congress Annually
75% Wing
4% 31% 65%
Surface
1% 39% 60%
Ships
Fighter
50% 5% 29% 66%
Aircraft
a. RDT&E – Research Development
Test & Evaluation
b. O&S – Operations and sustainment
c. Data from AEC/AMSAA Reliability
25% At Least Partially Course Notes,” 21 Aug 2011.
Suitable
Majority of cost here
At Least Partially
0% Reliable
1/10/2018-3 a. CI – Confidence Interval b. FY – Fiscal Year c. OT&E – Operational Test and Evaluation
Motivation for Improving System Reliability
10
9
9
Availability
8
7
7
Reliability
6
5
Programs in 2017
4
4
Interoperability
3
2
2
1
Usability
0
Yes Mixed No Insufficient Data
Suitability Outcome
1/10/2018-4
Design for Reliability (DfR)
Reliability must be designed into the product from A common problem failure: to reach desired initial
the beginning. system reliability indicating failure in the design phase
to engineer reliability into the system.
(L/R ≥ 30)
(Assessment Years 2013-2015)
80%
Can reasonably
70% demonstrate reliability
35%
Requirements
Reliability Growth
Guidance
• Relatively unchanged
from TEMP Guidebook
2.1
Reliability Test Planning
Guidance
• New section of the
TEMP Guidebook
• Emphases the use of
operating characteristic
curves for planning
operational tests
• Provides guidance on
using data collected
outside of an operational
test for reliability
assessments
1/10/2018-7
Topics Covered
Acronyms:
IDA Reliability Course Topics
BLRIP – Beyond Low Rate Initial Production
RAM Requirements Review CDD – Capabilities Development Document
CDR – Critical Design Review
Reliability Growth Planning CPD – Capabilities Production Document
EMD – Engineering & Manufacturing Development
TEMP Review and OT Planning FOC – Full Operational Capability
IOC – Initial Operational Capability
Importance of Design Reviews in Reliability Growth Planning LRIP – Low Rate Initial Production
RAM – Reliability, Availability, Maintainability
TEMP Review and OT Planning SRR – Systems Requirement Review
PDR – Preliminary Design Review
Assessment of Reliability in DT
1/10/2018-8
Topics Covered (cont.)
1/10/2018-9
Blank
1/10/2018-10
Institute for Defense Analyses
4850 Mark Center Drive • Alexandria, Virginia 22311-1882
1/10/2018-11
Reliability: the ability of an item to perform a required function,
under given environmental and operating conditions and for a
stated period of time
(ISO 8402, International Standard: Quality Vocabulary, 1986)
1/10/2018-12
Failures come in different levels of severity, which should be
clearly defined by the Failure Definition Scoring Criteria
Operational Mission Failure (OMF) or System Abort (SA): failures that result
in an abort or termination of a mission in progress
– Reliability requirements are typically written in terms of OMFs or SAs.
1/10/2018-13
Traditional reliability analysis assumes that failures
rates are constant over time, although this is often
not the case
1/10/2018-14
Timeline
System Acquisition Framework
A B C IOC FOC
Material Solution Technology Eng. & Manufacturing Production & Operations & Support
Analysis Development Development Deployment
CDD CPD
SRR PDR CDR
Material FRP
Development Pre-EMD Post-CDR Decision
Decision Review Assessment Review
Pre-Systems Acquisition Systems Acquisition Sustainment
Acronyms:
IDA Reliability Course Topics
BLRIP – Beyond Low Rate Initial Production
RAM Requirements Review CDD – Capabilities Development Document
CDR – Critical Design Review
Reliability Growth Planning CPD – Capabilities Production Document
EMD – Engineering & Manufacturing Development
TEMP Review and OT Planning FOC – Full Operational Capability
IOC – Initial Operational Capability
Importance of Design Reviews in Reliability Growth Planning LRIP – Low Rate Initial Production
RAM – Reliability, Availability, Maintainability
TEMP Review and OT Planning SRR – Systems Requirement Review
PDR – Preliminary Design Review
Assessment of Reliability in DT
1/10/2018-15
Topics Covered
1/10/2018-16
Requirements are often established early in a program’s life,
so AO involvement should start early, too
The first step in acquiring reliable systems is ensuring that they have
achievable, testable, and operationally meaningful reliability requirements
1/10/2018-17
The way you think about reliability for a system will depend on
the type of system you’re working with
Single-use systems
– System is destroyed upon use
– Missiles, rockets, MALD, etc.
– Reliability is a simple probability (e.g., “Failure Rate < 10%”)
Repairable Systems
– If the system breaks, it will be repaired and usage resumed
– Tanks, vehicles, ships, aircraft, etc.
– Reliability is typically time between events, i.e., failures, critical failures,
aborts, etc.
» A howitzer must have a 75 percent probability of completing an 18-hour
mission without failure.
» A howitzer mean time between failures must exceed 62.5 hours.
One-off systems
– Only a single (or very few) systems will be produced
– Satellites, aircraft carriers, etc.
– Like a repairable system, though often very few chances to improve
reliability once system has been produced
– Often no assembly line leading to different reliability concerns
1/10/2018-18
Reliability requirements may be translated from binary mission
success criteria to continuous time-between-failure metrics,
often making them easier to assess
Radar Program X’s Capabilities Development Document (CDD):
After review, CDD determined that a clarification of the Mean Time Between Operational Mission Failure (MTBOMF)
Key system Attribute (KSA) is appropriate and is rewritten as follows: “Radar Program X shall have a MTBOMF that
supports a 90% probability of successful completion of a 24 Hour operational period (Threshold), 90% probability of
successful completion of a 72 Hour operational period (Objective) to achieve the Operational Availability (Ao) of 90%”
24 0.9
1/10/2018-19
Care should be taken when translating binary requirements to
continuous failure times
Assumptions in translation
– Mean is an appropriate metric to describe the failure distribution
– The failures are exponentially distributed and therefore the failure rate is
constant
– No degradation (“wear-out”) over time
1/10/2018-20
When systems have both availability and reliability
requirements, it is important to verify that they are consistent
→ .8 → 4
1
∑
∑ ∑
is commonly computed:
Confidence interval methods for are equally valid for operational dependability :
1/10/2018-22
Medians and percentiles are typically more relevant than
means when considering the operational context
“The UAS equipment and hardware components shall have a Mean Time to Repair
(MTTR) for hardware of 1 hour.”
Median values and high percentile requirement can be more meaningful for
systems with highly skewed repair times
– E.g., 90% of failures should be corrected within 5 hours
– Or, the median repair for hardware should be 1 hour
1/10/2018-23
The operational context or rational for suitability requirements
(and requirements in general) should be clearly stated in the
requirements document or the TEMP
1/10/2018-24
DOT&E’s decision for whether a system is Reliabile is not
dictated by the system’s requirements
Identify the rationale for the reliability requirements and evaluate system
reliability based on this rationale
1/10/2018-25
When requirements are not achievable, understanding the
rationale behind them is crucial for evaluating the system
What is on contract?
– Typically, you will get what you pay for (or less!)
– Identifying what is on contract will help you assess systems risk for achieving
reliability requirement
1/10/2018-27
“DOT&E requires independent scoring of reliability failures –
FDSC should provide guidance only.”
-05 October 2012 DOT&E Guidance Memo
Failure Definitions
– Defines mission essential functions – minimum operational tasks the system
must perform to accomplish assigned mission
Scoring Criteria
– Provides consistent classification criteria applicable across all phases of test
– Determines severity of the failure with minimal room for interpretation
– Specifies chargeability of the failure
» Hardware, software
» Operator error
» Government furnished equipment (GFE)
Conditional Scoring
– The severity or chargeability of a failure should not depend on what was
going on when the failure occurred
1/10/2018-28
Avoid situational scoring
Situational Scoring
– The severity or chargeability of a failure should not depend on what
was going on when the failure occurred
– Models used to estimate reliability assume that failures are agnostic
to the particular conditions on the ground
1/10/2018-29
Action Officers should encourage the use of lower level
reliability requirements for systems with extremely high
mission level requirements and/or with built-in redundancy
1/10/2018-30
Example Program:
UAS Reliability Requirements
System of systems
– Modern systems are often complex and involve multiple
subsystems
– UAS includes 5 Air Vehicle, STUAS Recovery System,
Launcher, and four Operator Work Stations
– Government-Furnished Equipment (GFE) & Commercial Off-
The-Shelf (COTS)
Notional System Configuration
Question: “Once the air vehicle is off station and RTB, do critical failures (e.g.,
AV crashes) count against MFHBA?”
Answer: YES!!!
1/10/2018-34
Institute for Defense Analyses
4850 Mark Center Drive • Alexandria, Virginia 22311-1882
Jonathan L. Bell
10 January 2018
This briefing provides an overview of the importance
and process of reliability growth planning
Acronyms:
IDA Reliability Course Topics
BLRIP – Beyond Low Rate Initial Production
RAM Requirements Review CDD – Capabilities Development Document
CDR – Critical Design Review
Reliability Growth Planning CPD – Capabilities Production Document
EMD – Engineering & Manufacturing Development
TEMP Review and OT Planning FOC – Full Operational Capability
IOC – Initial Operational Capability
Importance of Design Reviews in Reliability Growth Planning LRIP – Low Rate Initial Production
RAM – Reliability, Availability, Maintainability
TEMP Review and OT Planning SRR – Systems Requirement Review
PDR – Preliminary Design Review
Assessment of Reliability in DT
MS A MS B MS C FRP
TD EMD P&D
Growth Planning
Tracking/Projection
*
a. CI – Confidence Interval b. FY – Fiscal Year c. OT&E – Operational Test and Evaluation
1/10/2018-38
Motivation: Reliable systems work better and cost less
1/10/2018-39
The reliability growth contractual goal often depends on the length of the IOT&E
DOT&E TEMP Guidebook 3.0 gives guidance on
reliability growth planning
*PM2 and Crow Extended models encourage more realistic inputs that
1/10/2018-40 are based on the systems engineering and design process.
A well-run reliability growth program requires a
dedicated systems engineering effort
Reliability Growth
Realistic Reliability Model is the “tip of
Growth (RG) Curve the iceberg”
• Based on funding • Realistic assumptions
• Component Design • Operational
for Reliability Adequate requirements c Testing
• Built-In-Test Dedicated Test Events for Reliability • Accelerated Life
Demonstration Testing
• Logistics Demo
• System-level values • Integration Testing
achieved before fielding
• Contract Spec
• Failure Mode
c
• Interim thresholds Adequate Requirements Reliability Analyses
• Entrance/Exit criteria Effects and
• Appropriate DT metric Criticality Analysis
• Level of Repair
• Reliability
• Funding and time allotted
Predictions
with commitment from Corrective Actions
the management
• Independent DT/OT
Data collection, reporting,
• Failure Definition Scoring Criteria data collection
and tracking
• Failure Reporting and Corrective Action System • Scoring/assessment
• Failure Review Board conferences
• Field Data • Root cause analysis
• Reliability,
1/10/2018-41 Maintainability, Availability Working Group
Reliability growth planning involves several steps
Understand Understand System Understand Contractor Reliability Determine Final
Policies and Requirements and Engineering Practices Reliability Target
DoD 5000.02 Requirements • FMEA • DfR Reliability
DTM 11-003 • HALT • FRB Requirement
OMS/MP
• Reliability
DOT&E Scoring Consumer Producer
Prediction
Service Criteria Risk Risk
• Design Reviews
Policies Contract
DfR – Design for Reliability
Specs FMEA - Failure Mode Effects Analysis IOT
FRB – Failure Review Board DT/OT
Resource
HALT – High Accelerated Life Testing Derating
Needs
Determine RG Parameters 1
0.9
Reliability Growth Potential
Probability of Acceptance
Identify Resource Needs
Failure (MTBF)
0.6
Phase 1
MTBF
MTBF
0.5
Phase 2 1,015-mile test, 1 failures permitted
• Fix
1,451-mile test, 2 failures permitted
Phase 3
0.4 1,870-mile test, 3 failures permitted
Effectiveness
2,680-mile test, 5 failures permitted
0.2 4,628-mile test, 10 failures permitted
0 1 2 3 4 5
Initial
12,056-mile test, 30 failures permitted
0.1
• Management
Probability of Acceptance Level
Reliability Strategy
0 200 400 600 800 1000 1200 1400
Surfaced B-modes
• Number/length of test phases
Rate of
New B- • Management Strategy
modes • Fix Effectiveness Factors
Ratio of DT Goal
and Growth Potential a. Figure adapted from ATEC Presentation on RG Planning, Joint Service
1/10/2018-42 RAM WG Meeting, SURVICE Engineering, Aberdeen, MD, 10-13 Jan 2011
Software intensive systems follow a similar process as
hybrid and hardware-only systems
Requires robust systems engineering support, dedicated testing, adequate funding and time,
reasonable requirements, scoring criteria, data collection and reporting, meetings to assess and
score data, etc.
Can be described using Non-Homogeneous Poisson Process (NHPP) models in the relation to
time (e.g., the AMSAA PM2 and Crow Extended Models) due to their simplicity, convenience,
and tractability.
The basis for scoring criteria and prioritization can be found in IEEE Standard 12207 for
Systems and Software Engineering — Software Life Cycle Processes:
Priority Applies if the Problem Could
1 Prevents the accomplishment of an essential capability, or jeopardizes safety, security, or requirement designated as critical
2 Adversely affects the accomplishment of an essential capability and no workaround solution is known, or adversely affects
technical, cost, or schedule risks to the project or to life cycle support of the system, and no work-around solution is known
3 Adversely affects the accomplishment of an essential capability but a work-around solution is known, or adversely affects
technical, cost, or schedule risks to the project or to life cycle support of the system, but a work-around solution is known
4 Results in user/operator inconvenience or annoyance but does not affect a required operational or mission essential
capability, or results in inconvenience or annoyance for development or maintenance personnel, but does not prevent the
accomplishment of those responsibilities
5 All other effects
1/10/2018-43
Notional examples of reliability tracking curves for
software intensive systems are shown below
Priority 2 SIRs
Illustrates allowable test risks (consumer’s and producer’s risks) for assessing the progress
against the reliability requirement
Reliability requirement 1152
Confidence (1-consumer risk) 0.8
User inputs
Probability of Acceptance (producer risk) 0.8
Ratio of DT reliability goal to requirement 1.75
1
0.9
0.8
Probability of Acceptance
0.7
3,449-mile test, 1 failure permitted
0.6
4,929-mile test, 2 failures permitted
0.5 6,353-mile test, 3 failures permitted
0.4 7,743-mile test, 4 failures permitted
9,108-mile test, 5 failures permitted
0.3 15,726-mile test, 10 failures permitted
0.2 40,968-mile test, 30 failures permitted
0.1 Probability of Acceptance Level
MTBF Requirement
0
0 1000 2000 3000 4000 5000 6000
True Mean Time Between Failures (MTBF) (miles)
1/10/2018-46 PM2 – Planning Model based on Projection Methodology
In Class Exercise Using PM2 Model
Planning Information
Input
Goal Mean Time Between 334
Failure (MTBF)
Growth Potential Design 1.39
Margin
Idealized Projection
DT Phase 1 Average Fix Effectiveness 0.70
Limited User Test (LUT) Management Strategy 0.95
DT Phase 2 Discovery Beta 0.57
DT Phase 3 Results
DT Phase 4 Initial Time [t(0)] 84
Termination Line Initial MTBF 155
Goal Value Final MTBF 336
Time at Goal 3,677
Test Time (hours) Note: Crow Extended does not use OC curves to
1/10/2018-48 determine the reliability growth goal.
Takeaway Points
Given the poor performance of producing reliable systems in the DoD, development of a
comprehensive reliability growth plan is important and is required by policy
Reliability planning is more than producing a growth curve; it requires adequate funding,
schedule time, contractual and systems engineering support, reasonable requirements,
scoring criteria, data collection and assessment, etc.
Reliability growth planning models, such as PM2 and Crow-Extended, provide useful ways to
quantify how efforts by the management can lead to improved reliability growth over time
Reliability growth planning for software intensive systems generally follows a similar process
as planning for hybrid and hardware-only systems, although use of a tracking curve can also
support quantification of growth planning efforts
Programs fail to reach their reliability goals for a variety of reasons; development of a robust
growth plan early on can help avoid some of the common pitfalls
1/10/2018-49
Backup Slides
1/10/2018-50
Common Reasons Why Programs Fail to Reach
Reliability Goals and What We Can Do About It
1. Failure to start on the reliability growth curve due to poor initial reliability of design
2. Failure to achieve sufficient reliability growth during developmental testing (DT)
3. Failure to demonstrate required reliability in operational testing (OT)
Failure to start on the reliability growth curve due to poor initial reliability of design
Common Causes Recommended DoD Mitigations
Poor integration or lack of a “design Review contractor’s reliability engineering processes; Establish contractual
for reliability” effort requirements that encourage system engineering “best practices”
Review prediction methodology; Require/encourage more realistic prediction
Unrealistic initial reliability predictions
methods such as physics of failure method using validated models and/or test
based on MIL-HDBK-217
data; Have experts review contractor software architecture and specifications
Early contractor testing is carried out Understand how the contractor conducted early testing; Encourage contractor to
in a non-operational environment test system in an operationally realistic environment as early as possible
Unrealistic reliability goals relative to
comparable systems or poorly stated Compare reliability goals to similar systems; Push for more realistic requirements
requirements
Overestimating the reliability of Communicate the operational environment to the contractor, and the contractor,
COTS/GOTS in a military in turn, has to communicate that information to any subcontractors; If available,
environments consider field data and prior integration experience to estimate reliability
Lack of understanding of the Review system design/scoring criteria early and ensure all parties understand
definition of “system failure” and agree with it; Communicate scoring criteria in Request For Proposal
Reliability requirement is very high Consider using “lower-level” reliability measures (e.g., use MTBEFF, instead of
and would require impractically long MTBSA); Investigate if the specified level of reliability is really required for the
tests to determine the initial reliability mission; Emphasize the importance of having a significant design for reliability
with statistical confidence efforts
1/10/2018-51 MTBEFF – Mean Time Between Essential Function Failures MTBSA – Mean Time Between System Aborts
Common Reasons Why Programs Fail to Reach Reliability
Goals and What We Can Do About It (cont.)
Failure to achieve sufficient reliability growth during developmental testing (DT)
Common Causes Recommended Mitigation
Development of the reliability growth planning Verify reliability program is included in contracting documents and that
curve was a “paper exercise” that was never there is sufficient funding to support testing and system engineering
fully supported by funding, contractual support, activities; Ensure program has processes in place to collect and assess
and systems engineering activities reliability data; Investigate realism of reliability growth model inputs
Insufficient testing or time to analyze failure Evaluate how many B-mode failures are expected to surface over the
modes and devise/implement corrective test period; Ensure there are sufficient test assets and push for
actions additional assets when the testing timeline is short; Evaluate if there will
Urgent fielding of systems that are not ready be sufficient time to understand the cause of failures and develop,
for deployment implement, and verify corrective actions
Inadequate tracking of software reliability or Ensure contract includes provisions to support software tracking and
testing of patches analysis; TEMP should define how software will be tracked/prioritized
Analyze data to see if the failure mode distributions varied with
System usage conditions or environment
changing conditions, Consider whether to reallocate resources and
changed during testing
conduct additional testing in more challenging conditions
Initial design or manufacturing processes Discuss whether it is necessary to rebaseline the reliability growth
underwent major changes during testing planning curve based on the new design
Investigate cause of wear-out; Consider recommending redesign for
System/subsystem components reaches wear-
subsystems showing early wear-out or taking steps to mitigate
out state during testing
overstresses to these components, if applicable
Reliability requirement is very high and would Consider using “lower-level” reliability measures (e.g., use MTBEFF,
require impractically long tests to surface instead of MTBSA); Investigate if the specified level of reliability is really
failure modes and grow reliability required for the mission; Emphasize the importance of having a
significant design for reliability efforts
1/10/2018-52 MTBEFF – Mean Time Between Essential Function Failures MTBSA – Mean Time Between System Aborts
Common Reasons Why Programs Fail to Reach Reliability
Goals and What We Can Do About It (cont.)
1/10/2018-53
PM2 Continuous RG Curve Risk Assessment
1/10/2018-55
DOT&E TEMP Guide 3.0
1/10/2018-56
DOT&E TEMP Guide 3.0 (cont.)
• The reliability growth program described in the TEMP should contain the
following
− Initial estimates of system reliability and a description of how this estimates were
arrived at
− Reliability growth planning curves (RGPC) illustrating the reliability growth strategy,
and including justification for assumed model parameters (e.g. fix effectiveness
factors, management strategy)
− Estimates with justification for the amount of testing required to surface failure
modes and grow reliability
− Sources of sufficient funding and planned periods of time to implement corrective
actions and test events to confirm effectiveness of those actions
− Methods for tracking failure data (by failure mode) on a reliability growth tracking
curve (RGTC) throughout the test program to support analysis of trends and
− changes to reliability metrics
− Confirmation that the Failure Definition Scoring Criteria (FDSC) on which the RGPC
is based is the same FDSC that will be used to generate the RGTC
− Entrance and exit criteria for each phase of testing Operating characteristic (OC)
curves that illustrate allowable test risks (consumer’s and producer’s risks) for
assessing the progress against the reliability requirement. The risks should be
related to the reliability growth goal.
1/10/2018-57
Blank
1/10/2018-58
Institute for Defense Analyses
4850 Mark Center Drive • Alexandria, Virginia 22311-1882
Jonathan L. Bell
10 January 2018
This briefing highlights the importance of design
reviews in the reliability growth planning process
Discusses questions to consider during design review activities, and provides programmatic
examples of this process
A B C IOC FOC
Material Solution Technology Eng. & Manufacturing Production & Operations & Support
Analysis Development Development Deployment
CDD SRR PDR CDR CPD
Material FRP
Development Pre-EMD Post-CDR Decision
Decision Review Assessment Review
Pre-Systems Acquisition Systems Acquisition Sustainment
Acronyms:
IDA Reliability Course Topics
BLRIP – Beyond Low Rate Initial Production
RAM Requirements Review CDD – Capabilities Development Document
CDR – Critical Design Review
Reliability Growth Planning CPD – Capabilities Production Document
EMD – Engineering & Manufacturing Development
TEMP Review and OT Planning FOC – Full Operational Capability
IOC – Initial Operational Capability
Importance of Design Reviews in Reliability Growth Planning LRIP – Low Rate Initial Production
RAM – Reliability, Availability, Maintainability
TEMP Review and OT Planning SRR – Systems Requirement Review
PDR – Preliminary Design Review
Assessment of Reliability in DT
A B C IOC FOC
Material Solution Technology Maturation Eng. & Manufacturing Production & Operations & Support
Analysis and Risk Reduction Development Deployment
CDD SRR PDR CDR CPD
Material FRP
Development Pre-EMD Post-CDR Decision
Decision Review Assessment Review
Pre‐Systems Acquisition Systems Acquisition Sustainment
Design Reviews
Per DOD 5000.02, “any program that is not initiated at Milestone C will include the
following design reviews”:
Per DOD 5000.02, the Program Manager will formulate a comprehensive Reliability and
Maintainability program to ensure reliability and maintainability requirements are achieved; the
program will consist of engineering activities including for example”:
In addition to design reviews, contract deliverables, developed early in a program, might also
provide documentation on the system design and the extent that the contractor had included
reliability in the systems engineering process
1/10/2018-62
Several questions should be addressed during
design reviews
AH-64E
Apache
OH-58F
Kiowa
Warrior
Joint Light
Tactical
Vehicle
F-15 Radar
Modernization
Program
1/10/2018-65
Ensure estimates of growth and management strategy
are realistic – they should accurately quantify what the
program intends to fix (particularly for system upgrades)
1/10/2018-66
Ensure estimates of growth and management
strategy are realistic – they should accurately
quantify what the program intends to fix
MTBSA (hours)
Between Software Anomalies (MTBSA) 25
50
requirement 20
40
− RMP software code maturity PM2 Inputs
15
30 Mg = 37 hours MTBSA
Mi = 5.0 hours MTBSA
10
20
PM2 Fit Parameters
DOT&E and IDA assessed the 10
5
Physically
impossible
MS = 1.02
FEF = 1.02
programs stability growth curve as 0
0c
overly aggressive 0 100 200 300 400 500 600 700
Cumulative test time (flight hours)
Acronyms:
FEF – Fix Effectiveness Factor Mg – Reliability Growth Goal
MS – Management Strategy PM2 – Planning Model based
1/10/2018-67 Mi – Initial Reliability on Projection Methodology
Ensure estimates of growth and management strategy
are realistic – they should accurately quantify what the
program intends to fix
100
100
Contractor
Notional Planning
Growth Curve
Planning Curve
DuaneModel
Duane Model Fit
MTBSA (hours)
110
10 100 1000
Military Standard 189C:
Cumulative test time (flight hours)
Historical mean/median for is 0.34/0.32
Historical range for is 0.23 - 0.53
An of 0.70 is unrealistically aggressive,
particularly for a program that is
incorporating mostly mature technology
1/10/2018-69
Make sure the reliability growth curves are
based on realistic assumptions
Initial Failure
Intensity
30%
Test Time 20%
10%
0%
1 2 3 4 5 6 7 8 9 10
Failure Mode
1/10/2018-70
Consider more inclusive reliability metrics
Mission aborts occur less frequently than Essential Function Failures (EFFs) or
Essential Maintenance Actions (EMAs)
Growth strategies based on EMAs produce a more credible and less resource-
intensive reliability growth strategy by:
− Incorporating a larger share of the failure modes
− Addressing problems before they turn into mission aborts
− Improving the ability to assess and track reliability growth
− Increasing the statistical power and confidence to evaluate reliability in
testing
− Enabling more reasonable reliability growth goals
− Reducing subjectivity that can creep into the reliability scoring process
AH-64E decided to focus growth strategy on Mean Time Between EMAs as well
as Mean time between Mission Aborts
1/10/2018-71
Takeaway Points
Discuss requirements: KPPs are not always the best for reliability growth
planning curves
− Fight inadequate requirements (e.g., F-15 Radar Modernization Program (RMP) Full
Operational Capability reliability requirement)
− In the absence of adequate requirements, compare to legacy performance in testing
(e.g., OH-58F Kiowa Warrior)
− Push for reliability growth planning curves based on EMAs/EFFs
1/10/2018-72
Institute for Defense Analyses
4850 Mark Center Drive • Alexandria, Virginia 22311-1882
Rebecca Dickinson
10 January 2018
1/10/2018-73
This briefing provides an overview of the importance
and process of TEMP Review and OT Planning (for Reliability)
System Acquisition Framework
A B C IOC FOC
Material Solution Technology Eng. & Manufacturing Production & Operations & Support
Analysis Development Development Deployment
CDD CPD
SRR PDR CDR
Material FRP
Development Pre-EMD Post-CDR Decision
Decision Review Assessment Review
Pre-Systems Acquisition Systems Acquisition Sustainment
Acronyms:
IDA Reliability Course Topics
BLRIP – Beyond Low Rate Initial Production
RAM Requirements Review CDD – Capabilities Development Document
CDR – Critical Design Review
Reliability Growth Planning CPD – Capabilities Production Document
EMD – Engineering & Manufacturing Development
TEMP Review and OT Planning FOC – Full Operational Capability
IOC – Initial Operational Capability
Importance of Design Reviews in Reliability Growth Planning LRIP – Low Rate Initial Production
RAM – Reliability, Availability, Maintainability
TEMP Review and OT Planning SRR – Systems Requirement Review
PDR – Preliminary Design Review
Assessment of Reliability in DT
1/10/2018-74
Topics covered in briefing will help address
the following questions:
Reliability is the chief enabler of operational suitability, and failure to achieve reliability
requirements typically results in a system being assessed "not suitable"; consequently,
its independent evaluation is pivotal to OT&E.
Independent Operational test and Evaluation (OT&E) Suitability Assessments – October 05 2012
DOT&E Memo
1/10/2018-75
TEMP Reliability Policy is Milestone dependent
• The TEMP must include a plan (typically via a working link to the
Systems Engineering Plan) to allocate reliability requirements
down to components and sub-components.
1/10/2018-76
Guidance on documenting and incorporating a
program’s reliability strategy in the TEMP**
– Dedicated test events for reliability such as accelerated life testing, and
maintainability and built-in test demonstrations
1/10/2018-78
The TEMP should contain the following information
with respect to the reliability growth program:
Reliability growth curves are excellent planning tools, but programs will not achieve
their reliability goals if they treat reliability growth as a “paper policy.” Good reliability
planning must be backed up by sound implementation and enforcement.
(DOT&E FY 2014 Annual Report)
1/10/2018-79
Systems not meeting entrance and exit criteria should revise
the reliability growth strategy to reflect current system reliability
A few important questions for evaluating Entrance and Exit criteria in the TEMP:
IOT&E Goal
Will we = 86 MTBF
even
start on
the Requirement
curve? = 69 MTBF
1/10/2018-80
The TEMP should describe how reliability will be
tracked across the developmental life cycle.
1/10/2018-81
The Reliability Tracking Process looks something like this:
Cumulative EFFs
Cumulative EFFs
Reliability (MTBF)
5
Estimated Reliability
4
3
2
1
100 200 300 400 500
Cumulative Operating Time
Cumulative Operating Time
Cumulative Operating Time
Cumulative Operating Time
• Good example of one method for updating the reliability growth curve for a Milestone C TEMP
– Reliability for each test event is clearly documented
– Could be improved by including confidence intervals (an indication of uncertainty associated with
estimate)
– Reliability point estimates are consistent with the curve
1/10/2018-83
Updated TEMPs at Milestone C must include updated RGCs
• In many cases we have seen curves that do not reflect existing test data
– Test results are not consistent with reliability growth curve!
1/10/2018-84
Example of a Good TEMP
• Going above and beyond by planning a Pre-IOT&E Reliability Qualification Test (RQT)
– Program Office wants to evaluate system reliability prior to IOT&E.
» Expected 69 hour MTBOMF will be demonstrated with 80% confidence and have a 70% probability of
acceptance during RQT
Reliability Growth
Curve Assumptions
1/10/2018-85
Guidance on Reliability Test Planning**
1/10/2018-87
The reliability requirements of a system
impacts test duration
– Pass/Fail
» Probability of a fuse igniting without failure in a weapon system > 90%
– Time/Duration based
» A howitzer must have a 75 percent probability of completing an 18-hour
mission without failure.
1/10/2018-88
Will enough data be collected to adequately
assess system reliability?
– No default criteria is given for the level of statistical confidence and power (it
depends!).
Operating Characteristic (OC) curves are useful for determining the statistical
confidence and power that a test is sized for.
Consumer Risk: the probability that a bad (poor reliability) system will be accepted
Producer Risk: the probability that a good (poor reliability) system will be rejected
While the statistical properties of a test do not determine its adequacy, they provide an
objective measure of how much we are learning about reliability based on operational
testing.
1/10/2018-89
Operating Characteristic Curves 101
– Power manages Producer Risk, the higher the power the less likely a reliable
system fails the test.
In general, the longer the test, the higher the power for a given confidence level
1/10/2018-90
Example: Building an OC Curve
Microsoft Excel
Worksheet
1/10/2018-93
A “Rule of Thumb” should not be the strategy
employed to develop or assess a reliability test plan.
For example, Testing to 3x the Requirement may not be a “good rule of thumb” to follow
0.9
requirement with 80% Confidence
0.8
to 2x the requirement, a test lasting 3x
0.7
Producer risk 45% the requirement will achieve an 80%
lower confidence bound greater than the
0.6 requirement 55% of the time (45%
producer risk ).
0.5
Test Length, Failures Allowed
0.4
1.6xRequirement, 0 Failures Allowed
Consumer Risk 3xRequirement, 1 Failure Allowed
0.3 20% fixed at
5.5xRequirement, 3 Failures Allowed
Requirement
0.2 10xRequirement, 7 Failures Allowed
20xRequirement, 16 Failures Allowed
0.1 50xRequirement, 43 Failures Allowed
Reliability Growth Goal
0
0 0.5 1 1.5 2 2.5 3
True MTBF/Requirement
1/10/2018-95
The TEMP Guidebook 3.0 provides guidance on
the use of DT data for OT evaluations
The conditions the data must be collected under to be acceptable for OT use.
– Developmental testing does not have to be conducted according to the Operational Mode
Summary/Mission Profile (OMS/MP) or Design Reference Mission (DRM), but there must be
a clear consideration of operational conditions in the developmental testing.
Clearly describe the statistical models and methodologies for combining information.
– Data should not simply be pooled together and an average reliability calculated. The
analysis should account for the conditions the reliability data were collected under to the
extent possible.
The methodology for determining adequate operational test duration must be specified.
– Bayesian assurance testing can be used in place of traditional operating characteristic
curves to determine adequate operational testing when prior information will be
incorporated.
CAUTION: Data from different test events should not be combined into one pool
of data and used to calculate and average reliability, rather advanced analysis
methodologies should be used to combine information from multiple tests.
1/10/2018-96
The OC Curve is not the only method for
assessing test adequacy.
Objective
– Scope an appropriately sized Operational Test (OT) using the
demonstrated reliability and growth of the system under test
1/10/2018-97
A Bayesian assurance testing approach to test planning
may be used to reduce test duration and control
both risk criteria
1/10/2018-98 Bayesian assurance test miles in table are hypothetical – only to illustrate a proof of concept
Takeaway Points
Test Planning
– The duration of test depends on the reliability requirement.
– OC Curves can be employed to visualize the risk trade space for a
given test length.
– If additional information will be used in the reliability assessment then
the TEMP needs to clearly outline the source, fidelity, and
methodology for combining the information.
1/10/2018-99
Blank
1/10/2018-100
Institute for Defense Analyses
4850 Mark Center Drive • Alexandria, Virginia 22311-1882
Matthew Avery
Rebecca Dickinson
10 January 2018
1/10/2018-101
Timeline
System Acquisition Framework
A B C IOC FOC
Material Solution Technology Eng. & Manufacturing Production & Operations & Support
Analysis Development Development Deployment
CDD CPD
SRR PDR CDR
Material FRP
Development Pre-EMD Post-CDR Decision
Decision Review Assessment Review
Pre-Systems Acquisition Systems Acquisition Sustainment
Acronyms:
IDA Reliability Course Topics
BLRIP – Beyond Low Rate Initial Production
RAM Requirements Review CDD – Capabilities Development Document
CDR – Critical Design Review
Reliability Growth Planning CPD – Capabilities Production Document
EMD – Engineering & Manufacturing Development
TEMP Review and OT Planning FOC – Full Operational Capability
IOC – Initial Operational Capability
Importance of Design Reviews in Reliability Growth Planning LRIP – Low Rate Initial Production
RAM – Reliability, Availability, Maintainability
TEMP Review and OT Planning SRR – Systems Requirement Review
PDR – Preliminary Design Review
Assessment of Reliability in DT
1/10/2018-102
Outline
• Reporting on Reliability
– Point & interval estimation
– Comparisons with legacy systems
– Comparisons against requirements
• Reliability Models
– Exponential Distribution
– Other models (Weibull, LogNormal, … )
– Nonparametric methods (Bootstrap)
• Scoring Reliability
• Qualitative Assessment
– Identifying drivers of reliability
• Summary
1/10/2018-103
When reporting on system reliability, focus on whether the
system is sufficiently reliable to successfully conduct its
mission
1/10/2018-104
Failure rates are the standard way to report reliability, but its
important to keep in my the assumptions that underlie MTBF
Average of all times between failure = Mean Time Between Failures (MTBF)
– Easy to calculate
– Requirements often given in terms of MTBF
– Implies assumption of constant failure rates
Different assumptions
require different analyses
1/10/2018-105
Reporting point estimates alone can give readers a false
impression of certainty about the reported failure rates
Confidence Intervals:
– Provides range of plausible values
– Shows how sure we are about system reliability
– Helps us evaluate risk that system meets requirement
Increment 1:
– 723 hours
– 5 failures observed Confidence Intervals for
Exponential Failure Times
– 144.6 MFHBSA
– 80% CI: (77.9,297.2) 2 2
1 ,2 1 ,2
2 2
Increment 2:
– 7052 hours T: Total Test Time
– 49 failures observed : Critical Value of a Chi-Squared distribution\
– 143.9 MFHBSA : Observed number of failures
: 1-confidence level (for 80%, 0.2)
– 80% CI: (119.0,175.1)
1/10/2018-108
Report whether the point estimate was better or worse than the
requirement and whether or not the difference was statistically
significant
*When evaluating reliability prior to the IOT, demonstrated reliability should also be compared to the
reliability growth curve to determine of programs are on track to eventually meet their requirement.
1/10/2018-109
Provide interpretation of demonstrated system reliability in the
context of the mission
Reliability Requirement:
1/10/2018-111
The Exponential distribution is easy to use but requires
dubious assumptions
∑
∑
Mean: MTBF
.04
Mean = 25
1/10/2018-112
Despite its flaws, the Exponential distribution is convenient to
use for operational testing
1/10/2018-113
Its can be difficult to determine based on OT data whether or
not the Exponential distribution is a reasonable model
1/10/2018-114
The Weibull and Lognormal are common alternatives to the
Exponential for modeling reliability
Weibull Lognormal
Distribution Distribution
exp exp )
Mean
Exponential
Distribution
1/10/2018-115
Weibull and Lognormal models allow for greater flexibility and
are more consistent with reliability theory
1/10/2018-116
To ensure that the correct model is being used, its important to
have the actual failure times for each system rather than just
the total hours and total number of failures
1/10/2018-117
Using the data, we can compare models and ensure we choose
the one that best fits our data
Exponential Lognormal
1/10/2018-119
Nonparametric methods that make no assumptions about the
distribution of failure times can be used if sufficient data are
available
1/10/2018-120
DOT&E’s evaluation of reliability is not constrained by the
scoring decisions of Operational Test Agencies (OTA) or
Program Managers (PM)
1/10/2018-121
DOT&E will make independent decisions regarding what
constitutes score-able test time
Passive/overwatch time
– OMS/MP may specify that electronic systems will operate for a
certain percentage of the time
» Anti-Tank Vehicle (ATV) turret is only credited with 37.5 hours of
operating time over a 48-hour mission in the OMS/MP
1/10/2018-123
An analytical basis, such as a statistical test, should be used
to justify combining data from different test phases or events
:
:
• CAUTION:
– Best used when dealing with operational test data only
– No way to get partial credit
– Will only detect large deviations when the individual test durations are
small
– The test cannot prove that you can combine information
– The test can only prove that you cannot combine information
1/10/2018-124
Is it appropriate to combine data across these OT when
estimating reliability?
Differences
– Surface components/configuration different aboard
ship and on ground
– Environment (altitude, humidity, etc.) different across
test sites Land GSC Configuration
1/10/2018-125
We can combine data from different test events to assess the
reliability of different system components
MFHBA – Mean Flight Hours Between Aborts 15.2 MFHBASystem ≡ 51.8% probability of
MTBA – Mean Time Between Aborts completing 10 hour mission
1/10/2018-126
Combining data using Bayesian approaches can improve our
estimates of reliability
Model
for
Data Classical
Likelihood Statistics
L(data | θ) Inference
Data
Posterior
f(θ | data)
Prior
The inclusion of the prior distribution allows us to
f(θ) incorporate different types of information into the analysis
1/10/2018-127
By combining data across different variants of a system, we
can report reliability with a higher degree of certainty
Reliability Requirements:
“The Armored Vehicle will have a reliability of 1000 mean miles between critical failure (i.e.
system abort)”
1/10/2018-128 *The NBC RV was excluded from the study because of its different acquisition timeline.
For some variants, substantial data is available, while for other
variants, very little data is available
1/10/2018-129 * A right censored observations occurs when the testing of the vehicle was terminated before a failure (i.e. system abort) was observed
Using a Bayesian approach to combine data across the
different variants and incorporate DT data produces more
realistic estimates and narrower confidence bounds
Operational Test MMBSA Estimates
(95% Confidence and Credible Intervals)
10000 Traditional Analysis
Frequentist Analysis
Bayesian Analysis
8000
MMBSA
6000
4000
?
2000
0
ATGMV CV ESV FSV ICV MCV RV MEV
Traditional Analysis:
• Extremely wide confidence intervals!
1/10/2018-130
Bayesian techniques allow us to formally combine DT and OT
data to produce a better estimate of reliability
Reliability Requirements:
– 600 Mean Miles Between OMF
1/10/2018-131
The Bayesian approach allows us to incorporate the DT data
while accounting for differences in DT vice OT performance
MMBOMF
we know there are substantial differences
80%
Method Phase MMBOMF
Confidence Interval
1/10/2018-132
Combining data requires forethought and planning
1/10/2018-133
Formal statistical models can be used to assess reliability for
complex systems-of-systems
• Reliability requirements for ships are often broken down into threshold
for the critical or mission-essential subsystems.
Test Data
Operational Mission
Critical Subsystem Total System Operating Time
Failures
Total Ship Computing Environment
4500 hours 1
(full-time)
Sea Sensors and Controls
2000 hours 3
(underway)
Communications (full-time) 4500 hours 0
Sea Engagement Weapons
11 missions 2
(on-demand)
Classical Bayesian
Classical Bayesian
Reliability at Reliability at
MTBOMF MTBOMF
720hrs 720hrs
TSCE 4500 hrs 0.85 3630 hrs 0.73
(1156 hrs, 42710 hrs) (0.54,0.98) (1179 hrs, 6753 hrs) (0.54,0.90)
SSC 667 hrs 0.33 697 hrs 0.31
(299 hrs, 1814 hrs) (0.09,0.67) (332 hrs, 1172 hrs) (0.11,0.54)
Comm 10320 hrs 0.83
> 2796 hrs* > 0.77*
(1721 hrs, 18210 hrs) (0.66,0.96)
SEW 0.82 0.77
(0.58,0.95) (0.62,0.91)
Core 0.15
?????
Mission (0.05, 0.27)
Comm – Communications
MTBOMF – Mean Time Between Operational Mission Failures
SEW – Sea Engagement Weapons
SSC – Sea Sensors and Controls
TSCE – Total Ship Computing Environment
1/10/2018-136* A conservative 80 percent lower confidence bound; frequentist MTBF does not exist
Qualitative assessments of reliability provide crucial context
for the operational impact of quantitative measures and the
basis for recommendations
1/10/2018-137
Primary Recommendations
• Reporting Reliability
– Was the system sufficiently reliable to successfully conduct its mission?
» What is the demonstrated reliability?
» Did the system meet its requirement? If not, what is the operational impact?
» How does the system’s reliability compare to the legacy system?
• Reliability Models
– To ensure estimates of reliability are accurate, choosing the correct
statistical model is crucial.
• Combining Information
– There are sound statistical approaches that can be used to capitalize on
all available data in assessing the reliability of a system.
1/10/2018-138
References
DOT&E references
• “DOT&E TEMP Guide,” 28 May 2013 (Version 3.0 Update in progress)
• “ Independent Operational Test and Evaluation (OT&E) Suitability Assessments,” Memo, 5 Oct 2012.
• “State of Reliability,” Memo from Dr. Gilmore to Principal Deputy Under Secretary of Defense (AT&L), 30 June 2010.
• “Next Steps to Improve Reliability,” Memo from Dr. Gilmore to Principal Deputy Under Secretary of Defense (AT&L), 18 Dec 2009.
• “Test and Evaluation (T&E) Initiatives,” Memo from Dr. Gilmore to DOT&E staff, 24 Nov 2009.
• “DOT&E Standard Operating Procedure for Assessment of Reliability Programs by DOT&E Action Officers,” Memo from Dr. McQuery, 29 May 2009.
• “DoD Guide for Achieving Reliability, Availability, and Maintainability,” DOT&E and USD(AT&L), 3 Aug 2005.
Other references
• “Reliability Growth: Enhancing Defense System Reliability,” National Academies Press, 2015.
• “Reliability Program Handbook,” HB-0009, 2012.
• “Department of Defense Handbook Reliability Growth Management,” MIL-HDBK-189C, 14 June 2011.
• “Improving the Reliability of U.S. Army Systems,” Memo from Assistant Secretary of the Army AT&L, 27 June 2011.
• “Reliability Analysis, Tracking, and Reporting,” Directive-Type Memo from Mr. Kendall, 21 March 2011.
• “Department of Defense Reliability, Availability, Maintainability, and Cost Rationale Report Manual,” 1 June 2009.
• “Implementation Guide for U.S. Army Reliability Policy,” AEC, June 2009.
• “Reliability Program Standard for Systems Design, Development, and Manufacturing,” GEIA-STD-009, Aug. 2008.
• “Reliability of U.S. Army Materiel Systems,” Bolton Memo from Assistant Secretary of the Army AT&L, 06 Dec 2007.
• “Empirical Relationships Between Reliability Investments And Life-cycle Support Costs,” LMI Consulting, June 2007.
• “Electronic Reliability Design Handbook,” MIL-HDBK-338B, 1 Oct. 1998.
• “DoD Test and Evaluation of System Reliability, Availability, and Maintainability: A primer,” March 1982.
Software
• AMSAA Reliability Growth Models, User Guides and Excel files can be obtained from AMSAA.
• RGA 7, Reliasoft.
• JMP, SAS Institute Inc.
1/10/2018-139