UJ Module Lecture 1 Rev3
UJ Module Lecture 1 Rev3
20 July 2020
Lecture 1
Reliability Engineering Strategy and Main Concepts
Index
• Reliability Strategy and Management 4
• The Need for Reliability Management 5
• Reliability Program 15
• The Reliability Organisation 23
• Reliability Policy and Strategy 31
• Reliability Governance and Procedures 39
• Reliability Organisation Capability and Maturity 44
• Cost of Reliability Programs 50
• Contracting and External Services 59
• Reliability and Quality Management Programs 69
• Reliability Reporting – Social Engineering Aspects 74
• Introduction To Reliability and Main Concepts 79
• Reliability Engineering as a Discipline 80
• Main Reliability Concepts 85
• Importance for Good Life Data 98
• Failure Distribution Curves 107
• Reliability Requirements Analysis 111
• Reliability within the Requirements Management Process 117
• Reliability Modelling and Prediction 120
• Reliability Demonstration and Growth 131
• Reliability Testing 132
• Reliability Growth Monitoring 135
3
Reliability Strategy and Management
6
New Product Development
Three factors are of fundamental importance to those responsible for developing a new product and
to those who will be responsible for using or operating the product; they are:
• Performance
• Will the performance (including dependability) of the product meet the expectations and needs of the end user?
• Cost
• What will be the cost, not only of developing and producing the product, but also of operating and maintaining and
eventually disposing of it, i.e. what will be its life-cycle cost?
• Timescale
• Will the product be available when required and appear on the market at the appropriate time?
• The customer’s satisfaction with a product, and the reputation of the product and of its supplier, depends to a
considerable degree on how well these factors are managed and harmonised during the various phases of the
product life cycle.
• The term dependability embraces reliability, maintainability, availability and maintenance support. Reliability,
availability and maintainability are themselves fundamental product performance characteristics and are
frequently specified as key product requirements.
8
Risk Assessment
• It is important to understand how much change is introduced in a project or when a new product
(or modifications to an existing product) are introduced (compared to the old one).
• Typical risk impacts of new product design/project undertakings are:
• Will this project/product introduce/use any new technology with an unproven reliability record?
• Is the design approach a revolutionary one as opposed to evolutionary?
• Will the product/design be significantly different from the old one (e.g. more that 30% of the content is new)?
• Will this product be used at a different geographic region or be exposed to more extreme environments?
• Does this product/design have any new requirements (e.g. 15 years life instead of 10 years)?
• Will the product/design have new applications?
• Any new materials or manufacturing methods used in the design?
• Any changes in the supply chain?
• Will the product be made at a different manufacturing location?
• Will this product be supplied to a new customer?
• Are there any other changes, which can affect reliability?
9
Why do Engineering Products Fail?
• The design might be inherently incapable.
• The item might be overstressed in some way. If the stress applied exceeds the strength then failure will occur.
• Failures might be caused by variations in product load and strength.
• Failures can be caused by wear-out. This is any mechanism or process that causes an item that is sufficiently
strong at the start of its life to become weaker with age. Well-known examples of such processes are material
fatigue, corrosion and insulation deterioration.
• Failures can be caused by other time-dependent mechanisms. Battery run-down and progressive drift of
electronic component parameter values (settings) are examples of this.
• Failures can be caused by sneaks. A sneak is a condition in which the system does not work properly even
though every part does.
• Failures can be caused by errors, such as incorrect specifications, designs or software coding, by faulty
assembly or test, by inadequate or incorrect maintenance, or by incorrect use.
When reliability is not under control, complicated issues arise that directly affect business operations
and asset cost of ownership – issues like manpower shortages, spare part availability, logistic
delays, lack of repair facilities, extensive retro-fit and complex configuration management costs, etc.
10
The Ecology of Failed Systems – Cause and Effect
Sources: Capability Measurement, October 2006 Presentation; Quality Handbook –Juran& Gryna. Also refer to Cost Of Quality Analysis in presentation. 11
Identification of possible Failure Sources
• Supply chain
• Human failures
• E-Security
12
Failure Recovery Strategies
13
Design and RAM
Reliability and maintainability are characteristics defined
during the design stage. They are also affected by the
manufacturing process and quality control. However, they
can be further improved during their productive life using
failure information and field experience to implement
modifications, although this is more difficult and less cost
effective.
Maintainability is defined mainly during design phase.
The following design considerations have a significant
impact on maintainability:
• Ease of access
• Standard tools
• Standard components
• Ease of calibration (or automated calibration)
• Ease of performing tests
• Interchange-ability
• Modular designs
14
Design Life, Service Life, MTBF and Obsolescence
• Design Life: The design life of a component or product is the period of time during which the item is expected by its
designers to work within its specified parameters; in other words, the life expectancy of the item. It is the length of time
between placement into service of a single item and that item's onset of wear-out. Some products designed for heavy or
demanding use are so well-made that they are retained and used well beyond their design life – in such cases “life
extension” activities are undertaken. Entry-level (lowest end of the price range fulfilling a certain specification) will tend to
have shorter design lives than more expensive products fulfilling the same function, since there are savings to be made in
using designs that are cheaper to implement, or, conversely, costs to be passed onto the customer from engineering using
a larger safety margin to achieve increased working life.
• Service Life: Is the forecast life expectancy of products based on real world results and linked to the manufacturer’s
operational experience in supporting their product. It is highly dependant on operational conditions.
• Mean time between failure (MTBF): Is a measure of the rate of occurrence of random failures in time where
these failures are not due to a wear-out mechanism. E.g., the MTBF of a device may be 100,000 hours while the design-life
is 20,000 hours. For this example, across the population of products, one failure is estimated to occur every 100,000
population operating hours*.
• Obsolescence: The Design life is related to but distinct from the concept of built-in obsolescence. This is where
products are designed so as to become obsolete (at least in the eyes of the user) before the end of their design life. An
example would be digital cameras (both for later technology as well as old film cameras).
*100,000 units operating for 1 hour each = 100,000 population operating hours. None of these units will ever approach reaching 100,000 operating hours as 15
each one will fail due to wear-out and be replaced by a new unit
Typical Design interventions to achieve Required Reliability
• The required reliability performance is obtained by employing design techniques either to prevent
failures from occurring (fault avoidance) or to eliminate their effects (fault tolerance). Typical
interventions would be:
• application of fault-tolerance (redundancies, parallel programming, reconfiguration, restarts) and fail-safe techniques;
• application of design procedures (for example top-down design, structured programming, hardware component derating);
• elimination of critical single point fault modes;
• control of stresses applied to components and assemblies;
• control of execution load on software;
• reduction of effects on design performance from parameter variation (e.g. ageing effects);
• use of preferred and proven parts and technology (instead of unproven technologies);
• definition of methods to reduce sensitivity to manufacturing processes;
• adherence to safety standards;
• specific techniques for ensuring reasonably fault-free software by means of methods like source code inspection,
software code auditing and walk-through processes (FAT and UAT).
• Once identified, reliability critical items should be closely monitored to ensure that design changes does
not impact their specified reliability.
16
Trade-off Analysis
• There are instances where design interventions does not yield the required improvement. In such
cases, trade-off analysis is performed.
• Design trade-offs allow the level of dependability/reliability required during the various life-cycle
phases to be selected and adjusted to determine the impact of changing reliability on the life cycle
cost and overall reliability of the product/system.
• Trade-off studies in the concept and definition phase and in the early design and development
phase provide input to allocation of dependability/reliability requirements. Studies performed at
later stages permit a refinement of allocations and assist in choosing between alternative design
and support solutions.
• Specific trade-off analyses that should be made include:
• reliability performance versus maintainability performance;
• maintainability performance versus maintenance support performance;
• reliability performance versus product features;
• reliability performance of design alternatives as a function of life-cycle cost
17
Performance and Competitive Priorities
Generally focused on
very specific aspects
like:
• Cost
• Speed
• Dependability
• Quality
• Flexibility
Adopted from: Operations Management by Nigel Slack, Stuart Chambers, Robert Johnston, Andy Neely 18
Reliability Program
Formal Reliability Program Considerations
• A formal reliability programme is necessary whenever the risks or costs of failure are deemed
considerable.
• Risks of failure usually increase in proportion to the number of components in a system, so reliability
programmes are required for any product whose complexity leads to an appreciable risk.
• An effective reliability programme should be based on the conventional wisdom of responsibility and
authority being vested in one person - the reliability programme manager.
• The reliability programme must begin at the earliest, conceptual phase of the project. It is at this stage
that fundamental decisions are made, which can significantly affect reliability. These are decisions
related to the risks involved in the specification (performance, complexity, cost, produce-ability, etc.),
development time-scale, resources applied to evaluation and test, skills available, and other factors.
• Throughout the product life cycle, therefore, the reliability is assessed, first by initial predictions based
upon past experience in order to determine feasibility and to set objectives, then by refining the
predictions as detail design proceeds and subsequently by recording performance during the test,
production and in-use phases. This performance is fed back to generate corrective action, and to
provide data and guidelines for future products.
20
The elements of a reliability programme are outlined in documents such as US MIL-STD-785, UK Defence Standard 00–40 and British Standard 5760
Formal Reliability Program Elements
4.0
3.0
1.0
2.0 5.0
Source: SMRP Maintenance and Reliability Body of Knowledge Concept Map. F Echeverri. Manufacturing Technician II. 21
Program for Systems and Equipment
Provides both general reliability program requirements and the required specific tasks.
Major program elements discussed by Juran and Gryna (1993) are:
1. Setting reliability goals
2. Reliability modelling
3. Apportioning the reliability goals
4. Stress analysis
5. Reliability prediction
6. Failure mode and effects analysis
7. Identification of critical parts
8. Design review
9. Supplier selection
10. Control of reliability in manufacturing
11. Reliability testing
12. Failure reporting and corrective action
Sources: Quality Handbook –Juran& Gryna. 22
The Reliability Program
• A Reliability Program is a complex learning and knowledge-based system unique to the products and
processes of each business.
• It is supported by leadership, built on the skills that developed within the business team, integrated into
business processes and executed by following proven standard work practices.
• A Reliability Program plan is used to document exactly what "best practices" (tasks, methods, tools,
analysis, and tests) are required for a particular (sub)system, as well as clarify customer requirements
put forward for reliability assessment/verification.
• A Reliability Program plan is essential for achieving high levels of reliability, testability, maintainability,
and the resulting system availability, and is developed early during system development and refined
over the system's life-cycle.
• The Reliability Program specifies not only what the reliability engineer does, but also the tasks
performed by other stakeholders.
• A Reliability Program plan is approved by top program Management, which is responsible for allocation
of sufficient resources for its implementation.
The elements of a reliability programme are outlined in documents such as US MIL-STD-785, UK Defence Standard 00–40 and British Standard 5760 23
Reliability Program Elements (Design Phase)
Reliability Programme Flow (design/development)
Notes:
1. Items in italics are reliability specific aspects.
2. Figures in brackets indicate relevant chapters in the Reliability
Engineering Handbook.
3. Shaded boxes indicate processes that are usually iterative.
4. Dotted lines indicate data feedback (FRACAS).
24
Reliability Program Elements (O&M Phase)
Reliability Programme Flow (production, in-service).
Notes:
1. Items in italics directly influence reliability.
2. Figures in brackets indicate relevant chapters.
3. Shaded boxes indicate processes that are usually iterative.
4. Dotted lines indicate data feedback (ideally captured in a
FRACAS/Failure Log).
The goal for any plant is to increase overall production reliability, meaning the maximization of output with current resources by reducing waste in 25
equipment reliability and process reliability.
Integrated Reliability Programs
• The reliability effort should always be treated as an integral part of the product development and not as a
parallel activity unresponsive to the rest of the development programme. This is the major justification for
placing responsibility for reliability with the project manager.
• The responsibility for reliability achievement must not be taken away from the project manager, who is the
only person who can ensure that the right balance is struck in allocating resources and time between the
various competing aspects of product development.
• Since production quality will affect reliability, quality control is an integral part of the Reliability Program.
Though Quality Control cannot make up for design shortfalls, poor quality can negate much of the reliability
effort.
• Quality control can be made to contribute most effectively to the reliability effort if:
• Quality procedures, such as test and inspection criteria, are related to factors which can affect reliability, and not only to
form and function. Examples are tolerances, inspection for flaws which can cause weakening, and the need for adequate
screening when appropriate.
• Quality control test and inspection data are integrated with the other reliability data.
• Quality control personnel are trained to recognise the relevance of their work to reliability, and trained and motivated to
contribute.
26
Cost of a Quality Management Program
28
Categories of Quality Costs
29
Categories of Quality Costs
Appraisal Costs.
Those costs that are incurred, by an enterprise, to determine the degree of conformance to the laid
down quality requirements.
• Incoming Inspection and Test. This is the cost incurred in inspecting a suppliers product before it is allowed
to enter the manufacturing, assembly etc., process. This can either be done by inspection at the source or
by inspection on receipt.
• In-Process Inspection and Test. The cost involved due to the in -process evaluation to ensure compliance to
the laid down quality standard or requirements.
• Final Inspection and Test. The cost incurred by inspecting the product at the end of a cycle before it is
delivered to the customer.
• Calibration. This is the cost of keeping ALL measuring, testing and inspection equipment in a known state of
calibration. This aspect is extremely important as an out of calibration situation can lead to extremely high
internal failure costs.
• The department name should not be used as a criterion for establishing appraisal costs as various
departments with in an enterprise may be involved/responsible for such activities.
30
The Reliability Excellence Model®
The five steps in achieving Reliability maturity are:
• Creating a foundation with vision and values.
• Defining the new culture in which work is accomplished.
• Implementing work processes that are defined,
disciplined and effective.
• Optimizing work processes to improve efficiency
and effectiveness.
• Measuring performance to ensure sustainability
and continuous improvement.
Source: https://www.lce.com/Reliability-Excellence-Model - The Model is developed and copyrighted by Lifecycle Engineering (LCE) 31
The Reliability Organisation
Reliability Organisation Considerations
• Because several different activities contribute to the reliability of a product it is difficult to be
categorical about the optimum reliability organisation that will ensure effective management of
reliability. Reliability is affected by design, development, production quality control, control of
suppliers and subcontractors, and maintenance – so stakeholders in all these business areas are
required to be actively involved in reliability management and optimisation activities.
• Since the knowledge, skills and techniques required for the reliability engineering tasks are
essentially the same as those required for safety analysis and for maintainability engineering, it is
logical and effective to combine these responsibilities in the same department or project team.
• Reliability management must be integrated with other project management functions, to ensure
that reliability is given the appropriate attention and resources in relation to all the other project
requirements and constraints.
• Two main forms of reliability organization have evolved over time:
• The QA Based Reliability Organisation
• The Engineering Based Reliability Organisation
33
Business Process Improvement Capability Indices
Operations
Marketing Finance Engineering Logistics
Planning
• Project reliability management should be part of the responsibilities of the relevant Project/
Program Manager.
• Since Reliability Engineering should involve interfaces with several other functions, including such
non-engineering areas as marketing and finance, the position of RE CoE Manager should be seen
as one requiring very good engineering and business talents with good insight into the overall
business. This individual also needs to market and create awareness at all levels of the
importance of the RE Program.
• Whilst selection and training of reliability people is important, it is also necessary to train and
motivate all other members of the engineering or Project team.
35
Reliability Type Organisations
Quality Assurance based Reliability Organisation
Product Manager
Data Analysis
Reliability Quality Test and
Audit and Statistical Design Development Production
Engineering Control Inspection
Engineering
Parts Evaluation
Environmental Design
and Failure
Testing Assurance
Analysis
• Places responsibility for reliability with QA management, which then controls the ‘quality’ of design, maintenance, and so on, as well as of
production activities.
• This organisational form is based upon the definition of quality as the totality of features which has bearing on a product’s ability to satisfy the
requirement.
• This type of structure allows easier integration of some tasks that are common to design, development and production. The ability to operate a
common failure data (FRACAS) system is potentially also easier.
36
The Quality Assurance Department & Role
• The production department should have ultimate responsibility for the manufacturing quality of the product.
• The QA department is responsible for assessing the quality of production but not for the operations which
determine quality. QA thus has the same relationship to production as reliability engineering has to design and
development.
• The QA department should be responsible for:
• Setting production quality criteria: The quality manager should determine, or approve, the final inspection and test methods and
criteria to ensure conformance. He or she should also determine such details as quality levels of components, quality control of
suppliers and calibration requirements for test and measuring equipment.
• Monitoring production quality performance and costs: The QA department should prepare or approve quality performance and
cost reports, and should monitor and assist with problem-solving. The quality manager must be satisfied that the quality objectives
are being attained or that action is being taken to ensure this.
• QA training: The quality manager is responsible for all quality control training. all production people must understand and apply
basic quality concepts such as simple statistical process control (SPC) and data analysis.
• Specialist facilities and services: The quality department provides facilities such as calibration services and records, vendor
appraisal, component and material assessment, and data collection and analysis.
• Quality audit and registration: Quality audit is an independent appraisal of all of the operations, processes and management
activities that can affect the quality of a product or service. The objective is to ensure that procedures are effective, that they are
understood and that they are being followed. Quality audits, like financial audit, requires both internal and external auditing.
37
Reliability Type Organisations (Cont.)
Engineering Based Reliability Organisation
Product Manager
Quality
Product/Solution Reliability Quality Control
Design Procedures and Production Test and Inspection
Development Engineering Data Analysis
Audit
Parts Evaluation
Environmental Design Statistical
and Failure
Testing Assurance Engineering
Analysis
• In the engineering based organisation, reliability is made the responsibility of the engineering manager.
• The QA (or QC) manager is responsible only for controlling production quality and may report directly to the product manager or to the
production manager
• For products or systems where a considerable amount of innovative design is required, more of the reliability effort will have to be directed
towards design assurance, such as stress analysis, design review and development testing.
38
Competencies of the Reliability Engineer
Typical competencies for a Reliability Engineer are:
• Sufficient technical knowledge to assist design engineers in preventing design/production problems, and to
ascertain causes of failure. Extensive technical experience and sufficient knowledge to understand the
specialists’ problems is important.
• The ability to demonstrate the value and relevance of the reliability methods applied.
• Experience and knowledge of the product, including manufacturing, operation and maintenance. This will
enable the reliability engineer to contribute effectively and with credibility in a product development team.
• A good communicator and having the ability to translate specialists information and requirements into the
elements and data needed for RE analysis.
• A background in areas such as test, product support, and user maintenance. Engineers in these fields
usually make good Reliability Engineers.
• Where data analysis and statistical engineering support has to be provided, specialist training in reliability
engineering techniques and methodologies will be crucial. Specialists in statistics are needed for design of
experiments (DOE) and analysis of reliability/statistical data, and not many engineers are suitably trained
and experienced in these skills.
39
Outsourcing RE Tasks
• For large engineering and design activity intensive organisations a dedicated reliability
engineering organization is usually necessary.
• Depending on the risks involved in engineering activities and product development, an alternative
is to make use of external RE services.
• External reliability engineering services can fulfil the requirements of smaller companies by
providing the specialist support and facilities when needed.
• Reliability engineering consultants and specialist test establishments can often be useful to larger
companies also, in support of internal staff and facilities.
• Since they are engaged full time across a number of different types of project they should be
considered whenever new problems arise. The benefit in using their services is that they perform
RE full time and may also have had exposure to problems being experienced and can provide fast
and effective reliability advice and guidance.
• RE consultants and specialists should be selected carefully and integrated in the project team.
40
Outsourcing Requirements
When going the outsourcing route for RE services, it is important that the following aspects receive
adequate attention when issuing RFI/RFP to potential service providers:
• Detailed and clear specification the reliability requirements.
• Specification of the RE standards and methods to be used.
• Specifying the documentation required and level of reporting requirements (Content, format and size).
• A Project Reliability Plan
• Design Analysis Reports and Updates (as per RE methods specified)
• Test and/or V&V Reports (of analysis findings)
• Setting up the financial and contractual framework that will ensure a win-win situation for both contractor and
the customer.
• Specify the RE reporting requirements and final outputs to be delivered to the customer.
• Methods and means with which contract performance will be monitored (e.g. use of reward, penalty-reward
or penalty mechanisms).
41
The Need for Reliability Training
• A central reliability department (Centre of Excellence) is necessary to provide general Reliability Engineering
standards, training and advice, but should not necessarily be relied upon to manage reliability programmes
across a range of projects.
• Since product failures are nearly always due to human shortcomings, in terms of lack of knowledge, skill or
effort, all staff and suppliers involved with the product must be trained so that the chances of such failures are
minimized.
• Reliability training effort should be focussed on the whole Engineering/Project team, and not just to the
reliability specialists.
• In the absence of Quality and Reliability Engineering in formal engineering curricula, it may be important for
large organisations to develop the required training material and formal training programs to ensure that
quality and reliability methods are understood and used correctly in the organisation.
• For complex reliability programs, it may also be advisable to develop work instructions at an appropriate level
to ensure that all staff execute reliability analysis and tasks consistently and correctly. Such work instructions
can also form very useful baseline for contractual requirements regarding reliability deliverables outsourced to
external parties.
42
Reliability Policy and Strategy
Corporate Policy
• A really effective reliability function can exist only in an organization where the achievement of high reliability
is recognised as part of the corporate strategy and is given top management attention.
• Quality and reliability awareness and direction must start at the top and must permeate all functions and levels
where reliability can be affected.
• If these conditions are not fulfilled, and if it receives only lip service, reliability effort will be cut back whenever
cost or time pressures arise.
• The reliability policy should become more and more explicit as it is translated into actions from the top down
through the organisation. At the lowest leadership level (plant, area, department), it should be a specific plan
or a strategy for taking action that is consistent with the reliability policy.
• A reliability improvement policy statement should be explicit regarding:
• the compelling business reason for improving equipment and/or process reliability
• the acceptable maintenance and reliability work processes and standards
• what is to be improved
• how reliability growth and improvement will be measured
• the timeframes in which reliability improvement should be made/achieved
• the policy should provide the “what” but not the “how” to be effective
44
Best Practice Reliability Philosophy of Successful Companies
• Reliability must be designed into products and processes using the best available science-based
methods.
• Knowing how to calculate reliability is important, but knowing how to achieve reliability is equally, if
not more, important.
• Reliability practices must begin early in the design process and must be well integrated into the
overall product development cycle.
• Understanding and institutionalising when, what and where to use the wide variety of reliability
engineering tools available, in order to achieve the reliability mission of an organisation
45
Effective Reliability Strategies
• The two most important features of a Good Reliability Program are:
• The statement of the reliability aim in such a way that it is understood, feasible, mandatory and demonstrable.
• Dedicated, integrated management of the reliability programme with design, production/manufacturing and other asset
lifecycle phases.
• Achieving the organisation’s reliability goals requires strategic vision, proper planning and sufficient
organisational resource allocation.
• There are a variety of activities involved in an effective reliability program and in arriving at reliable
products. This requires careful planning to ensure the correct activities define the strategy.
• The integration and institutionalisation of reliability practices into development projects also needs
to happen from the earliest lifecycle phase.
• Although life data analysis is an important piece of the overall reliability program, performing just
this type of analysis is not enough to achieve reliable products.
46
Reliability Strategy Elements
• Values, Beliefs and Policy: Reliability beliefs set the target for the culture. An example of a set of beliefs
is:
• “Equipment failures are preventable”
• “Reliability is everyone’s business”
• “We report all equipment issues”
• “We investigate abnormal conditions and failures”
• “We have visible reliability metrics”
• “Management is committed to reliability”
• “Reliability issues are investigated and resolved”
• “Reliability is for the entire lifecycle”
• Competency and Baselines: Organisations need to have individuals with the right competencies in place to
carry out a corporate reliability program. This starts with staff having the right skill and knowledge sets.
• Program Structure Considerations:
• It is best to specify principles and high-level practice expectations and leave the rest to the sites (or individual projects) to
decide how to implement these practices.
• Ensure that appropriate metrics are implemented to monitor the Reliability Program.
47
Important tactics to improve reliability and performance
• Reliability
• Improving individual components
• Providing redundancy
• Maintenance
• Implementing or improving preventive
maintenance
• Increasing repair capability or speed
48
Reliability Strategy Elements
• The Business Case: The business case can typically focus on these issues:
• Uptime
• Operating cost
• Supply chain
• Demand
• Deferring capital
• Safety
• Product cost
• Breakdowns
• Leadership Structure: The effectiveness of corporate reliability programs varies depending on:
• Leadership – who drives the initiative both from a corporate and business position
• Strategy – targeting what the business needs to survive
• Structure – including the “right” elements
• Culture – collaborating, focusing on defect elimination, and being proactive
• Talent – supporting its implementation
• Level of buy-in from participants – accepting the initiative from the boardroom to the production floor
• Competency – obtaining knowledge of practices
49
Using Hoshin Planning to define the Reliability Strategy
Hoshin Kanri (also called Policy Deployment) is a method for ensuring that the strategic goals of a company
drive progress and action at every level within that company. This eliminates the waste that comes from
inconsistent direction and poor communication.
It achieves this by aligning the goals of the company (Strategy) with the plans of middle management (Tactics)
and the work performed by all employees (Operations). It focuses on 4 key steps:
1 – Create a Strategic Plan
The Process starts with a strategic plan (e.g. an annual plan) that is
developed by top management to further the long range goals of the
company. This plan should be carefully crafted to address a small number of
critical issues.
2 - Develop Tactics
At a departmental level, mid-level managers develop tactics that will best
achieve the goals as laid out by top management. This is best achieved
with KPIs are meaningful and appropriate.
3 - Take Action
At the plant floor level, supervisors and team leaders work out the
operational details to implement the tactics as laid out by mid-level
managers. Managers should stay closely connected to activity at this level.
4 - Review and Adjust
Information about progress and results is the second flow that creates a
closed loop system – enabling control and adjustment of the entire process.
50
The Hoshin (X) Matrix – An Example
• Every project should create and work to a documented and approved Reliability Plan. The
Reliability Plan should include:
• A brief statement of the reliability requirement.
• The organisation put in place for reliability management.
• The reliability activities that will be performed (design analysis, test, reports).
• The timing of all major reliability activities, in relation to the project development milestones.
• Reliability management of suppliers.
• The standards, specifications and internal procedures (e.g. the Organisational Reliability Manual) which will
be used, as well as cross-references to other plans such as those used for test, safety, maintainability and
quality assurance.
53
Reliability Governance Structure – An Example
CP – Vice President
GM – General Manager
AM – Asset Manager
RM – Reliability Manager
ATL – Asset Technical Lead
P-TL – Process Technical
Lead
SL – System Engineer/Lead
AFP – Area Focal Point
55
Reliability Procedures – Institutionalising the Practice
• An integrated reliability programme must be disciplined and well managed.
• The reliability (and quality) effort must be tightly controlled and supported by mandatory procedures.
The written procedures must state, in every case, who carries responsibility for action(s) to be taken
and who is responsible for providing the resources and capability. They must also state who provides
supporting services.
• The actions of design analysis, test, reporting, failure analysis and corrective action must be strictly
imposed by means of formal Design Process Procedures. Deviation from best practices can result in a
reduction of reliability, without any reduction in the cost of the programme. There will always be
pressure to relax the extent of design analyses required, or to classify a failure as non-relevant if doubt
exists, but this must be resisted.
• The most effective way to ensure this is to have the agreed reliability programme activities written down
as mandatory procedures, with defined responsibilities for completing and reporting all tasks, and to
check by audit and during programme reviews that they have been carried out as specified.
• In-house reliability procedures should not attempt to teach basic reliability management principles in
detail, but rather should refer to appropriate standards and literature.
56
Reliability Organisation Capability and Maturity
Reliability Capability and Maturity Assessments
• The evaluation methods for organizational reliability processes are reliability capability and
reliability maturity assessments. These methods use standardised measurement criteria for
assessing and quantifying the reliability capability of an organization.
• Reliability Capability uses IEEE 1624 and the standard defines the inputs, the required activities and the
expected outputs for 8 key categories.
• Reliability Maturity: The Automotive Industry Action Group (AIAG) published the Reliability Methods
Guideline (AIAG, 2004) containing 45 key reliability tools, and this activity was later expanded to develop an
organizational capabilities maturity concept.
• Both reliability maturity and capability assessments provide important tools to evaluate the
organisational capability from a product reliability perspective.
• The assessments can also be used for supplier selection process and can be conducted as self-
assessment and/or 2nd or 3rd party assessment.
• These activities also help to identify gaps and weaknesses in the reliability process and can be
instrumental in developing an efficient product and process improvement plan.
58
Reliability Capability Assessments
Reliability Capability: Reliability capability is a measure of the practices
within an organization that contribute to the reliability of the final product and the
effectiveness of these practices in meeting the reliability requirements of customers.
IEE 1624 was developed measuring the following key reliability practices:
• Reliability requirements and planning.
• Training and development.
• Reliability analysis.
• Reliability testing.
• Supply chain management.
• Failure data tracking and analysis.
• Verification and validation.
• Reliability improvements.
• Each of the reliability practices are individually assessed (levels 1–5) with
reference to the specified set of activities required to obtain a specific capability
level. These five levels represent the metrics or measures of the organizational
reliability capability and reflect stages in the evolutionary transition of that practice.
https://ops.fhwa.dot.gov/publications
59
Capability Maturity Levels – The CMMI Approach
• CMMI is a world-class
performance
improvement model for
competitive organizations
that want to achieve high-
performance operations.
• Proven effective in
organizations
and governments globally
over the last 25 years, CMMI
consists of collected best
practices designed to
promote the behaviours that
lead to improved
performance in any
organisation.
• High Maturity
organizations are
continuously evolving,
adapting, and growing to
meet the needs of
stakeholders and customers
http://cmmiinstitute.com
Capability Maturity Model Integration (CMMI) is a process level improvement training and appraisal program. Administered by the CMMI Institute, a
subsidiary of ISACA, it was developed at Carnegie Mellon University (CMU). CMU claims CMMI can be used to guide process improvement across a 60
project, division, or an entire organisation.
Reliability Maturity Assessments
Reliability Maturity: The use of a reliability maturity assessment (RMA) manual that covers nine reliability categories:
A Reliability planning.
B Design for reliability.
C Reliability prediction and modelling.
D Reliability of mechanical components and systems.
E Statistical concepts.
F Failure reporting and analysis.
G Analysing reliability data.
H Reliability testing.
I Reliability in manufacturing.
• For each reliability tool within the category the RMA suggests
scoring criteria. After assessment of the individual scores they
are combined by categories resulting in a rating based on the
percentage of the maximum available score for each category.
• The scores for each category can be combined based on the
weighted averages to obtain the total score for an organization.
Score above 60% is classified as B-level and above 80% is
A-level. Scores below 60% are considered as reliability
deficiencies. http://www.ausenco-rylson.com
61
Reliability Maturity Assessments
GenesisSolutions® has developed a proprietary methodology for evaluating and implementing a comprehensive enterprise asset management strategy
for organizations in a wide array of industries. It uses the DMAIC (Define, Measure, Analyze, Improve and Control)approach to EAM Master Planning. 62
http://www.genesissolutions.com/eam-assessment-methodology
Cost of Reliability Programs
“If you are not taking care of your customer, your competitor
will”.
Bob Hooey
Reliability and Costs
• Achieving high reliability is expensive, particularly when the product is complex or involves
relatively untried technology.
• It is nearly always less costly to correct causes of production defects than to live with the
consequences in terms of production costs and unreliability.
• Nearly every failure mode experienced in service is worth discovering and correcting during
development, owing to the very large cost disparity between corrective action taken during
development and similar action) once the equipment is in service (or the cost of living with the
failure mode.
• There are, however, usually practical limits to how much can be spent on reliability during a
development programme.
• A good Reliability Program (supporting the reliability effort with the right engineers, with adequate
management support and involvement, and the required test equipment and products for testing)
will justify the necessary expenditure on reliability that supports total lowest asset cost.
64
Reliability Economics – The Traditional View
Annual Cost
expended on reliability (and production quality)
activities are shown here. Investment Cost
• It shows a U-shaped total cost curve with the
minimum cost occurring at a reliability level
Damage Cost
somewhat lower than 100 %.
• This would be the optimum reliability, from the
total cost point of view. System Reliability
65
Reliability Economics – The Deming Model
• W.E. Deming presented a different model in his
teaching on manufacturing quality.
• He argued that, since less than perfect quality is
the result of failures, all of which have causes, we Total Costs
should not be tempted to assume that any level of
quality is “optimum”, but should ask ‘what is the Development/Production
Cost
cost of preventing or correcting the causes, on a Costs
case by case basis, compared with the cost of
doing nothing?’
• When each potential or actual cause is analysed
in this way, it is usually found that it costs less to
Failure Costs
correct the causes than to do nothing.
• Thus total costs continue to reduce as quality is Quality/Reliability 100%
improved.
This simple picture was the prime determinant of the post-war quality revolution in Japan, and formed the basis for the philosophy of kaizen (continuous 66
improvement). 100% quality was rarely achieved, but the levels that were achieved exceeded those of most Western competitors, and production costs
were reduced.
Calculating Maintenance Costs
• The traditional view attempted to balance preventive and breakdown maintenance costs
• Typically this approach failed to consider the true total cost of breakdowns
• Inventory
• Employee morale
• Schedule unreliability
• This view superseded by a “Full/Total Cost View” that considers all cost elements to determine optimal point.
Traditional View Full/Total Cost View
67
Reliability Economics – There is an optimal point
• All efforts to improve reliability by identifying and
removing potential causes of failures in service
should result in cost savings later in the product life
cycle, giving a net benefit in the longer term.
• Achieving levels of reliability close to 100% is often
not realistic for complex products.
• Research on reliability cost modelling (Kleyner,
2010) showed that in practical applications the total
Cost
Total Costs
cost curve is highly skewed to the right due to the
increasing cost and diminishing return on further
reliability improvements.
• Achieving reliable designs and products requires a
totally integrated approach, including design, test,
production, as well as the reliability programme Quality/Reliability 100%
activities.
69
Quality Costs
• Quality Management suggest considering costs under three headings, so that they can be
identified, measured and controlled. These quality costs are the costs of all activities specifically
directed at reliability and quality control, and the costs of failure. Quality costs are usually
considered in three categories:
• Prevention Costs: Those costs related to activities which prevent failures occurring. These include
reliability efforts, quality control of bought-in components and materials, training and management.
• Appraisal Costs: Costs related to test and measurement, process control and quality audit.
• Failure Costs: Actual costs of failure. Internal failure costs are those incurred during manufacture.
These cover scrap and rework costs. Failure costs also include external or post-delivery failure costs,
such as warranty costs; these are the costs of unreliability.
• Cost Analysis should be performed using a range of assumptions to determine the sensitivity of the
results to assumed effects, such as the yield at test stages and reliability in service.
• The costs of unreliability in service should be evaluated early in the development phase, so that
the effort on reliability can be justified and requirements can be set, related to expected costs.
70
Safety and Product Liability (PL) Costs
• Product liability legislation adds a new dimension to the importance of eliminating safety-related failure modes, as
well as to the total quality assurance approach in product development and manufacture.
• Before product liability (PL), the law relating to risks in using a product was based upon the principle of caveat
emptor (‘let the buyer beware’). PL introduced caveat venditor (‘let the supplier beware’).
• PL makes the manufacturer of a product liable for injury or death sustained as a result of failure of his product.
Since these risks may extend over ten years or even indefinitely, depending upon the law in the country
concerned, long-term reliability of safety-related features becomes a critical requirement. The size of the claims,
liability being unlimited in the United States, necessitates top management involvement in minimizing these risks,
by ensuring that the organization and resources are provided to manage and execute the quality and reliability
tasks which will ensure reasonable protection.
• A designer can now be held liable for a failure of his design, even if the product is old and the user did not operate
or maintain it correctly.
• Claims for death or injury in many product liability cases can only be defended successfully if the producer can
demonstrate that he has taken all practical steps towards identifying and eliminating the risk, and that the injury
was entirely unrelated to failure or to inadequate design or manufacture.
• PL insurance is a business area for the insurance companies, who naturally expect to see a suitable reliability and
safety programme being operated by the manufacturers they insure.
Abbot and Tyler (1997) provide an overview of this topic. 71
ROI of Reliability Programs and improved Asset Reliability
An effective reliability program focusing on the 5 key areas listed below, has many benefits:
• Increased plant process availability
• Improved asset and equipment reliability
• Improved safety and environmental risk mitigation
An effective essential asset monitoring program has been shown to save up to 13 percent of maintenance costs, reclaim 0.9 percent of lost production, 72
and save up to 2 percent of preheat energy costs every year.
Contracting and External Services
Contracting for Reliability
• Product warranty is a type of reliability contract.
• Contracts which stipulate specific incentives or penalties related to reliability achievement have been
developed by the military and other major equipment users such as airlines and public utilities.
• The most common form of reliability contract is one which ties an incentive or penalty to a reliability
demonstration. The demonstration may either be a formal test or may be based upon the user or
customers experience. In either case, careful definition of what constitutes a relevant failure is
necessary, and a procedure for failure classification must be agreed.
• When planning incentive contracts it is necessary to ensure that other performance aspects are
sufficiently well specified and, if appropriate, also covered by financial provisions such as incentives or
guarantees, so that the supplier is not motivated to aim for the reliability incentive at the expense of
other features.
• Incentive contracting requires careful planning so that the supplier’s motivation is aligned with the
customer’s requirements. The parameter values selected must provide a realistic challenge and the
fee must be high enough to make extra effort worthwhile.
74
Contractual Reliability considerations during Asset Lifecycle
• Concept and Definition Phase: During this phase the foundation is laid for the product’s dependability
and its life-cycle cost. Decisions made during this phase have greatest impact on the product and its life-cycle
cost. In this phase the focus should be on establishing the correct requirements for the product and its future
support and for establishing the Reliability Management (RM) Plan that will form the basis for the control of
reliability during the subsequent asset lifecycle phases.
• Design and Development Phase*: During this phase it is critical to ensure that full consideration is given to
the reliability requirements captured in the RM Plan and Reliability specification for the product and system.
Sufficient detail should be included in contracts to ensure that the required analysis and prediction activities
are implemented and the V&V and Test procedures/criteria are defined and executed. This is especially
important in cases where there will be multiple designers and contractors involved in the end product design,
development and implementation.
• Manufacturing Phase: Contractual reliability requirements for this phase must be very clear about
performance levels that the product should achieve and that it is not degraded in any way during the
manufacturing process. Where required, additional product testing during manufacturing (e.g. factory
acceptance tests) and reliability stress screening may be introduced as contractual requirements.
75
Specifying Reliability when Outsourcing Design*
• In order to ensure that reliability is given appropriate attention and resources during design,
development and manufacture, the requirement must be specified.
• The reliability specification must contain:
• A definition of failure related to the product’s function. Care must be taken in defining failure to ensure that the failure
criteria are unambiguous. Failure should always be related to a measurable parameter or to a clear indication The
definition should cover all failure modes relevant to the function.
• A full description of the environments in which the product will be stored, transported, operated and maintained. The
environmental specification must cover all aspects of the many loads and other effects that can influence the product’s
strength or probability of failure. The environments to be covered must include handling, transport, storage, normal use,
foreseeable misuse, maintenance and any special conditions. The type of test equipment likely to be used, the skill level
of users and test technicians, and the conditions under which testing might be performed should be stated if these factors
might affect the observed reliability.
• A statement of the reliability requirement, and/or a statement of failure modes and effects which are particularly critical
and which must therefore have a very low probability of occurrence. Levels of reliability can be stated as a success ratio,
or as a life. Specified life parameters must clearly state the life characteristic. The life must be related to the duty cycle
The life parameter may be stated as some time-dependent function, or it may be stated as a time, with a stipulated
operating cycle.
76
Specifying Reliability when Outsourcing Design* (Cont.)
• Contract reviews should be conducted regularly to ensure compliance with Reliability Requirements. The
contract manager should ensure that all respective organisations honour the contractual reliability performance
levels agreed to. Typical dependability related contract requirements subject to review may include:
• scope and schedule of Reliability Plan activities;
• specific delivery targets and deliverable items;
• specific resources tailored as appropriate;
• specific documentation requirements;
• specific test or demonstration provisions;
• warranty, penalty and specific incentive details;
• the environmental conditions under which the product is to be used.
• Of particular importance is sub-contracted products (or parts for the overall product). It is crucial to ensure that
requirements for those parts adequately respond to the requirements set for the entire deliverable product; and
that the sub-contractor has an appropriate reliability program in place to ensure his product’s reliability.
• Similarly, where the customer provides parts for inclusion into the deliverable product, evidence should be
provided that it was designed and manufactured according to a dependability program. Information should also
be available regarding the V&V of the part’s reliability as well as any problems identified that can impact the
part’s operation in the new product.
77
Contractual Reliability considerations during Asset Lifecycle (Cont.)
• Installation Phase: The contractual requirements should ensure that the required procedures and
instructions for installation and commissioning are provided that will maintain reliability of the
product during the process. The performance of the product for acceptance and handover to the
client should also be very clearly specified. Increased acceptance testing may be required.
• Operation and Maintenance Phase: Where maintenance and/or operating services are
outsourced, it would be crucial to provide the performance requirements for the product or
system/plant. The criteria and operational/maintenance boundaries from a performance
perspective should be clear (e.g. when performance has deteriorated to a level where maintenance
must be performed). Training, spares/stock holding and other issues will also then be contractually
specified items to ensure that the reliability levels can be maintained.
• Disposal Phase: For this phase, environmental criteria may become important. Where lifecycle
costs were based on recycling options, it would be important that recovery of materials are
adequately covered. There may also be a requirement to test and analyse products removed from
service to improve future design or new product development, which would have to be specified.
78
Contract Types
• Warranty Improvement Contracts: In many industries manufacturers make the efforts to motivate their
suppliers to improve their reliability and consequently reduce warranty. In recent years there have been more
and more efforts on the part of manufacturers to force their suppliers to pay their ‘fair share’ of the warranty
costs.
• Total Service Contracts: A total service supply contract is one in which the supplier is required to provide
the system, as well as all of the support.
• The purchaser does not specify a quantity of systems, but a level of availability. The supplier must determine and build
the appropriate number of trains, provide all maintenance and other logistic support, including staffing and running the
maintenance depots, spares provisioning, etc.
• A total supply contract places the responsibility and risk aspects of reliability firmly with the supplier, and can therefore be
highly motivating.
• However, there can be long-term disadvantages to the purchaser. The purchaser’s organization can lose the engineering
knowledge that might be important in optimizing trade-offs between engineering and operational aspects, and in planning
future purchases.
• Other risks are that conflicts of interests can arise, leading to sub-optimization and inadequate co-operation. Since the
support contract, once awarded, cannot be practically changed or transferred, the supplier can also be placed in a
monopoly situation.
79
Sub-Contracts and Lower-Level Suppliers
• Lower-level suppliers can have a major influence on the reliability of systems. Continuous globalisation and
outsourcing also affect the amount of work outsourced to lower-level suppliers. It is not uncommon these days to
have suppliers located all over the globe including regions with little knowledge depth regarding design and
manufacturing processes and with a less robust quality system in place.
• Some guidelines to address these risks are:
• Rely on the existing commercial laws that govern trading relationships to provide assurance. In all cases this provides for redress if
products or services fail to achieve the performance specified or implied.
• Do not rely solely on ISO 9000 or similar schemes to provide assurance – these approaches provide no direct assurance of product
or service reliability or quality
• Engineers should manage the selection and purchase of engineering products. It is a common practice for companies to assign this
function to a specialized purchasing organisation. Engineers can quickly be taught sufficient purchasing knowledge, or can be
supported by purchasing experts, but purchasing people cannot be taught the engineering knowledge and experience necessary for
effective selection of engineering components and sub-systems.
• Do not select suppliers on the basis of the price of the item alone - Suppliers and their products must be selected on the basis of
total value to the system, over its expected life (TCO). This includes performance, reliability, support, etc. as well as price.
• Create long-term partnerships with suppliers, rather than seek suppliers on a project-by-project basis/change suppliers for short-term
advantage. It then becomes practicable to share information, rewards and risks, to the long-term benefit of both sides. Suppliers in
such partnerships can teach application details to the system designers, and can respond more effectively to their requirements.
80
Reliability and Quality Management Programs
• The Management of Quality is concerned with getting things right (it thus concerns us all)
and that it is critically important to drive out all possibility of errors - or it may be acceptable
to take delivery of a product or service that is ‘not good enough'.
Statement: In today's “real world” products and services do not “always” meet the required
quality standards because …… we can only produce / deliver according to the capability of
the system we designed and maintained. It therefore does not matter what we do - the
“system” rules.
82
Quality has an impact on any company
• Company’s reputation
• Product liability
• International implications
83
Quality Characteristics of Goods & Services
• Functionality - how well the product or service does the job for which it was intended.
• Appearance - aesthetic appeal, look, feel, sound and smell of the product or service.
• Recovery - the ease with which problems with the product or service can be rectified or
resolved.
84
QA Programs
QA and Reliability
QA disciplines are essential elements of any integrated reliability programme.
85
Standards for Reliability Quality and Safety
• ISO/IEC 60300 (Dependability): ISO/IEC60300 is the international standard for ‘dependability’, which is defined as
covering reliability, maintainability and safety. It describes management and methods related to these aspects of product design
and development. The methods covered include reliability prediction, design analysis, maintenance and support, life cycle costing,
data collection, reliability demonstration tests, and mathematical/statistical techniques; most of these are described in separate
standards within the ISO/IEC60000 series. Manufacturing quality aspects are not included in this standard.
• ISO 9000 (Quality Systems): The international standard for quality systems, IS0 9000, has been developed to provide a
framework for assessing the quality management system which an organization operates in relation to the goods or services
provided. ISO9000 does not specifically address the quality of products and services, nor does it prescribe methods for achieving
quality, such as design analysis, test and quality control. It describes, in very general terms, the system that should be in place to
assure quality.
86
A
Managing Quality – The Challenges
88
Quality Capability Maturity Model & Continuous Improvement
• Represents continual
improvement of process &
customer satisfaction
• Involves all operations &
work units
• Other names applied for
continuous improvement:
• Kaizen (Japanese)
• Zero-defects
• Six sigma
89
Note: Reliability in Manufacturing concepts and methods ares covered in more detail in Lecture 6
Quality Function Deployment
• Quality Function Deployment (QFD) is a technique to identify all of the factors which might affect the ability of
a design or product to satisfy the customer, and the methods and responsibilities necessary to ensure control.
• QFD goes beyond reliability, as it covers aspects such as customer preferences for feel, appearance, and so
on, but it is a useful and systematic way to highlight design and process activities and controls necessary to
ensure reliability.
• QFD begins by a team consisting of the key marketing, design, production, reliability and quality staff working
their way through the project plan or specification, and identifying the features that will require to be controlled,
the control methods applicable, and the responsible people.
• Every aspect of design and production, including analysis, test, production process control, final inspection,
packaging, maintenance, and so on, is systematically evaluated and planned for, always in relation to the
most important product requirements.
• Requirements and features that are not important are shown up as such, and this can be a very important
contribution to cost reduction and reliability improvement.
• Constraints and risks are also identified, as well as resources necessary. At this stage no analysis or detailed
planning is performed, but the methods likely to be applied are identified.
90
The shape of the QFD chart has led to its being called the ‘House of Quality’.
Quality Function Deployment (QFD) – Electrical Motor Example
• QFD makes use of charts which enable the requirements
to be listed, and controls responsibilities, constraints, and
so on, to be tabulated, as they relate to design, analysis,
test, production, etc.
• This shows requirements rated on an importance scale (1–5), and the
design features that can affect them.
• Each feature is in turn rated against its contribution to each requirement,
and a total rating of each feature is derived by multiplying each rating by
the importance value, and adding these values.
• Thus the bearing selection, housing construction, and mounting design
come out as the most critical design features.
• The ‘benchmark’ column is used to rate each requirement, as perceived
by potential customers, against those of competitive products.
• The correlation matrices indicate the extent to which requirements and
features interact: plus sign(s) indicate positive correlation, and minus
negative correlation. For example, magnet material and stator winding
design might interact strongly. The minus signs in the requirements matrix
indicate conflicting requirements.
91
Reliability Reporting – Social Engineering Aspects
“The engineer requires the imagination to visualize the needs of society and to appreciate what is possible
as well as the technological and broad social age understanding to bring his vision to reality”.
Sir Eric Ashby
Efficient Reporting of Reliability
• The importance of converting reliability data into business
information.
• Remember not everyone is a reliability expert! You will most
likely have to put a lot of effort in to convert reliability data into
meaningful business information that can be used for decision
making.
• The importance of training EVERYONE in the business to
understand at least the basic reliability concepts and how it is
measured when it comes to products of the business.
• There may be brilliant ideas to improve reliability but if you
cannot get them across your ideas won’t get anywhere.
93
Social Engineering for the Greater Good
• Understanding some basics about human preferences can assist in more targeted reporting.
Though not THE definitive quantifiers, it can help when constructing reliability reports and engaging
executives.
• Kolbe Profiling
• Quick-Start – The “bottom-line executive”
• FactFinder – The “factual executive”
• FollowThru – The “procedural compliance executive”
• Implementer – The “hands-on executing executive”
• MBTI (Myers-Briggs) Types
• Introvert vs Extrovert – Individual vs Group Reporting
• Sensing vs Intuition – Basic factual information vs interpreted and information where meaning has to be added
• Thinking vs Feeling – Preference for logic in decision making vs. special circumstances/people impact
• Judgemental vs Perceiving – Structured based on information (what is the best option given facts) vs. open to new
information and multiple options (did you maybe consider this?)
94
An Example…..
• A pending equipment failure, but the urgency gets lost in the “technical data
and explanation”!
95
An Example…..
The Quickstart The FactFinder
Executive Summary
Motor Condition Motor Candidates Recommended Action Detailed Analysis
Fault Match Mill 5C (05NM30) Schedule motor replacement of both
Mill 6E (06NM50) motors as soon as possible, based on Several broken rotor bars are suspected. Very notable at this stage
their SFFoI amplitudes (very high, >160 is the significant vibration energy levels around the rotor bar pass
microns) as well as failure progression
curve behaviour. (RBP) fault frequency, as well as the significantly raised “vibration
Extensive Degradation Mill 1B (01NM20) Schedule motor replacement due to the energy noise floor” at the 1st and 2nd RBP frequencies. Multiples of 1
high SFFoI amplitude as well as the x rpm sidebands are visible around the RBP fault frequency,
speed of failure condition degradation
(steep angle of logarithmic curve). indicating the presence of a severe defect. It will also be observed
Increased monitoring to trend failure that some lower level frequency amplitudes are now also more
Mill 1A (01NM10) progression.
Notable Degradation Mill 2B (02NM20) Increased monitoring to trend failure
pronounced and rapidly increasing in amplitude.
Mill 5E (05NM50) progression.
Early Degradation Mill 2A (02NM10) Monitor equipment as per current
Mill 6A (06NM10) program. Increase monitoring if timeline
curve angle changes.
96
Introduction To Reliability and Main Concepts
Reliability Engineering as a Discipline
Reliability Engineering
• Reliability engineering is an engineering field that deals with the study of reliability, i.e. the ability
of a system or component to perform its required functions under stated conditions for a specified
period of time. It is often reported as a probability.
• Reliability may be also defined in several ways:
• The idea that something is fit for a purpose with respect to time;
• The capacity of a device or system to perform as designed;
• The resistance to failure of a device or system;
• The ability of a device or system to perform a required function under stated conditions for a specified period of time;
• The ability of something to "fail well" (i.e. it will fail without catastrophic consequences)
Reliability Availability
• Covers the time to system failure (time based function). • Covers the complete system time (asset lifecycle).
• Deals with reducing the frequency of failures and the impact on • Deals with the total duration (proportion of time) a system is
system safety. available for operation or production (therefore a probability).
• Related to the failure rate or the Mean Time to Failure (MTTF).
99
The Aim of Reliability Engineering
100
Considerations regarding scope of Reliability Engineering
Reliability engineering for "complex systems" requires a different, more elaborate systems approach than for
non-complex systems. Reliability engineering may in that case involve:
• System availability and mission readiness analysis and related reliability and maintenance requirement allocation
• Functional system failure analysis and derived requirements specification
• Inherent (system) Design Reliability Analysis and derived requirements specification for both Hardware and Software design
• System Diagnostics design
• Fault tolerant systems (e.g. by redundancy)
• Predictive and preventive maintenance (e.g. reliability-centred maintenance)
• Human factors / Human interaction / Human errors
• Manufacturing- and Assembly-induced failures (effect on the detected "0-hour Quality" and reliability)
• Maintenance-induced failures
• Transport-induced failures
• Storage-induced failures
• Use (load) studies, component stress analysis, and derived requirements specification
• Software (systematic) failures
• Failure / reliability testing (and derived requirements)
• Field failure monitoring and corrective actions
• Spare parts stocking (availability control)
• Technical documentation, caution and warning analysis
• Data and information acquisition/organisation (creation of a general reliability development Hazard Log and FRACAS system)
101
Maintainability and Reliability
Reliability
Formally defined as the probability that a product, piece of equipment, or system performs
its intended function for a stated period of time under specified operating conditions
• Inherent reliability – predicted by product design
• Achieved reliability – observed during use
Maintainability
The probability that a system or product can be retained in, or one that has failed can be
restored to, operating condition in a specified amount of time.
• Functional failure –failure that occurs at the start of product life due to manufacturing or material detects
• Reliability failure –failure after some period of use
• To implement Preventive Maintenance Strategy you need to know when a system requires service or is likely
to fail. Requires good reporting and record keeping (Maintenance History)
102
Requirements for Effective Reliability Engineering
Effective reliability engineering requires understanding of the basics of failure mechanisms for
which experience, broad engineering skills and good knowledge from many different special fields
of engineering, for example:
• Tribology
• Stress (mechanics)
• Fracture mechanics / Fatigue
• Thermal engineering
• Fluid mechanics / shock-loading engineering
• Electrical engineering
• Chemical engineering (e.g. corrosion)
• Material science
103
Main Reliability Concepts
The Core Elements of Reliability
105
Reliability Functions
106
Reliability Functions (Cont.)
Overall System Reliability
107
Providing Redundancy
• Back up components can be introduced into the design to increase reliability of the system
108
Typically reported Reliability Measures – Failure Rate Function
FAILURE RATE FUNCTION The failure rate function indicates how the number of
failures per unit time of the product changes with time.
This provides a measure of the instantaneous probability
of product failure changes as usage time is
accumulated.
Courtesy: ReliaSoft Corporation. Blueprint for Comprehensive Reliability Program. 2003. 109
Typically reported Reliability Measures – Probability Plotting
Probability plotting was originally a method of graphically
PROBABILITY PLOT
estimating distribution parameter values.
Probability plots have nonlinear scales that will essentially
linearize the distribution function, and allow for assessment of
whether the data set is a good fit for that particular distribution
based on how close the data points come to following the
straight line.
The y-axis usually shows the unreliability or probability of
failure, while the x-axis shows the time or ages of the units.
Specific characteristics of the probability plot will change
based on the type of distribution being considered.
Courtesy: ReliaSoft Corporation. Blueprint for Comprehensive Reliability Program. 2003. 110
Typically reported Reliability Measures – Probability Density Function
Courtesy: ReliaSoft Corporation. Blueprint for Comprehensive Reliability Program. 2003. 111
Typically reported Reliability Measures – Reliability Function
Courtesy: ReliaSoft Corporation. Blueprint for Comprehensive Reliability Program. 2003. 112
Typically reported Reliability Measures – Life vs. Stress
LIFE VS. STRESS The Life vs. Stress plot is a product of accelerated life
testing or reliability testing that is performed at different
stress levels.
This indicates how the life performance of the product
changes at different stress levels.
The gray-shaded areas are actually pdf plots for the
product at different stress levels.
Note that it is difficult to make a complete graphical
comparison of the pdf plots due to the logarithmic scale of
the y-axis.
Courtesy: ReliaSoft Corporation. Blueprint for Comprehensive Reliability Program. 2003. 113
Typically reported Reliability Measures – Likelihood Function
Courtesy: ReliaSoft Corporation. Blueprint for Comprehensive Reliability Program. 2003. 114
Typically reported Reliability Measures – Reliability Importance
Reliability importance is a measure of the relative
RELIABILITY IMPORTANCE
weight of components in a system, with respect to
the system's reliability value.
RELIABILITY GROWTH
Reliability growth is an important component of a
reliability engineering program.
It essentially models the change in a product's reliability
over time and allows for projections on the change in
reliability in the future based on past performance.
It is useful in tracking performance during development
and aids in the allocation of resources.
There are a number of different reliability growth
models available that are suitable to a variety of data
types. The chart shown is a graphical representation of
the logistic reliability growth model.
Courtesy: ReliaSoft Corporation. Blueprint for Comprehensive Reliability Program. 2003. 116
The Bathtub Curve
There are three basic ways in which the
pattern of failures can change with time.
The hazard rate may be decreasing,
increasing or constant.
• Decreasing hazard rates are observed in items Total Hazard Rate “Bathtub”
which become less likely to fail as their survival time
increases. This is often observed in electronic
equipment and parts.
Hazard/Failure Rate
• A constant hazard rate is characteristic of failures
which are caused by the application of loads in
excess of the design strength, at a constant
average rate. For example, overstress failures due
to accidental or transient circuit overload.
• Wear-out failure modes follow an increasing hazard
Externally induced failures
rate.
• The combined effect generates the so-called
Failure of weak items
“bathtub curve”. This shows an initial decreasing
hazard rate or infant mortality period, an
intermediate useful life period and a final wear-out Infant Mortality Useful Life Wear-out
period. Time
• Repairable Items: Reliability is the probability that failure will not occur in the period of interest, when more
than one failure can occur. It can also be expressed as the rate of occurrence of failures (ROCOF), which is
sometimes referred as the failure rate (usually denoted as λ).
• Repairable system reliability can also be characterized by the mean time between failures (MTBF), but only under the particular condition of a
constant failure rate. It is often assumed that failures do occur at a constant rate, in which case the failure rate λ = (MTBF)−1.
• We are also concerned with the availability of repairable items, since repair takes time. Availability is affected by the rate of occurrence of failures
(failure rate) and by maintenance time. Maintenance can be corrective (i.e. repair) or preventive (to reduce the likelihood of failure, e.g. lubrication).
• Availability and the cost of maintaining a system can also be influenced by the way in which the design is partitioned. ‘Modular’ design is used in
many complex products, to ensure that a failure can be corrected by a relatively easy replacement of the defective module, rather than by
replacement of the complete unit.
118
Probability of Failure/Events
• If an event can occur in N equally likely ways, and if the event with attribute A can happen in n of these ways,
then the probability of A occurring is
• If, in an experiment, an event with attribute A occurs n times out of N experiments, then as n becomes large,
the probability of event A approaches n/N, that is,
• The first definition covers cases of equally likely independent events such as rolling a dice.
• The second definition covers typical cases in quality control and reliability. If testing 100 items and finding that
30 of them are defective, a hypothesis would be that that the probability of finding a defective item in future
test are 0.30, or 30 %. The probability of 0.30 of finding a defective item in our next test may be considered as
a degree of belief.
119
Availability Measures
Availability Measures
• Inherent Availability: The steady state availability which considers only the corrective maintenance.
• Achieved Availability: Achieved availability is very similar to inherent availability with the exception that PM
downtimes are also included. Specifically, it is the steady state availability in an ideal support environment (i.e.
readily available tools, spares, personnel, etc.) The achieved availability is sometimes referred to as the
availability seen by the maintenance department (does not include logistic delays, supply delays or
administrative delays).
• Operational Availability: Operational availability is a measure of the ‘real’ average availability over a period
of time in an actual operational environment. It includes all experienced sources of downtime, such as
administrative downtime, logistic downtime, etc.
• Operational Availability
1. Well-trained personnel
• Inherent Availability 2.Adequate resources
3.Ability to establish repair plan and priorities
4.Ability and authority to do material planning
5.Ability to identify the cause of breakdowns
6.Ability to design ways to extend MTBF
122
Importance for Good Life Data
Gathering and Evaluating Life Data
• The accuracy and credibility of any parameter estimations are highly dependent on the quality, accuracy and
completeness of the supplied data. Good Life Data for the product, along with the appropriate model choice,
usually results in good parameter estimations.
• Complete data means that the value of each sample unit is observed or known.
• The first and foremost assumption that must be satisfied is that the collected data, or the sample, are truly
representative of the population of interest.
• Select a lifetime distribution against which to test the data.
• When life data are analysed, all of the units in
the sample set may not have failed (i.e. the
event of interest was not observed) or the exact
times-to-failure of all the units are not known.
(Censored data)
• Generate plots and results that estimate the life
characteristics of the product, such as the
reliability, failure rate, mean life, or any other
appropriate metrics.
124
Censored Reliability Data
When life data are analysed, all of the units in the sample set may not have failed (i.e. the event of interest was not
observed) or the exact times-to-failure of all the units are not known. This type of data is commonly called censored
data. Data can fall in the following censoring schemes:
• Complete Data (Uncensored): Complete data means that the value of
each sample unit is observed or known. Complete data is the ideal state to
have and is much easier to work with than censored data, and basic statistical
analysis techniques assume that the analyst has complete data.
• Right Censored (Suspended): These are data for which we know only
its minimum value. In reliability testing, for example, not all of the tested units
will necessarily fail within the testing period. Then this means that the failure
time exceeds the testing time.
• Interval Censored: These are data for which we know only that they lie
between a certain minimum and maximum. Interval censoring arises
commonly when measurements are assigned into categories, ranges or
intervals. In reliability testing, e.g. the strategy may only be to inspect the
units every T hours, and can thus only record that a unit failed between nT
and (n+1)T hours. This is sometimes called inspection data.
• Left Censored: These are data for which know only its maximum value is
known. In scientific experiments, it may not be able to measure some
quantity because it is below the threshold of detection (e.g. chemical
concentration).
125
Engineering Approach for Best Life Data Distribution
While analysing life data and finding the best statistical model for it, it is important to remember that we
are dealing with failures of a real engineering device and the knowledge and understanding of that
device will also be a factor in determining the likely best distribution. The engineering considerations
should include the following:
• Maturity of the system and its place on the bathtub curve: Maturity of the system will affect the trend for its failure
rate and in the case of the Weibull distribution can be easily characterized by the β-value as presented by the bathtub model
• Type of failures (failure modes and physics of failure): It is very important that the physical nature of failures be
investigated as part of any evaluation of failure data. If the intent is to understand the failure modes in order to make
improvements, all failure modes should be investigated and the distributions analyse separately, since two or more failure
modes may have markedly different parameters, yet generate overall failure data which are fitted by a distribution which is
different to that of any of the underlying ones. Also certain failure modes have known historical β-slopes, which can be
considered as guidelines.
• Sample size and size of the population it represents: Can play a critical part in defining the mathematical model.
Goodness-of-fit tests will nearly always indicate good correlation with any straight line drawn through such points. It is doubly
important for large data sets to carefully consider the engineering aspects of the failures. In addition to that, large data sets,
such as warranty databases, may contain secondary failures, which would require a totally different approach to probability
plotting. Small numbers of actual failures also present challenges with data analysis and plotting.
126
Probability Plotting using Computerised Data Analysis
The most popular two methods are:
• Rank Regression (RR): It requires that a line mathematically be fitted to a set
of data points such that the sum of the squares of the vertical or horizontal
deviations from the points to the line is minimized. One of the advantages of the
rank regression method is that it can provide a good measure for the fit of the line
to the data points. This measure is known as the correlation coefficient.
127
Rank Regression (RR) vs. Maximum Likelihood Estimation (MLE)
• Rank regression methods often produce different distribution parameters than MLE, therefore it is a logical
question to ask which method should be applied with which type of data. Based on various studies regression
generally works best for data sets with smaller (<30) sample sizes (as sample sizes get larger, 30 or more,
these differences become less important) that contain only complete data.
• Failure-only data is best analysed with rank regression on X, as it is preferable to regress in the direction of
uncertainty.
• When heavy or uneven censoring is present and/or when a high proportion of interval data is present, the MLE
method usually provides better results.
• MLE can also provide estimates with one or no observed failures, which rank regression cannot do.
In the case where it is not clear which method would provide more accurate results, it is advisable to run both
methods and compare the results. The following scenarios are possible:
• The RR and MLE results do not differ much.
• The results differ and one method might provide unreasonable values of β- (too high or too low).
• The results differ and one method provides the values of β which do not fit the model IFR vs. DFR (See next slide)
133
Requirements Interpretation
• The interpretation of requirements should include analysis of those conditions and constraints that are
typical for the intended use of the product, and that may affect its reliability, including:
• operation and maintenance conditions, including, for example, application types and durations;
• identification of load and duty cycles imposed on the product during the intended use;
• determination of environmental and operational conditions experienced by each assembly and sub-assembly of
the product during each phase of use and during maintenance and support activities (including storage,
transportation, etc.);
• determination of the effects of manufacturing, testing, storage, packaging, transportation, handling and
maintenance.
• Constraints caused by issues like the maintenance policy, personnel skill level, etc. should be identified
and changes recommended, if appropriate.
• Any agreed interpretation of the provisions of the requirements specification should be formally
documented and attached to the reliability specification. This is particularly important where
requirements can be conflicting in nature (between different stakeholder expectations) or where there is
potential for misinterpretation. In such cases the requirements MUST be agreed on and signed off
before any work commences.
134
Functional Requirements
• Functional requirements define specific behaviour
or functions.
• A functional requirement defines a function of a
system or its component.
• A function is described as a set of inputs, the
behaviour, and outputs.
• Used extensively in system and software
development (documented as “Functional
Requirements Specifications”)
135
Non-Functional Requirements
• Non-functional requirements
cover all the remaining
requirements which are not
covered by the functional
requirements.
136
Reliability Requirements – Compliance Demonstration
The following are standard methods of test and analysis which are used to demonstrate compliance
with Repairable Equipment reliability requirements. The standards should be seen as
complementary to the statistical engineering methods and useful (or mandatory) for demonstrating
and monitoring reliability of products which are into or past the development phase.
• Probability Ratio Sequential Test (PRST) (US MIL-HDBK-781), This testing is based on probability
ratio sequential testing (PRST). PRST is based on the assumption of a constant failure rate. The decision
risks are based upon the risks that the estimated MTBF will not be more than the upper test MTBF (for
rejection), or not less than the lower test MTBF (for acceptance).
• MIL-HDBK-781 Test plans: The risk of a good equipment (or batch) being rejected is called the producer’s
risk and is denoted by α. The risk of a bad equipment (or batch) being accepted is called the consumer’s
risk, denoted by β. test plans are based upon the assumption of a constant failure rate, so MTBF is used as
the reliability index.
• Operating Characteristic Curves and Expected Test Time Curves: An operating characteristic (OC)
curve can be derived for any sequential test plan to show the probability of acceptance (or rejection) for
different values of true MTBF.
Best practice dictates that a configuration management system should be established that provides a systematic process for controlling, monitoring and
documenting the product design and subsequent changes to the product and its maintenance support. This facilitate compliance demonstration as well 137
as Requirements trace-ability management.
Reliability within the Requirements Management Process
Reliability Requirements Management
By defining requirements, it becomes possible to define documentation that verifies that the input requirements
that will support the corresponding qualification outputs.
In most cases, reliability techniques will be used to perform the qualification and to prove “reliability growth” of
the product or system being developed. Documents that define requirements would be the:
• Design Specification (DS). Define the configuration of the
process with detailed drawings/databases.
• User Requirements Specification (URS). What the process
must do. (The “What”)
• Sequence of Events (SOE) Chart. When each assembly step
occurs (or process/control functionality). (The When”)
• Trace Matrix. Where each design requirement is tested and
verified; this tool directly relates each requirement to the that will
confirm the requirement and functionality was met. (The “Where”)
• Functional Requirements Specification (FRS). How the
process is to operate and how the functionality is to be tested. (The
“How”)
• Process Failure Modes and Effects Analysis
(pFMEA). Proving that the right product was built in the right way -
input document is the product design FMEA (dFMEA). (The “Why”)
139
Ongoing Reliability Requirements Management
140
Reliability Modelling and Prediction
Purpose of Reliability Prediction
• Irrespective of the method selected, reliability predictions can only be estimates since even the
most recent data are inevitably out of date.
142
Benefits of Reliability Prediction
• Depending upon the product and its market, advance knowledge of reliability would allow accurate forecasts
to be made of support costs, spares requirements, warranty costs, marketability, and so on.
• Reliability prediction can rarely be made with high accuracy or confidence. Nevertheless, even a tentative
estimate can provide a basis for forecasting of dependent factors such as life cycle costs.
• Reliability predictions can be used as an effective tool in a comparative analysis. When choosing between
different design alternatives, the inherent reliability of each design obtained through reliability prediction
analysis could be used as a critical decision making factor.
• Reliability prediction can also be valuable as part of the study and design processes, for comparing options
and for highlighting critical reliability features of designs.
• If a new engineered system is being planned, which will supersede an existing system, and the reliability of
the existing system is known, then its reliability could reasonably be used as a starting point for predicting the
likely reliability of the new system.
• The common approach to predicting reliability is to estimate the contributions of each part, and work
upwards to the overall product or system level.
143
Methods of Reliability Prediction
There are five main methods that can be used to predict the reliability of a product:
• System models: A range of models including computer models are commercially available for estimating
the reliability of large, complex systems. They include parametric cost and rapid prototyping models.
• Similar equipment method: In the similar equipment method, the proposed design is compared with similar
designs for which the achieved reliability is known. The estimated reliability for the new design is then
changed to take into account any differences between the designs and any areas of risk.
• Extrapolation of reliability data from tests and trials: When a product is undergoing development, it may
be possible to use data from testing or other activities to develop reliability growth models to predict the
reliability as described.
• Reliability modelling: RBDs, FTA and Markov techniques can be used to estimate reliability. These
methods address redundancy and the effects of multiple failures.
• Generic parts methods (parts count and parts stress): Parts count analysis is applicable in the early
design phase of a product when a parts list becomes available. This technique involves multiplying the base
failure rate for each component by the appropriate environmental, quality and complexity factors, and then
summing to give a total failure rate for the product.
144
Physics of Failure (PoF)
The objective of physics-of-failure (PoF) analysis is to predict when a specific end-of-life failure
mechanism will occur for an individual component or interconnect in a specific application.
• A physics-of-failure prediction looks at each individual
failure mechanism such as metal fatigue, electro-
migration, solder joint cracking, wire-bond adhesion,
etc., to estimate the probability of component failure
within the expected life of the product
• In contrast to empirical reliability prediction methods
based on historical failure data, this analysis requires
detailed knowledge of all material characteristics,
geometries and environmental conditions.
• The calculations involve understanding of the stresses
applied to the part, types of failure mechanisms they
would be causing and the appropriate model to calculate
the expected life to a failure caused by the particular
failure mechanism in question.
145
Physics of Failure (PoF) – Cont.
• PoF analysis provides much needed insights into the failure risks and mechanics that lead to them (especially
when actual test data is not available yet).
• PoF utilizes knowledge of life-cycle load profile, package architecture, material properties, relevant geometry,
processes, technologies, etc. to identify potential Key Process Indicator Variables (KPIVs) for failure
mechanisms.
• It can also be used to identify design margins and failure prevention actions as well as to focus reliability
testing.
Advantages Disadvantages
• The advantage of the physics-of-failure approach is that fairly accurate • The disadvantage is that this method requires knowledge of the component
predictions using known failure mechanisms can be performed to determine manufacturer’s materials, processes, design, and other data, not all of which
the wear-out point. may be available at the early design stage.
• PoF methods address the potential failure mechanism and the stresses on the • In addition, the actual calculations and analysis are complicated and
product; therefore it is more specific to the product design, its applications and
sometimes costly activities requiring a lot of information and a high level of
is expected to be more accurate than other types of reliability prediction.
analytical expertise.
• Additional criticism includes the difficulty to address the entire system, since
most of the analysis is done on a component or sub-assembly level.
146
Standards Based Reliability Prediction
• Standards based reliability prediction is a methodology based on failure rate estimates published in
globally recognized standards, both military and commercial.
• In some cases manufacturers are obliged by their customers or by contractual clauses to perform
reliability prediction based on published standards.
• A typical standards based reliability prediction treats devices as serial, meaning that one component failure causes a
failure of a whole system.
• The other key assumption is a constant failure rate, which is modelled by the exponential distribution.
• This generally represents the useful life of a component where failures are considered random events
(i.e. no wear-out or early failures problem).
• Reliability data can be useful in specific prediction applications, such as aircraft, petrochemical
plant, computers or automobiles, when the data are derived from the area of application.
However, such data should not be transferred from one application area to another without
careful assessment. Even within the application area they should be used with care, since even
then conditions can vary widely.
Standards-Based Predictions
The commonly used standards include MIL-HDBK-217, Bellcore/Telcordia (SR-332), NSWC-06/LE10, China 299B (GJB/z 299B), IEC
147
62380 (RDF 2000/UTEC80810) and IEEE Standard 1413. The typical analysis methods used by these standards include parts count
and parts stress analysis methods. It is most applicable during very early design or proposal phases of a project.
Other Reliability Prediction Methods
Advantages Disadvantages
The definite advantage of this method is that results of the analysis are specific However, the drawback is the absence of common comparison criteria for a
to the company’s products, manufacturing processes and applications. Those manufacturer to conduct supplier benchmarking.
predictions typically produce more accurate results than those coming from the
generic databases.
148
Other Reliability Prediction Methods
149
Other Reliability Prediction Methods
“Top Down” Approach:
• It is often necessary to predict the likely reliability of a new system. It is possible to make reasonably credible
reliability predictions, without using the kinds of models described above, for systems under certain
circumstances. These are:
1. The system is similar to systems developed, built and used previously, so that we can apply our experience of what happened before.
2. The new system does not involve significant technological risk (this follows from 1).
3. The system will be manufactured in large quantities, or is complex (i.e. contains many parts, or the parts are complex) or will be used
over a long time, or a combination of these conditions applies, that is there is an asymptotic property.
4. There is a strong commitment to the achievement of the reliability predicted.
• Where no great changes from past practice are involved, technological risks are low, they will be built in large
quantities and they are quite complex, and the system must compete with established, reliable products.
• Such reliability predictions (in the sense of a reasonable expectation) could be made without recourse to
statistical or empirical mathematical models at the level of individual parts. Rather, they could be based upon
knowledge of past performance at the system level, the possible effects of changes, and on management
targets and priorities. This is a ‘top down’ prediction.
150
Fundamental Limitations of Reliability Prediction
• For a mathematical model to be accepted as a basis for scientific prediction, it must be based upon
a theory which explains the relationship. It is also necessary for the model to be based upon
unambiguous definitions of the parameters used.
• Also, failure, or the absence of failure, is heavily dependent upon human actions and perceptions.
This is never true of laws of nature. This represents a fundamental limitation of the concept of
reliability prediction using mathematical models.
• Reliability can vary by orders of magnitude with small changes in load and strength distributions,
and the large amount of uncertainty inherent in estimating reliability from the load-strength model.
• Another serious limitation arises from the fact that reliability models are usually based upon
statistical analysis of past data.
• A statistically-derived relationship can never by itself be proof of a causal connection or even
establish a theory. It must be supported by theory based upon an understanding of the cause-and-
effect relationship.
Reliability Predictions
Predictions of reliability can seldom be considered as better than rough estimates, and that achieved reliability can be considerably 151
155
RAM Analysis
Reliability Growth
CaseMonitoring
Study
Reliability Growth
• Reliability growth is maximized by deliberate and aggressive stress-testing, analysis and corrective
action.
• The objective should be to stimulate failures during development testing and not to devise tests and
failure-reporting methods whose sole objective is to maximize the chances of demonstrating that a
specification has been satisfied.
• Reliability growth programmes are known as test, analyse and fix (TAAF Initiatives). For Effective
TAAF Programs:
• All failures should be analysed fully, and action taken in design or production to ensure that they should not recur.
• No failure should be dismissed as being ‘random’ or ‘non-relevant’ during this stage, unless it can be demonstrated
conclusively that such a failure cannot occur on production units in service.
• Corrective action must be taken as soon as possible on all units in the development programme. This might mean that
designs have to be altered more often, and can cause programme delays. However, if faults are not corrected reliability
growth will be delayed, potential failure modes at the ‘next weakest link’ may not be highlighted, and the effectiveness of
the corrective action will not be adequately tested.
• Action on failures should be based on a disciplined FRACAS. Whenever failures occur, the investigation should refer back
to the reliability predictions, stress analyses and FMECAs to determine if the analyses were correct.
157
Business Improvement
Competitiveness
It is only after evaluation of an organisations business process that organisations stay competitive. In order to
stay competitive, Harrington (1991) provides very good reasons why organisations should consider improving
business processes. These are:
159
Reliability Growth Monitoring Methods
The Duane Method:
Duane reliability growth.
• It is common for new products to be less reliable during early development than
later in the programme, when improvements have been incorporated as a result of
failures observed and corrected. Similarly, products in service often display
reliability growth. The Duane method is applicable to a population with a number of
failure modes which are progressively corrected, and in which a number of items
contribute different running times to the total time.
• The empirical Duane method provides a reasonable approach to monitoring and
planning MTBF growth for complex systems. The Duane method can also be used
in principle to assess the amount of test time required to attain a target MTBF.
• The α value chosen must be related to the expected effectiveness of the reliability improvement programme. The following may
be used as a guide:
• α = 0.4 − 0.6. Programme dedicated to the elimination of failure modes as a top priority. Immediate analysis and effective corrective action for all failures.
• α = 0.3 − 0.4. Priority attention to reliability improvement. Well-managed analysis and corrective action for important failure modes.
• α = 0.2. Routine attention to reliability improvement. Corrective action taken for important failure modes.
• α = 0.2–0. No priority given to reliability improvement. Failure data not analysed. Corrective action taken for important failure modes, but with low priority.
160
Reliability Growth Monitoring Methods (Cont.) 1
162
Reliability Growth of Products in Service
• In-service reliability growth is more difficult to achieve than during the development phase. This is
because of the following issues:
• Failure data are often more difficult to obtain.
• It is much more difficult and much more expensive to modify delivered equipment or to make changes once
production has started
• A product’s reputation is made by its early performance. Reliance on reliability growth in use can be very
expensive in terms of warranty costs, reputation and markets.
• Despite the challenges, it is still important that reliability data be collected and analysed, and
improvements designed and implemented.
• Another source of reliability data is that from production test and inspection. Whilst data from
production test and inspection are collected primarily to monitor production quality costs and
vendor performance, they can be a useful supplement to in-use reliability data.
163
BPCM and Total Quality and Performance
Q&A