| |
UNDERSTANDING RELIABILITY THROUGH ANALYSIS OF EVENT HISTORIES – A CASE STUDY FROM THE AEROSPACE INDUSTRY |
|
Lesley Walls, Jeff Jones, Ian James, Jane Marshall | |
Abstract. Building a history of the events incurred during the life of equipment permits analysis of the effects of factors on performance. In turn, the findings of such analysis can aid understanding of reliability and can inform decision-making. We focus on two types of event history analysis. Firstly, retrospective analysis of equipment performance in order to support actions to mitigate any shortfalls in reliability identified or to provide a basis for further exploration of factors that have resulted in enhanced performance. Secondly, prospective analysis so that new designs can benefit from the lessons learnt from experience of existing equipment. The process and findings of in-service event history data analysis are explored through a case study for electronic aerospace equipment. We discuss how data requirements are set to answer the questions posed about reliability, the acquisition of relevant data, the database management process and the selection of analysis tools. The usefulness of new data analysis tools, including neural nets and bootstrapping techniques, are evaluated. The challenges of accessing, managing and analysing messy in-service data are explored. Introduction. Changes in contractual agreements within the aerospace industry are driving the specification of reliability requirements. For example, the move to performance indicators such as ‘power by the hour’ in the commercial sector and ‘failure free operating periods’ in the militarysector demand suppliers improve the reliability and availability of their products. Further, the introduction of the reliability case (DEF-STN 00-42 part 3) is changing the process by which the assessment of reliability performance is conducted and reported. Gone is the ‘tick the box’ syndrome where reliability techniques were applied to satisfy customer requirements. Instead the culture is changing to one where suppliers need to identify and use tools that are likely to lead to greatest added-value in terms of reliability performance. Thus it is no longer sufficient to state that certain techniques were applied: it is now necessary to give details, conclusions and recommendations about the impact of actions on reliability that can be justified through analysis and sound argument. This paper describes work carried out in a collaborative project funded by the UK DTI (CARAD programme) called Reliability Enhancement Methodology and Modelling (REMM). The partners include TRW Aeronautical Systems, BAE SYSTEMS, Smiths Industries, Warwick Manufacturing Group, University of Strathclyde, Loughborough University and the RAF. REMM aims to develop a methodology and modelling framework that will facilitate building a reliability case by designing-in reliability. Further details about the REMM project and the modelling framework can be found in Marshall et al (2000) and Walls and Quigley (2000). In-service data is very important in supporting the REMM model and in providing information about lessons learnt from previous experience that will contribute to the reliability case. In this paper we focus on the aspects of REMM that relate to the identification, collection, storage, processing, analysis and interpretation of such in-service data. Our aims are to: use in-service data from related products to provide the basis for modelling the reliability of new variant designs; use in-service data to monitor the reliability of products through their life-cycle to provide a basis for taking action with respect to that product if necessary and to provide a feed-back mechanism that will be used to validate the estimates and information produced by the REMM model. This paper briefly describes the database set up to capture data, including in-service data, to support REMM activities. The tools proposed for data analysis will be reviewed. Remm database and in-servise data used. REMM provides a database structure that any company can use to store its own data. The REMM database does not provide the data. There is no reason why companies cannot pool their data. To date, the database specification has been agreed by the companies within the REMM consortium, although the pilot database has been populated by data from TRW. Full details about the REMM database can be found in Jones and Marshall (2000). Here we provide a summary of the key features. Two types of information are required for REMM – population and event history data. The population data describes the way in which systems are structured in terms of, for example, line replaceable units, modules, boards and components. The main tables include: design drivers – characteristics of the system used to search for related products during concept design (e.g. system use, function, technology, integration, architecture, location etc); system identification – contains system part numbers, details of system manufacturer, data supplier, system description; it does not contain serial numbers – they are included event table; system structure – contains same details for boards and other sub-systems as stored for system. Events can correspond to any actions performed on the equipment or any happening that may effect or contain information about the reliability of the equipment. Thus events include in-service failures, performance test failures and successful passes through manufacture. The first system event is where the system is defined in terms of its sub-systems or components. This is where a serial number tends to be allocated to a system. The last system event is where the system is removed from service. Every process, failure occurrence and repair, are system events. Events also occur at sub-system level. Further events can occur at component level. It is also important to collect data about those factors that might impact reliability. For example, design rules, build standards, usage environment, and other operational factors. In summary the data stored in the event database includes: a description of the event – e.g. module insertion, service removal, confirmed failure, fault not found, software upgrade, disposal; a description of active item – e.g. item which has failed or been replaced; description of passive item - e.g. item that may be impacted by intervention affecting a related active item; time metric – e.g. operating time, powered time, calendar time, number of cycles, number of operations; reason for action – e.g. failure modes and causes; other data - e.g. surety of the data. Most companies have a resource where most of the data required is already stored. It is often distributed between different databases and even paper systems. Much of the structural data can be extracted from parts lists and diagrams, while event data is stored in customer support, sales and reliability databases. Amongst the techniques used in data analysis are: Kaplan-Meier nonparametric estimation of reliability function (Wolstenholme 1999); comparison tests of the equality or otherwise of reliability functions (Meeker and Escobar 1998); proportional hazards modelling of impact of factors on hazard rate (Meeker and Escobar 1998, Wolstenholme 1999); mean cumulative function analysis (Lawless and Nadeau 1995, Meeker and Escobar 1998); failure intensity modelling, including M(t) analysis (Moltoft 1994). These techniques were implemented using a variety of software packages and specially developed macros. For example, Excel, SPSS, Minitab and NeuroSolutions software. Selected analysis of analysis of in-service data. The analysis has been conducted using a mix of the aforementioned techniques to answer questions such as those listed below. How does reliability change during the life of the unit type – by population (fleet) and individual (unit)? For both fleet and unit reliability there are clear patterns whereby entry into service reliability is low but increases quite rapidly before stabilising to an acceptable level. What factors appear to have most impact (negative or positive) on reliability levels? Consistent with engineering judgement, manufacturing defects appear to have a big effect in reducing reliability. Do different operators of a unit type have the same reliability? Different operators show different reliability levels. The primary purpose of this study has been to test the REMM process of data acquisition, processing and analysis with a view to identifying areas in need of refinement. The main data used relates to only selected operators and is typically more detailed than the norm for in-service data. In future it is intended that overall data quality and quantity should improve as users complete the specified fields within the REMM database. To date, data processing has required transfer of data between databases. In future such non-added value tasks should be eliminated as data is input directly to the REMM database or to a system with which REMM interacts. The data analysis conducted has shown consistency in the major findings when different techniques have been applied to explore the same questions. This is promising as interesting patterns were highlighted by both conventional and more novel approaches. Data analysis did reveal some patterns consistent with engineering expectations and also showed insight into issues of concern but not previously investigated. Thus the data analysis proposed might contribute to retrospective analysis or routine monitoring of units. The data analysis has provided base reliability functions that are being used in REMM modelling for a new product currently in design that is variant of ones discussed here and so is contributing to prospective analysis. |
Íàçàä |