introduction and summary
in today's competitive environment, the electronics industry is focusing on business process re-engineering or product improvements with emphasis on development and manufacturing intervals and costs, warranty costs, field reliability, and customer maintenance costs. some of the improvement methods used are cross-functional teams, concurrent engineering, simulation, six-sigma and robust design analysis, environmental stress testing (est), and many others. this paper describes the est method and its implementation in development and manufacturing processes.
est is an effective method for improving product reliability. electronic products typically have potential latent defects or weaknesses, which can cause failures during field operation. est is the application of stress tests at levels beyond design limits, to turn potential latent defects into failures; failure mode analysis (fma) is used to identify the root causes, and take corrective action. est is used during development to identify potential design margin weaknesses. in manufacturing, est is used to identify potential process and component latent defects, or to screen for specific failure mechanisms. the primary objective of est is to identify and correct any potential latent defects quickly, while minimizing the overall product cost. development and manufacturing groups, suppliers, and other equipment manufacturers (oems) may use and benefit from est.
background - examples
in the 1960s, space program suppliers used environmental stress screening (ess). production volume was small, hence, there was no effort to perform fma and identify the root causes. stress tests were applied at levels that simulated field operation and did not exceed the design limits.
the objective of ess was to screen the bad from the good product. in the 1970s, the ess approach was adopted by the military defense industry. military suppliers were required to perform ess on 100% of production. hence, ess was expensive. for details about ess, see reference .
in the 1980s, major electronic companies improved ess and created the est method. this method is used for improvements during development and manufacturing. est incorporates the use of fma and corrective action. the objective is to produce a robust and reliable product with lower overall cost. commercial companies such as at&t, boeing, hewlett packard and ibm, have been using est to improve their design and manufacturing processes, reduce warranty repairs, improve field reliability, and achieve significant cost savings. examples about est are presented in references [2-6].
an est program includes processes for development and manufacturing, suppliers, oems, and fma and corrective action. est processes are based on customer, market and internal requirements. an fma and corrective action process is needed for a timely identification of the failure mechanism(s) and correction. emphasis is placed on integrating the est processes in the development and manufacturing processes, and eliminating any duplication of tests.
the success of an est program implementation depends heavily on management's commitment and support. if that commitment exists, the manager will allocate the resources. a cross-functional team approach may be used. an est team is formed with representatives from development, manufacturing, component, qc/qa, and field repair groups. a team leader is responsible for leading and managing the est program. an fma coordinator oversees the fma activities. the team must be innovative, persistent, and willing to try novel methods. a change control (cc) system is recommended to document all the verified problems discovered due to est. a modification request (mr) should be entered in the cc system for each est failure. the use of such a system provides the visibility needed for the est failures and the timely corrective action.
certainly, every est team will discover problems. in either case, decisions must be made about stopping production, problem correction and customer shipments. the team should thoroughly analyze each situation separately. emphasis is placed on the magnitude of the defect, timely problem correction, warranty costs, and customer shipments. based on results, the manager can make a decision that will not jeopardize customer relationships and product sales.
stress tests and conditions
typical stress tests are, temperature, voltage and clock variation, temperature cycling, power cycling, humidity, thermal shock, random vibration, electro-static discharge (esd), emi susceptibility, and product specific tests. each stress test has the ability to stimulate different latent defects. for example, temperature is used to detect design margin flaws and component defects; temperature cycling is used to detect interconnection and packaging defects. stress tests should be combined and applied simultaneously, while the product is powered and monitored.
est may be applied on as few as 3-5 models. stress levels are increased in step increments and testing is continued until the product can operate at levels at least 20-30% higher than the required limits. for example, if the product must operate up to 50°c, it should be able to operate at least at 60°c. if the product fails, it should be repaired and testing continued until most models can operate at the desired stress levels. significant drawbacks are the availability of models, the ability to diagnose problems at high stress levels, and fma support by suppliers.
when est is used on production units to screen a specific failure mechanism, the stress tests should be applied above the required customer limits, but below the operating limits. a proof of screen should be developed to specify the appropriate stress levels for production units.
est may be applied on design models. the objective is to identify and correct any potential design margin weaknesses prior to initial production. stress tests are selected based on customer requirements, component technologies used, and field reliability of similar products. field repair, fma and failure mechanism data provide excellent input to the selection of the stress tests. at a minimum, the est process should include the stress tests listed above.
drawbacks of using est on design models are incomplete hardware and software, unavailable system tests and diagnostics, tight development schedules with no allowance for a 2nd iteration, and limited resources. the est team needs to work with development groups to implement as much of the est process as possible at the card, sub-system and system level. any incomplete stress tests should be included in manufacturing est.
a manufacturing est process complements the development est process. this process should be used on initial production models to identify and correct any potential manufacturing process and component defects. this process may also be used periodically on a few production units to thoroughly check for any shifts in the design margins, manufacturing processes, or component quality. such units should not be shipped, since these stress tests may be destructive. est and field failure mechanism data is used to select the stress tests.
initial production should be subjected to est. as sufficient data indicates that no potential failure mechanisms are present, perform est on a sample basis (sampling est) or no est may be initiated. sampling est requires a sampling scheme and a decision process to switch to a 100% production est or no est. the decision process is based on the factory and any available field data. the stress tests are selected based on the observed failure mechanisms and performed at non-destructive levels. it is essential that when est is used on production units, the stress levels are severe enough to precipitate the latent defects, but without causing damage to good units.
manufacturing est requires a commitment for fma. if this effort is not supported, est becomes a 100% production screen which is usually expensive and unnecessary for mature products. if 100% production est is a customer requirement, the factory and field data should be used to demonstrate the benefits of sampling est and the potential cost savings to the customer.
suppliers and oems
an est program should address components, modules, cards, and sub-systems purchased from suppliers and oems. minimum requirements and design limits should be specified for every part purchased based on the product requirements. est should be incorporated in the suppliers' processes. at a minimum, suppliers should use est to demonstrate that their product meets or exceeds the specified limits. for example, if a card must pass an 85°c temperature test, all component and modules used on that card must meet this requirement. such specifications should be included in the supplier contract agreement.
failure mode analysis
an integral part of an est program is a strong fma effort. every est failure should be subjected to fma down to the lowest actionable level, to identify any failure mechanisms. corrective action and any improvements should be implemented and verified. the impact of timely improvements is enormous for large volume products. attention should be paid to soft failures, i.e., units that fail at a certain stress level but recover at a lower stress level. such failures are difficult to diagnose and time consuming. in addition, it is a challenge to get suppliers to perform fma on units that have failed above their design limits. a failure reporting, analysis, and corrective action system (fracas) is recommended to keep track and save all the data. similarly, fma should be performed on manufacturing defects and field indicted units to identify related failure mechanisms. such information is essential for the continuous improvement of est.
data collection and analysis
est data should be collected and analyzed. this includes failure, test, repair and fma data, failure mechanism(s), and any corrective action. regarding manufacturing data, emphasis should be placed on card and system test defects, components indicted and their associated failure mechanisms. for field data, emphasis should be placed on field failures, components indicted and their associated failure mechanisms.
failure mechanisms associated with est, manufacturing, and field defects should be compared and appropriate corrective action taken. for example, if field and est failure mechanisms are the same, the stress tests and conditions must be modified, so that these latent defects are caught during manufacturing est and not during field operation. the data may also be used to compare different est programs.
est costs should be evaluated and compared with potential cost savings. it is assumed that products with latent defects fail during est and are not shipped to customers. furthermore, if est is not performed, such products will fail during field operation. typically, est costs include cost of models, resources, facilities, floor space, repair of est defects, and fma support. cost savings are due to reduced manufacturing, handling and shipping, warranty and repair costs. in addition, the impact on customer maintenance costs should be considered.
a spreadsheet program may be used to perform what-if analyses and to evaluate the full impact of est prior to making any decisions. factors, such as production volume and product cost, plus warranty repair policy should be considered. for example, if low cost circuit cards are not repaired but replaced with a working unit, est will not generate any cost savings due to a reduction in warranty repairs.
conclusions - next step
est is an excellent method to improve development and manufacturing processes and costs, field reliability, and customer maintenance costs. a cross-functional team is recommended to achieve implementation efficiencies. strong fma effort is also required to achieve the full benefits of est. finally, the success of est is solely rested on management's commitment and support. future work includes the integration of simulation, robust design or six sigma analysis and est.
harry i. saraidaridis, ph.d.
at&t bell laboratories
north andover, ma
 environmental stress screening guidelines for assemblies, institute of environmental sciences, march 1990.
 g.k. hobbs, halt and hass, course notes, october 1993.
 c.shinner, the board electronic strife test (best) program, american society of quality control reliability review, june 1988, pp. 3-6.
 r.w. deppe and e.o. minor, reliability enhancement testing, 1994 proceedings annual reliability and maintainability symposium, january 1994, pp. 91-98.
 t.p. parker, a study of failures identified during board level environmental stress testing, ieee transactions oncomponents, hybrids, and manufacturing technology,vol. 15, no. 6, december 1992, pp.1086-1092.
 h.a. chan, p.j. englert, m.a. oien and s. rajaram, environmental stress testing, at&t technical journal, march/april 1994, pp. 77-85.