convection air cooling is still the most commonly used method of cooling microelectronics. in order to deliver air cooled computer equipment with higher reliability, we need to focus on the life expectancy of the air moving devices (amds). however, this is not a trivial exercise because there are so many variables and there is no industry standard for amd life test procedures.
therefore, it may not be meaningful to compare life expectancy information from different amd vendors. we have first to understand the vendor's fan failure definition and then to consider life test procedure, and the fan life expectancy calculations. the purpose of this article is to explain these topics and to encourage the standardization of fan life evaluation, as outlined in [1].
definition of fan failure
fans may fail in several ways and failure may be defined differently depending upon the applications. fan failures typically include excessive vibration, noise, rubbing or hitting of the propeller, reduction in rotational speed, locked rotor, failure to start, etc table 1 lists the failure criteria that different vendors are using in their amd life tests.

vendor a 
vendor b 
vendor c 
vendor d 
vendor e 
rotational speed, n 
< 0.9 n _{initial} 
< 0.8 n _{nominal} 
< 0.9 n _{initial} 
0 rpm 
< 0.7 n _{initial} 
running current, i 
< i _{maximum} 
< 1.2 i _{nominal} 
n/a 
n/a 
n/a 
acoustic nose 
+ 3dba 
n/a 
+3 dba 
+ 5 dba 
n/a 
table 1. failure criteria.
it is worth mentioning that no amd stops moving air because of increased noise. the increased noise is a result of a bearing failure and the bearing failure is usually caused by the loss of lubricant, which leads to wear in the bearing.
in addition, the capacitor may fail in ac amds and the electronics may contribute to early failures in dc amds. failure criteria in life tests can also include a change in coastdown time or start time to reach full speed. problems with coil insulation breakdown or failures of that type can be classified as workmanship problems or an outofcontrol manufacturing process.
reliability concepts
fan reliability can be measured in several ways. the data for a life test, can be plotted as a cumulative distribution which shows the total fraction of fans failing up to any operating time. a sample cumulative distribution is plotted in fig. 1 for a vendor's test which was stopped at 8,400 hours after 18 out of 48 fans had failed [1].
figure 1: sample cumulative distribution function, weibull vs. empirical with 95% confidence bands.
a few amd vendors provide prospective customers with reliability information based on the exponential assumption. however, life test data, such as shown in fig. 1, does not support the use of the exponential distribution. past experimentation and model fitting has shown that the weibull distribution provides a good fit to fan life data, because it accurately represents wearout phenomena. therefore, the use of the exponential distribution is misleading, because it distorts the data and ignores the wearout of the amds.
for the weibull distribution, the cumulative distribution
function (cdf), a function of age t, is given by [2]
f(t)=1e^{(t/α)ß}[1]
where α is the characteristic life and β is the shape parameter. shape parameters for weibull models fit to fan life are generally greater than 1, which means that a fan's failure tendency increases with age (wearout). the reliability function equals 1  f, which at any age t represents the proportion of survivors from the original population. the weibull hazard rate (also known as the failure rate or hazard function) is given by
[2]
two metrics of fan reliability commonly quoted by vendors are the l_{2} life and l_{10} life, which are the second and tenth percentiles under some assumed fan life distribution such as the weibull. since f(t) = 0.1 at l_{10} and 0.02 at l_{2} in equation (1), we get:
l_{10} = α(0.10536)^{1/ß} l_{2} = α(0.02020)^{1/ß} [3]
for example, given α=100 kpoh and ß=1.5, l_{2}=7, 418 hours represents the age at which 98% of the population is expected to still be operating. the advantage of specifying an l_{2} life in place of l_{10} life is that the desired early life failure distribution is more tightly specified.
sometimes vendors will also quote the mean time to failure (mttf). for the weibull distribution,
mttf = αγ(1+1/ß) [4]
where γ denotes the gamma function.
it is worth mentioning that the mttf is often confused with the mean time between failures (mtbf). the mtbf should only be used in a repairable systems setting. if a machine has ten fans in it, and any failed fan is promptly replaced, then the mtbf may be used to understand the system's maintenance needs and service cost. but since the underlying hazard rate of the fans is not constant, computing the mtbf of a multiplefan system is quite difficult. instead, system reliability issues often are settled by inputting a onenumber hazard rate for the individual fans, in which case the average hazard rate may be appropriate [1].
fan life estimation
the life of most fans is limited by the bearings. electronics, even in dc fans, play a secondary role. bearing life is generally limited by the grease life, which is primarily a function of temperature. grease life is affected by the type of grease, percentage of grease fill, operating environment, load, and bearing design. the booser grease life equation is based on grease life tests on electric motor bearings, but it holds true for any rollingelement bearing. the equation for the bearing grease life in the application is [3]
logl_{10}=2.6 +(k_{t}/t_{brg}) 0.301s [5]
where
s=s_{g}+s_{n}+s_{p}
s_{n}=0.86dn/(dn)_{l}
s_{p}=0.61dnp/c^{2}_{r}
p 
equivalent dynamic bearing load, lbf 
n 
speed, rpm 
cr 
basic dynamic load capacity, lbf 
d 
bore diameter, mm 
(dn)_{l} 
speed limit, rpmmm 
s 
halflife subtraction factor; for s = 1, the life drops 50% 
s_{g} 
grease halflife subtraction factor, typically 0 for many greases 
s_{n} 
speed halflife subtraction factor 
s_{p} 
load halflife subtraction factor 
k_{t} 
grease temperature factor = 2450 for acceleration factor of 1.5 for each 10°c 
t_{brg} 
bearing temperature, k 
this equation, however, does not account for the effect of grease quantity and may not cover all greases on the market, particularly modern greases which use synthetic oils. for these new greases and depending upon the operating conditions, the results from the booser equation may be conservative. therefore, unless adjustment factors are available for a certain fan type, it is best to use the booser equation to obtain a qualitative comparison of two fan designs rather than an absolute life estimate.
example 1: (the following information was obtained from a fan vendor.)
p = 960 g (2.116 lb), cr = 57 kg (125.66 lb), d = 3 mm
n = 2200 rpm, (dn)_{l} = 270,000 rpmmm, t_{brg} = 42°c when t_{amb} = 25°c
the half life subtraction factor is calculated as
s = s_{g} + s_{n} + s_{p} = 0 + 0.021 + 0.540 = 0.561,
and the resulting life estimate is
l_{10} = 102,000 hours.
in situations where fan reliability is critical, it is a good idea to limit the bearing temperature rise to 10°c. this rule of thumb should generally be applied when a single fan failure results in a system shutdown.
the booser life estimate can also be significantly affected by the bearing load and the bearing size. installing a fan with the shaft mounted vertically will result in a lower bearing load and a longer fan life. using a larger bearing will also yield a longer fan life.
in addition to grease life, bearing load rating life is a long established method of estimating bearing life based primarily on bearing loads and capacity. international standard iso 281 [4] and bearing catalogs describe the method. it typically yields life values longer than most fan vendors will support. many bearing vendors have devised adjustment factors for parameters other than bearing load. for information on a specific fan, consult with the fan vendor and bearing supplier.
fan life experiments
on account of economic and time constraints, we may rely on a zero failure test strategy and/or accelerated testing techniques. a zero failure test strategy may be used to estimate the test time required to verify a life expectancy criterion such as a minimum l_{10} life. note that the precision of this approach depends on the accuracy of the shape parameter assumption [5].
example 2 how long should a sample size of 30 fans be tested to determine with 90% confidence that l_{10} is greater than or equal to 80,000 hours, at 30°c? based on breyfogle [5, equation (12.7)] and assuming a weibull distribution, each of n fans should be tested t_{1} hours, with
t1 = α [ χ ^{2}_{2} ;c /2n] ^{1/ß}
where χ ^{2}_{2} is the cth percentile of the chi square distribution with two degrees of freedom; c is determined by the desired confidence level. from a chi square table, we get χ ^{2}_{2} ;_{0.90} = 4.60
assuming ß = 2, we solve for a in equation (3) to obtain
α = l_{10}(0.10536)^{1/2} = 246,463 hours
now substitute into equation (6), with n = 30 to get t_{1} = 68,280 hours of test time for each fan. if all 30 fans operate t_{1} hours (at 30°c), without failure, then we will be able to assert with 90% confidence that l_{10} is at least 80,000 hours.
accelerated life testing
the previous example shows that life test durations are very long, even when a zero failure test strategy is used. therefore, accelerated testing techniques are essential to complete component evaluation within a reasonable time and cost.
care must be taken when selecting an accelerated test strategy. an acceleration model that does not closely represent the characteristics of an amd can result in an invalid conclusion.
the first acceleration factor shown in table 2 is on/off cycles. these cycles stress the amd by accelerating the bearing from zero speed to normal speed. an on/off cycle every 8 hours would be representative of a personal computer application. even if this degree of stress is not appropriate, some on/off cycles are required to detect amd problems such as failure to start, changes in rotational speed, coast down time or start time, and increased noise.
an example of the problem of not identifying fan failures is provided by tables 1 and 2. table 2 indicates that some vendors do not use on/off cycles to identify failures and table 1 shows that for vendor e, reduced rotational speed is not considered a failure until it has dropped 30%, a very loose failure criterion. it should be expected that vendor e, using this combined test definition, will report amd life values that are much higher than normal.

vendor a 
vendor b 
vendor c 
vendor d 
vendor e 
on/off cycles 
every 500 hours 
none 
biweekly (336 hours) 
none 
none 
air temp. during life tests 
80±5°c 
75°c 
72°c 
70°c 
85°c 
temperature acceleration factor 
2.0/10°c 
1.482/10°c 
2.0/10°c 
1.315/10°c 
2.0/15°c 
table 2. acceleration factors.
elevated temperature is generally the primary acceleration factor. the range of acceleration factors typically used in amd reliability calculations is shown in table 2. for amd failures caused by lubricant breakdown, it is reasonable to use the acceleration factor of 1.5 per 10°c as in booser's equation. for example, to extrapolate the results of a life test run at 80°c down to 40°c, use an acceleration factor of 1.5^{(8040)/10} = 5.1.
what is the upper temperature bound for accelerated amd life testing? at the accelerated life test temperature, there should not be a significant change in grease structure. the performance of the grease is degraded mainly due to evaporation loss and oxidation. based on the astm standard test methods for evaporation loss and oxidation characteristics of lubricating greases and oils, accelerated life testing should be conducted at temperatures below 90°c [1].
what is the minimum ambient temperature to which fan life test data may be extrapolated using the temperature acceleration factors given in table 2? booser's nominal temperature acceleration factor applies specifically at a bearing temperature of 100°c.
therefore, applying these acceleration factors down to a room temperature of 25°c is probably questionable, but that is often how they are used because a better model is not available.
conclusion
as different companies use different approaches, it is useless to compare the life expectancy information from one vendor with that from other vendors. thermal engineers and component evaluation engineers have to perform an independent comparative analysis when selecting an amd. parameters of importance in fan life analysis include failure criteria, the distribution functions for statistical analysis and the life test acceleration factors.
once these parameters are selected, they should be used consistently in order to compare different amds. we hope the material presented here will help encourage standardization in fan life evaluation.
sung j. kim ibm storage systems division, tucson, arizona
alan claassen ibm storage systems division, san jose, california
references 1. kim s., vallarino c. and claassen a., 1996, "review of fan life evaluation procedures," international journal of reliability, quality, and safety engineering (in press). 2. tummala r.r., and rymaszewski e.j., 1989, microelectronics packaging handbook, van nostrand reinhold, new york, chapter 5. 3. booser e.r., 1974, "grease life forecast for ball bearings", lubrication engineering, pp. 536541. 4. international standard iso 281, 1990, "rolling bearings  dynamic load ratings and rating life". 5. breyfogle iii, f.w., 1992, statistical methods for testing, development, and manufacturing, john wiley and sons, new york.
