Log In   |   Sign up

New User Registration

Article / Abstract Submission
Register here
Register
Press Release Submission
Register here
Register
coolingZONE Supplier
Register here
Register

Existing User


            Forgot your password
April 2006

System-Level Thermal Design and Testing of a 3G Wireless Network Gateway


dr. izuh obinelo
degree controls inc.
18 meadowbrook dr., milford, nh 03055

abstract

a major oem of dedicated gateways for mobile networks was faced with a show-stopping problem: their flagship product was failing during initial field tests due to overheating, so what to do to salvage their multi-million dollar investment and thousands of hours of development effort? the design of an improved thermal architecture for the product is the subject of this paper.

keywords
thermal design, cfd, simulation, testing, thermal controller.
nomenclature
optional listing of terms and units

1. introduction

product reliability considerations have driven thermal design practices to the forefront in the development of electronics components and systems. as the power densities of electronic components and systems continue to increase, heat-related failure mechanisms have become of primary consideration in the packaging of electronics for high reliability. in year 2000, the uptime institute had predicted the heat dissipation footprint of communications equipment frames to quadruple in 8 years [1], which even though a staggering number back then, has had to be revised upwards as actual power dissipations have out-paced even the prediction [2]. it is thus not surprising that thermal design has become an integral part of the product development process for electronic systems, particularly for complex communications equipment such as the subject of this article. in the past several years computational fluid dynamics (cfd) has become a primary design tool of choice for packaging engineers attempting to predict the thermal behavior of such systems. however, these analysis tools have limitations in terms of their ability to sufficiently if not accurately predict the thermal behavior of complex electronics assemblies. the subject of the accuracy of cfd codes in the analysis of electronics systems has received wide treatment from many authors (see for example [4-7]) and will only receive an implicit consideration herein. as best practice, these tools should be used with a good knowledge of their inherent limitations in mind, and desirably in conjunction with experimental testing. owing to questions about their accuracy (often due to unknown or inaccurate input values), the best use of cfd tools for electronics packaging is in predicting trends, in weighing the relative advantage of various design considerations, and to some extent in probing design envelopes. however, when experimental data is obtainable, it is possible to calibrate the cfd model to yield sufficiently accurate results for decisive qualification of the product’s thermal behavior in the field. if in addition the cfd code is integrated into the mechanical cad design platform used for product architecture, thermal analysis can become quite a powerful driver of the entire product design process [3]. such is the approach taken in the present study to design future generations of highly complex 3g wireless network communications equipment.

currently boasting 5kw heat dissipation in a relatively small footprint, it is interesting to follow the evolution of this product from its first generation where thermal management was a secondary consideration, to its latest generation where thermal management was the main driver in all aspects of the product packaging – from board layout to system packaging. figure 1 shows the first generation of the shelf-level network equipment. the unit measured 0.432m in width, 0.6096m in overall depth, and 0.62m (14u) tall. the front part of the card cage consisted of two groups of seven line cards astride two management cards located in the middle two slots of a total of 16 slots in the unit. the line cards and management cards plug into a mid-plane, the other side of which was connected with two groups of 16 i/o cards (32 in total) located in the rear slots of the unit. in a minimal configuration, the unit is populated with 1 management card and 1 line card, each with accompanying 2 i/o boards. the line card and corresponding i/o cards may be located in any of the 14 slots designated for the line cards. in a full configuration the unit is populated with all 48 line, management, and i/o cards; the unit dissipated a total of 3.5kw fully configured.

the cooling system consisted of two fan assemblies arranged in a push-pull configuration, with five high capacity axial fans at the bottom and three high-pressure impellers at the top of the unit.

2. thermal design objectives  

the thermal design was defined to accomplish four primary goals:

• baseline the current cooling situation with existing fan assemblies.
• improve on the total airflow and flow distribution in the current system.
• reduce acoustic noise of fan trays thru cooling efficiency gains.
• improve heat sink design and verify that current 400mhz processor can be upgraded to 600 and ultimately 800mhz performance levels without over heating.

 

 

 

figure 1: outline of first generation of chassis

3. thermal design parameters

the four main criteria the cooling system must satisfy are the following:

• the unit must run in temperatures up to 55°c external ambient defined as normal operation.
• a single fan failure defines a fault condition. the unit must be able to run indefinitely under the worst case single fan failure.
• normal operation requires that the system run at less than 65 dba acoustic noise.
• in fault condition, the normal operation range is defined as 0-40°c, thus at temperatures beyond 40°c no acoustic limitations apply.

acoustics and thermal controllers

currently, the fan trays operated only at two speed settings – slow and fast. slow setting is 80% fan speed at which the measured acoustic noise was ~64dba. the high speed setting was at 100% fan speeds and was used for fault mode only, as it exceeded acoustic limitations.
new thermal controllers will be designed as part of new thermal architecture, regardless of any changes made to the current fan assemblies. the new controllers will operate with linear speed control through the entire speed range of the fans to maintain a maximum air exhaust temperature 70?c, and a maximum noise level of 65dba under normal operation.

redundancy

one aspect of the system that needed to be investigated was the possibility of eliminating the top fan assembly altogether, or at least using it in a redundant status. if the system could be cooled using only the bottom fan assembly, then top tray was required for only three reasons.

• removal of lower fan tray for servicing
• cooling of rear i/o slots in the chassis
• worst case single fan failure

this was a major change, which if tested successfully will improve the system’s cost, reliability, airflow and acoustic noise.

future upgrade

presently, four sibyte processors were used on each line card card, with each processor dissipating 8.5w. the product map of this unit included a plan to migrate to two progressively faster sibyte processors which have wattages of 10.5w and 12w respectively, or dual low voltage intel nocona processors which were expected to dissipate 55w each. the re-design must account for these higher wattages since the next generation boards will be designed to plug into the current chassis.

4. baseline of the current system


establishing the thermal performance of the current system involved both lab testing and detailed computational fluid dynamics simulation. the essence of this approach was to use the test results to validate the cfd model. then, once the model was fully validated, further design changes would be made in the model in an effort to maximize the cooling capacity of the thermal management solution. it was expected that a side benefit of this optimized cooling would be a reduced acoustic noise achieved by running the air movers in the system at lower speeds.

in the following, the procedure and results of the lab testing are first presented, then the results of the benchmark cfd model are presented along with comparisons to test data.

4.1 lab testing

total airflow tests

the first step taken to baseline the cooling performance of the existing chassis was to measure the total airflow. to accomplish this, a fully populated chassis was attached to a wind tunnel to measure the total airflow rate through the system at default fan speeds, and then at full fan speeds.

to measure the gross airflow, the unit was first powered at the desired fan speeds, then the supply fan of the wind tunnel was used to zero the static pressure in the wind tunnel chamber to which the unit was attached. the flow through the unit and wind tunnel was then measured to obtain the total airflow delivered by the fan assemblies in the unit. note that without the pressure compensation from the supply fan of the wind tunnel, the air movers in the unit would have been driving the flow resistance of the unit as well as that of the wind-tunnel, which would be reflected by a negative (vacuum) static pressure at the inlet of the unit.

the total air flow measured in the unit was 169cfm at normal (80% of max) fan speeds, and 183cfm at full fan speeds. this was very poor airflow indeed, when compared to the potential capacity of 1500cfm that could be delivered by the bottom fan assembly.

figure 2: unit is shown attached to a wind tunnel in order to measure the total air flow rate.

board and component level tests

in component and board level testing, each of the four types of boards was instrumented with thermocouples to measure the surface temperature of critical heat dissipating components. in addition, air velocity and temperature probes were strategically placed on each board at positions selected to give a good feedback on the air flow through each slot. these air probes were typically located at the center of the boards and near critical components. all the thermocouples used in the measurements were the t-type, and all the air flow probes were cafs model 220 probes which have a range of 150-1000[fpm]. the thermocouples were polled with keithley model 2700 multimeter, and the air probes were polled with atm-24 air flow monitor. all instruments were interfaced to the testmetrics test software which also runs the wind tunnel. testmetrics logs all the raw data directly to a spreadsheet, and automatically calculates and enters data summaries into preformatted reports. a high speed thermal camera (fig. 3) was used to scan each board primarily to identify unsuspected trouble spots which were subsequently instrumented with thermocouples.

a typical board-level instrumentation of one of the line cards is shown in fig. 4. figure 5 shows a sample thermal scan of the line cards board in slot #1. pretty much the same temperature pattern was repeated in the packet accelerator slots that were scanned, with slight variations. note that since these are surface temperatures, the thermal scans did not necessarily reflect the temperature of those chips with attached heatsinks. primarily the thermal scans were used during set-up to locate areas of concern that were then instrumented with thermocouples. being so convenient to provide an overall temperature map, the thermal scans were used more extensively to validate the final solution.

figure 3: a line card being scanned with a thermal camera.

figure 4: one of each type of board was instrumented with several thermocouple probes to measure component surface temperatures, and several atm probes to measure local air velocities and temperatures. an instrumented line card is shown here.


figure 5: a sample temperature scan of the line card in slot 1.

4.2 thermal analysis

a 3d cad model of the unit was built in i-deas software for purpose of carrying out the analysis. in i-deas esc simulation is done directly on the design geometry, so the cfd model can be made geometrically exact. this feature was especially helpful as the cfd model was the central focus of the re-design effort, by facilitating easier information exchange with the mechanical design and board layout groups. detailed solid models of the various line cards in the system were also built, with every major heat dissipating component and every major flow blockage discretely modeled. the cfd cad model is shown in fig 6.

figure 6: the assembled cfd model. the chassis and internal assembly are shown in the first row. rows 2 and 3 show the seven different types of board that were used to populate the unit.

5. comparison of benchmark model results with experiment

5.1 air flow rates and flow distribution

the total flow rate measured in the system at full fan speeds was 183cfm, compared to a model calculated value of 192cfm. the discrepancy between the two values was less than 5 percent, which was deemed quite sufficient as a validation of the model.

figure 7: air temperature and velocity in slot #1. note the large recirculation regions in the io slots. these recirculation zones must be eliminated.

figure 7 shows the velocities and temperature in slot 1. as may be observed, in addition to the low velocity values overall, there existed unacceptable regions of dead spots especially in the rear slots. the components in these regions of dead spots could not be adequately cooled regardless of the total airflow through the unit. also, as will be shown later in the discussion, the flow distribution from one slot to the next was also very poor. the re-design effort concentrated primarily on improving the airflow situation in terms of three criteria: maximizing the total airflow, eliminating any dead spots, and improving the flow distribution among the slots.

5.2 component and board temperatures

the component and board temperatures calculated by the benchmark model are shown in the next four figures. table 1 shows a more detailed chip-to-chip comparison. all component temperature results were taken from the worst case slot for each type of board. for the packet accelerator cards the worst case slots were the end slots.

note that the temperature correlation between model and experiment, while being very important, was less important than the flow correlation. this was because the calculated (and measured) temperatures were very dependent on the actual heat dissipations of the individual chips, a quantity which was only a best estimate and also dependent on the actual traffic processed by the system. while worst-case heat dissipation values were used in modeling, it was not clear what the actual heat dissipations were in the tests. to illustrate this point further, a heat balance on the measured air flow showed that the actual heat dissipation in the system was typically only 40% of the total heat dissipation values put into the model. this does not mean that the worst-case values were not realistic however, only that for a particular test condition certain chips may be stressed more than others by the simulated traffic used in the test.

figure 8: temperature profile of the packet accelerator card in slot #1.

figure 9: temperature profile of system management card in slot #9.

figure 10: temperature profile of the rear i/o boards in slot #1.


6. improvements to the system

6.1 design of a new thermal architecture

overall, the simulation predictions were sufficiently close to the measurements carried out in the lab. however, the most important conclusions that were drawn from the benchmark model were the following:


• flow calculation by the model was sufficiently accurate.
• the model predicted the correct trend in thermal performance of the system.
• the model used higher heat dissipations conditions than the actual measurements indicated. on this account, the model temperature predictions were taken as the upper limit of a fully loaded chassis operating at full design capacity.


using the benchmark model, several chassis changes were investigated in software to determine how the thermal management of the current system can best be improved. these changes were centered on optimizing the air flow through the chassis. it was reasoned that once an optimum air flow configuration was obtained, further improvements will be made on the selection of heatsinks for particular components if necessary, especially for the new processors planned for future generations of the system.


• optimizing the airflow through the system consisted of:
• maximizing the total air flow.
• improving the distribution such that the total airflow is uniformly distributed among all the slots.
• more of the total airflow is biased to the front section of the unit where the heat density was much greater.
• eliminating regions of recirculation as much as possible.
• sustaining adequate airflow in the case of a single fan failure.

the logical point to start was to determine how much air flow was required to cool at the unit, at full heat dissipation. based on an estimate of 3500w dissipation and a 15°c (considering a 55°c ambient) air temperature rise through the chassis, the minimum required airflow was calculated to be 410cfm. obviously, and as had been confirmed from field tests, the system as presently designed could not provide this amount of airflow and as such could not survive elevated ambient temperatures. starting with the minimum airflow requirement as a datum, several of the design changes made in simulation were discarded if they failed to provide reasonable head room above and beyond the minimum airflow requirement.

without wasting too much time on the intervening details, the optimized chassis design is presented in fig. 11. among several modifications made to the chassis (which are mentioned briefly in the concluding remarks), the major change was to replace the three-blower top fan assembly with an assembly consisting of six 25mm axial fans; this was done primarily to improve the flow distribution among slots and especially to provide more airflow to the outlier slots. the total airflow in the re-designed chassis was calculated to be about 540cfm, a quite dramatic 300% improvement in airflow through the unit. although it is outside the scope of this article, it is to be noted that subsequent tests carried out on the re-designed chassis confirmed all the improvements suggested by the analysis. the total airflow in the actual re-designed chassis was measured to be 560cfm, virtually identical to what was calculated.

figure 11: the re-designed chassis showing the new fan assemblies in a push-pull configuration.

6.2 summary of thermal performance of the proposed design

figure 12 shows a breakdown of the improvements obtained with the new chassis. not only was the total airflow improved, but the proportions of airflow to the front and back slots were nicely balanced according to the heat dissipation. the airflow in the worst case single fan failure is also shown; even in this condition the airflow to all parts of the chassis was quite adequate.


figure 12: total airflow obtained with the proposed system, compared with the current system.


figure 13 shows not only a marked improvement in absolute airflow values compared to 7, but even more importantly shows that the dead spots in the slots have been eliminated.

figure 13: airflow distribution in slot #1 of the re-designed chassis.

the temperature profile of the line card in slot #1 is shown in fig. 14. a comparison of this temperature map with fig. 8 shows just how much of a dramatic improvement was obtained with the new design. the other two boards used as an example are shown in figs. 15 and 16. the dramatic improvements in the cooling of these boards are obvious.

figures 17—19 show the average air speeds in the various slots, and compares the re-designed chassis to the current design. the figures confirm dramatic improvements in the air velocities everywhere in the chassis.

 

figure 14: temperature profile of the line card in slot #1 obtained with the re-designed chassis.


figure 15: temperature profile of the management card in slot #9.

figure 16: temperature profile of the rear i/o boards in slot #1.

figure 17: air speeds in the front slots of the proposed system, compared with their current values. the airflow in the outside slots has been particularly improved.

figure 18: air speeds in the bottom back slots of the proposed system, compared with their current values.


figure 19: air speeds in the top back slots of the proposed system, compared with their current values.

table 1 shows the calculated temperatures of critical components in the new system, compared with the spec values, the measured values in the benchmark (current) system, and the benchmark model values. as expected, dramatic improvements in the heat transfer were achieved with the improved airflow, and the system can quite adequately be cooled with the new thermal management architecture.

 


table 1: temperatures of major heat dissipating components.

6.3 summary of changes to the current design

a summary of the major changes implemented in the new design are as outlined in the following. many minor changes were also made but not listed here:

  1. the top 3-blower fan assembly was replaced with an assembly of six 6×1” axial fans.
  2. the fan placement in the bottom tray was tweaked a bit for better overall flow distribution in the unit
  3. both bottom and top fan assemblies were redesigned for minimalist sheet metal usage in order to eliminate any obstructions to airflow while still preserving their structural integrity.
  4. the vents were completely redesigned, both in terms of area and percentage openings.
  5. honeycomb structures were incorporated with the vents to provide emi shielding. the honeycombs were 92% open to airflow.
  6. all card guides were modified to minimize material and increase open area.

7. conclusions

the important conclusion drawn from table 1 was that the new system, using the current heatsinks, comfortably cooled to nebs max ambient specification of 55°c. although outside the scope of this presentation, new and improved heatsinks were designed for the major heat dissipating components; these heatsinks dramatically improved the cooling situation even further.

one other major aspect of the re-design not discussed here is redesign of the thermal management controller to provide closed-loop thermal control of the unit using infinitely variable speed control between the min and max speeds of the fans, as opposed to the old system that had only two speed settings. because of the much improved cooling margin, the new controller was able to regulate the fan speeds to meet and exceed the acoustic requirement up to the maximum design ambient temperature. the noise output of a fully loaded chassis at the upper limit of 55?c was only 59dba.

references

1. “heat-density trends in data processing, computer systems, and telecommunications equipment”, uptime institute, the uptime institute white paper, version 1, 2000.
2. “datacom equipment power trends and cooling applications”, ashrae tc9.9, mission critical facilities, technology spaces, and electronic equipment, 2005.
3. “electronics cooling reference documentation”, maya heat transfer technologies, 1999.
1. ellison, gordon n., "thermal computations for electronic equipment", robert e. krieger publishing co., fl, 1989.
2. lasance, c., "the conceivable accuracy of experimental and numerical thermal analyses of electronic systems", ieee cpt 25, 2002, pp. 366-382.
3. prakash, c., “application of computational fluid dynamics for analyzing practical electronics cooling problems”, proc
20th international symposium on heat transfer in electronic and microelectronic equipment, dubrovnik, yugoslavia, aug. 1988.
4. rodgers, p., eveloy, v., davies, m.r.d., "an experimental assessment of numerical predictive accuracy for electronic component heat transfer in forced convection: parts i and ii," transactions of the asme, journal of electronic packaging, vol. 125, no. 1, 2003, pp. 67-83.

Choose category and click GO to search for thermal solutions

 
 

Subscribe to Qpedia

a subscription to qpedia monthly thermal magazine from the media partner advanced thermal solutions, inc. (ats)  will give you the most comprehensive and up-to-date source of information about the thermal management of electronics

subscribe

Submit Article

if you have a technical article, and would like it to be published on coolingzone
please send your article in word format to [email protected] or upload it here

Subscribe to coolingZONE

Submit Press Release

if you have a press release and would like it to be published on coolingzone please upload your pr  here

Member Login

Supplier's Directory

Search coolingZONE's Supplier Directory
GO
become a coolingzone supplier

list your company in the coolingzone supplier directory

suppliers log in

Media Partner, Qpedia

qpedia_158_120






Heat Transfer Calculators