disaster recovery site improves thermal security
by wally phelps
data center product manager
introduction
increasing thermal loads at data center operations combined with demands for increased operating security require operators to look for innovative methods of real-time thermal management. all corporate data operations require 24/7 uptime. in the event of a major outage the disaster recovery site’s performance can literally determine the fate of the company. a major multinational manufacturer has recently added thermal security (the ability to adapt to and monitor changing thermal conditions) to its disaster recovery site to assure uninterrupted operations. as a side benefit, the solution also provides more efficient cooling. the greater efficiency will pay for the entire project in less than a year in electric savings and continue to add to the bottom line in subsequent years.
problem statement
the katrina disaster 2000 miles away, a crac failure during indian summer, the temporary unavailability of spot coolers for rental and persistent hotspots spotlighted the need for better thermal management. in addition, the site faced sharply rising energy costs it had not budgeted for. higher electricity bills cut deeply into the it department’s budget. the technology manager knew he needed a solution to both the thermal and cost problems and evaluated the options available.
previous options
the options considered were
- use hot aisle/cold aisle layout – this is a normal solution. but here, the locations of all the cracs at one side of the room, and the inability to shut down the disaster recovery function’s it equipment during a move made this option unattractive.
- add additional crac capacity – the room was almost full, and there was no capital budget for buying more cracs. plus more cracs would drive utility bills even higher. the manager knew he already had more than enough cooling capacity with his 30-ton units. one of these units constantly short cycled, a common symptom of poorly utilized crac capacity.
- make rack-level changes – the diverse mix of equipment in the room didn’t lend itself to being reconfigured into a new rack system. plus the downtime and budget requirements made this a non-starter as well.
- try something new – the it manager decided to look for a more systematic approach to his thermal security problem.
the adaptivecool™ solution
degreec, a thermal engineering firm with 10 years experience, designs the air-cooling subsystems for many leading server manufacturers. their experience allowed them to see several years ago the crucial thermal issues data centers would experience. degreec takes a systems approach to data center thermal management. it considers each component and subsystem’s individual thermal and airflow performance and how they all interact in a complex system. degreec also listened to the voice of it managers around the globe and designed its solution to meet the cooling, monitoring, expansion, and 24/7/365 needs of the modern data center.
the adaptivcool solution has a number of features and benefits:
- makes the most efficient use of the cooling that is already paid for before considering adding new capacity.
- provides easy-to-use monitoring tools that tell at a glance the thermal health of the data center and warn of impending problems.
- requires no downtime or facility upgrades to implement.
- adapts automatically to changing loads or cooling capacity.
- all its components are hot swappable so it’s easy for operators to maintain the thermal health of their data center.
- fully expandable as new it equipment, new cracs and rooms are added.
objectives of the project
the it manager at the disaster recovery site had six main objectives. he wanted to:
- maintain a higher cooling margin during the summer months where the environment immediately surrounding the data center exposes the center to temperature extremes.
- fix hotspots in several areas.
- save money on electric utilities, with payback from any expense to come in less than a year.
- allow remote monitoring of the data center’s thermal health.
- plan for almost certain expansion of the center’s thermal load.
- support and maintenance plan that would insure continued thermal performance.
implementation
implementation at the disaster recovery site was straight forward. the main steps were:
- site audit to inventory and characterize the it equipment heat sources and the facility;
- simulation using cfd (computational fluid dynamics) modeling to predict heat and airflow of baseline data center;
- verification of the cfd model against measurements taken during the audit;
- iterate to make improvements using the model to determine optimum configuration of passive and active airflow elements;
- installation of sensor network to monitor changes during remaining installation;
- install and reconfigure passive and active airflow elements;
- verify and recertify room thermal performance;
only during the initial audit and final installation and recertification phases of the project did the data center need to be visited. no downtime was incurred during any of the project phases.
summary of results
cooling margin results
the cooling margin of the room was improved by 7ºf at the top of the racks.
max rack intake |
before |
after |
avg. |
73.6 f |
70.6 f |
std. dev. |
3.6 |
2.3 |
max |
81 f |
74 f |
more dramatically, virtually all server intake temperatures dropped, some by as much as 14 ºf.
temp reduction by % of servers
temp reduction |
% of servers |
>= 10ºf |
7% |
6 to 10ºf |
16% |
2 to 4ºf |
52% |
utility savings results
just as one would not heat and cool a large expanse of unutilized space, there is no longer any reason to flood cool a data center when only 20-30% of the space is taken up by racks. flood cooling is both inefficient and ineffective. it is no longer necessary.
by directing cooled air to the intakes that need it, by minimizing recirculation and the mixing of hot and cold air streams, and by reducing the short-circuiting of cool air, rack intake and exhaust temperatures are lowered. that means crac setpoints can be raised, and that saves money. in this particular case the crac setpoints were raised by +5ºf and are still keeping critical rack and server intake conditions within the ashrae recommended limits of 77ºf max and 40%rh min. controlling airflow and raising setpoints provides energy efficiency benefits three ways.
first, from a thermodynamic standpoint there is less heat rejection required by the cracs. second, by reducing short-circuiting of cooling air to the crac, the ”t of the cooling system is raised, which improves crac operating efficiency. third, dehumidification of the data center is reduced, improving the sensible cooling ratio.
moderate and achievable changes in crac intake temperature and humidity can mean up to 17% increase in sensible cooling capacity while meeting ashrae class 1 temperature and humidity specs for data centers.
it should be noted that since there can be a significant number of constant speed devices (fans and pumps) in a cooling infrastructure, the actual energy savings per site can vary. this particular site cannot take full advantage of its potential energy savings today because it has constant speed components. facilities executives are considering adding low cost variable frequency drives (vfds) in the future to benefit fully.
yearly savings in electricity for cooling 2000 sq. ft. disaster recovery site at $.13 per kilowatt hour
infrastructure |
$ savings |
% savings |
without vfd drives |
$14,800 |
18% |
$ per sq. ft. |
$7 |
- |
with vfd drives |
$24,000 |
30% |
$ per sq. ft. |
$12 |
- |
roi for either case is < 1 year.
remote monitoring results
adaptivcool provides automatic monitoring of the data center’s thermal health. the application is securely accessible over the customer’s intranet so the data center manager can see at a glance the thermal profile and status of the active airflow devices. this allows immediate response to a real thermal issue, and peace of mind in instances where the center is running properly. future releases of the adaptivcool software will send e-mail or page key personnel in response to an excursion outside preset limits.
a tool for expansion planning
data centers are constantly changing and this site is no exception. one of the challenges facing this center is knowing where to place its new ibm blade centers, and what cooling challenges they will pose. since the cfd model was previously verified against actual measured conditions in the room, it was a straightforward task to simulate various configurations and placements for the new equipment.
project summary
data center cooling technology has remained virtually unchanged for 30 years while it equipment refreshes every 3 years. heat densities are rising with each new generation of equipment. simply putting in more raw cooling capacity is rarely the right answer. targeting the available cooling to where it is needed most, and controlling airflow precisely are the sensible approaches that will pay dividends in equipment uptime, energy costs and real estate.
this case study demonstrated that by carefully analyzing all the heating and cooling factors affecting a data center and how they interact, responsible data center professionals can increase the operating and thermal security of their centers, and save money while doing so.
|