Home

strategic assessment and recommendations

image

Contents

1. As a result cooling is not uniform and space is not used efficiently To better utilize the space it is recommended that the existing cabinets be replaced with cabinets of uniform height which allow better power distribution and cable management To maximize the use of the room a second row of servers could be allocated facing the existing row Allow four feet of space between cabinet faces two full floor tiles for equipment racking For increased hot air segregation and more effective cooling a hot aisle cold aisle configuration should be established when the second row of cabinets are installed Placing the two rows of cabinets facing each other with the perforated tiles in between will force the cool air to do work as it is pulled into the servers and vented out the rear where it is picked up by the CRAC units cooled and redeployed Blanking panels should be installed in all empty cabinet rack units to maintain air flow and heat segregation Cabling Standards Currently cables are haphazardly labeled via zip ties and plastic placards There are a significant number of 10 PDU Power Distribution Unit in this case a device with many outlets fed by a larger circuit with monitoring and alerting functions Confidential 13 Of 28 NewCo Data Center Assessment Metagyre Inc dead cables evident Cable management is underutilized where it exists Poor cable management raises the risk of an accidental outages complicates adds moves
2. EE 21 Rack Inventory and Power Information cccecccccesseeessessseeseeeescecnseeeeseeenseccsaeeseseeseeseseecssaecneesneeeeeeesneenags 24 Liebert Product Documentations ascii eia i aa E a aa V a N AA a 25 PRO TOER EAA T E E E E E E 25 Liebert Deluxe System Borze nen iar E a a dnd 26 Intellislot Web 485 Card With Adapter cccccccssccsssccssscesssessseeeseeesseeceeeesseecnseesseecneeeceseeseseeseneeeneeennees 27 General Information on UPS Batteries c cccccecsccesssessecsseeeeseeeeeeeseecsseceaeceseeeseeeesseeeeseeseaeeeeeeeseesseeeneeens 28 Confidential 3 Of 28 NewCo Data Center Assessment Metagyre Inc Audience This document is intended for NewCo staff only Confidential 4 Of 28 NewCo Data Center Assessment Metagyre Inc Purpose The NewCo requested assistance in assessing the health and capacity of their data center They contracted Metagyre to review their data center and ensure the UPS and other other power components have the capacity for their current usage as well as planned growth NewCo also asked for input on the HVAC system and to receive recommendations for improvements to their infrastructure if needed In order to complete this request Metagyre went on site to analyze current data center s power cooling and supporting infrastructure While in the data center Metagyre reviewed the facility for general efficiency In order to provide the NewCo with recommendations Metagyre performed the fol
3. Floor les osencae goes avo atk sue ads saudosTusondetcesanyenacasvebyedicel caus ecbuues stutyetadeestetyadeiseceseotds cae eetelneeaes 11 Raised Flo iia an 11 REUS a a ia 11 MONO A te cee IE cea losses E EE E EE See tadas 12 PO ia 12 URS Battery A Sex cites a a li loc Tas 12 MOM AA TE 12 Generator Start Balli A A dc 12 MP E e Ei 13 Circuit Panel Documentation and Clean Up ccccccsscsssecsseceseeeeseeesseeenseeceseeeeseesaueceeeceseeceeeeeteeeeeeesseensas 13 Phase Balcon aladas 13 A O RN O 13 Eayoutiand Positioning aia 13 Cabling Standards A A A RDA cd ai 13 Cabinet Power MONO icon conta AR n A Ad iS A EA 14 Recommended Improvements iii a AA a AAA A 14 MONOS A A A A Gala kb ots 14 Power and UPS ci a ie 14 CRACIOAVA Coria AA A ad 15 General Site Monitoring seses a E a A td Ai 15 Malta is A A AR A Abd ii 16 Cai A A E a a a a ta a 16 A A e Mtb N 16 Cabling ii chet aaa ese aged ST PAG en dete eta hn a IR 16 ECN sa 17 ATI a 17 APPEAR A A AA A E AAA A A dd Bis te 18 Data center la yo LL A a nba oe 18 Current floor plan Do Nothing Scenario ccccccscccssccssseceseeesteceesecesseceseecesaeesseeceseeenaeeesseceaeeneeeeseeenaees 18 Relocated Perforated Floor Tiles Procedural Changes and Site Clean Up ScenariO ooooconoccciocnnoncnionnnnns 19 Confidential 2 Of 28 NewCo Data Center Assessment Metagyre Inc Fully Redundant SCenarios ti aii td dali a A 20 Data Genter pictures aa a ec E e tate casash E EE dau tg santa ER
4. Liebert services to install Alternatively the units can be upgraded to an COM system Impact Installation of this device should not incur any outage Benefit Management and monitoring of the CRAC system Used in conjunction with a SNMP based network monitoring solution this will provide usage data and alerts Monitoring should be used to track unit duty cycles ensuring proper capacity and to prevent short cycling Temperature should also tracked over time to ensure heat loads are being handled and hot spots are being addressed General Site Monitoring 1 General SNMP monitoring of equipment in racks to verify system health and connectivity Cost Variable There are a number of different packages available for different levels of event automation Recommend looking at products similar to Solarwinds product offerings Please note that any product purchased requires a substantial amount of configuration and optimization to see full benefit Impact Installation of this device should not incur any outage There will be an elevated staff impact during the initial deployment and documentation as the products are better understood and implemented through the site This impact on staff time should fade as the product matures in the environment Benefit A properly implemented monitoring system will provide detailed information for resource planning and speed time to recovery SNMP traps are widely implemented as a default resource in prod
5. causing instantaneous loss of power to the critical load Some possible events that could lead to a partial site outage Failure of a CRAC unit remaining unit unable to maintain set temperature requiring the shutdown of less critical components to reduce heat generation Overloading an individual circuit causing the instantaneous loss of power to a subset of equipment Mis tracing a power cable or accidentally shaking loose additional power connections under floor causing the instantaneous loss of power to a subset of equipment Risk Reduction Recommendation The following recommendations will not address all the risks that have been highlighted but will with minimal financial cost reduce the threat or impact of the more likely events and should be considered for immediate implementation At a minimum the site should have Operational monitoring of all critical components with tested alerting functionality The power should be traced and properly identified on the panel schedule Outlets should be clearly labeled with panel and circuit information All dead network cable should be removed Active network cable should be cleaned up and properly routed via cable management Patch cables should have a matching unique identifier on each end to allow verification of the proper cable All equipment should be properly mounted into the racks rather than partially mounted stacked on other equipment or leaning against the cabinets Documenta
6. controlled outage reboot to switch over to the new power distribution Properly installed dual powered devices should not experience and outage Benefit Ease of management and monitoring This will significantly decreases the likelihood of accidentally tripping or overloading a circuit Additionally if switching PDUs are utilized remote sites can power equipment on off as needed via a controlled and logged interface 2 SNMP monitoring of the UPS to track load over time and alert on events Cost Approximately 500 per UPS for a Liebert Intellislot Web 485 Card w Adapter plus Liebert services to install Impact Installation of this device should not result in any outage Benefit Management and monitoring of the UPS system Used in conjunction with a SNMP based network monitoring solution will provide usage data and alerts Additionally when used in conjunction with power monitoring software the system can trigger a clean shutdowns of servers and equipment in the case of an extreme failure such as a generator failing to start or battery problem Monitoring the UPS load over time will also allow verification of energy cost saving efforts 11 See the file sl_29107 pdf in the Appendix directory for product information Confidential 14 Of 28 NewCo Data Center Assessment Metagyre Inc CRAC HVAC 1 SNMP monitoring to alert on events Cost Approximately 500 per unit for a Liebert Intellislot Web 485 Card w Adapter plus
7. failure Circuit Panel Documentation and Clean Up The existing panel schedule reflects the original power layout from when the mainframe was in place with 1ts supporting peripherals Power distribution is via quad outlets and receptacles connected via flex duct under the raised floor Distribution to the cabinets is haphazard based on what outlets were near the cabinet and are connected to via extension cords and under desk style power strips zip tied to the cabinets It is recommended that power be re deployed via labeled fixed outlets above the cabinets and distributed in cabinet by monitored PDU s Deploying power in this fashion will facilitate the addition and removal of equipment and will prevent accidental outages due to pulling the wrong chord or knocking loose connections free Phase Balancing Power is delivered through a three phase configuration A B and C Currently the B phase on the UPS is carrying twice the load of the remaining phases This type of imbalance adds unnecessary stress and increases the load on the system While this imbalance not a critical issue currently going forward it is advisable to add load to the A and C phases making them equivalent with the B phase This balancing can be accomplished through the recommended Circuit Panel Documentation and Clean Up project Cabinets Layout and positioning The existing cabinets are a collection of legacy cabinets that were put into production in an organic fashion
8. meant to group tasks together that make sense technically and financially with each scenario expected to build on the previously described Do Nothing The current site has developed organically over the last decade and the lack of organized site management is evident Move add and change order work is difficult to perform on the site with legacy cabling knotted through blind paths and under floors Electrical sources are randomly used and many connection points are hidden under floor via extension cables While the site has had few major outages remote sites now depend on 1t more and more for their disaster recovery needs The lack of procedure and organization will wear on the site and increase the risk of outages With the addition of the San Diego equipment the site will be at or near maximum capacity based on power and cooling limitations Doing nothing will require the least investment in the short run but significantly limits the site s agility and increases the complexity of resolving data center incidents in the future Administration of the site will continue to become increasingly difficult as more technology is added if the current practices remain Procedural Changes and Site Clean Up Significant risk reduction can be accomplished through procedural standards and site clean up While these changes will not add redundancy or resiliency they will provide alerts aid in the recovery of failures and prevent incidents caused by poor practic
9. receptacles Cost Approximately 2 000 Impact Would require work in the area by skilled trades While there would be no expectation of an outage additional personnel in the data center always increases risk Benefit Currently power is distributed via flex duct under the floor and connected to the racks by power strips or to devices directly The panel schedule on the UPS is out of date and reflected the power distribution from when the mainframe was in place Re working the power distribution with the PDU project above would allow monitored PDUs to verify capacity available Cabling 1 Establish document and maintain color and labeling standards Recommend using a single color for each network path Patch cables should be labeled with a serial number and their length on each end Cost Approximately 3 500 depending on local trade rates Impact Any singly networked devices will experience a controlled outage to switch over to the new cable Properly installed dually homed network devices should not experience and outage Confidential 16 Of 28 NewCo Data Center Assessment Metagyre Inc Benefit Properly labeled networking cables improves device manageability and reduces the likelihood of accidentally removing the wrong cable during a change procedure New devices are easier to install in a controlled and maintained fashion Efficiency Air Segregation 1 Utilize the space above the ceiling by ducting the overhead plenum into
10. the CRAC units This allows directed heat evacuation and better efficiency Clear out all debris from under raised floor to allow proper air flow Cost Approximately 2 000 depending on local trade rates Impact Would require work in the area by skilled trades While there would be no expectation of an outage additional personnel in the data center always increases risk Benefit Improved air flow will allow the CRAC units to operate more efficiently at a lower cost reduce their work load and extend the life of the equipment Confidential 17 Of 28 NewCo Data Center Assessment Metagyre Inc Appendix Data center layouts Current and suggested layouts Please reference the file Floorplan vsd for the image source All equipment is represented in the various layers allowing different combinations to be explored Current floor plan Do Nothing Scenario s Confidential 18 Of 28 NewCo Data Center Assessment Metagyre Inc Relocated Perforated Floor Tiles Procedural Changes and Site Clean Up Scenario The Increase Site Capacity Scenario is identical as well es AD CERA Confidential 19 Of 28 Metagyre Inc NewCo Data Center Assessment Fully Redundant Scenarios E A j a COMME oro E E Hae gt Si n a T E La 20 Of 28 Confidential NewCo Data Center Assessment Metagyre Inc Data center pictures UMMM F The EPO Emergency Power Off may not be Cable ma
11. 1 Utilize a uniform cabinet style capable of better deploying PDUs blanking panels and cable management Cost Approximately 1 000 2 000 per cabinet plush shipping for similar cabinets to those deployed in the Dallas data center Impact Equipment would be migrated to the new cabinets in a scheduled rolling fashion to minimize outages and allow process control Benefit The current set of cabinets have grown organically from previous endeavors Differing in height width depth and rail type requires all installations to be ad hoc in nature New cabinets would ease installation and removal of equipment and would reduce the time requirements for the clean up of the existing equipment 2 Install a ladder system to bring cabling up out from under the floor where it can be more easily maintained Cost Approximately 3 000 depending on local electrician rates Impact Would require work in the area by skilled trades While there would be no expectation of an outage additional personnel in the data center always increases risk Benefit Currently power and network cabling is routed under the floor tiles which both is restricting air flow as well as makes cable tracking and clean up much harder Routing inter cabinet cabling via overhead ladders will clear up the under floor plenum for better air flow and improve overall cable maintenance and management Electrical 1 Re home power to the cabinets with dedicated breakers and properly labeled
12. NewCo Data Center Assessment etagyre by Metagyre Inc September 27 2010 Metagyre Account Manager Contact Paul Thompson 1249 NW Arcadia Ct Suite 300 Poulsbo WA 98370 voice 360 697 3386 fax 360 697 6676 pthompson metagyre com NewCo Data Center Assessment Metagyre Inc Table of Contents AUN A a ana evens teats eaten A 4 RO 5 SUSO a A N EA EE tt EE E EE A E EE A 6 CUESTAS it A A A Ad 6 A E RT 6 REdUNdANC a tasa sks 6 Existing Risks and Single Points of Failure ooooonioccnnococonccconccnonncnonocnnnocnnnncnnnnnon conan cnn nnrnana cnn nn can nr ran rncnnnos 6 Risk Reduction Recommendation cceccssscsseessceecesecenecscecseceseceaecsaecececscecsecesecesecaeceeeeseesaeeeseceaeceneseeeeeeeaee 7 Scena OS sez coh els dt da O Titi 9 DON OO al ie 9 Procedural Changes and Site Clean UD oooonoccnocccnonacoonncoonncnonnconcnconcnnon conan non nono nn ron nn cnn nn nan nr non n rn n rr non r cnn nn cnnnnnnnes 9 Add Critical System Redundancy serani E E E nornonn ron nro nr cnn nn ran E rn nan rn TEAS EASE 10 Increase Site Gapacity aee e falteaele Sica sauncs ER E A da sate aa E a cde a e uu 10 Add Critical System Redundancy And Increase Site Capacity oooococincccoonnnoncnoonncnonccnnnnconnncon nono onconnnconcnna nacos 10 TechnologyaSpecific EOI Sutil ti dr a E A ASA 11 Computer Room Air Conditioning cccccecsscessseeesceceseeceseeessecesceesseeesseeceseeseeecsseecueceaecseeeseseesseeseeeeeeeesaees 11
13. and changes to the equipment and can reduce cooling efficiency by reducing air flow Recommended corrective action is to initially remove the dead cable followed by either implementing a patch panel system for inter cabinet cabling or developing a ladder system In either case additional cable management should be used and standardized cabling practices implemented Having patch cables pre labeled with a unique serial number on both ends along with the cable length ensures cables are verifiable from both ends Cabinet Power Monitoring Today power distribution in the racks is handled by power strips on extension cords to quad outlets under the floor In this configuration it is impossible to gauge power consumption on a cabinet by cabinet basis It is recommended that power be redistributed off fresh breakers so they can be tracked labeled and deployed on monitored PDUs in the cabinets to ensure that no overload conditions exist Since Chicago is a DR site for remote locations switched PDUs might be used to allow remote staff to power equipment on and off Recommended Improvements Monitoring Power and UPS 1 Utilize monitored PDUs to track circuit load and ensure proper redundancy is maintained Cost Approximately 450 per PDU typically installed as a pair per rack Additionally an electrical contractor will be required to install branch circuits appropriately with the proper receptacles Impact Any singly powered devices will experience a
14. apter See information included in the zip package Confidential 27 Of 28 NewCo Data Center Assessment Metagyre Inc General Information on UPS Batteries See battery maintenance and service information included in the zip package Confidential 28 Of 28
15. cials support back up capabilities and intra company networking for NewCo Financials The Data Center is evolving to support the additional purpose of DR and backup functionality along with additional processing of new branch office related transactions DR and backup functionality will be expanded for the NewCo Financials locations in San Diego California Houston Texas for the NewCo Call Center and Atlanta Georgia In addition the facility will house servers networked with systems in the South NewCo Data Center in Atlanta Georgia for transaction load balancing and data networking functions While these are the immediate plans it is anticipated that the Data Center in Chicago Illinois will evolve to become a fully redundant facility to support the real time fail over needs of the entire organization Resiliency Resiliency is the ability of the site to maintain operation in the case of failure through alternative methods The battery capacity of the UPS should allow adequate time at full load for the ATS Automatic Transfer Switch to spin up the generator and transfer critical load from the interrupted utility No other critical systems on site were observed to have resilient features Redundancy Redundancy is the duplication of critical components in a system in order to maintain operations in the event of an equipment failure There are two CRAC Computer Room Air Conditioning units providing cooling to the room At the current l
16. d an additional CRAC unit will bring the site to N 1 capacity With these components in place the site would be able to sustain a complete failure of any individual major component and maintain full operation The addition of a like capacity CRAC unit should not be overly cost prohibitive and would provide good value as CRAC units regularly need to be brought down for service and repair The addition of a second UPS could be a much larger project if the ATS does not have the capacity to support a second unit Adding a second UPS would give all dual power supplied equipment in the site a complete second path In the case of a failure of the generator the divided load on the UPS would provide a longer interval before the batteries are depleted allowing additional time for a graceful server shut down or critical repair Completing the tasks in this section should alleviate the identified risks and single points of failure outlined above Increase Site Capacity If the existing level of risk is acceptable but a substantial addition in equipment above the San Diego deployment will be placed in Chicago an alternative option is to increase the capacity of the site without increasing the redundancy or resiliency Replacing the existing UPS and adding another CRAC unit of equal capacity could add up to 50 to the critical electrical load serviced by this site Generator capacity will need review under full load to ensure it has sufficient capacity to ha
17. e p Wy tees to promote even air distribution in the room Recommend moving the power and networking Recommended Floor Tile Layout cables above the floor A best practice is to suspend power and cabling from ladders or in channels on cabinet tops Clear any excess equipment from under the floor old backup tapes extra floor tiles etc These items reduce air flow and should not be stored under the floor Redundancy The two Liebert Challenger 3000 CRAC units are matched in capacity to the 30k VA of the installed UPS Currently the room is at the threshold if not slightly over of a single unit It is recommend that an additional 5 ton CRAC unit be installed in the room with a variable speed fan and an intelligent management system to Confidential 11 Of 28 NewCo Data Center Assessment Metagyre Inc balance the load between the units This configuration would provide N 1 capacity and allow a unit failure or maintenance without service disruption Monitoring Currently the CRAC units are not monitored for fault or duty cycle An earlier attempt to provide duty balancing was tried but for some reason did not detect a head pressure fault that tripped the active unit and resulted in no running CRAC The AC3 unit in place should be wired to the alarm wires on the CRAC units to allow it to detect when a fault occurs so it can activate a standby unit The AC3 is capable of run
18. es e Operational monitoring of all critical components with tested alerting functionality e The power should be traced and properly identified on the panel schedule Outlets should be clearly labeled with panel and circuit information e All dead network cable should be removed Active network cable should be cleaned up and properly routed via cable management Patch cables should have a matching unique identifier on each end to allow verification of the proper cable All equipment should be properly mounted into the racks rather than partially mounted stacked on other equipment or leaning against the cabinets Documentation regarding procedures for equipment maintenance emergency procedures and relevant asset information should be developed and kept current e The batteries in the existing UPS should be replaced on a proper schedule e Verify the alarm wire functionality of the Liebert AC3 controlling the CRAC units Current configuration is full on rather than activating the standby unit in case of fault resulting in the units fighting e Check with the Fire Marshall regarding fire codes regarding EPOs In many cases the fire department does not actually use them There have been a number of high profile outages in large data centers due to accidental or malicious triggering of the EPO Confidential 9 Of 28 NewCo Data Center Assessment Metagyre Inc Add Critical System Redundancy The addition of a second UPS an
19. hlight details of individual improvements Depending on the over all future goals of the data center budget and business risk appetite individually these changes will have varying impacts on the bottom line Computer Room Air Conditioning maue e Floor Tiles The perforated tiles are not laid out to optimize air e e at flow The perforated tiles should be re allocated to i better segregate heat and place the cool air where it will do the most work A number of tiles are still crac Race located as they were when the mainframe was in _ Y Y l place Larger versions of the drawings below are Current Floor Tile Layout included in the appendix pre The current perforated tile placement is shown to the right The square grid lines represent floor tiles Perforated tiles are denoted as dot filled squares The recommended tile placement is shown to the right Initially it may only require a single row of tiles in front of the existing cabinets With the addition of the San Diego equipment part or all of a second row may be required RAMP Raised Floor The 8 raised floor is quite low by today s standards and every effort should be made to keep the under floor space as clear as possibl
20. lowing activities e On site walk through of data center e Reviewed equipment maintenance logs e Discussed data center goals with client e Performed power audit UPS Panel PDUs e Performed cooling audit HVAC heat segregation e Evaluated the data center against industry best practices 1 UPS Uninterruptible Power Supply An electrical apparatus that provides emergency power to a load when the input power source typically the utility mains fail Confidential 5 Of 28 NewCo Data Center Assessment Metagyre Inc Site Overview Current Status The Chicago data center was designed to accommodate up to 24kVA of critical load in a non redundant fashion The Computer Center was originally built to house a Unisys mainframe and related peripheral devices purposed servers for mail remote login access data archiving as well as SONET and other data communication equipment The heat segregation and power distribution infrastructure was built to match the data center s original mainframe purposes Over time the support skill set and working computational load migrated to Microsoft based systems and the mainframe was decommissioned Since its removal the mainframe has been replaced with five racks of various servers and storage equipment The current focus of the site is to support the business needs of the Chicago based branch offices provide the infrastructure required by the Chicago based employees of the parent company NewCo Finan
21. nagement and proper required by local fire codes and may be an mounting of production equipment unnecessary risk for the facility reduce the risk of accidental outages Confidential 21 Of 28 NewCo Data Center Assessment Metagyre Inc Using under desk style power strips and Improper cable enana extension cords increase the risk of increases time to recovery and accidental outages increases risk of related breakage during troubleshooting Confidential 22 Of 28 NewCo Data Center Assessment Metagyre Inc ie A 7 With limited plenum space under the floor any unnecessary items will restrict airflow Non uniform cabinets increase the complexity for moves adds and changes Confidential 23 Of 28 NewCo Data Center Assessment Metagyre Inc Rack Inventory and Power Information Please refer to Chicago xls included in the zip package The first tab includes an accurate representation of the San Diego equipment expected and the loads they will incur on the site Additional tabs include the current inventory of the site Confidential 24 Of 28 NewCo Data Center Assessment Metagyre Inc Liebert Product Documentation AC3 See information included in the zip package Confidential 25 Of 28 NewCo Data Center Assessment Metagyre Inc Liebert Deluxe System 3 See information included in the zip package Confidential 26 Of 28 NewCo Data Center Assessment Metagyre Inc Intellislot Web 485 Card With Ad
22. ndle the increased demand Additionally the CRAC units would likely need to be ducted into the ceiling plenum to better segregate the heat generated Due to the limitations of the raised floor height and the limited space in the ceiling plenum it is not expected that more than a 50 increase could be accomplished safely While a feasible option the effort and cost should be weighed against relocation to a new owned site or collocation Add Critical System Redundancy And Increase Site Capacity If the site will be adding business critical services beyond the San Diego deployment by combining the two previous scenarios yields a N 1 site with greater capacity By far the most complex and costly of the scenarios described the effort and cost should be weighed against relocation to a new owned site or collocation 6 N l denotes that a single device can fail for a particular system without operational impact For example if two CRAC units can maintain the set temperature for the site and four CRAC units are installed the site would be N 2 A N site has no redundancy 7 ATS Automatic Transfer Switch In the event of utility power failure the ATS will trigger the generator to spin up and then cut the UPS and CRAC units from the utility power to the generator to provide for the critical load of the site Confidential 10 Of 28 NewCo Data Center Assessment Metagyre Inc Technology Specific Findings The following recommendations hig
23. ning up to 3 CRAC units For greater control and better energy efficiency talk to the Liebert sales representative about their COM system By networking the CRAC units to the iCOM system it will manage the variable speed fans to most efficiently meet the target temperature set points in the room reducing overall wear tear and cost In addition the system will detect a failure of a CRAC unit and ramp up the remaining units to account for the failure The iCOM system should provide a savings in power cost for the site over time that will make it more attractive in the long run If the iCOM system doesn t provide network monitoring for the current CRAC units ask the Liebert representative about the Liebert IntelliSlot Web 485 Card w Adapter The adapters are about 500 each this card allows the CRAC units to be monitored by the corporate SNMP monitoring trap system providing a common alerting interface to support staff Use of a system like this also prevents the units from fighting having one unit cooling while another is heating to try to maintain set temperature This fighting occurs as each tries to maintain a temperature independently of the other units During the on site survey the units in Chicago were observed to be fighting each other dramatically raising their work load Power UPS Battery Age With the exception of a few that have been replaced the existing batteries are original to the system making the battery st
24. oad there is redundancy in the cooling systems with a single CRAC capable of supporting the site However with the anticipated addition of the new San Diego equipment both units will be required to provide cooling for the work load resulting in a loss of redundancy No other redundant components were observed Existing Risks and Single Points of Failure Every site has some level of risk inherent in its design Resiliency and redundancy offset risk but have increased cost and complexity As a backup DR site where occasional interruption can be tolerated a lower level of resiliency and redundancy may be an appropriate level of infrastructure provided there are adequate monitoring and response procedures 2 DR Disaster Recovery Confidential 6 Of 28 NewCo Data Center Assessment Metagyre Inc Some possible events that could lead to a full site outage Failure of the UPS inverter would cause an instantaneous loss of power to the critical load Failure of the UPS rectifier would necessitate cutting over the UPS to maintenance bypass before the batteries ran out Failure of the battery string in the UPS would cause a loss of power in the case of a utility failure Failure of the generator to start within the time allowed by the UPS causing power loss to the critical load Accidental EPO Emergency Power Off switch activation causing instantaneous loss of power to the critical load Overloading the UPS tripping the main breaker
25. ring ten years old Typically UPS batteries are scheduled for replacement every 5 8 years depending on make with smaller batteries tending to the shorter lifespan Monitoring The Liebert Series 300 UPS can be upgraded with a Liebert IntelliSlot Web 485 Card w Adapter that costs approximately 500 plus the service call and allows the unit to be monitored by a standard corporate SNMP monitoring trap system providing a common alert interface Generator Start Battery An incident occurred in the past where the generator did not start on an outage The root cause identified was a dead starter battery This type of an outage is mitigated by proper maintenance on the generator with regular system start up scheduled testing and replacement of the starter battery Regular maintenance cycle and testing procures should be documented and followed This maintenance and 8 AC3 User Manual is included in the appendix 9 See appendix for additional information on iCOM system Confidential 12 Of 28 NewCo Data Center Assessment Metagyre Inc testing can be out sourced to a generator UPS maintenance company if required UPS The current UPS system is sized to 30kVA to support the current data center limit of 24kVA This does not provide any redundancy in the case of a catastrophic failure with the UPS A second UPS would allow for dual power circuits to each server supporting UPS testing and maintenance without placing the servers at risk of a power
26. tion regarding procedures for equipment maintenance emergency procedures and relevant asset information should be developed and kept current The batteries in the existing UPS should be replaced on a proper schedule Verify the alarm wire functionality of the Liebert AC3 controlling the CRAC units Current configuration is full on rather than activating the standby unit in case of fault resulting in the units fighting 3 CRAC Computer Room Air Conditioner 4 CRAC units are said to be fighting when different units are heating and cooling simultaneously to maintain the set temperature Confidential 7 Of 28 NewCo Data Center Assessment Metagyre Inc e Check with the Fire Marshall regarding fire codes regarding EPO s In many cases the fire department does not actually use them There have been a number of high profile outages in large data centers due to accidental or malicious triggering of the EPO Details of each of these recommendations can be found in the Technology Specific Recommendations section below 5 EPO Emergency Power Off button typically located near the entrance to the data center Activating the EPO will cut all power to the site instantaneously Confidential 8 Of 28 NewCo Data Center Assessment Metagyre Inc Scenarios Management will need to dictate the ultimate role for this facility Depending on the planned use of the space a number of options are available The scenarios below are
27. ucts and having an intelligent parser of these alerts that can forward to pagers cell phones or email with proper escalations allows limited tech to notice and work issues before they impact productivity SNMP traps are today s industry standard for monitoring most all equipment in a data center 2 Access control logs for incident tracking change control and management Cost Minimal This can be a simple as a virtual machine dedicated to syslogging and a binder or spreadsheet Impact Process change requires rigorous methodology to maintain The ITIL framework was built out of the need for change control in the data center The ITIL frame work provides a good foundation for developing the appropriate process controls Benefit A basic tenet of ITIL is change management and tracking Most outages are caused by the last change to the environment Having all systems on a common clock and logging to a single parsable format for event correlation speeds time to recovery in the event of an outage Tracking access to the site and the purpose maintenance correction addition replacement simplifies the reconstruction of events enabling faster recovery from errors 12 See the file sl 29107 pdf in the Appendix directory 13 Information Technology Infrastructure Library information can be found at http www best management practice com Confidential 15 Of 28 NewCo Data Center Assessment Metagyre Inc Maintainability Cabinets

Download Pdf Manuals

image

Related Search

Related Contents

  Samsung 24" LED-monitor med avancerede ergonomiske funktioner Brugervejledning  Polar CS600X User's Manual  jonsered 15.5 cv 42" tracteur no. de modéle lt15 (j8f1542d)  Samsung AME1114TST Manual de Usuario  Samsung BD-C8500 Bruksanvisning  LG GB106 User's Manual  Asahi Pentax Spotmatic F Instruction Manual  Kenmore 6-Year Energy Guide    

Copyright © All rights reserved.
Failed to retrieve file