Table of Contents
4.0 Plan Objectives and Overview
5.0 Disaster Planning
5.1 Disaster Risks and Prevention
5.2 Disaster Preparation
5.3 Backup Procedures
6.0 Initiation of Emergency Procedures
6.1 Safety Issues
6.2 Disaster Notification List
6.3 Activation of Disaster Recovery Plan
6.4 Equipment Protection and Salvage
6.5 Damage Assessment
6.6 Emergency Procurement Procedures
7.0 Initiation of Recovery Procedures
7.1 Cold Site Preparation
7.2 Platform Recovery Procedures
7.3 Application Recovery Procedures
7.4 Critical Applications
8.0 Maintaining the Plan
Contingency and disaster recovery refers to the criteria and procedures used to guide management and technical staff in the recovery of computing and network facilities operated by the College of Public Health Office of Information Technology in the event that a disaster destroys all or part of the facilities.
In accordance with the “security standards” incorporated into the Health Information Portability and Accountability Act, a plan for disaster recovery must be an integral part of the College of Public Health Information Technology procedures and guidelines. The contingency and disaster recovery plan is composed of a number of sections that document resources and procedures to be used in the event that a disaster occurs at the College of Public Health, Office of Information Technology. Each supported computing platform has a section containing specific recovery procedures. There are also sections that document the personnel that will be needed to perform the recovery tasks and an organizational structure for the recovery process.
The contingency and disaster recovery plan is applicable to all College of Public Health system administrators, department administrators, and supervisors responsible for managing critical facilities, including server hardware, software, and data. The collegiate Office of Information Technology is applicable as an administrator of the core server infrastructure for the College of Public Health.
Over the years, the mission critical dependence upon the use of computers in the day-to-day business activities of many organizations has become standard. The University of Iowa, College of Public Health certainly is no exception to this trend. Today you can find very powerful computers in every department on campus. These machines are linked together by a sophisticated network that provides communications with other machines across campus and around the world. Vital functions of the College depend on the availability of this network of computers.
Consider for a moment the impact of a disaster that prevents the use of the system to process Student Registration, Payroll, Accounting, Healthcare, Clinical Research or any other vital application for weeks. Students and faculty rely upon our systems for instruction and research purposes, all of which are important to the well-being of the College, as well as the University. It is hard to estimate the damage to the College that such an event might cause. One tornado properly placed could easily cause enough damage to disrupt these and other vital functions of the College. Without adequate planning and preparation to deal with such an event, the collegiate computer systems could be unavailable for many weeks.
As important as having a disaster recovery plan is, taking measures to prevent a disaster or to mitigate its effects beforehand is even more important. This portion of the plan reviews the various threats that can lead to a disaster, where our vulnerabilities are, and steps we should take to minimize our risk. The threats covered here are both natural and human-created.
- Tornados and High Winds
- Computer Crime
- Terrorist Actions and Sabotage
The threat of fire in any collegiate building, especially in the General Hospital server room area, is very real and poses the highest risk factor of all the causes of disaster mentioned here. All collegiate buildings are filled with electrical devices and connections that could overheat or short out and cause a fire.
The computers within the facility also pose a quick target for arson from anyone wishing to disrupt collegiate operations. Wide area fires, such as those common in recent years in California, are also a possibility in dry times.
All collegiate buildings, including the General Hospital server room, are equipped with a fire alarm system, with ceiling-mounted smoke detectors scattered widely throughout the building.
Hand-held fire extinguishers are required in visible locations throughout the building. Staff are to be trained in the use of fire extinguishers.
General Hospital is built primarily of non-combustible materials. The risk to fire can be reduced when new construction is done, or when office furnishings are purchased, to acquire flame resistant products.
Training and Documentation
Detailed instructions for dealing with fire are present in Standard Operating Procedures documentation. Staff are required to undergo training on proper actions to take in the event of a fire. Staff are required to demonstrate proficiency in periodic, unscheduled fire drills.
Regular review of the procedures should be conducted to insure that they are up to date. Unannounced drills should be conducted by an impartial administrator and a written evaluation should be produced for the department heads housed in the building.
Regular inspections of the fire prevention equipment are also mandated. Fire extinguishers are periodically inspected as a standard policy.
Iowa City is split in half by the Iowa City River. However, the majority of collegiate buildings are located on higher ground. In particular, General Hospital is located atop a hill on the west side of campus. The likelihood of a natural flood is low. However, a flood due to a water main break, sprinkler system malfunction, or roof leak is a strong concern. Flood waters penetrating the machine room can cause a lot of damage. Not only could there be potential disruption of power caused by the water, flood waters can bring in mud and silt that can destroy sensitive electrical connections. Of course, the presence of water in a room with high voltage electrical equipment can pose a threat of electrical shock to personnel within the machine room.
The collegiate facilities coordinator is in direct contact with the Director of Information Technology on a continuing basis for any changes in water/sewer infrastructure within General Hospital.
Periodic inspections of the machine room must be conducted to detect water seepage, especially any time there is a heavy downpour. Operators should be trained in shutdown procedures and drills should be conducted on a regular basis. Facility Services contact phone numbers are to be posted in the collegiate Office of Information Technology. Also, staff in the machine room should be trained in responding to victims of electrical shock.
Tornadoes and High Winds
As the University of Iowa is situated in “tornado country,” damage due to high winds or an actual tornado is a very real possibility. A tornado has the potential for causing the most destructive disaster we face.
While a fire can be as destructive as a tornado, there are very few preventative measures that we can take for tornados. Building construction makes a big difference in the ability of a structure to withstand the forces of high winds. General Hospital is located within the University of Iowa Hospital and Clinics. Strong winds are often accompanied by heavy rain, so a double threat of wind and water damage exists if the integrity of the roof is lost. Unfortunately the server room is located directly below the roof of General Hospital. A new roof was installed in 2003.
All occupants of collegiate buildings on campus should know where the strong points of the building are and directed to seek shelter in threatening weather. The collegiate Office of Information Technology server room is often unaware of outside weather conditions, so the Office should be equipped with a radio or other warning device.
The collegiate Office of Information Technology should have large tarps or plastic sheeting available in the server room area ready to cover sensitive electronic equipment in case the building is damaged. Protective covering should also be deployed over magnetic tape racks to prevent water and wind damage. Operators should be trained how to properly cover the equipment.
The threat of an earthquake in the Iowa City area is low, but should not be ignored. Scientists have predicted that a large earthquake along the New Madrid fault may happen any time in the next 50 years, and that its effects will be felt as far away as our area. Buildings in our area are not built to earthquake resistant standards like they are in quake-prone areas like California. So we could expect light to moderate damage from the predicted quake.
An earthquake has the potential for being the most disruptive for this disaster recovery plan. If the General Hospital server room is damaged, it is highly probable that the “cold site” on campus may also be similarly affected. Restoration of computing and networking facilities following a bad earthquake could be very difficult and require an extended period of time due to the need to do wide-scale building repairs.
The preventative measures for an earthquake can be similar to those of a tornado. Building construction makes all the difference in whether the facility will survive or not. Even if the building survives, earthquakes can interrupt power and other utilities for an extended period of time. Standby power generators could be purchased or leased to provide power while commercial utilities are restored.
The collegiate Office of Information Technology should have large tarps or plastic sheeting available in the machine room area ready to cover sensitive electronic equipment in case the building is damaged. Protective covering should also be deployed over magnetic tape racks to prevent water and wind damage. Operators should be trained how to properly cover the equipment.
Computer crime is becoming more of a threat as computer workstations become more highly distributed. With the new networking technologies, more potential for improper access is present than ever before.
Computer crime usually does not affect hardware in a destructive manner. It may be more insidious, and may often come from within. A disgruntled employee can build viruses or time bombs into applications and systems code. A well-intentioned employee can make coding errors that affect data integrity (not considered a crime, of course, unless the employee deliberately sabotaged programs and data).
All systems should have security products installed to protect against unauthorized entry. All systems should be protected by passwords, especially those permitting updates to data. All users should be required to change their passwords on a regular basis. All security systems should log invalid attempts to access data, and security administrators should review these logs on a regular basis.
All systems should be backed up on a periodic basis. Those backups should be stored in an area separate from the original data. Physical security of the data storage area for backups must be implemented. Standards should be established on the number of backup cycles to retain and the length of their retention.
Continue to improve security functions on all platforms. Strictly enforce policies and procedures when violations are detected. Regularly let users know the importance of keeping their passwords secret. Let users know how to choose strong passwords that are very difficult to guess.
Improve network security. Shared network infrastructure, such as Ethernet and wireless networking, are susceptible to sniffing activities, which unscrupulous users may use to capture passwords. Implement stronger security mechanisms over the network, such as one-time passwords, data encryption, and non-shared wire media.
Terrorist Action and Sabotage
The University’s computer systems are always potential targets for terrorist actions. The threat of kidnapping of key personnel also exists.
Good physical security is extremely important. However, terrorist actions can often occur regardless of in-building security, and they can be very destructive. A bomb placed next to an exterior wall of the server room will likely breach the wall and cause damage within the room.
Given the freedom that we enjoy within the United States at this time, almost no one will accept the wide-scale planning, restrictions, and costs that would be necessary to protect General Hospital, as well as other collegiate buildings, from a bomb. Some commonsense measures can help, however.
The building should be adequately lit at night on all sides. All doors into the server room area should be strong and have good locks. Entrances into the server room should be locked at all times. Only those people with proper security clearances should be permitted into the server room area. Suspicious parties should be reported to the police (they may not be terrorists, but they may have theft of expensive computer equipment in mind).
Maintain good building physical security. Doors into the server room area should be locked at all times. All visitors to the server room should receive prior authorization. Server and workstation operating system security, including the newest security patches, is important to maintain a protected cyber environment.
In order to facilitate recovery from a disaster which destroys all or part of the server room in General Hospital, certain preparations have been made in advance. This document describes procedures for a quick and orderly restoration of facilities in the collegiate Office of Information Technology.
The following topics for disaster preparation include:
- Disaster Recovery Planning
- Recovery Facility
- Replacement Equipment
- Disaster Lock Boxes
Disaster Recovery Planning
To begin, a plan should be established. The overall plan should include responses to specific disasters, while maintaining flexibility and adaptability.
Every other business unit within the University should develop a plan on how they will conduct business, both in the event of a disaster in their own building or a disaster at the collegiate Office of Information Technology. Those business units should develop procedures to function while the computers and networks are down, plus they need a plan to synchronize the data that is restored on the central computers with the current state of affairs. For example, if the Payroll Office is able to produce a payroll while the central computers are down, that payroll data will have to be re-entered into the central computers when they return to service. Having a means of tracking all expenditures such as payroll while the central computers are down is extremely important.
If a central facility operated by the collegiate Office of Information Technology is destroyed in a disaster, repair or rebuilding of that facility may take an extended period of time. In the interim it will be necessary to restore computer and network services at an alternate site.
The College has a number of options for alternate sites, each having a varying degree of up-front costs.
This is probably the most expensive option for being prepared for a disaster, and is typically most appropriate for very large organizations. A separate computer facility, possibly even located in a different city, can be built, complete with computers and other facilities ready to cut in on a moment’s notice in the event the primary facility goes offline. The two facilities must be joined by high speed communications lines so that users at the primary campus can continue to access the computers from their offices and classrooms.
Disaster Recovery Company
A number of companies provide disaster recovery services on a subscription basis. For an annual fee (usually quite steep) you have the right to a variety of computer and other recovery services on extremely short notice in the event of a disaster. These services may reside at a centralized hot site or sites that the company operates, but it is necessary for you to pack up your backup tapes and physically relocate personnel to restore operations at the company’s site. Some companies have mobile services which move the equipment to your site in specially prepared vans. These vans usually contain all of the necessary computer and networking gear already installed, with motor generators for power, ready to go into service almost immediately after arrival at your site. (Note: Most disaster recovery companies that provide these types of subscription services contractually obligate themselves to their customers to not provide the services to any organization who has not subscribed, so looking to one of these companies for assistance after a disaster strikes will likely be a waste of time.)
Some organizations will team up with others in a partnership with reciprocal agreements to aid each other in the event of a disaster. These agreements can cover simple manpower sharing all the way up to full use of a computer facility. Often, however, since the assisting partner has to continue its day-to-day operations on its systems, the agreements are limited to providing access for a few key, critical applications that the disabled partner must run to stay afloat while its facilities are restored. The primary drawback to these kinds of partnerships is that it takes continual vigilance on behalf of both parties to communicate the inevitable changes that occur in computer and network systems so that the critical applications can make the necessary upfront changes to remain operational. Learning that you can’t run a payroll, for instance, at your partner’s site because they no longer use the same computer hardware or operating system that you need is a bitter pill that no one should swallow.
One of the most critical issues involved in the recovery process is the availability of qualified staff to oversee and carry out the tasks involved. This is often where disaster partnerships can have their greatest benefit. Through cooperative agreement, if one partner loses key personnel in the disaster, the other partner can provide skilled workers to carry out recovery and restoration tasks until the disabled partner can hire replacements for its staff. Of course, to be completely fair to all parties involved, the disabled partner should fully compensate the assisting partners for use of their workers unless there has been prior agreement not to do so.
The use of reciprocal disaster agreements of this nature may work well as a low-cost alternative to hiring a disaster recovery company or building a hot site. And they can be used in conjunction with other arrangements, such as the use of a cold recovery site described below. The primary drawback to these agreements is that they usually have no provision for providing computer and network access for anything other than predefined critical applications. So users will be without facilities for a period of time until systems can be returned to operation.
A cold recovery site is an area physically separate from the primary site where space has been identified for use as the temporary home for the computer and network systems while the primary site is being repaired. There are varying degrees of “coldness”, ranging from an unfinished basement all the way to space where the necessary raised flooring, electrical hookups, and cooling capacity have already been installed, just waiting for the computers to arrive.
The College of Public Health has chosen to use the cold site approach for this disaster recovery plan. The College of Public Health is distributed across sixteen different locations on and off campus. The necessary agreements are in place for the collegiate Office of Information Technology to utilize space in the US Bank building in downtown Iowa City as its Cold Site. It has adequate space to house the hardware, with some office space available for operating and technical personnel. It has good connectivity to the campus fiber optic network and a preparation has been made for electrical and cooling capacity to support servers and network equipment. A contingency plan to the US Bank location is the Institute for Rural and Environmental Health building located on the Oakdale campus in Coralville.
This plan contains a complete inventory of the components of each of the computer and network systems and their software that must be restored after a disaster. The inevitable changes that occur in the systems over time require that the plan be periodically updated to reflect the most current configuration. Where possible, agreements have been made with vendors to supply replacements on an emergency basis. To avoid problems and delays in the recovery, every attempt should be made to replicate the current system configuration. However, there will likely be cases where components are not available or the delivery timeframe is unacceptably long. The collegiate Office of Information Technology will have the expertise and resources to work through these problems as they are recognized. Although some changes may be required to the procedures documented in the plan, using different models of equipment or equipment from a different vendor may be suitable to expediting the recovery process.
New hardware can be purchased. New buildings can be built. New employees can be hired. But the data that was stored on the old equipment cannot be bought at any price. It must be restored from a copy that was not affected by the disaster. There are a number of options available to us to help ensure that such a copy of your data survives a disaster at the primary facility.
Remote Dual Copy
This option calls for a disk subsystem located at a site away from the primary computer facility and fiber optic cabling coupling the remote disk to the disk subsystem at the primary site. Data written to disk at the primary site are automatically transmitted to the remote site and written to disk there as well. This guarantees that you have the most up-to-the-second updates for the databases at the primary site in case it is destroyed. You can simplify the recovery process by locating the remote disk subsystem at the disaster recovery site. This option is somewhat expensive, but not prohibitively so. It does not require that an entire computer system be built at a hot site, just the disk subsystem.
Automated Off-Site Tape Backup
This option calls for a robotic tape subsystem located at a site away from the primary computer facility and fiber optic cabling (the campus backbone network would be suitable) coupling the subsystem to the primary computer facility. Copies of operating system data, application and user programs, and databases can be transmitted to the remote tape subsystem where it is stored on magnetic tape (optical writable disk media can also be used, but may be more expensive).
While this option does not guarantee the up-to-the-second updates available with the remote dual copy disk option, it does provide means for conveniently taking backups and storing them off-site any any time of the day or night. Another huge advantage is that backups can be made from mainframes, file servers, distributed (unix-based) systems, and personal computers. Although such a system is expensive, it is not prohibitively so.
Off-Site Tape Backup Storage
This option calls for the transportation of backup tapes made at the primary computer facility to an off-site location. Choice of the location is important. You want to ensure survivability of the backups in a disaster, but you also need quick availability of the backups.
This option has some drawbacks. First, there is a period of exposure from the time that a backup is made to the time it can be physically removed off-site. A disaster striking at the wrong time may result in the loss of all data changes that have occurred from the time of the last off-site backup. There is also the time, expense, and energy of having to transport the tapes. And there is also the risk that tapes can be physical damaged or lost while transporting them.
Some organizations contract with disaster recovery companies to store their backup tapes in hardened storage facilities. These can be in old salt mines or deep within a mountain cavern. While this certainly provides for more secure data storage, considerable expense is undertaken for regular transportation of the data to the storage facility. Quick access to the data can also be an issue if the storage facility is a long distance away from your recovery facility.
The collegiate Office of Information Technology has opted to take periodic backups of its primary database servers, file servers, web servers, research servers, mail and calendaring servers, and UNIX systems and storing those backups at an off-site location elsewhere on campus. The primary storage location is in the US Bank building, in coordination with the State Health Registry of Iowa.
Disaster Lock Boxes
To ensure that an up-to-date copy of this plan is available when a disaster occurs, procedures have been established to store a copy of the plan with other important recovery information at the Cold Site backup tape storage area. Two “lock boxes” or safes have been purchased to hold these materials. The contents of both lock boxes are identical. One resides at the US Bank building; the other resides in the tape vault just off the server room in the collegiate Office of Information Technology, located in General Hospital.
When changes to the contents of the lock boxes are necessary, the documents at the collegiate Office of Information Technology in General Hospital is first updated, then it is take over to US Bank building and swapped with the documents stored there. This ensures that at least one copy of the plan is available at the recovery site.
The lock boxes are to remain locked at all times. Keys to the boxes are kept by several key people within the department, including:
- Director of Information Technology – Tim Shie
- State Health Registry of Iowa – Gary Hulett
- System Administrators – Michael Brady and Jeremy Stoltenberg
In a disaster situation when entry into a lock box is needed but the key is not available, you can physically break the lock with bolt cutters or locksmith.
Every system that the collegiate Office of Information Technology operates is backed up regularly. The backup media for each of these systems is relocated to an off-site storage area where there is a high probability that the media will survive in the event a disaster strikes.
Regular backup procedures include:
- Server backups will be performed every business night, excluding holidays.
- Backups performed on Friday will be kept for a month before recycling.
- The last backup of every month will be considered the monthly backup and kept for a year before recycling.
- Monthly backup tapes will be stored in a fireproof safe.
- The last two monthly tapes will be stored off-site in a fireproof safe.
- Backups will be performed and monitored by a fulltime IT staff member.
- Backups will be automated using Veritas Backup Exec, Arcserve or similar software product.
- Tapes will be inserted routinely every night before leaving work.
- Backup failures will be reported to the Director of Information Technology and action will be taken quickly to fix the problem.
- Backups will always be performed before upgrading or modifying a server.
In almost any disaster situation, hazards and dangers can abound. While survival of the disaster itself can be a harrowing experience, further injury or death following the disaster stemming from carelessness or negligence is senseless.
All personnel must exercise extreme caution to ensure that physical injury or death is avoided while working in and around the disaster site itself. No one is to perform any hazardous tasks without first taking appropriate safety measures.
|This document contains safety warnings in several places that recovery personnel should heed. These warnings are all marked with a special symbol:
Any time this symbol is displayed in this document, take the time to read through the warning thoroughly to understand what the hazards are and how to prevent injury.
There are hazardous materials present in the majority of College of Public Health buildings. Three primary sources exist for these materials:
- Janitorial supplies – hazardous chemicals are present in the janitorial closets scattered throughout the building. The door to each closet contains a list of the chemicals present in the closet. If this information is not present at the scene of the disaster, contact Facility Services for a list of the chemicals located in the building.
- Battery acid – hazardous battery acid is present in Uninterruptible Power Supply units located in the server room. Battery acid can cause caustic skin burns, blindness, and pulmonary distress if inhaled. If you come in contact with battery acid, immediately seek a source of water and wash the affected areas continuously until medical assistance can be sought.
- Automotive fluids – hazardous substances related to the operation of a motor vehicle are present throughout the University of Iowa. These can include, but are not limited to, gasoline, motor oil, brake fluid, antifreeze, lubricants, and battery acid.
Approach any collection of a hazardous material with caution. Notify the nearest safety personnel in the event of a hazardous material spill. Unless you have had the necessary training to do so, do not attempt to clean up a hazardous material spill yourself. Allow the local HAZMAT team to evaluate, neutralize, and clean up any spills.
Recovery from a disaster will be a very stressful time for all personnel involved. Each manager should be careful to monitor the working hours of his staff to avoid over-exertion and exhaustion that can occur under these conditions. A good approach is to divide your team members into shifts and rotate on a regular basis. This will keep team members fresh and also provide for needed time with family.
PTSD – Post-traumatic Stress Disorder is a very real condition that can affect survivors and recovery workers in a disaster. All recovery managers and coordinators should be alert to symptoms in their employees that indicate PTSD and seek assistance from the necessary counseling services. Symptoms usually manifest themselves as:
Intrusions: The individual experiences flashbacks or nightmares where the traumatic event is re-experienced.
Avoidance: The individual tries to reduce exposure to people or things that might bring on their intrusive symptoms.
Hyperarousal: The individual exhibits physiologic signs of increased arousal, such as hyper vigilance or increased startle response.
The disaster notification list for the collegiate Office of Information Technology is shown below. These people are to be notified as soon as possible when disaster threatens or occurs.
|Emergency Fire, Ambulance, Rescue, Police, and HAZMAT||
911 or 9-911
|University Facility Services Help Desk||
|Hospital Security Office||
Information Technology Primary Notification List
|Rob Higareda||Systems Administration||384-5472||n/a||594-6580|
|Brian Beninga||Systems Administration||384-5473||n/a||936-6402|
Other Information Technology Contacts
|Jane Drews, IT Security Officer||335-6332||n/a||n/a|
|Campus IT Help Desk||384-4357||n/a||n/a|
|HCIS Help Desk||335-6500||n/a||n/a|
Appointment of Recovery Manager
The first order of business is to appoint the Recovery Manager. The person most appropriate for the position is the current collegiate Director of Information Technology. If the Director is unavailable, the appointment should be made by the Collegiate Administrator or Assistant to the Dean. This person must have management experience and must have signature authority for the expenditures necessary during the recovery process.
Determine Personnel Status
One of the Recovery Manager’s important early duties is to determine the status of personnel working at the time of the disaster. Safety personnel on site after the disaster will affect any rescues or first aid necessary to people caught in the disaster. However, the Recovery Manager should produce a list of the able-bodied people who will be available to aid in the recovery process.
The Recovery Manager should also take responsibility for identifying anyone injured or killed in the disaster. The Recovery Manager will work with families and employees, ministering to their needs and obtaining counseling services as necessary.
Taking care of our people is a very important task and should receive the highest priority immediately following the disaster. While we will have a huge technical task of restoring computer and network operations ahead of us, we can’t lose sight of the human interests at stake.
Equipment/Media Protection and Salvage
A primary goal of the recovery process is to restore all computer operations without the loss of any data. It is important that the Recovery Manager appoint the Technical Coordinator quickly so that he can immediately set about the task of protecting and salvaging any magnetic media on which data may be stored. This includes any magnetic tapes, optical disks, CD-ROMs, and disk drives.
Establish the Recovery Control Center
The Recovery Control Center is the location from which the disaster recovery process is coordinated. The Recovery Manager should designate where the Recovery Control Center is to be established. If a location in the collegiate Office of Information Technology, located in General Hospital, is not suitable, the US Bank building has been designated as the off-site location of the center.
Activating the Disaster Recovery Plan
The Recovery Manager sets the plan into motion. Early steps to take are as follows:
- The Recovery Manager should retrieve the Disaster Recovery Lock Box located in either the General Hospital or US Bank Building and open it to obtain an up-to-date copy of the Disaster Recovery Plan. This plan is in printed form in the box as well on computer media (diskette or CD-ROM). Copies of the plan should be made and handed out at the first meeting of the Recovery Management Team. The Recovery Manager is responsible for the remaining contents of the Lock Box, which should probably be relocked if possible.
- The Recovery Manager is to appoint the remaining members of the Recovery Management Team. This should be done in consultation with surviving members of the collegiate Office of Information Technology staff and Facility Services management, and with upper University and Collegiate Administration approval. The Recovery Manager’s decision about who sits on the Recovery Management Team is final, however.
- The Recovery Manager is to call a meeting of the Recovery Management Team at the Recovery Control Center or a designated alternate site. The Dean or Associate Dean of the College of Public Health is to be invited to this meeting. The following agenda is suggested for this meeting:
- Each member of the team is to review the status of their respective areas of responsibility.
- After this review, the Recovery Manager makes the final decision about where to do the recovery. If the US Bank building is to be used, the Recovery Manager is to declare emergency use of the facility and notify the Dean or Associate Dean of the College of Public Health immediately.
- The Recovery Manager briefly reviews the Disaster Recovery Plan with the team.
- Any adjustments to the Disaster Recovery Plan to accommodate special circumstances are to be discussed and decided upon.
- Each member of the team is charged with fulfilling his/her respective role in the recovery and to begin work as scheduled in the Plan.
- Each member of the team is to review the makeup of their respective recovery teams. If individuals key to one of the recovery teams is unavailable, the Recovery Manager is to assist in locating others who have the skills and experience necessary, including locating outside help from other area computer centers or vendors.
- The next meeting of the Recovery Management Team is scheduled. It is suggested that the team meet at least once each day for the first week of the recovery process.
- The Recovery Management Team members are to immediately start the process of contacting the people who will sit on their respective recovery teams and call meetings to set in motion their part of the recovery.
- The Dean or Associate Dean of the College of Public Health is responsible for immediately clearing the Recovery Control Center room, US Bank building, for occupation by the Recovery Management Team. This includes the immediate relocation of any personnel occupying the room. The Dean or Associate Dean should assist the Administrative Coordinator in locating baseline facilities for the recovery room:
- Office desks and chairs
- Computer Workstation (including data service)
- Fax machine
- Mobile communications will be important during the early phases of the recovery process. This need can be satisfied through the use of cellular telephones and/or two-way radios. The University has an existing contract with US Cellular and Nextel for cellular service, and Facility Services has two-way radio units that may be available upon request.
This document contains information on procedures to be used immediately following an incident to preserve and protect resources in the area damaged.
It is extremely important that any equipment, magnetic media, paper stocks, and other items at the damaged primary site be protected from the elements to avoid any further damage. Some of this may be salvageable or repairable and save time in restoring operations.
- Gather all magnetic tape cartridges into a central area and quickly cover with tarps or plastic sheeting to avoid water damage.
- Cover all computer equipment to avoid water damage.
- Cover all undamaged paper stock to avoid water damage.
- Ask the police to post security guards at the primary site to prevent looting or scavenging.
Salvage Magnetic and Optical Media
The magnetic and optical media on which our data is stored is priceless. Although we retain backups of our disk subsystems and primary application systems off-site, magnetic tapes stored in the tape vault and machine room area contain extremely valuable information. If the media has been destroyed, such as in a fire, then nothing can be done. However, water and smoke damage can often be reversed, at least good enough to copy the data to undamaged media.
After protecting the media from further damage, recovery should begin almost immediately to avoid further loss. A number of companies exist with which the University can contract for large scale media recovery services.
The following are links to sites that provide additional information about salvage techniques for magnetic and other media.
As soon as practical, all salvageable equipment and supplies need to be moved to a secure location. If undamaged, transportation should be arranged through the Recovery Manager to move the equipment to the Cold Site, or to another protective area (such as a warehouse) until the Cold Site is ready.
Take great care when moving the equipment to avoid damage.
If the equipment has been damaged, but can be repaired or refurbished, the Cold Site may not be the best location for the equipment, especially if there is water or fire damaged that needs to be repaired. Contractors may recommend an alternate location where equipment can be dried out, repainted, and repaired.
As soon as practical, a complete inventory of all salvageable equipment must be taken, along with estimates about when the equipment will be ready for use (in the case that repairs or refurbishment is required). This inventory list should be delivered to the Technical Coordinator and Administrative Coordinator who will use it to determine which items from the disaster recovery hardware and supplies lists must be procured to begin building the recovery systems.
This damage assessment is a preliminary one intended to establish the extent of damage to critical hardware and the facility that houses it. The primary goal is to determine where the recovery should take place and what hardware must be ordered immediately.
Team members should be liberal in their estimate of the time required to repair or replace a damaged resource. Take into consideration cases where one repair cannot begin until another step is completed. Estimates of repair time should include ordering, shipping, installation, and testing time.
In considering the hardware items, consider first the equipment lists provided in the recovery sections for each platform. These lists were constructed primarily for recovery at the cold site so they consist of the critical components necessary to recovery. You will need to separate items into two groups. One group will be composed of items that are missing or destroyed. The second will be those that are considered salvageable. These “salvageable” items will have to be evaluated by hardware engineers and repaired as necessary. Based on input from this process, the Recovery Management team can begin the process of acquiring replacements.
With respect to the facility, evaluation of damage to the structure, electrical system, air conditioning, and building network should be conducted. If estimates from this process indicate that recovery at the original site will require more than 14 days, migration to the cold site is recommended.
The success or failure of this plan’s ability to recover the collegiate computer and network facilities hinges on our ability to purchase goods and services in a timely manner.
The Recovery Manager must have a sound financial plan and procedures for aggressive recovery actions. Perhaps now is the time for a word of caution. There will always be a day of reckoning following every exciting event, when those actions taken under the stress of the moment will be examined and evaluated in the light of normalcy. You can significantly reduce your anxiety level in the eve of such financial accounting by following preset rules and directives – to the extent possible under the circumstances – and most importantly, keeping records and logs of transactions.
The Administrative Support Coordinator is responsible for all emergency procurement for the collegiate Office of Information Technology. All Disaster Recovery Team members must submit their requests to the Coordinator. The Coordinator will follow the regulations established for emergency procurement and will work with the Buyer that has been appointed by the Purchasing Office to complete the acquisition.
The Administrative Support Coordinator is also responsible for tracking all acquisitions to ensure that financial records of the disaster recovery process are maintained and that all acquisition procedures will pass audit review.
The Administrative Support Coordinator must also be aware of the University’s insurance coverage to know what is and is not allowed under our policies. In the event an item to be purchased is disallowed by insurance coverage, or if expenses exceed the dollar limits of the insurance coverage, the Coordinator must consult with the Recovery Manager and other responsible University personnel (such as the University’s Business Manager).
This document focuses on the preparation of the designated Cold Site for the recovery of primary computing and network facilities after a disaster has occurred. By agreement with the State Health Registry of Iowa and in coordination with the College of Public Health, the collegiate Office of Information Technology have the option to use space in the US Bank building, located in downtown Iowa City. If the Recovery Management Team opts to use this site for recovery after the disaster, some work must be done to convert the space from its present use to be able to house the computer systems, network equipment and disaster recovery team personnel.
In an extreme disaster where the US Bank building has also been rendered unusable, an alternate site must be chosen, such as the Institute for Rural and Environmental Health building located on the Oakdale campus. Those sites may require additional work to prepare for the special power and cooling requirements of the server equipment. Before considering off-campus sites, be sure to consider the need for proper telecommunications and networking connections to the building, including fiber optic cable to the campus network.
Cold Site Spaces
The Cold Site in the US Bank building is located in downtown Iowa City. The space consists of a secure server room and several offices. These areas must all be cleared or shared in order to make room for the computer and network equipment necessary to re-establish the facilities of the collegiate Office of Information Technology. The server room and offices are equipped with adequate electricity, ventilation, and network bandwidth. The US Bank building is connected to the campus fiber optic network. The Disaster Recovery Team should work closely with the Dean or Associate Dean of the College of Public Health for access or changes needed to these rooms during the recovery process.
Quick Review of Site Preparation Work
The Cold Site has only had minimal advanced preparations; so much work is to be done in the early stages of the recovery process to make the site ready. Here is a quick review of the facilities and work that must be done.
- The computer equipment, file cabinets, and furniture in the US Bank allocated offices must be relocated to another area within the building. All occupied offices in the US Bank which are allocated to the Recovery Management Team must be cleared to make the space available for the temporary relocation of the collegiate Office of Information Technology.
- The collegiate Office of Information Technology will share the secure server room in US Bank with the State Health Registry of Iowa. The State Health Registry of Iowa is the current owner of the space.
- Adequate power capacity is already available within the building, and some provisions have been made to provide power to the current space.
- The site has limited power conditioning equipment, such as uninterrupted power supply (UPS). The Plan does not call for a UPS, which may put the equipment installed at risk of power interruptions, a risk we must accept due to the temporary conditions of the Cold Site.
- The facility has only adequate ventilation and air conditioning. The Recovery Management Team may need to consider adding or upgrading the current controlled air unit.
- Access to the Cold Site server is restricted. Keys or access codes will be issued to all appropriate server administrators within the collegiate Office of Information Technology.
- Terminations to the fiber optic cabling for the campus backbone network is located within the Cold Site. Additional fiber optic cable may need to be installed to extend the backbone to other points within the site.
- There is access to a loading dock in the vicinity of the Cold Site where large trucks can deliver equipment and furniture.
- Telephone numbers will be re-routed to the temporary offices in US Bank building.
This portion of the plan documents a list of collegiate platforms/servers to be restored at the recovery facility. All platforms/servers require server hardware, operating system, application service, security policy, network connectivity (100BaseT), controlled air conditioning, and power conditioning. The collegiate Office of Information is only required to recovery productions servers. Test servers will take lower precedence.
List of Platforms/Servers to Recover
- Active Directory Services Domain Controllers
- Exchange 2003 Mail and Calendaring Server
- SQL 2000 Database Server
- Internet Information Service Web Server
- Internet Information Service FTP Server
- Windows 2003 File and Print Server
- IBDR Oracle Database Server
- RMEREP Database Server
- Linux Server
- Statistical Genetics Linux Cluster
- Video Streaming Server
- BCBS Database Server
- Terminal Services Server
- Symantec Anti-Virus Console Server
- Backup Server
- Quitline Linux Web Server
- CPHS1 Database Server
- Test Database Server
- Test Web Server
- Test Linux Server
Once the platform system software and subsystems are operating correctly, the task of preparing the remaining end-user applications can begin. Each platform will have a unique recovery road to follow. In some cases, there may be very little to do except for general testing. In other cases, considerable analysis and data synchronization work will likely be required.
The Applications Recovery Team will be responsible for carrying out this phase of the recovery. Each application area will require a review. This review should be conducted by an analyst familiar with the application while working closely with an application user representative.
Items to be considered should include:
- Review of the user department Disaster Recovery Plan with special attention to any “interim” procedures that have been required in the time period since the disaster event occurred.
- Review of the application documentation concerning file and database recovery.
- Review the status of files and databases after the general platform recovery processing is complete.
- Identify any changes to bring the application to a ready for production status.
- Identify any areas where the application must be synchronized with other applications and coordinate with those application areas.
- Identify and review application outputs to certify the application ready for production use.
The College has identified Active Directory Services as a critical application. Active Directory Services represents the core infrastructure for authentication across the majority of enterprise IT services, as well as local collegiate services. This means that delaying the processing of this application could cause much hardship on faculty, staff, students, and others that depend on it.
Having a disaster recovery plan is critical. However, the plan will rapidly become obsolete if a workable procedure for maintaining the plan is not also developed and implemented. This document provides information about the document itself, standards used in its construction, and maintenance procedures necessary to keep it up to date.
Web Server Accessible
This disaster recovery plan has been designed to be accessible as a World Wide Web document retrievable from a web server or through a “fake-scape” browser (e.g., Netscape file browse mode). This makes it easy to access the plan for periodic review and provides a convenient means for structuring the plan in an online fashion. It is presently maintained on the University of Iowa, College of Public Health, web server, www.public-health.edu, as a set of HTML-formatted text files and images (GIF).
Certain standards have been implemented into the design of this document to provide consistency in format and use. All maintainers of the plan should use these standards when adding to or revising the plan.
The plan will be routinely evaluated once each year. All portions of the plan will be reviewed by the collegiate Office of Information Technology. In addition the plan will be tested on a regular basis and any faults will be corrected. The Disaster Recovery Plan coordinator has the responsibility of overseeing the individual documents and files and ensuring that they meet standards and consistent with the rest of the plan.
It is inevitable in the changing environment of the computer industry that this disaster recovery plan will become outdated and unusable unless someone keeps it up to date. Changes that will likely affect the plan fall into several categories:
- Hardware changes
- Software changes
- Facility changes
- Procedural changes
- Personnel changes
As changes occur in any of the areas mentioned above, the collegiate Director of Information Technology will determine if changes to the plan are necessary. This decision will require that the managers be familiar with the plan in some detail. A document referencing common changes that will require plan maintenance will be made available and updated when required.
Changes that affect the platform recovery portions of the plan will be made by the staff in the affected area. After the changes have been made, the collegiate Director of Information Technology will be advised that the updated documents are available. They will incorporate the changes into the body of the plan and distribute as required.
Changes Requiring Plan Maintenance
The following lists some of the types of changes that may require revisions to the disaster recovery plan. Any change that can potentially affect whether the plan can be used to successfully restore the operations of the collegiate computer and network systems should be reflected in the plan.
- Additions, deletions, or upgrades to hardware platforms.
- Additions, deletions, or upgrades to system software.
- Changes to system configuration.
- Changes to applications software affected by the plan.
- Changes that affect the availability/usability of the Cold Site location (US Bank).
- Changes to the General Hospital server room that affect Cold Site choice such as enlargement cooling or electrical requirements etc.
- Changes to personnel identified by name in the plan.
- Changes to organizational structure of the department.
- Changes to off-site backup procedures, locations, etc.
- Changes to application backups.
- Changes to vendor lists maintained for acquisition and support purposes.