Archive for the ‘resilience’ tag
Three months ago I blogged about the Conficker worm and its relevance for emergency managers. Since then, I’ve rumours that a number of health agencies were still having problems with their email systems. The reason I raise this again, is that now, with a large national response to a potential pandemic taking place, one hopes that Conficker has been well and truly removed from all Health systems (both Ministry and DHB).
If Conficker is still impacting on health agency IT systems during this period of increased activity, then honestly, heads need to roll at MOH.
In a decision that will probably frustrate some Aucklanders, it has been announced the Whenuapai Airport will remain in the hands of the NZ Defence Force. This is probably the best outcome, as it will ensure that the field remains as an emergency alternative airstrip in case anything happens to Auckland International in Manukau. Whilst Auckland probably doesn’t need a second commercial airport, you never know when you might need an alternate airstrip during an emergency.
I’ve only recently started following the NZ Health WebEOC blog, but it is exciting to see this sort of information sharing taking place. Congratulations to Charles and the team for the work involved. I found in their feed today an article about the Ministry of Health suffering from the recent Conficker worm outbreak over the past few days. There is more info here from Computerworld.
First, what is Conficker? From Wikipedia.
Conficker disables a number of system services such as Windows Automatic Update, Windows Security Center, Windows Defender and Windows Error Reporting. It then connects to a server, where it receives further orders to propagate, gather personal information, and downloads and installs additional malware onto the victim’s computer. The worm also attaches itself to certain critical Windows processes such as svchost.exe, explorer.exe and services.exe.
What is interesting is that the security hole that Conficker utilises to gain control of the Windows operating systems was plugged in a security patch released on 23 October 2008. That means in theory that all those systems that have been compromised in the past week were systems that had not had the patch applied that was released in late October. The security patch to protect against Conficker-like attacks for Windows 2000, Windows XP and Windows Server 2003 was marked as critical and should have been installed in a timely manner.
What are some lessons from an emergency management and business continuity perspective?
1. If you’re running Microsoft Operating Systems – you must keep them patched, and do it in a timely manner. Windows represents the largest near-homogenous family of operating systems in the world. This makes them the primary target for the developers of botnets and malicious software. Whilst I recognise that it takes time to deploy patches in a large organisation such as the Ministry of Health – an organisation will always be at risk if it doesn’t install security updates in a timely manner. All Microsoft ‘Critical’ patches should be patched within weeks of release.
2. Where possible, organisations should attempt to diversify the installed base of operating systems in an organisation. If you solely run Microsoft operating systems then a worm has the potential to take down an entire organisation. If you run a heterogeneous computing environment that has a variety of operating systems (e.g. Windows, Unix and OS X), then any outbreak of malicious software will only directly impact some of the systems. In our small business I support all three of these platforms. We have Windows and OS X clients, and servers on Linux, OS X Server, OS X Desktop, and this is one of the main reasons I refused to deploy solely Windows software for client and server when setting up our business. Reliance on a homogeneous computing environment decreases overall IT resiliency.
3. Emergency Management Information Systems (EMIS) should ideally be able to be segregated from the production systems. Malicious software doesn’t have to infect a system to have an impact on it. Even if the malicious software just consumes 100% of the network bandwidth, that will be enough to create a continuity issue by denying access to critical systems – such as servers. Therefore, EMIS should really be configured on a separate network so that even if the internal network bandwidth has been fully consumed, and access to the Internet severely restricted to limit the spread, critical systems can still be provided to the wider world. Network segmentation can be used to limit the impact upon critical systems. Direct access to the emergency network segment could be provided from network jacks in the EOC. Once again, these should be on an entirely independent network segement to ensure that emergency operations can continue during an outbreak of malicious software on the main LAN.
Finally, emergency managers should also make themselves aware of the Centre for Critical Infrastructure Protection (CCIP), and consider signing up for vulernability alert emails. These are sent out for critical advisories associated with information security risks, and can be good prompts for getting in touch with IT, and making sure that your systems are patched and up-to-date.
Update 2009-01-27: I see that the Manager of the CCIP went public yesterday saying the CCIP advised MOH of the security patch in October. The real question is whether the Ministry has custom applications installed on all its systems (e.g. including clients), or if they are just talking about server applications. If most of the desktops are only running Office and a groupware application such as Outlook or Notes, then they should have been able to be relatively easily patched before December. It is well recognised that patching servers running legacy applications takes longer to test for complications before deploying patches.
After hearing about one of our GPS Society members losing their data in a computer malfunction tonight, I’ve decided to sit down and flesh out some thoughts on developing a good backup strategy for your computer(s). This is one of those get a round ‘tuit posts that I’ve been meaning to do after seeing people caught by HD failures on the Digital Photography School forums.
The topic of developing a good backup strategy for your computer surely makes most peoples eyes glaze over. It is decidedly unsexy until such time as you need it. Of course, by then it is too late. I’m hoping to combine some of my IT, risk and emergency knowledge to provide some insight into develop a suitably robust backup strategy.
If the consequence is lost data, what are the risks?
When developing a backup strategy, it is important to have a good understanding of how data can be lost – the risks – so that we can create a simple yet comprehensive plan to backup our data that accommodates the many different ways data can disappear.
So, lets pick a few. I’ve named them L1-L5 where ‘L’ is for loss.
- L1 Loss of computer (e.g. theft, smoke or water damage; electrical surge from computer power supply)
- L2 Filesystem accidents – formatting of filesystem, deletion of files, data corruption
- L3 Malicious software – formatting, deletion, or encryption of files with an unknown key (e.g. encrypt and extort)
- L4 Mechanical failure of the hard drive (the dreaded clunking sounds)
- L5 Loss of home containing computer (e.g. fire, earthquake, flood, landslide)
Whilst not comprehensive, this includes a good range of different issues we may face where a backup would be very handy and save us a lot of time, and potentially money. If we can come up with something that protects us from these losses, we should be doing pretty well.
What we need to do now, is look at various means available to backup data, and then create a quick matrix comparing each type of backup, and which losses it may/may not protect us from.
Firstly, let’s identify a number of backup solutions.
- S1 Backup to CD/DVD/HD and store on site
- S2 Backup to external HD on site
- S3 Backup to internal HD
- S4 Backup to other computer at home
- S5 Backup to Internet (service or web host)
- S6 Backup to CD/DVD and store off site
- S7 Backup to HD and store off site
Now, all these solutions are not equal. What we need to investigate now is which type of loss a given backup solution can protect against. I’ve created a sample table below to give you some idea of how it all comes together.
We start with a grid comparing types of loss and solutions. A green box means the solution generally prevents that type of loss, and red box that it generally doesn’t protect, and an orange one means that it may provide some protection.
Next, we compare solutions with various costs and constraints – in this grid green means it isn’t really a cost/constraint, red means it is a cost/constraint, and again, orange means it might be a cost/constraint.
In this example matrix, two points stand out.
- backing up to an internal hard drive does not provide much protection against data loss
- quite a few backup solutions do not protect against major losses such as the loss of a home from fire.
Additionally, every backup solution has a number of costs and/or constraints on its operation. The next step has been to add some cells that identify some of the more common costs and constraints associated with each solution.
What we can see is that there is no single perfect solution. We could extend this further and add a grid outlining some of the benefits of each backup solution – they all have some – and this would also further educate us in the development of our backup strategy.
Now, we’ll use this grid to look at selecting a couple of complementary backup solutions that avoid each others weakness.
Personally, I’m a fan of backing up my home computer using Time Machine on a Mac to an external USB hard drive (effectively S2). As you can see from the matrix, this protects me against most of the common losses, except the rather catastrophic loss-of-home. Clearly then, I can select an Internet or off-site solution as well that will provide me with more complete data protection than just backing up to an external hard drive.
Quite a few people will look at the Internet backup option (S5) and think that it looks pretty good, but be warned, there are some issues that you may face including – the speed of your internet connection when backing up files to remote servers, ongoing service fees, and potential privacy risks by storing you files on a remote server of a business.
I’d recommend selecting solutions so that you can meet the follow three four requirements.
- You should have at least three copies of your data (source + two backups).
- At least one backup must be reasonably current and disconnected from the computer most of the time (except when a backup is being made).
- At least one backup file must be offsite.
- At least one backup should be incremental.
Ian (in the GPS forums) made a good point about incorporating incremental backups into the process. Broadly speaking, there are two types of backups, full (where everything is copied at once) and incremental (where only the files that have changed since the last backup are copied). When doing incremental backups, the first backup is a full backup, and then incremental backups take place from there on. Time Machine is a good example of incremental backup software – every hour it backs-up any changed files.
As I’m not that keen personally on online backups, I’d recommend one of the following as the minimum. There is nothing wrong with making more copies on CD/DVD media to supplement the main backup solutions.
- external hard drive onsite + DVD media offsite (affordable setup)
- external hard drive onsite + external hard drive offsite (same sizes, switch them once a week or month, expensive setup)
- synchronise files between two home computers on network + external hard drive offsite (utilise existing hardware and provide backups of both computers)
There are three other tips to provide as well:
- If you use backup software, keep a copy of the install media (and licence key if appropriate) with the backups
- If you need quick access to data upon failure, make sure that at least one of your backups uses a very accessible filesystem on external hard drives (CD/DVDs are good as they generally use filesystems that are accessible in any computer). This means you can literally plug them in and access key files without having to perform a software installation and full restore
- AND TEST THAT YOU CAN ACCESS BACKED-UP DATA and/or RESTORE FROM BACKUPS
Finally, as you should have a GetAway Kit for natural disasters and the like, in addition to your other important paper information such as identification, policies and photos – you should also include a backup of your data in the kit. If you haven’t got a GetAway Kit, then now is a good time to learn about how to get ready!
As I write this, my home Mac has just about finished receiving a hard drive upgrade. I had approach the 250GB hard drive’s limit and was needing something far bigger – mainly for photos and movies from my camera.I had read online a few blogs that performed the upgrade by simply making sure that the Time Machine backup was up-to-date, and then removing the old hard drive, installing the new one, booting the OS X installer DVD, formatting the drive and restoring from backup. Sounded like a great option so I thought I’d give it a go to install a nice shiny 1TB hard drive.This blog post noted that the restore process was extremely slow. I had a quick restore with around 230GB being restored in around 2.5 hours. The expected time was a lot slower initially, I think this was because the system files were copied first, and this included a lot more small files that have more overhead associated with reading and writing than say transferring a digital photo or movie – e.g. one 10MB file is faster than transferring 1000 1KB files.It is just booting now, so I’ll know very soon whether it worked or not… stunning it has appeared to work perfectly. Now I just have to wait for Spotlight to re-index everything. But other than that, it was a painless upgrade.It also served an excellent dual purpose of actually testing that the restore process works. Very useful!Now I need to look at setting Time Machine to backup to multiple drives. It sounds like it can be done.