An Incident Post-Mortem…

I thought it would be fun to post an abbreviated post-mortem we did on a compromised computer from a few years ago. I’ve tried to strike a balance on detail so that a variety of audiences could understand this, but I’m sure the more technical among you would like more details (ask me, I’ll share) and this abbreviated version might even be too detailed for others. My main reason for posting this is to talk about perspectives on compromised computers and anti-malware software, but we could also talk about forensic techniques, log analysis, timeline reconstruction and so on.  I think I’ll save the detailed forensic, log analysis, and timeline discussion for a later post.  

Let’s start with the results of the analysis, which you can find at Incident Analysis (shortest).  This is an abbreviated version of a timeline analysis of a compromised computer from a few years back – I’ve anonymized the data but kept the timestamps.   

In an investigation like this we typically construct a timeline composed of chronological data pulled from a variety of sources: file system metadata, log files from the affected system and other systems, network traffic logs and so on – anything that might be relevant that could help reconstruct the chronological history surrounding the event in question (this is what forensic specialists are calling a “super timeline” these days, we didn’t used to have a special name for it :-).  This original data might consist of millions of events (or more, sigh).  In this case I focused mostly on evidence from the local system and our IDS logs over a relatively short slice of time (a few hours).  When you make maple syrup you boil 40 gallons of sap to make a gallon of syrup.  In a similar fashion, when you do forensic timeline reconstruction you “boil” hundreds of thousands of events down to a few thousand events, and might even take it further – in this case this is the 3rd “boil” from my original set of events, and we’re down to what’s essentially hand written commentary about what I observed from the logs from the 2nd boil, as it were.   The “big picture” you should be looking for from the analysis I linked to above is that a computer got infected through vulnerable software, and this led to a cascade of malware being installed on the system.  Some of this malware was caught and stopped by the anti-malware software but most was not, and most importantly, the main downloader was not.  This ultimately led to the computer sending at large quantities of spam. One question that I wanted to address is the question of how we should view things when our anti-malware software blocks something bad.  I think its fairly common for people to think “good, my anti-malware software just saved me from that malware” and in some cases, that might be correct.  But I’ll make the following observations…

Anti-malware software doesn’t catch everything.  Anti-malware software on this computer obviously detected some things in this example but it completely missed others.  The miscreants know this, and actively seek to avoid detection.  Fresh malware is typically not detected by many anti-malware products, and it often takes days or weeks for the anti-malware vendors to start detecting it.

Just because your anti-malware product detected and blocked/cleaned/deleted/quarantined something doesn’t mean that you’re safe.  I think its best to regard anti-malware as a detection mechanism rather than as a preventative measure.  Yes, with luck, sometimes (most of the time?  some of the time?  rarely?) it will prevent malware from taking root on your machine.  But if it detects and blocks something how do you know whether it missed anything?  Did it catch the first stage downloader or some later stage?  If it missed the first-stage, what else did it miss?

I’m not arguing that anti-malware is useless.  Obviously some things were detected by anti-malware, and well before we noticed anything at the network level through network based intrusion detection.  I’m a belt-and-suspenders sort of security guy, I think its prudent to use multiple layers of detection/prevention.  Defense in depth, etc.  

Reinstallation is better than disinfecting.  Good luck cleaning up from something like this.  How would you know whether you got everything?  Granted in some cases its possible, but (a) how do you know you’ve gotten everything when you’re likely using the compromised system to investigate itself and do the fixing (go read Thompson’s “Reflections on Trusting Trust” 🙂 and (b) for the sake of “plain old” disaster recovery (disk failure, theft, building caught fire, etc) you probably should have some sort of capability for quickly restoring systems in the event of a disaster – and if you do you can probably restore a system quite quickly and painlessly.  If you’re disinfecting rather than reinstalling b/c reinstalling is too hard, I’d suggest that you need to work on your disaster recovery procedures.  Apart from ensuring that you’ve removed all of the malware, you also have to worry about the configuration changes that the malware might have made to the system – did you find and fix all of those?

The missing piece: finding the root cause(s) and fixing the exploitable vulnerability.  In this case the system had a vulnerable version of Adobe Reader installed and that this was used to perform the initial infection via malicious PDF files.  But I don’t think its common for people to do much of a root cause analysis on compromised computers – you frequently hear them talking about disinfecting or reinstalling them, but if no RCA is done and the initial point of entry is unknown, they’re just a target waiting to be infected again.  Was it a missing patched?  An old version of Java left behind when you updated it?  Was the computer user browsing the web using an account with administrative privileges?  Did they fall for a tech support phone scam? Find the exploitable vulnerability and fix it (and fix it on the other affected systems as well)!