Thoughts about Malware Scanning


Posted by Ed on Nov 1, 2005 in Analysis | Comments Off

I recently reached out to Alex Wheeler, who I met on a job interview a number of years ago and who impressed me with his energy, intelligence, and drive. I noticed that he was doing quite a bit of research in the AV space, which really got me thinking about the state of malware scanning in the industry today…

In general, most current AV technologies are based on pattern matching. Basically, they’re like ‘grep’ except with a fancy UI and signature update capability. Not that that’s a bad thing, mind you. There were products (e.g. Thunderbyte) back in the day that worked in other ways, but the market chose the ‘grep approach’ over and above other detection methods. The dynamics were such that heuristic and behavioral scanners give more false readings: false positives (warning about things that are not malware) and false negatives (not warning about things that are malware.) The grep-based products had a higher rate of accuracy. That accuracy is one of the strong points of pattern matching – and therefore was historically a strength of signature-based AV scanning.

In my opinion, there is an equation at work that doesn’t bode well for scanning-based AV in the long term. The equation is:

scantime = (datavolume * numberofpatterns) / (cpuspeed + iospeed)

Why this is a problem:

- “Data Volume”, the amount of data scanned (number and size of files) is increasing exponentially.
- “Number of Patterns”, the number of malware signatures that products must process per file is increasing exponentially.
- CPU speed is increasing linearly (Moore’s law)
- IO speed is increasing linearly

This means that the time needed to conduct a scanning operation is increasing at an exponential rate; since AV performance is tied directly to scanning time, this means all signature products will become slower at an exponential rate over time. It’s already started; Norton AV is exponentially slower today than it was yesterday – and exponentially slower than it was the day before that. This isn’t a huge problem today, but it will be – we’ve seen only the “flat part” of the growth curve. But as anybody who’s used Norton rcently can attest, the performance of these products is rapidly becoming a factor.

There are some noteable exceptions to the rule; CA, SpyBot S&D, and others use an intelligent spyware scanning methodology to “cheat” the equation by minimizing the data that needs to be scanned. Non-signature products like Savant and IBM’s AXE are making a resurgence.

I guess time will tell.

Comments are closed.