Not long ago, we’ve created a pull request for the official YARA repository on Github, that would introduce new functions in the `math` module to improve the flexibility in cases in which a sample is heavily scrambled or obfuscated. These cases require further statistical evaluations that go beyond the currently available “entropy”, “mean” or “deviation” functions.
The example on the right shows a heavily obfuscated PHP web shell, as used by a Chinese actor.
You immediately notice the high amount of “%” characters, but since each of them is preceded and followed by different characters, it’s difficult to find atoms that are long enough to maintain an acceptable performance / stability of that rule.
If you could, you would formulate a rule like this: “Detect files smaller 400 bytes, that begin with ‘<?’ and consist of at least 25 percent ‘%’ characters”.
Well, the new module extension allows you to do exactly that.
Read the documentation provided with the pull request for details on all three new functions:
- count(byte/string, offset, size)
- percentage(byte, offset, size)
- mode(offset, size)
While the first two functions are self-explanatory, the “mode” function isn’t. It is is a term used in statistics for the most common value.