Not long ago, we’ve created a pull request for the official YARA repository on Github, that would introduce new functions in the `math` module to improve the flexibility in cases in which a sample is heavily scrambled or obfuscated. These cases require further statistical evaluations that go beyond the currently available “entropy”, “mean” or “deviation” functions.
The example on the right shows a heavily obfuscated PHP web shell, as used by a Chinese actor.
You immediately notice the high amount of “%” characters, but since each of them is preceded and followed by different characters, it’s difficult to find atoms that are long enough to maintain an acceptable performance / stability of that rule.
If you could, you would formulate a rule like this: “Detect files smaller 400 bytes, that begin with ‘<?’ and consist of at least 25 percent ‘%’ characters”.
Well, the new module extension allows you to do exactly that.
Read the documentation provided with the pull request for details on all three new functions:
- count(byte/string, offset, size)
- percentage(byte, offset, size)
- mode(offset, size)
While the first two functions are self-explanatory, the “mode” function isn’t. It is is a term used in statistics for the most common value.
For your convenience, we’ve already patched our versions of THOR TechPreview and THOR Lite to support these extensions of the “math” module. You need at least v10.6.6 to use the new function in your rules.
We wish you good hunting.