This is one of the YARA related blog posts showcasing a special use case. Last year I noticed that I wrote many rules for hex encoded strings found in OLE objects embedded in MS Office documents and RTF files.
I did most of the encoding and decoding work on the command line or with the help of CyberChef, an online tool provided by GCHQ. I also thought about a new YARA keyword that would allow us to write rules without encoding the strings.
Today, rules contain strings in a hex encoded form. I usually add the decoded string as a comment.
$s1 = "68007400740070003a002f002f00" /* http:// */
Rules with the new keyword would look like this:
$s1 = "http://" wide hex
Neat, isn’t it? I already forwarded that feature request to Wesley Shields (@wxs) but it seems to be no low hanging fruit. I’ll keep you informed about this feature via Twitter.
A tweet by Kevin Beaumont reminded me of the work that I’ve done and while looking at the tool by Rich Warren. I thought that I should create a illustrative example of a more generic YARA rule that explains why the “hex” keyword would be very useful.
The tool creates weaponized RTF files with hex encoded payloads.
I derived some strings for a new rule from the decoded object.
/* Hex encoded strings */ /* This program cannot be run in DOS mode */ $a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii /* C:fakepath */ $a2 = "433a5c66616b65706174685c" ascii
To further improve the rule I went to my goodware directory and ran the following command to generate a list of the most frequent PE file headers in a hex encoded form.
neo$ find ./ -type f -name "*.exe" -exec xxd -ps -l 14 {} ; | sort | uniq -c | sort -k 1 | tail -10 4 4d5a87000300000020000000ffff 4 4d5aae010300000020000000ffff 4 4d5abf000300000020000000ffff 4 4d5add000300000020000000ffff 4 4d5aeb000300000020000000ffff 6 213c73796d6c696e6b3e2f757372 8 4d5a72010200000020001700ffff 88 4d5a40000100000006000000ffff 116 4d5a50000200000004000f00ffff 5852 4d5a90000300000004000000ffff
Then I used these hex encoded strings in a YARA rule that looks for these strings in the OLE objects of an RTF file.
rule MAL_RTF_Embedded_OLE_PE { meta: description = "Detects a suspicious string often used in PE files in a hex encoded object stream" author = "Florian Roth" reference = "https://github.com/rxwx/CVE-2018-0802/blob/master/packager_exec_CVE-2018-0802.py" date = "2018-01-22" strings: /* Hex encoded strings */ /* This program cannot be run in DOS mode */ $a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii /* KERNEL32.dll */ $a2 = "4b45524e454c33322e646c6c" ascii /* C:fakepath */ $a3 = "433a5c66616b65706174685c" ascii /* DOS Magic Header */ $m3 = "4d5a40000100000006000000ffff" $m2 = "4d5a50000200000004000f00ffff" $m1 = "4d5a90000300000004000000ffff" condition: uint32be(0) == 0x7B5C7274 /* RTF */ and 1 of them }
The first analysis of the coverage looks pretty good. I see only clear matches in munin‘s output.
The few questionable matches look fishy enough to release my rule.
If you have further ideas to improve the rule, ping me via Twitter.