Write YARA Rules to Detect Embedded EXE Files in OLE Objects

This is the first blog post published on our new website. If you followed my blog on www.bsk-consulting.de you should consider subscribing to the RSS feed of this blog or the “Nextron Systems Newsletter”.

This is one of the YARA related blog posts showcasing a special use case. Last year I noticed that I wrote many rules for hex encoded strings found in OLE objects embedded in MS Office documents and RTF files.

I did most of the encoding and decoding work on the command line or with the help of CyberChef, an online tool provided by GCHQ. I also thought about a new YARA keyword that would allow us to write rules without encoding the strings.

Today, rules contain strings in a hex encoded form. I usually add the decoded string as a comment.

$s1 = "68007400740070003a002f002f00" /* http:// */

Rules with the new keyword would look like this:

$s1 = "http://" wide hex

Neat, isn’t it? I already forwarded that feature request to Wesley Shields (@wxs) but it seems to be no low hanging fruit. I’ll keep you informed about this feature via Twitter.

A tweet by Kevin Beaumont reminded me of the work that I’ve done and while looking at the tool by Rich Warren. I thought that I should create a illustrative example of a more generic YARA rule that explains why the “hex” keyword would be very useful.

The tool creates weaponized RTF files with hex encoded payloads.

I derived some strings for a new rule from the decoded object.

/* Hex encoded strings */
/* This program cannot be run in DOS mode */
$a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii
/* C:fakepath */
$a2 = "433a5c66616b65706174685c" ascii

To further improve the rule I went to my goodware directory and ran the following command to generate a list of the most frequent PE file headers in a hex encoded form.

neo$ find ./ -type f -name "*.exe" -exec xxd -ps -l 14 {} ; | sort | uniq -c | sort -k 1 | tail -10
4 4d5a87000300000020000000ffff
4 4d5aae010300000020000000ffff
4 4d5abf000300000020000000ffff
4 4d5add000300000020000000ffff
4 4d5aeb000300000020000000ffff
6 213c73796d6c696e6b3e2f757372
8 4d5a72010200000020001700ffff
88 4d5a40000100000006000000ffff
116 4d5a50000200000004000f00ffff
5852 4d5a90000300000004000000ffff

Then I used these hex encoded strings in a YARA rule that looks for these strings in the OLE objects of an RTF file.

rule MAL_RTF_Embedded_OLE_PE {
   meta:
      description = "Detects a suspicious string often used in PE files in a hex encoded object stream"
      author = "Florian Roth"
      reference = "https://github.com/rxwx/CVE-2018-0802/blob/master/packager_exec_CVE-2018-0802.py"
      date = "2018-01-22"
   strings:
      /* Hex encoded strings */
      /* This program cannot be run in DOS mode */
      $a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii
      /* KERNEL32.dll */
      $a2 = "4b45524e454c33322e646c6c" ascii
      /* C:fakepath */
      $a3 = "433a5c66616b65706174685c" ascii
      /* DOS Magic Header */
      $m3 = "4d5a40000100000006000000ffff"
      $m2 = "4d5a50000200000004000f00ffff"
      $m1 = "4d5a90000300000004000000ffff"
   condition:
      uint32be(0) == 0x7B5C7274 /* RTF */
      and 1 of them
}

The first analysis of the coverage looks pretty good. I see only clear matches in munin‘s output.

The few questionable matches look fishy enough to release my rule.

If you have further ideas to improve the rule, ping me via Twitter.