50 Shades of YARA

A long time ago I’ve noticed that there is no single best YARA rule for a given sample, but different best solutions depending on the user’s requirements and use case. I noticed that I often create 2 to 3 YARA rules for a single sample that I process, while each of them serves a different purpose.

In this blog post, I’d like to describe the three most common rule types.

In the following example I’ll use the malware sample with hash 7415ac9d4dac5cb5051bc0e0abff69fbca4967c7 (VirusBay, Hybrid-Analysis)

While looking at the strings extracted by yarGen, you’ll notice that it contains a lot of interesting strings. In my past tutorials (1, 2, 3) I’ve always distinguished between “Highly Specific” and “Suspicious” strings (see Part 3 of the blog post series). Today I’d like to show you a more purpose oriented approach. 

The following screenshots shows what types of strings I see while looking at these strings:

The strings that are marked with yellow look very specific. I’d use them as “Highly Specific” strings ($x*) of which only a single one is required to trigger the rule: 1 of ($x*)

The strings marked green will be used in combination with other green strings. A reasonable set of these strings is required to trigger the rule: $u1 and 1 of ($f*)

The strings marked with red color could serve in a rule that tracks the C2 addresses used by this sample and the strings marked blue could be used for a generic detection of malicious samples that can be completely unrelated.

The different rule categories are:

  • Regular Rules: Detect a certain malware or malware family 
  • Threat Intel Tracking Rules: Detect specific indicators that relate to a certain actor
  • Method Detection Rules: Detect methods or anomalies 

The following table describes these three different types of rules and gives some string examples. 

Regular Rules

In the case of the “Regular Rules” I distinguish between two different flavors: 

  • Threat Detection Rules
  • Threat Hunting Rules

The difference between these flavors is based on a different level of strictness in the conditions and not on the different selection of strings. While a “threat detection” rule may require “6 of them”, a “threat hunting” rule may be satisfied with “3 of them”, accepting some false positives. 

The reason why someone distinguishes between “threat detection” and “threat hunting” rules is that the response to matches can be very different. Antivirus solutions that respond to matches with “delete” or “disinfect” reactions do not accept false positives and avoid false positives by any means.

In “threat hunting” use cases which include direct destructive reactions to signatures matches are rare. Typically analysts investigate such an event, classify and react to it manually. In “threat hunting” scenarios analysts try to avoid “false negatives” by all means. 

(Source: Chris Gerritz @gerritzc

Threat Intel Tracking

In threat intel, we can use YARA rules to track the activity of certain actors in cases in which there are certain characteristics or keywords that persist over longer periods and campaigns. 

A very convenient form of tracking without having access to the telemetry data of OS and AV vendors is offered in the form of YARA match notification services as provided by VirusTotal or ReversingLabs

Method Detection Rules

During the past year I focussed on the last rule type “Method Detection” whenever I had the opportunity as it allows me to provide very generic rules that produce amazing results with a minimum of false positives.

However, those rule matches lack a reference like a malware name or an adversary group that used the detected method in their samples. Here is an example with one of the few public YARA rules published in the “signature-base” repository:

Sample: fc18bc1c2891b18bfe644e93c60a2822ad367a697bebc8c527bc9f14dad61db5 

The comment tab shows a match with generic rule “SUSP_LNK_SuspiciousCommands” . No reference is given. The Antivirus detection ratio is low. 

You can find more matches with this rule on Virustotal using the search function – URL: https://www.virustotal.com/#/search/lnk 

Conclusion

These are the reasons why the analysis of a single sample often results in 2-3 different YARA rules.

Using this method the coverage is exceptionally good as the set of rules covers specific samples of the same family and the different malware families the use the same methods.

YARA Rule Sets and Rule Feed

As previously announced our YARA rule packs and feeds will be available in March/April 2019. We’ve put a lot of effort into a internal system named “Mjolnir” that parses, normalizes, filters, tags and automatically modifies our rule base, which contains more than 9000 YARA rules. 

This system will now fill a database of tagged YARA rules – the basis of our new YARA services. 

The services will be divided into two categories:

  • YARA Rule Set
  • YARA Rule Feed

YARA Rule Set

The YARA rule set consist of more than 7000 YARA rules of different categories that are used in our scanners.

Some of our rules use extensions (external variables) that are only usable in our scanner products. These rules, experimental, third party and other classified rules will not be part of the purchasable rule set. 

YARA Rule Feed 

The YARA rule feed is a subscription on our rules. The feed always contains the rules of the last 90 days, which is between 250-400 YARA rules. 

Rule Samples

The quality of the rules in the rule set are comparable to the rules in our public “signature-base” repository. 

Some good examples for the different rule categories are:

Quality and Focus

The rules are tested against a data set of more than 350 TB of goodware. The goodware file repository consists of Windows OS files, several full Linux distributions and a big collection of commercial and free software. 

However, false positives are always possible. We do not recommend any destructive action on a signature match, like delete or blocking.

The main focus of our rules are:

  • Threat Hunting
  • Classification
  • Anomaly Detection
  • Compromise Assessment 

Subscribe to our Early Access Mailing List

Short Tutorial: How to Create a YARA Rule for a Compromised Certificate

Working in incident response or malware analysis, you may have come across compromised and sometimes revoked certificates used to sign malware of different types. Often threat groups use stolen certificates to sign their malware.

I’d like to show you an easy way to create a YARA rule for such a certificate. We will look at a sample that has been marked as malware by many Antivirus engines on Virustotal and the “Details” tab shows a revoked certificate. That’s a good indicator for a compromised certificate that has been and sometimes is still used by threat groups to sign their binaries.

Sample: ee5340b2391fa7f8d6e22b32dcd48f8bfc1951c35491a1e2b4bb4ab2fcbd5cd4

Let’s look at the details. I recommend creating a YARA that uses the “pe” module of YARA and integrate the Serial Number and the Issuer of the certificate to create an unambiguous rule.

rule MAL_Compromised_Cert_Nov18_1 {
   meta:
      description = "Detects a compromised certificate of CORP 8 LIMITED - identified in November 2018"
      date = "2018-11-01"
      hash = "ee5340b2391fa7f8d6e22b32dcd48f8bfc1951c35491a1e2b4bb4ab2fcbd5cd4"
   condition:
      uint16(0) == 0x5a4d and
      for any i in (0 .. pe.number_of_signatures) : (
         pe.signatures[i].issuer contains "COMODO RSA Code Signing CA" and
         pe.signatures[i].serial == "4c:75:75:69:2c:2d:06:51:03:1a:77:ab:49:22:4c:cc"
      )
}

As you can see, you need to copy two strings from Virustotals web page:

Copy the CA name and use it for the “.issue” condition as well as the serial number, which you use for the “.serial” condition. Make sure that you changed the casing to lower-case as YARA does not expect and understand uppercase characters in the serial field.

Virustotal Intelligence users can use the following hunting rule to detect new uploaded malicious samples with revoked certificates:

rule Compromised_Certificate {
  condition:
    // New files, detected by more than 30 engines and revoked certificate
   new_file and positives > 30 and tags contains "revoked-cert"
}

YARA Rule Creation Crackme

I’ve collected some interesting samples for an internal YARA rule creation training session with our interns. With this blog post, I’ll also share 3 new premium feed YARA rules by pushing them to the Open Source signature-base repo.

What are the the preliminary conditions for the rule creation?

  • We don’t want to to spend more than 20 minutes for a single rule.
  • We use string extraction, hex editors and yarGen
  • We also use public resources like Google (yes), malware.one

Requirements:

  • You need a Virusbay account to download the samples

So, get ready. We process the following 3 cases.

Turla Agent-BTZ

  • Great for yarGen string extraction
  • Especially check for variations of strings (in PE header) that are highly specific
  • Use google to check strings

Sample

PLEAD Downloader

  • yarGen will not produce good results in this case
  • Try to compare the samples in order to find specific strings that appear in all of them

Sample 1

Sample 2

Sample 3

Sample 4

TYPEFRAME (Hidden Cobra)

  • Authors missed some specific strings

Sample

Solution

Don’t check the solution before you’ve created your own rules.

Agent.BTZ YARA rule

PLEAD YARA rule

TYPEFRAME YARA rule

Remember, there is no single correct solution to this task. Your rules may be better than mine. If that’s the case, please share them with me 😄.

Write YARA Rules to Detect Embedded EXE Files in OLE Objects

This is the first blog post published on our new website. If you followed my blog on www.bsk-consulting.de you should consider subscribing to the RSS feed of this blog or the “Nextron Systems Newsletter”.

This is one of the YARA related blog posts showcasing a special use case. Last year I noticed that I wrote many rules for hex encoded strings found in OLE objects embedded in MS Office documents and RTF files.

I did most of the encoding and decoding work on the command line or with the help of CyberChef, an online tool provided by GCHQ. I also thought about a new YARA keyword that would allow us to write rules without encoding the strings.

Today, rules contain strings in a hex encoded form. I usually add the decoded string as a comment.

$s1 = "68007400740070003a002f002f00" /* http:// */

Rules with the new keyword would look like this:

$s1 = "http://" wide hex

Neat, isn’t it? I already forwarded that feature request to Wesley Shields (@wxs) but it seems to be no low hanging fruit. I’ll keep you informed about this feature via Twitter.

A tweet by Kevin Beaumont reminded me of the work that I’ve done and while looking at the tool by Rich Warren. I thought that I should create a illustrative example of a more generic YARA rule that explains why the “hex” keyword would be very useful.

The tool creates weaponized RTF files with hex encoded payloads.

I derived some strings for a new rule from the decoded object.

/* Hex encoded strings */
/* This program cannot be run in DOS mode */
$a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii
/* C:fakepath */
$a2 = "433a5c66616b65706174685c" ascii

To further improve the rule I went to my goodware directory and ran the following command to generate a list of the most frequent PE file headers in a hex encoded form.

neo$ find ./ -type f -name "*.exe" -exec xxd -ps -l 14 {} ; | sort | uniq -c | sort -k 1 | tail -10
4 4d5a87000300000020000000ffff
4 4d5aae010300000020000000ffff
4 4d5abf000300000020000000ffff
4 4d5add000300000020000000ffff
4 4d5aeb000300000020000000ffff
6 213c73796d6c696e6b3e2f757372
8 4d5a72010200000020001700ffff
88 4d5a40000100000006000000ffff
116 4d5a50000200000004000f00ffff
5852 4d5a90000300000004000000ffff

Then I used these hex encoded strings in a YARA rule that looks for these strings in the OLE objects of an RTF file.

rule MAL_RTF_Embedded_OLE_PE {
   meta:
      description = "Detects a suspicious string often used in PE files in a hex encoded object stream"
      author = "Florian Roth"
      reference = "https://github.com/rxwx/CVE-2018-0802/blob/master/packager_exec_CVE-2018-0802.py"
      date = "2018-01-22"
   strings:
      /* Hex encoded strings */
      /* This program cannot be run in DOS mode */
      $a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii
      /* KERNEL32.dll */
      $a2 = "4b45524e454c33322e646c6c" ascii
      /* C:fakepath */
      $a3 = "433a5c66616b65706174685c" ascii
      /* DOS Magic Header */
      $m3 = "4d5a40000100000006000000ffff"
      $m2 = "4d5a50000200000004000f00ffff"
      $m1 = "4d5a90000300000004000000ffff"
   condition:
      uint32be(0) == 0x7B5C7274 /* RTF */
      and 1 of them
}

The first analysis of the coverage looks pretty good. I see only clear matches in munin‘s output.

The few questionable matches look fishy enough to release my rule.

If you have further ideas to improve the rule, ping me via Twitter.