How to Write Simple but Sound Yara Rules – Part 2

How to Write Simple but Sound Yara Rules – Part 2

Months ago I wrote a blog article on “How to write simple but sound Yara rules“. Since then the mentioned techniques and tools have improved. I’d like to give you a brief update on certain Yara features that I frequently use and tools that I use to generate and test my rules.

Handle Very Specific Strings Differently

In the past I was glad to see very specific strings in samples and sometimes used these strings as the only indicator for detection. E.g. whenever I’ve found a certain typo in the PE header fields like “Micorsoft Corportation” I cheered and thought that this would make a great signature. But – and I have to admit that now – this only makes a nice signature. Great signatures require not only to match on a certain sample in the most condensed way but aims to match on similar samples created by the same author or group.
Look at the following rule:

rule Enfal_Malware_Backdoor {
        description = "Generic Rule to detect the Enfal Malware"
        author = "Florian Roth"
        date = "2015/02/10"
        super_rule = 1
        hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
        hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
        hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
        score = 60
        $x1 = "Micorsoft Corportation" fullword wide
        $x2 = "IM Monnitor Service" fullword wide
        $a1 = "imemonsvc.dll" fullword wide
        $a2 = "iphlpsvc.tmp" fullword
        $a3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
        $s1 = "urlmon" fullword
        $s2 = "Registered trademarks and service marks are the property of their" wide
        $s3 = "XpsUnregisterServer" fullword
        $s4 = "XpsRegisterServer" fullword
        uint16(0) == 0x5A4D and
            ( 1 of ($x*) ) or
            ( 2 of ($a*) and all of ($s*) )

What I do when I review the 20 strings that are generated by yarGen is that I try to categorize the extracted strings in 3 different groups:

  • Very specific strings (one of them is sufficient for successful detection, e.g. IP addresses, payload URLs, PDB paths, user profile directories)
  • Specific strings (strings that look good but may appear in goodware as well, e.g. “wwwlib.dll”)
  • Other strings (even strings that appear in goodware; without random code from compressed or encrypted data; e.g. “ModuleStart”)

Then I create a condition that defines:

  • A Certain Magic Header (remove it in case of ASCII text like scripts or webshells)
  • 1 of the very specific strings OR
  • some of the specific strings combined with many (but not all) of the common strings

Here is another example that does only have very specific strings (x) and common strings (s):

rule Cobra_Trojan_Stage1 {
        description = "Cobra Trojan - Stage 1"
        author = "Florian Roth"
        reference = ""
        date = "2015/02/18"
        hash = "a28164de29e51f154be12d163ce5818fceb69233"
        $x1 = "KmSvc.DLL" fullword wide
        $x2 = "SVCHostServiceDll_W2K3.dll" fullword ascii
        $s1 = "Microsoft Corporation. All rights reserved." fullword wide
        $s2 = "srservice" fullword wide
        $s3 = "Key Management Service" fullword wide
        $s4 = "msimghlp.dll" fullword wide
        $s5 = "_ServiceCtrlHandler@16" fullword ascii
        $s6 = "ModuleStart" fullword ascii
        $s7 = "ModuleStop" fullword ascii
        $s8 = "5.2.3790.3959 (srv03.sp2.070216-1710)" fullword wide
        uint16(0) == 0x5A4D and filesize < 50000 and 1 of ($x*) and 6 of ($s*)

If you can’t create a rule that is sufficiently specific, I recommend the following methods to restrict the rule:

  • Magic Header (use it as first element in condition – see performance guidelines, e.g. “uint16(0) == 0x5A4D”)
  • File Size (malware that mimics valid system files, drivers or legitimate software often differs significantly in size; try to find the valid files online and set a size value in your rule, e.g. “filesize > 200KB and filesize < 600KB")
  • String Location (see the “Location is Everything” section)
  • Exclude strings that occur in false positives (e.g. $fp1 = “McAfeeSig”)

Location is Everything

One of the most underestimated features of Yara is the possibility to define a range in which strings occur in order to match. I used this technique to create a rule that detect metasploit meterpreter payloads quite reliably even if it’s encoded/cloaked. How that?
If you see malware code that is hidden in an overlay at the end of a valid executable (e.g. “ab.exe”) and you see only strings that are typical function exports or mimics a well-known executable ask the following questions:

  • Is it normal that these strings are located at this location in the file?
  • Is it normal that these strings occur more than once in that file?
  • Is the distance between two strings somehow specific?

Malware Strings

Malware Strings

In case of the unspecific malware code in the PE overlay, try to define a rule that looks for a certain file size (e.g. filesize > 800KB) and the malware strings relative to the end of the file (e.g. $s1 in (filesize-500..filesize)).
The following example shows a unspecified webshell that contains strings that may be modified by an attacker in future versions when applied in a victim’s network. Try always to extract strings that are less likely to be changed.
Webshell Code PHP

Webshell Code PHP

The variable name “$code” is more likely to change than the function combination “@eval(gzinflate(base64_decode(” at the end of the file. It is possible that valid php code contains “eval(gzinflate(base64_decode(” somewhere in the code but it is less likely that it occurs in the last 50 bytes of the file.
I therefore wrote the following rule:

rule Webshell_b374k_related_1 {
        description = "Detects b374k related webshell"
        author = "Florian Roth"
        reference = ""
        score = 65
        hash = "d5696b32d32177cf70eaaa5a28d1c5823526d87e20d3c62b747517c6d41656f7"
        date = "2015-10-17"
        $m1 = "<?php"
        $s1 = "@eval(gzinflate(base64_decode(" ascii
        $m1 at 0 and $s1 in (filesize-50..filesize) and filesize < 20KB

Performance Guidelines

I collected many ideas by Wesley Shields and Victor M. Alvarez and composed a gist called “Yara
Performance Guidelines”. This guide shows you how to write Yara rules that use less CPU cycles by avoiding CPU intensive checks or using new condition checking shortcuts introduced in Yara version 3.4.
Yara Performance Guidelines

PE Module

People sometimes ask why I don’t use the PE module. The reason is simple: I avoid using modules that are rather new and would like to see it thoroughly tested prior using it in my scanners running in productive environments. It is a great module and a lot of effort went into it. I would always recommend using the PE module in lab environments or sandboxes. In scanners that walk huge directory trees a minor memory leak in one of the modules could lead to severe memory shortages. I’ll give it another year to prove its stability and then start using it in my rules.


yarGen has an opcode feature since the last minor version. It is active by default but only useful in cases in which not enough strings could be extracted.
I currently use the following parameters to create my rules:

python --noop -z 0 -a "Florian Roth" -r "http://link-to-sample" /mal/malware

The problem with the opcode feature is that it requires about 2,5 GB more main memory during rule creation. I’ll change it to an optional parameter in the next version.


yarAnalyzer is a rather new tool that focuses on rule coverage. After creating a bigger rule set or a generic rule that should match on several samples you’d like to check the coverage of your rules in order to detect overlapping rules (which is often OK).
yarAnalyzer helps you to get an overview on:

  • rules that match on more than one sample
  • samples that show hits from more than one rule
  • rules without hits
  • samples without hits

Yara Rule Analyzer

yarAnalayzer Screenshot

yarAnalyzer Github Repository

String Extraction and Colorization

To review the strings in a sample I use a simple shell one-liner that a good friend sent me once.
“strings” version for Linux

(strings -a -td "$@" | sed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; strings -a -td -el "$@" | sed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n

“gstrings” version for OS X (sudo port install binutils)

(gstrings -a -td "$@" | gsed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; gstrings -a -td -el "$@" | gsed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n

It produces an output as shown in the above screenshot with green text and the description “Malware Strings” showing the offset, ascii (A) or wide (W) and the string at this offset.
For a colorization of the string check my new tool “prisma” that colorizes random type standard output.

Prisma STDOUT colorization

Prisma STDOUT colorization


Follow me on Twitter: @Cyb3rOps

APT Detection is About Metadata

People often ask me, why we changed the name of our scanner from “IOC” to “APT” scanner and if we did that only for marketing reasons. But don’t worry, this blog post is just as little a sales pitch as it is an attempt to create a new product class.
I’ll show you why APT detection is difficult – for the big players and spirited newcomers like us.

Metadata is the Key

Only recently I recognized and named the methods that we apply since we introduced the scoring system in our scanner product. Instead of looking at a file only by its content we collect numerous attributes and evaluate a score based on certain rules that indicate conspicuous features or anomalies.
What I recognized was that Metadata is the key to successful APT detection. Let me give you some examples.

The “Sticky Keys” Backdoor

During our investigations we found that the attackers used a simple backdoor that allowed them to avoid AV detection and use tools that were already available on the target systems. What they did was to copy a valid “cmd.exe” over the “sethc.exe” in the System32 folder in order to establish a backdoor that waits for the user pressing five times shift consecutively on a RDP logon screen and pops up a Windows command line running as LOCAL_SYSTEM. Another method sets the Windows command line as debugger for the stickykeys binary.

wmic /user:username /password:secret /node:system1 process call create
"C:\Windows\system32\reg.exe add "HKLM\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\Image File Execution Options\sethc.exe"
/v "
Debugger" /t REG_SZ /d "cmd.exe" /f"

With the necessary rights it is easy to install and difficult to detect.

StickyKeys Backdoor APT

APT Detection StickyKeys Backdoor

As I already said, an Antivirus engine won’t detect this backdoor as the content of the file is a valid Windows executable with an intact signature. Windows 7+ users won’t stumble over it as they have Network Level Authentication (NLA) enabled by default, which prompts the user for username and password before fully establishing a Terminal Services connection. Attackers modify their local “default.rdp” file and add “enablecredSSPsupport:i:0:1” in order to disable this behaviour.
APT detection therefore means the following:

  • Check system files like the stickykey binary for modifications – not by comparing MD5 hashes from whitelist databases like the NSRL, but by comparing the expected content for a certain file name with the actual content of the file. I described this method in a blog article and Chad Tilbury from Crowdstrike described how to apply this method using their CrowdResponse tool.
  • Identify “default.rdp” files on server systems that have NLA disabled. (Administrators shouldn’t do that)
  • Check if the Windows command line (cmd.exe) is registered as a debugger for any program

PsExec’s Evil Clone

I often use the example of the well-known Sysinternals tool “PsExec”, which is likewise used by administrators and APT groups. It doesn’t make much sense adding it to the indicators of compromise (IOCs) of your triage sweep although it may have played a substantial role in lateral attacker movement.
The human eye is able to distinguish between a PsExec that has been used for administration and a PsExec that has been used by the attackers. The essential difference, which enables us to distinguish between both versions does not lie in the content of the file but the Metadata.
Look at the following table and tell me which of both files is valid and which has been placed on the system by the attackers. Remember, the file content is the same – a MD5 hash of both files is equal.

File 1 File 2
MD5 aeee996fd3484f28e5cd85fe26b6bdcd aeee996fd3484f28e5cd85fe26b6bdcd
Filename PsExec.exe p.exe
Path C:\SysInternals C:\TEMP
Owner Administrators LOCAL_SYSTEM
Modified Time Stamp 2013-02-10 09:22:04 1970-01-01 00:00:00

It is not that difficult, isn’t it?
APT detection therefore means the following:

  • Imitating the human point of view by pulling together all Metadata connected with an element, be it a file, a process or eventlog message and evaluate the legitimacy of the element based on all available Metadata attributes

The Simplest Webshell

We often encounter so-called webshells that were placed in web server directories to establish a simple backdoor. Webshells can be very specific and therefore easy to detect. The C99 webshell is a good example for a PHP webshell, JspSpy is a well-known JSP webshell. Both are easily detected, even by Antivirus engines (see: C99 on VT, JSPSpy on VT).
However, APT groups tend to use two different types of webshells:

  • Tiny webshells
  • Code snippets with certain functions copied from legitimate software

There are a lot of well-known tiny webshells. The following one is my favorite. Add a space or change the request parameter “abc” to something else and the detection ratio is alarmingly low (Example). It allows an attacker to evaluate (execute) an arbitrary command on the web server. There are numerous blog posts and other articles describing what can be done with a webshell like this. However, the protection level provided by AV engines, firewalls and NIDS is almost zero.

<%eval request("abc")%>

Another method we discovered was the use code snippets copied from blog entries or tutorial pages that allowed them to use only certain functions like “file upload” or “directory listing”.
They often use a weakness in web applications to upload and run their own scripts or even whole application containers (.war). By placing a known webshell like the JspSpy webshell into that web server folder, they would run the risk of being detected. What they really need is a distribution point for their toolset or a simple tool to execute code on the server (like a tiny webshell). We’ve seen simple upload scripts that provide nothing more than a upload function, which they use to store their toolsets for lateral movement. A google search for “upload jsp” revealed various scripts they used in their attacks. It’s obvious that AV engines won’t detect this type of threat. How could they? The attackers abuse benign pieces of code to establish malicious backdoors.
APT detection therefore means the following:

  • Use the Metadata like file size, creation timestamp, file extension in combination with generic content detection rules – e.g. check for the string “eval” in a file smaller 40 byte and a script extension like “.jsp”.
  • Check the content of upload directories for the expected file types (and don’t use the extension to determine the file type)
  • Check web server processes for executables running in the web server directories – e.g. curl.exe in D:\Inetpub\wwwroot
  • Generate and send frequent reports on modified files within the web server directories

The Heavy Burden of Definite Detection

One could be tempted to believe that I wrote this article in order to degrade Antivirus engines, but this isn’t the case. Antivirus solutions are still play a key role and carry the heavy burden of definite detection. Their scan result has to be “thumbs up” or “thumbs down” as there is no middle ground.
Years ago they introduced signatures to detect “Potentially Unwanted Applications” (PUA). Users or administrators decide on what to do if one of these “dual use” tools has been found on a system. Handling thousands of the events generated by the Antivirus agents is a difficult task, even with a central console or SIEM integrated log files. It is easy to understand why PUA events do not play an important role in view of dozens of Trojan detections per day.

APT detection is the art of suspicion. A missing “stickykeys” string in the “sethc.exe” indicates a manipulation, a replaced system file. It is not a definite detection but the certainty that something is wrong.


Considering the given examples an attentive reader may be inclined to believe that Antivirus and simple IOC scanning (Triage) is not enough to defend against Advanced Persistent Threats. After the experiences of the last 3 years I have to confirm that assumption.
Who would recognize and report the execution of a “sethc.exe” on a server system, the “PUA/PsExec” message generate by the Antivirus or another JSP file on the web server?
I even doubt that so-called “APT solutions” are able to detect

  • a “.war”-file upload to a Tomcat server by the use of “tomcat/tomcat” as credentials,
  • encrypted file uploads,
  • lateral movement using PsExec, Powershell or WMIC and
  • “StickyKey” backdoor access via RDP.

An extensive security monitoring in form of a SIEM system allows you to detect a needle in the haystack but only if you are able to distinguish between straw and needles.

The question is: How can I define such soft indicators to detect the described anomalies? The OpenIOC framework already contains options to combine certain characteristics like filename and filesize, but rather than using it as a tool to describe anomalies it is often used to tighten the detection to the level of a hash value. I prefer hash values over “Name:PsExec.exe” combined with “Filesize:381816” because it doesn’t make you believe that you’re looking at a clever rule.
I therefore recommend the following:

  1. Assume compromise and start from there
    Ask yourself: How would I detect a breach? What if attackers already took control of the Windows Domain and worked with domain admin accounts? What if they worked with tools that my Antivirus is unable to detect?
  2. Use all Metadata you can get to determine the legitimacy of an element
    This does no only apply to files or processes in APT and IOC scanning, but also to the discipline of security monitoring.E.g. Select interesting Antivirus events based on various characteristics and not only the status that indicates if the malware has be removed or not. Consider the location of the malware (Temporary Internet files or System32 folder), the user account (Restricted user, Administrator or LOCAL_SYSTEM), the malware type (JS/Redirector or PUA/PasswordDump), system type (server or client workstation), detection time (02:00am with noone working in the night shift), detected form (in a RAR archive or extracted). Develop similar schemes for other log types. The most interesting ones are Antivirus, Proxy and Windows logs.
    Useful Links:
    SysForensics published an article about process anomaly detection, which was adapted to other OS versions and included in our THOR APT Scanner and my LOKI IOC Scanner.
    Use SysInternals Sysmon to enrich you Windows log data.
  3. Don’t create detection rules that are too tight but concentrate on filtering the false positives
    If you regard filtering false positives as a pain in the neck you’re probably using the wrong SIEM system. (Warning – product placement: I prefer Splunk over all others, especially with the Enterprise Security App installed)
  4. Use IOCs from published APT reports to enrich your detection rules
    We use the APT reports to create new rules in our customer’s SIEM systems and as input for our APT scanner THOR.Useful Links:
    APT Notes is a IOC repository with hundreds of reports from the last years. You can download the github repository from here.
How to Write Simple but Sound Yara Rules

How to Write Simple but Sound Yara Rules

During the last 2 years I wrote approximately 2000 Yara rules based on samples found during our incident response investigations. A lot of security professionals noticed that Yara provides an easy and effective way to write custom rules based on strings or byte sequences found in their samples and allows them as end user to create their own detection tools.
However it makes me sad to see that there are mainly two types of rules published by the researchers:

  1. rules that generate many false positives and
  2. rules that match only the specific sample and are not much better than a hash value.

I therefore decided to write an article on how to build optimal Yara rules, which can be used to scan single samples uploaded to a sandbox and whole file systems with a minimal chance of false positives.
These rules are based on contained strings and easy to comprehend. You do not need to understand the reverse engineering of executables and I decided to avoid the new Yara modules like “pe” which I still consider as “testing” features that may lead to memory leaks or other errors when used in practice.

Automatic Rule Generation

First I believed that automatically generated rules can never be as good as manually created ones. During my work for out IOC scanners THOR and LOKI I had to create hundreds of Yara rules manually and it became clear that there is an obvious disadvantage. What I used to do was to extract UNICODE and ASCII strings from my samples by the following commands:

strings -el samples.exe
strings -a sample.exe

I prefer the UNICODE strings as they are often overlooked and less frequently changed within a certain malware/tool family. Make sure that you use UNICODE strings with the “wide” keyword and ASCII strings with the “ascii” keyword in your rules and use “fullword” if there is a word boundary before and after the string. The problem with this method is that you cannot decide if the string that is returned by the commands is unique for this malware or often used in goodware samples as well.
Look at the extracted strings in the following example:

NT LM 0.12
%s.exe %s
LoadLibrary( NTDLL.DLL ) Error:%d

Could you be sure that the string “NT LM 0.12” is a unique one, which is not used by legitimate software?
To accomplish this task for me I developed “yarGen“, a Yara rule generator that ships with a huge string database of common and benign software. I used the Windows system folder files of Windows 2003, Windows 7 and Windows 2008 R2 server, typical software like Microsoft Office, 7zip, Firefox, Chrome, Cygwin and various Antivirus solution program folders to generate the database. yarGen allows you to generate your own database or add folders with more goodware to the existing database.
yarGen extracts all ASCII and UNICODE strings from a sample and removes all strings that do also appear in the goodware string database. Then it evaluates and scores every string by using fuzzy regular expressions and the “Gibberish Detector” that allows yarGen to detect and prefer real language over character chains without meaning. The top 20 of the strings will be integrated in the resulting rule.
Let’s look at two examples from my work. A sample of the Enfal Trojan and a SMB Worm sample.
yarGen generates the following rule for the Enfal Trojan sample:

rule Enfal_Generic {
description = "Auto-generated rule - from 3 different files"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
$s0 = "urlmon" fullword
$s1 = "Registered trademarks and service marks are the property of their respec" wide
$s2 = "Micorsoft Corportation" fullword wide
$s3 = "IM Monnitor Service" fullword wide
$s4 = "imemonsvc.dll" fullword wide
$s5 = "iphlpsvc.tmp" fullword
$s6 = "XpsUnregisterServer" fullword
$s7 = "XpsRegisterServer" fullword
$s8 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$s9 = "tEHt;HuD" fullword
$s10 = "" fullword wide
$s11 = "#*8;-&gt;)" fullword
$s12 = "%/&gt;#?#*8" fullword
$s13 = "\\%04x%04x\" fullword
$s14 = "
3,8,18" fullword
$s15 = "
3,4,15" fullword
$s16 = "
3,7,12" fullword
$s17 = "
3,4,13" fullword
$s18 = "
3,8,12" fullword
$s19 = "
3,8,15" fullword
$s20 = "
3,6,12" fullword
all of them

The resulting string set contains many useful strings but also random ASCII characters ($s9, $s11, $s12) that do match on the given sample but are less likely to produce the same result on other samples of the family.
yarGen generates the following rule for the SMB Worm sample:

rule sig_smb {
description = "Auto-generated rule - file smb.exe"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
$s0 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$s1 = "SetServiceStatus failed, error code = %d" fullword ascii
$s2 = "%s\\Admin$\\%s.exe" fullword ascii
$s3 = "%s.exe %s" fullword ascii
$s4 = "iloveyou" fullword ascii
$s5 = "Microsoft@ Windows@ Operating System" fullword wide
$s6 = "\\svchost.exe" fullword ascii
$s7 = "secret" fullword ascii
$s8 = "SVCH0ST.EXE" fullword wide
$s9 = "msvcrt.bat" fullword ascii
$s10 = "Hello123" fullword ascii
$s11 = "princess" fullword ascii
$s12 = "Password123" fullword ascii
$s13 = "Password1" fullword ascii
$s14 = "config.dat" fullword ascii
$s15 = "sunshine" fullword ascii
$s16 = "password &lt;=14" fullword ascii
$s17 = "del /a %1" fullword ascii
$s18 = "del /a %0" fullword ascii
$s19 = "result.dat" fullword ascii
$s20 = "training" fullword ascii
all of them

The resulting rules are good enough to use them as they are, but they are far from an optimal solution. However it is good that so many strings have been found, which do not appear in the analyzed goodware samples.
If you don’t want to use or download yarGen, you could also use the online tool Yara Rule Generator provided by Joe Security, which was inspired by/based on yarGen.
It is not necessary to use a generator if your eye is trained and experienced. In this case just read the next section and select the strings to match the requirements of the (what I call) sufficiently generic Yara rules.

Sufficiently Generic Yara Rules

As I said in the introduction rules that generate false positives are pretty annoying. However the real tragedy is that most of the rules are far too specific to match on more than one sample and are therefore almost as useful as a file hash.
What I tend to do with the rules is to check all the strings and put them into at least 2 different categories:

  • Very specific strings = hard indicators for a malicious sample
  • Rare strings = likely that they do not appear in goodware samples, but possible
  • Strings that look common = (Optional) e.g. yarGen output strings that do not seem to be specific but didn’t appear in the goodware string database

Check out the modified rules in order to understand this splitting. Ignore the definition named $mz, I’ll explain it later and look at the string definitions below.
The definitions starting with $s contain the very specific strings, which I regard as so special that they would not appear in legitimate software. Note the typos in both strings: “Micorsoft Corportation” instead of “Microsoft Corporation” and “Monnitor” instead of “Monitor”.
The strings starting with $x seem to be special (I tend to google the strings) but I cannot say if they also appear in legitimate software. The definitions starting with $z seem to be ordinary but have not been part of the goodware string database so they have to be special in some way.

rule Enfal_Malware_Backdoor {
description = "Generic Rule to detect the Enfal Malware"
author = "Florian Roth"
date = "2015/02/10"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
$mz = { 4d 5a }
$s1 = "Micorsoft Corportation" fullword wide
$s2 = "IM Monnitor Service" fullword wide
$x1 = "imemonsvc.dll" fullword wide
$x2 = "iphlpsvc.tmp" fullword
$x3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$z1 = "urlmon" fullword
$z2 = "Registered trademarks and service marks are the property of their" wide
$z3 = "XpsUnregisterServer" fullword
$z4 = "XpsRegisterServer" fullword
( $mz at 0 ) and
( 1 of ($s*) ) or
( 2 of ($x*) and all of ($z*) )
and filesize &lt; 40000

Now check the condition statement and notice that I combine the rules with a magic header of an executable defined by $mz and a file size to exclude typical false positives like Antivirus signature files, browser cache or dictionary files. Set an ample file size value to avoid false negatives. (e.g. samples between 100K and 200K => set file size < 300K)
You can see that I decided that a single occurrence of one of the very specific strings would trigger that rule. ( 1 of $s* )
Than I combine a bunch of less unique strings with most or all of the ordinary looking strings. ( 2 of $x* and all of $z* )
Let’s look at second example. (see below)
$s1 is a very special string with string formatting placeholders “%s” in combination with an Admin$ share. $s2 seems to be the typical “svchost.exe” but contains the number “0” instead of an “O”, which is very uncommon and a clear indicator for something malicious.
All the definitions starting with $a are special but I cannot say for sure if they won’t appear in legitimate software. The strings defined by $x seem ordinary but were produced by yarGen, which means that they did not appear in the goodware string database.
This special example contains a list of typical passwords which is defined by $z1..z8.

rule SMB_Worm_Tool_Generic {
description = "Generic SMB Worm/Malware Signature"
author = "Florian Roth"
reference = ""
date = "2015/02/08"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
$mz = { 4d 5a }
$s1 = "%s\\Admin$\\%s.exe" fullword ascii
$s2 = "SVCH0ST.EXE" fullword wide
$a1 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$a2 = "\\svchost.exe" fullword ascii
$a3 = "msvcrt.bat" fullword ascii
$a4 = "Microsoft@ Windows@ Operating System" fullword wide
$x1 = "%s.exe %s" fullword ascii
$x2 = "password &lt;=14" fullword ascii
$x3 = "del /a %1" fullword ascii
$x4 = "del /a %0" fullword ascii
$x5 = "SetServiceStatus failed, error code = %d" fullword ascii
$z1 = "secret" fullword ascii
$z2 = "Hello123" fullword ascii
$z3 = "princess" fullword ascii
$z4 = "Password123" fullword ascii
$z5 = "Password1" fullword ascii
$z6 = "sunshine" fullword ascii
$z7 = "training" fullword ascii
$z8 = "iloveyou" fullword ascii
$mz at 0 and
( 1 of ($s*) and 1 of ($x*) ) or
( all of ($a*) and 2 of ($x*) ) or
( 5 of ($z*) and 2 of ($x*) ) and
filesize &lt; 200000

You see that I combined the string definitions in a similar way as before. This method in combination with the magic header and the file size should be a good starting point for the final stage – testing.


Testing the rules is very important. It seems that most authors decide that the rules are good enough if they match on the given samples.
You should definitely do the following checks:

  1. Scan the malware samples
  2. Scan a big goodware archive

To carry out the tests download the Yara scanner and run it from the command line. The goodware directory should include system files from various Windows versions, typical software and possible false positive sources (e.g. typical CMS software if you wrote Yara rules that match on malicious web shells)

Yara Rule Testing

Yara Rule Testing on Samples and Goodware

If the rule matched on the malicious samples and did not generate a match on the goodware archive your rule is good enough to test the rule in practice.


Make sure to check Part 2 of “How to Write Simple and Sound YARA Rules”.

How to Scan for System File Manipulations with Yara (Part 2/2)

How to Scan for System File Manipulations with Yara (Part 2/2)

As a follow up on my first article about inverse matching yara rules I would like to add a tutorial on how to scan for system file manipulations using Yara and Powershell. The idea of inverse matching is that we do not scan for something malicious that we already know but for anomalies within the system files. Chad Tilbury from Crowdstrike related to this method in his article describing a way to scan for this type of anomaly using their incident collection tool CrowdResponse. In my first article I described how we utilize this method in our incident response tool and promised a free solution based on available system tools.
The yara rules used to apply this method require the name of the observed file. Yara allows the file name to be passed via an external variable like in the following listing.

yara32.exe -d filename=iexplore.exe inverse-matching.yar iexplore.exe

But we have to define and pass this “filename” variable for every file we analyse while walking the directory tree.
So – what do we do?
First – we need a powershell script that walks a directory tree and feeds each file with an “.exe” extension together with the rule set and the file name as external variable to a yara32.exe. You could copy the script and paste it directly to the command line but I would recommend the following:
Prepare a folder with the following content:

  1. The powershell script as listed below – name it “inverse-scan.ps1”
  2. The ruleset listed below as “inverse-matching.yar”
  3. A version of Yara for Windows
  4. A batch script that invokes the powershell script with some parameters named “runit.bat”

The final result looks like this:

Yara Scan on Anomalies

Inverse Yara Matching Script Set

You can copy that folder to the target system, take it with you on a USB drive or provide a network share with its contents.

Get-ChildItem -Recurse -filter *.exe C:\Windows 2> $null |
ForEach-Object { Write-Host -foregroundcolor "green" "Scanning"$_.FullName $_.Name; ./yara32.exe -d filename=$_.Name inverse-matching.yar $_.FullName 2> $null }


powershell -ExecutionPolicy ByPass -File ./inverse-scan.ps1


rule iexplore_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal iexplore.exe - typical strings not found in file"
        date = "23/04/2014"
        score = 55
        $upd_magic = { 44 43 }
        $win2003_win7_u1 = "IEXPLORE.EXE" wide nocase
        $win2003_win7_u2 = "Internet Explorer" wide fullword
        $win2003_win7_u3 = "translation" wide fullword nocase
        $win2003_win7_u4 = "varfileinfo" wide fullword nocase
        not ( $upd_magic at 0 ) and not 1 of ($win*) and filename matches /iexplore\.exe/is
rule svchost_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal svchost.exe - typical strings not found in file"
        date = "23/04/2014"
        score = 55
        $upd_magic = { 44 43 }
        $win2003_win7_u1 = "svchost.exe" wide nocase
        $win2003_win7_u3 = "coinitializesecurityparam" wide fullword nocase
        $win2003_win7_u4 = "servicedllunloadonstop" wide fullword nocase
        $win2000 = "Generic Host Process for Win32 Services" wide fullword
        $win2012 = "Host Process for Windows Services" wide fullword
        filename matches /svchost\.exe/is and not 1 of ($win*) and not ( $upd_magic at 0 )
rule explorer_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal explorer.exe - typical strings not found in file"
        date = "27/05/2014"
        score = 55
        $upd_magic = { 44 43 }
        $s1 = "EXPLORER.EXE" wide fullword
        $s2 = "Windows Explorer" wide fullword
        filename matches /explorer\.exe/is and not 1 of ($s*) and not ( $upd_magic at 0 )
rule sethc_ANOMALY {
        description = "Sethc.exe has been replaced - Indicates Remote Access Hack RDP"
        author = "F. Roth"
        reference = ""
        date = "2014/01/23"
        score = 70
        $upd_magic = { 44 43 }
        $s1 = "stickykeys" fullword nocase
        $s2 = "stickykeys" wide nocase
        $s3 = "Control_RunDLL access.cpl" wide fullword
        $s4 = "SETHC.EXE" wide fullword
        filename matches /sethc\.exe/ and not 1 of ($s*) and not ( $upd_magic at 0 )
rule Utilman_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal utilman.exe - typical strings not found in file"
        date = "01/06/2014"
        score = 55
        $upd_magic = { 44 43 }
        $win7 = "utilman.exe" wide fullword
        $win2000 = "Start with Utility Manager" fullword wide
        $win2012 = "utilman2.exe" fullword wide
        filename matches /utilman\.exe/is and not 1 of ($win*) and not ( $upd_magic at 0 )
rule osk_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal osk.exe (On Screen Keyboard) - typical strings not found in file"
        date = "01/06/2014"
        score = 55
        $upd_magic = { 44 43 }
        $s1 = "Accessibility On-Screen Keyboard" wide fullword
        $s2 = "\\oskmenu" wide fullword
        $s3 = "&About On-Screen Keyboard..." wide fullword
        $s4 = "Software\\Microsoft\\Osk" wide
        filename matches /osk\.exe/is and not 1 of ($s*) and not ( $upd_magic at 0 )
rule magnify_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal magnify.exe (Magnifier) - typical strings not found in file"
        date = "01/06/2014"
        score = 55
        $upd_magic = { 44 43 }
        $win7 = "Microsoft Screen Magnifier" wide fullword
        $win2000 = "Microsoft Magnifier" wide fullword
        $winxp = "Software\\Microsoft\\Magnify" wide
        filename matches /magnify\.exe/is and not 1 of ($win*) and not ( $upd_magic at 0 )
rule narrator_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal narrator.exe - typical strings not found in file"
        date = "01/06/2014"
        score = 55
        $upd_magic = { 44 43 }
        $win7 = "Microsoft-Windows-Narrator" wide fullword
        $win2000 = "&About Narrator..." wide fullword
        $win2012 = "Screen Reader" wide fullword
        $winxp = "Software\\Microsoft\\Narrator"
        $winxp_en = "SOFTWARE\\Microsoft\\Speech\\Voices" wide
        filename matches /narrator\.exe/is and not 1 of ($win*) and not ( $upd_magic at 0 )
rule notepad_ANOMALY {
        author = "Florian Roth"
        description = "Abnormal notepad.exe - typical strings not found in file"
        date = "01/06/2014"
        score = 55
        $upd_magic = { 44 43 }
        $win7 = "HELP_ENTRY_ID_NOTEPAD_HELP" wide fullword
        $win2000 = "Do you want to create a new file?" wide fullword
        $win2003 = "Do you want to save the changes?" wide
        $winxp = "Software\\Microsoft\\Notepad" wide
        filename matches /notepad\.exe/is and not 1 of ($win*) and not ( $upd_magic at 0 )

Although the string descriptors list only some of the windows versions we’ve tested it against the following versions:
Windows 2000
Windows 2003 Server
Windows 7 (x64)
Windows 2008 R2
Windows 2012
What you get as result is a small anomaly scanner made completely with Windows tools and Yara. An administrator would just have to click the Batch file and run the script with admin rights. The following screenshot shows a scan on the Windows folder with a prepared malicious “iexplore.exe” in the subfolder “C:\Windows\AA_Testing”.

Yara Anomaly Scanner

Yara Inverse Matching Anomaly Scanner in Action

You could remove the section “Write-Host -foregroundcolor “green” “Scanning”$_.FullName $_.Name;” to show only the alerts or modify the script that it writes a log file.
We use all of these rules in our APT Scanner THOR and added further rules matching 3rd party tools attackers tend to replace or rename.
Inverse Yara Signature Matching (Part 1/2)

Inverse Yara Signature Matching (Part 1/2)

During our investigations we encountered situations in which attackers replaced valid system files with other system files to achieve persistence and establish a backdoor on the systems. The most frequently used method was the replacement of the “sethc.exe” with the valid command line “cmd.exe” to establish a backdoor right in logon screen by pressing shift several times, popping a command line running under LOCAL_SYSTEM rights.
It is obvious that none of our signatures matched on a valid Windows system file that just had a different name – cmd.exe renamed to sethc.exe. The signature was still intact, the MD5 and SHA1 hash of the file a well known one and listed in the “trusted-md5-hashesh” database. Therefore I introduced the so called “Inverse Yara Signature Matching” that checks if certain system files contain specific strings. THOR extends the Yara functionality by certain values that may be checked including the “filename” of the scanned file. This could also be accomplished by the Yara integrated functionality of external variables.
So – I combined the file name with strings that must exist in the system file and generated a rule that checks for the existence of 4 different strings that I extracted from all “sethc.exe” versions from the different operating system versions (Windows 2003, Windows XP, Windows 7, Windows 2008 and Windows 2012).

$s1 = "stickykeys" fullword nocase
$s2 = "stickykeys" wide nocase
$s3 = "Control_RunDLL access.cpl" wide
$s4 = "SETHC.EXE" wide
$filename = "filename: sethc.exe"
condition: $filename and not 1 of ($s*)

The rule matches if the analysed “sethc.exe” does NOT contain the typical string values and therefore indicates a manipulation. This method could be extended to other system files that are typically replaced or placed in uncommon directories like an “explorer.exe” in the “C:\Windows\System32” folder or a “svchost.exe” in the “C:\Windows” folder. We have already integrated checks that compares the file extension with the actual file type but this definitely bringst the anomaly detection to a new level.
Applying the rule listed above we detected over 15 server systems with this backdoor modification in place in a distributed THOR run. In case of the “sethc.exe” we had no false positives but they are possible if new operating system versions of the “sethc.exe” are checked.

Signature Matching sethc.exe

Yara Signature sethc.exe

With this method you are not able to determine the exact malware or malicious modification of the analysed file but you will at least be able to detect the modification.

Update 28.08.14

Thanks to Chad for the back reference to our blog. I even created more rules that match on valid Windows system files and described a way to scan a system with Windows PowerShell. You can find the second part of my article here.

Howto detect Ebury SSH Backdoor

Die folgende Yara Signatur kann für die Erkennung der Ebury SSH Backdoor verwendet werden.

rule Ebury_SSHD_Malware_Linux {
description = "Ebury Malware"
author = "Florian Roth"
hash = "4a332ea231df95ba813a5914660979a2"
$s0 = "keyctl_set_reqkey_keyring" fullword
$s1 = "recursive_session_key_scan" fullword
$s2 = "keyctl_session_to_parent" fullword
$s3 = "keyctl_assume_authority" fullword
$s4 = "keyctl_get_security_alloc" fullword
$s5 = "keyctl_instantiate_iov" fullword
$s6 = "keyutils_version_string" fullword
$s7 = "keyctl_join_session_keyring" fullword
$a1 = "%[^;];%d;%d;%x;"
all of them

Wer kein Yara verwenden möchte, kann auf diesen Workaround zurückgreifen.

find /lib -type f -size -50k -exec strings -f {} \; | grep '%\[^;\];%d;%d;%x;'

Weitere Informationen zur Erkennung von Ebury CERT Bund.

WordPress Cookie Plugin by Real Cookie Banner