Code, Strings and what’s in between - Intezer


Cybersecurity DNA

Code, Strings and what’s in between

Or Fridman
13.08.18 | 10:46 am

Our technology is based on genetic analysis of files. So far, we’ve focused mainly on detection of code reuse, as part of the genetic malware analysis process.
Recently, we’ve added two new and exciting capabilities to our product:

1. String reuse
2. View shared code

While each feature brings its own value to the product, it’s the combination of them both that really empowers our investigation capabilities of suspicious files.

String Reuse

In addition to our capability to identify code similarities in files, we are now able to also discover strings similarities. Just like code, strings can also be used as a footprint for malware authors.
In many cases, there is a correlation between code reuse and string reuse. Relevant strings can shed more light on the malware you are facing and strengthen the conclusion you reach following the code reuse analysis.

In this very interesting example, you can see that in addition to a strong code similarity to the WannaCry code, there are a lot of strings that were seen in WannaCry, which can indicate on ransomware attack.

For example:
1. “You did not pay or we did not confirmed your payment!”
2. “Pay now, if you want to decrypt ALL your files!”

Another example demonstrates shared strings between the MirageFox APT and older Mirage attacks. As we found a strong code connection between these attacks, the new feature helps validate the findings, by adding more context to the attack.

One of the most powerful things you can achieve with string reuse, is the ability to focus on the unique strings. Just as we do with code, we filter the common strings (used in malware and trusted software as well) and focus the analyst’s attention on the more interesting strings, including unique strings, which are strings that have not been used before.
In this example, we found 728 strings, 559 of which were common. What we actually want you to see here, are some very interesting unique strings, which can correlate with the code reuse finding of a new variant of CoinMiner:
1. “Message: You must pay 500 BTC to 3HJ2QkGaSZSR6stdH4tQyZo2sPCDwjF7Ac then we will stop all hack activities on your system!”
2. “C:\Users\chung96vn\Documents\DT-SBV\backdoor\backdoor\Debug\backdoor.pdb”

Correlation between code and strings can also be found in trusted software. In this example, you can see that most of the code and strings were seen in OpenSSL.

While string reuse can be used to support detection, it also improves the ability to create very effective vaccines (YARA rules or other formats). More to be discussed on this topic next month.

View Shared Code

Until now, we’ve provided the ability to view the shared code, only with our plugin for IDA Pro. While this is the best choice for advanced investigation, we wanted to enable that experience also in the web interface and so we’ve added the ability to view the Assembly code shared with specific malware/trusted software.
This capability is useful for initial research and focuses on the specific pieces of code that indicate the file is malicious. Moreover, it provides analysts with reverse engineering capabilities, as it automatically points them to the interesting code they should focus on, addressing one of the main challenges of reverse engineering.
In order to view the shared code in the web interface, all you need to do is click on the relevant code family you’d like to see and then choose the “Shared Code” section.

The code blocks behind the genes are divided into some clusters, which represent a collection of connected code blocks.
By browsing the code, it’s possible to see strings, API calls, and other interesting information that enables to better understand what is behind the malicious code.
For example, in the figure below, you can see part of the Assembly code behind WannaCry. The addition of specific strings assists in understanding the code.

While reverse engineering can take hours or even days, with this new ability to see the shared code behind a specific malware family, we make it possible even at a glance, before any deep investigation takes place, to get more insights about the malware.

In addition, the new feature also enables to view the unique code in the file. As mentioned above, unique code is code that has never been seen before. When it comes to malicious files, unique code can indicate a new variant of the malware is used, which naturally emphasizes the importance of investigating this code.
Let’s take a look at the same example of unique strings. When browsing the unique code, it’s possible to find the code related to the strings we showed before!


We really believe that using these two features will empower every IR team that uses our product. When a new incident is flagged, time is critical and we want to accelerate the investigation time. Both strings that are correlated to malware, and the actual Assembly code, are key to understand the threat better and faster.

By Or Fridman

Or has 10 years of experience in technology development and product management. He is responsible for developing and executing the product roadmap and technological integrations behind Intezer Analyze. Or began his career in cybersecurity through a programming course in the Israeli Defense Force (IDF) and later became a developer and product manager for the unit. Or previously served as a product manager for CyberArk for three years. Or earned his Bachelor of Science in computer science from the Open University of Israel.

Register to our free community

© 2019 All rights reserved