Malware Static Analysis

Hossein Yavari
Feb. 16, 2021
Malware Static Analysis
1

What is Static Analysis?
• Technique of analyzing the suspect file
without executing it.
• Extracting useful information from the
suspect binary,
• To help how to classify or analyze it and
where to focus subsequent analysis efforts.
2

What Do We Learn?
• Identifying the malware's target architecture
• Fingerprinting the malware
• Scanning the suspect binary with anti-virus
engines
• Extracting strings, functions, and metadata
associated with the file
• Identifying the obfuscation techniques used to
thwart analysis
• Classifying and comparing the malware samples
3

Determining the File Type
• These methods will help you identify the malware's target
operating system and the architecture.
• Windows, Linux, etc?
• 32-bit/64-bit ?
• Example: {.exe, .dll, .sys, .drv, .com, .ocx } are Windows
executable files.
• Most Windows-based malware are executable files ending with
extensions such as .exe, .dll, .sys.
• But relying on file extensions alone is not recommended!!
4

File Signature
• Attackers use different tricks to hide their file
by modifying the file extension and changing
its appearance to trick users into executing
it.
• Instead of relying on file extension, File
signature can be used to determine the file
type.
• A file signature is a unique sequence of bytes
that is written to the file's header.
5

Identifying File Type Using Manual Method
• Using hex editors:
• A hex editor is a tool that allows an examiner to
inspect each byte of the file
• Example:
• HxD hex editor (https://mh-nexus.de/en/hxd/)
6

Identifying File Type Using Manual Method (Cont.)
7

8

#> xxd -g 1 <targetfile> | more
9

Identifying File Type Using Tools
#> file <targetfile>
10

Identifying File Type Using Tools (Cont.)
CFF Explorer : https://ntcore.com/?page_id=388
11

Identifying File Type Using Python
12

Identifying File Type – Hidden Extension
13

Fingerprinting the Malware
• Generating the cryptographic hash values for the
suspect binary based on its file content.
• Hashing algorithms such as MD5, SHA1 or
SHA256 are considered the de facto standard for
generating file hashes for the malware
specimens.
• Same malware sample can use different
filenames, but the cryptographic hash calculated
based on the file content will remain the same.
• Cryptographic hash for your suspect file serves
as a unique identifier.
14

Fingerprinting the Malware (Cont.)
• During dynamic analysis, when malware is executed, it can
copy itself to a different location or drop another piece of
malware. Having hash of the sample can help in
identifying whether the newly dropped/copied sample is
the same as the original sample or a different one.
• File hash is frequently used as an indicator to share with
other security researchers to help them identify the
sample.
• File hash can be used to determine whether the sample
has been previously detected by searching online or
searching the database of multi-Anti-virus scanning
services.
15

Generating Cryptographic Hash Using Tools
16

Generating Cryptographic Hash Using Tools (Cont.)
17

Generating Cryptographic Hash Using Python
18

Multiple Anti-Virus Scanning
• Scanning the suspect binary with multiple anti-
virus scanners helps in determining whether
malicious code signatures exist for the suspect
file.
• Visiting the respective antivirus vendor websites
or searching for the signature in search engines,
you can yield further details about the suspect
file
19

Scanning the Suspect Binary with VirusTotal
https://www.virustotal.com/gui/home/upload
20

Scanning the Suspect Binary with VirusTotal (Cont.)
21

Querying Hash Values Using VirusTotal Public API
22

Risk of Using Anti-Virus Scanning
• If a suspect binary does not get detected by the Anti-Virus
scanning engines, it does not necessarily mean that the
suspect binary is safe.
• These anti-virus engines rely on signatures and heuristics
to detect malicious files. The malware authors can easily
modify their code and use obfuscation techniques to
bypass these detections.
• When you upload a binary to a public site, the binary you
submit may be shared with third parties and vendors. The
suspect binary may contain sensitive, personal, or
proprietary information specific to your organization.
• Most web-based anti-virus scanning services allow you to
search their existing database of scanned files using
cryptographic hash values (MD5, SHA1, or SHA256).
23

Risk of Using Anti-Virus Scanning
• When you submit a binary to the online antivirus
scanning engines, the scan results are stored in their
database, and most of the scan data is publicly
available
• Attackers can use the search feature to query the
hash of their sample to check whether their binary
has been detected.
• Detection of their sample may cause the attackers to
change their tactics to avoid detection.
24

Extracting Strings
• Strings are ASCII and Unicode-printable sequences of
characters embedded within a file.
• It can give clues about the program functionality and
indicators associated with a suspect binary.
• Strings extracted from the binary can contain references to
file names, URLs, domain names, IP addresses, attack
commands, registry keys, and so on.
• Although strings do not give a clear picture of the purpose
and capability of a file, they can give a hint about what
malware can do.
• For example, if a malware creates a file, the filename is
stored as a string in the binary. Or, if a malware resolves a
domain name controlled by the attacker, then the domain
name is stored as a string.
25

String Extraction Using Tools
#> strings -a <filename>
26

String Extraction Using Tools (Cont.)
#> strings –a -el <filename>
27

String Extraction Using Tools (Cont.)
28

Decoding Obfuscated Strings Using FLOSS
• String obfuscation techniques is used to avoid
detection.
• FireEye Labs Obfuscated String Solver (FLOSS)
is a tool designed to identify and extract
obfuscated strings from malware
automatically.
• It can help you determine the strings that
malware authors want to hide from string
extraction tools.
• FLOSS can also be used just like the strings
utility to extract human-readable strings
(ASCII and Unicode).
29

FLOSS
https://github.com/mandiant/flare-floss
30

Determining File Obfuscation
• Obfuscation is used to protect the inner
workings of the malware from security
researchers, malware analysts, and reverse
engineers.
• These techniques make it difficult to
detect/analyze the binary; extracting the
strings from such binary results in very fewer
strings, and most of the strings are obscured.
• Packers and Cryptors programs use to
obfuscate their file to evade detection from
security products such as anti-virus and to
thwart analysis.
31

Packers and Cryptors
• A Packer is a program that takes the
executable as input, and it uses compression
to obfuscate the executable's content.
• This obfuscated content is then stored within
the structure of a new executable file; the
result is a new executable file (packed
program) with obfuscated content on the
disk.
• Upon execution of the packed program, it
executes a decompression routine, which
extracts the original binary in memory during
runtime and triggers the execution.
32

Packers and Cryptors (Cont.)
• A Cryptor is like a Packer, but instead of
using compression, it uses encryption to
obfuscate the executable's content, and the
encrypted content is stored in the new
executable file.
• Upon execution of the encrypted program, it
runs a decryption routine to extract the
original binary in the memory and then
triggers the execution.
33

Detecting File Obfuscation Using Exeinfo PE
http://www.exeinfo.byethost18.com/?i=1
Loading the packed malware sample into Exeinfo PE shows that it is packed with
UPX, and it also gives a hint on which command to use to decompress the
obfuscated file; this can make your analysis much easier.
35

PE Header
• The PE (Portable Executable) file is a series of structures
and sub-components that contain the information
required by the operating system to load it into memory
such as where the executable needs to be loaded into
memory, the address where the execution starts, the list
of libraries/functions on which the application relies on,
and the resources used by the binary.
• Examining the PE header yields a wealth of information
about the binary, and its functionalities.
• You can get a clear understanding of the PE file format by
loading a suspect file into PE analysis tools that allow you
to examine and modify the PE structure and its sub-
components (CFF Explorer, PE Internals, PPEE(puppy),
PEBrowse Professional ,…)
36

• The functions that are executable and imported from other
files (mostly DLLs) are called imported functions and
provides interaction with file, registry, network, and so on.
• For example, if a malware executable wants to create a file
on disk, on Windows, it can use an API CreateFile(), which
is exported in kernel32.dll. To call the API, it first has to
load kernel32.dll into its memory and then call the
CreateFile() function.
• Inspecting the DLLs that a malware relies upon and the API
functions that it imports from the DLLs can give an idea
about the functionality and capability of malware and
what to anticipate during its execution.
• The file dependencies in Windows executables are stored
in the import table of the PE file structure.
Inspecting File Dependencies and Imports
37

Inspecting File Dependencies and Imports (Cont.)
38

In addition to determining the malware functionality, imports can help you detect
whether a malware sample is obfuscated. If you come across a malware with very
few imports, then it is a strong indication of a packed binary.
Inspecting File Dependencies and Imports (Cont.)
39

Using Python to Enumerate DLL Files and Imported Functions
https://github.com/erocarrera/pefile
40

Inspecting Exports
• Typically, a DLL exports functions (exports)
that are imported by the executable. A DLL
cannot run on its own and depends on a
host process for executing its code.
• An attacker often creates a DLL that exports
functions containing malicious functionality.
To run the malicious functions within the
DLL, it is somehow made to be loaded by a
process that calls these malicious functions.
• DLLs can also import functions from other
libraries (DLLs) to perform system
operations.
41

Examining PE Section Table And Sections
• The actual content of the PE file is divided into sections.
• The sections are immediately followed by the PE header.
• These sections represent either code or data and they have in-memory attributes such as
read/write.
• For example, a section with name .text indicates code and has an attribute of read-execute;
a section with name .data indicates global data and has an attribute of read-write.
42

Examining PE Section Table And Sections (Cont.)
43

Examining PE Section Table And Sections (Cont.)
• The section names do not contain common sections added by the compiler (such as
.text, .data, and so on) but contain section names UPX0 and UPX1
• Typically, raw-size and the virtual-size should be almost equal, but small differences
are normal due to section alignment.
• This is a strong indication of a packed binary. The reason for this discrepancy is that
when a packed binary is executed, the decompression routine of the packer will copy
decompressed data or instructions into the memory during runtime.
44

Examining the Compilation Timestamp
• The PE header contains information that specifies when the binary was
compiled.
• Examining this field can give an idea of when the malware was first created.
• This information can be useful in building a timeline of the attack campaign.
• It is also possible that an attacker modifies the timestamp to prevent an
analyst from knowing the actual timestamp.
45

Examining PE Resources
• The resources required by the executable file such
as icons, menu, dialog, and strings are stored in the
resource section (.rsrc) of an executable file.
• Often, attackers store information such as additional
binary, decoy documents, and configuration data in
the resource section, so examining the resource can
reveal valuable information about a binary.
• The resource section also contains version
information that can reveal information about the
origin, company name, program author details, and
copyright information.
46

Examining PE Resources (Cont.)
http://www.angusj.com/resourcehacker/
The malware uses the icon of Microsoft
Excel to give the appearance of an
excel sheet.
The executable also contains file
signature of D0 CF 11 E0 A1 B1 1A E1
which is the sequence of bytes for a
Microsoft Office document file.
The attackers stored a decoy excel
sheet in the resource section. Upon
execution, the malware is executed in
the background, and this decoy excel
sheet is displayed to the user as a
diversion.
47

• Comparing the suspect binary with
previously analyzed samples or the samples
stored in a public or private repository can
give:
 understanding of the malware family,
its characteristics, and the similarity
with the previously analyzed samples..
• Methods:
 Fuzzy Hashing
 Import Hash
 YARA
Comparing And Classifying The Malware
48

• This technique is useful in comparing a suspect binary with the samples in a
repository to identify the samples that are similar.
• This can help in identifying the samples that belong to the same malware family
or the same actor group.
• Cryptographic hashes are not helpful in determining the relationship between
the samples, whereas the fuzzy hashing technique
identifies the similarity between the samples
Classifying Malware Using Fuzzy Hashing
49

Classifying Malware Using Fuzzy Hashing (Cont.)
• ssdeep is a useful tool to generate the fuzzy hash for
a sample, and it also helps in determining percentage
similarity between the samples.
• From the output, out of the three samples, two
samples have 99% similarity, suggesting that these
two samples probably belong to the same malware
family.
https://ssdeep-project.github.io/ssdeep/
50

• You might have a directory containing many
malware samples. In that case, it is possible
to run ssdeep on directories and
subdirectories containing malware samples
using the recursive mode (-r) as shown here:
• You can also match a suspect binary with a list of file hashes.
In the following example, the ssdeep hashes of all the
binaries were redirected to a text file (all_hashes.txt), and
then the suspect binary (blab.exe) is matched with all the
hashes in the file:
51

52

Classifying Malware Using Import Hash
• Import hash (or imphash) is a technique in which
hash values are calculated based on the
library/imported function (API) names and their
order within the executable.
• If the files were compiled from the same source and
in the same manner, those files would tend to have
the same imphash value.
53

Classifying Malware Using Import Hash (Cont.)
• In this output, the samples have different
cryptographic hash values (MD5), but the impash of
these samples are identical; this indicates that they
were probably compiled from the same source and in
the same manner.
get_imphash.py
Files having the same imphash does not necessarily mean they are from the same threat group; you might
have to correlate information from various sources to classify your malware.
For example, it is possible that the malware samples were generated using a common builder kit.
54

Classifying Malware Using Section Hash
• Like import hashing, section hashing can also help in
identifying related samples.
When an executable is loaded in pestudio, it calculates the
MD5 of each section (.text,.data, .rdata, and so on.)
55

• Yet Another Recursive/Ridiculous Acronym (YARA) is a
tool aimed at helping malware researchers to identify
and classify malware samples.
• With YARA you can create descriptions of malware
families based on textual or binary patterns.
• These YARA rules consist of a set of strings and a
Boolean expression, which determines its logic.
• YARA is multi-platform, running on Windows, Linux and
Mac OS X, and can be used through its command-line
interface or from your own Python scripts with the
yara-python extension.
• The YARA rule consists of the following components:
 Rule identifier
 String Definition
 Condition Section
Classifying Malware Using YARA
http://virustotal.github.io/yara/
56

Applications of YARA
Example 1: Detects an executable file containing an
embedded Microsoft Office document in it
Example 2: Detecting Packers
57

Applications of YARA (Cont.)
• YARA can be used to detect patterns
in any file. This sample YARA rule
detects communication of different
variants of the Gh0stRAT malware:
• Running the preceding rule on a directory containing
network packet captures (pcaps), detected the GhostRAT
pattern:
58

Malware Static Analysis

More Related Content

What's hot

Similar to Malware Static Analysis

More from Hossein Yavari

Recently uploaded

Malware Static Analysis