KEMBAR78
Malware Static Analysis | PPTX
Hossein Yavari
Feb. 16, 2021
Malware Static Analysis
1
What is Static Analysis?
• Technique of analyzing the suspect file
without executing it.
• Extracting useful information from the
suspect binary,
• To help how to classify or analyze it and
where to focus subsequent analysis efforts.
2
What Do We Learn?
• Identifying the malware's target architecture
• Fingerprinting the malware
• Scanning the suspect binary with anti-virus
engines
• Extracting strings, functions, and metadata
associated with the file
• Identifying the obfuscation techniques used to
thwart analysis
• Classifying and comparing the malware samples
3
Determining the File Type
• These methods will help you identify the malware's target
operating system and the architecture.
• Windows, Linux, etc?
• 32-bit/64-bit ?
• Example: {.exe, .dll, .sys, .drv, .com, .ocx } are Windows
executable files.
• Most Windows-based malware are executable files ending with
extensions such as .exe, .dll, .sys.
• But relying on file extensions alone is not recommended!!
4
File Signature
• Attackers use different tricks to hide their file
by modifying the file extension and changing
its appearance to trick users into executing
it.
• Instead of relying on file extension, File
signature can be used to determine the file
type.
• A file signature is a unique sequence of bytes
that is written to the file's header.
5
Identifying File Type Using Manual Method
• Using hex editors:
• A hex editor is a tool that allows an examiner to
inspect each byte of the file
• Example:
• HxD hex editor (https://mh-nexus.de/en/hxd/)
6
Identifying File Type Using Manual Method (Cont.)
7
Identifying File Type Using Manual Method (Cont.)
8
Identifying File Type Using Manual Method (Cont.)
#> xxd -g 1 <targetfile> | more
9
Identifying File Type Using Tools
#> file <targetfile>
10
Identifying File Type Using Tools (Cont.)
CFF Explorer : https://ntcore.com/?page_id=388
11
Identifying File Type Using Python
12
Identifying File Type – Hidden Extension
13
Fingerprinting the Malware
• Generating the cryptographic hash values for the
suspect binary based on its file content.
• Hashing algorithms such as MD5, SHA1 or
SHA256 are considered the de facto standard for
generating file hashes for the malware
specimens.
• Same malware sample can use different
filenames, but the cryptographic hash calculated
based on the file content will remain the same.
• Cryptographic hash for your suspect file serves
as a unique identifier.
14
Fingerprinting the Malware (Cont.)
• During dynamic analysis, when malware is executed, it can
copy itself to a different location or drop another piece of
malware. Having hash of the sample can help in
identifying whether the newly dropped/copied sample is
the same as the original sample or a different one.
• File hash is frequently used as an indicator to share with
other security researchers to help them identify the
sample.
• File hash can be used to determine whether the sample
has been previously detected by searching online or
searching the database of multi-Anti-virus scanning
services.
15
Generating Cryptographic Hash Using Tools
16
Generating Cryptographic Hash Using Tools (Cont.)
17
Generating Cryptographic Hash Using Python
18
Multiple Anti-Virus Scanning
• Scanning the suspect binary with multiple anti-
virus scanners helps in determining whether
malicious code signatures exist for the suspect
file.
• Visiting the respective antivirus vendor websites
or searching for the signature in search engines,
you can yield further details about the suspect
file
19
Scanning the Suspect Binary with VirusTotal
https://www.virustotal.com/gui/home/upload
20
Scanning the Suspect Binary with VirusTotal (Cont.)
21
Querying Hash Values Using VirusTotal Public API
22
Risk of Using Anti-Virus Scanning
• If a suspect binary does not get detected by the Anti-Virus
scanning engines, it does not necessarily mean that the
suspect binary is safe.
• These anti-virus engines rely on signatures and heuristics
to detect malicious files. The malware authors can easily
modify their code and use obfuscation techniques to
bypass these detections.
• When you upload a binary to a public site, the binary you
submit may be shared with third parties and vendors. The
suspect binary may contain sensitive, personal, or
proprietary information specific to your organization.
• Most web-based anti-virus scanning services allow you to
search their existing database of scanned files using
cryptographic hash values (MD5, SHA1, or SHA256).
23
Risk of Using Anti-Virus Scanning
• When you submit a binary to the online antivirus
scanning engines, the scan results are stored in their
database, and most of the scan data is publicly
available
• Attackers can use the search feature to query the
hash of their sample to check whether their binary
has been detected.
• Detection of their sample may cause the attackers to
change their tactics to avoid detection.
24
Extracting Strings
• Strings are ASCII and Unicode-printable sequences of
characters embedded within a file.
• It can give clues about the program functionality and
indicators associated with a suspect binary.
• Strings extracted from the binary can contain references to
file names, URLs, domain names, IP addresses, attack
commands, registry keys, and so on.
• Although strings do not give a clear picture of the purpose
and capability of a file, they can give a hint about what
malware can do.
• For example, if a malware creates a file, the filename is
stored as a string in the binary. Or, if a malware resolves a
domain name controlled by the attacker, then the domain
name is stored as a string.
25
String Extraction Using Tools
#> strings -a <filename>
26
String Extraction Using Tools (Cont.)
#> strings –a -el <filename>
27
String Extraction Using Tools (Cont.)
28
Decoding Obfuscated Strings Using FLOSS
• String obfuscation techniques is used to avoid
detection.
• FireEye Labs Obfuscated String Solver (FLOSS)
is a tool designed to identify and extract
obfuscated strings from malware
automatically.
• It can help you determine the strings that
malware authors want to hide from string
extraction tools.
• FLOSS can also be used just like the strings
utility to extract human-readable strings
(ASCII and Unicode).
29
FLOSS
https://github.com/mandiant/flare-floss
30
Determining File Obfuscation
• Obfuscation is used to protect the inner
workings of the malware from security
researchers, malware analysts, and reverse
engineers.
• These techniques make it difficult to
detect/analyze the binary; extracting the
strings from such binary results in very fewer
strings, and most of the strings are obscured.
• Packers and Cryptors programs use to
obfuscate their file to evade detection from
security products such as anti-virus and to
thwart analysis.
31
Packers and Cryptors
• A Packer is a program that takes the
executable as input, and it uses compression
to obfuscate the executable's content.
• This obfuscated content is then stored within
the structure of a new executable file; the
result is a new executable file (packed
program) with obfuscated content on the
disk.
• Upon execution of the packed program, it
executes a decompression routine, which
extracts the original binary in memory during
runtime and triggers the execution.
32
Packers and Cryptors (Cont.)
• A Cryptor is like a Packer, but instead of
using compression, it uses encryption to
obfuscate the executable's content, and the
encrypted content is stored in the new
executable file.
• Upon execution of the encrypted program, it
runs a decryption routine to extract the
original binary in the memory and then
triggers the execution.
33
UPX
https://upx.github.io/
34
Detecting File Obfuscation Using Exeinfo PE
http://www.exeinfo.byethost18.com/?i=1
Loading the packed malware sample into Exeinfo PE shows that it is packed with
UPX, and it also gives a hint on which command to use to decompress the
obfuscated file; this can make your analysis much easier.
35
PE Header
• The PE (Portable Executable) file is a series of structures
and sub-components that contain the information
required by the operating system to load it into memory
such as where the executable needs to be loaded into
memory, the address where the execution starts, the list
of libraries/functions on which the application relies on,
and the resources used by the binary.
• Examining the PE header yields a wealth of information
about the binary, and its functionalities.
• You can get a clear understanding of the PE file format by
loading a suspect file into PE analysis tools that allow you
to examine and modify the PE structure and its sub-
components (CFF Explorer, PE Internals​, PPEE(puppy),
PEBrowse Professional ,…)
36
• The functions that are executable and imported from other
files (mostly DLLs) are called imported functions and
provides interaction with file, registry, network, and so on.
• For example, if a malware executable wants to create a file
on disk, on Windows, it can use an API CreateFile(), which
is exported in kernel32.dll. To call the API, it first has to
load kernel32.dll into its memory and then call the
CreateFile() function.
• Inspecting the DLLs that a malware relies upon and the API
functions that it imports from the DLLs can give an idea
about the functionality and capability of malware and
what to anticipate during its execution.
• The file dependencies in Windows executables are stored
in the import table of the PE file structure.
Inspecting File Dependencies and Imports
37
Inspecting File Dependencies and Imports (Cont.)
38
In addition to determining the malware functionality, imports can help you detect
whether a malware sample is obfuscated. If you come across a malware with very
few imports, then it is a strong indication of a packed binary.
Inspecting File Dependencies and Imports (Cont.)
39
Using Python to Enumerate DLL Files and Imported Functions
https://github.com/erocarrera/pefile
40
Inspecting Exports
• Typically, a DLL exports functions (exports)
that are imported by the executable. A DLL
cannot run on its own and depends on a
host process for executing its code.
• An attacker often creates a DLL that exports
functions containing malicious functionality.
To run the malicious functions within the
DLL, it is somehow made to be loaded by a
process that calls these malicious functions.
• DLLs can also import functions from other
libraries (DLLs) to perform system
operations.
41
Examining PE Section Table And Sections
• The actual content of the PE file is divided into sections.
• The sections are immediately followed by the PE header.
• These sections represent either code or data and they have in-memory attributes such as
read/write.
• For example, a section with name .text indicates code and has an attribute of read-execute;
a section with name .data indicates global data and has an attribute of read-write.
42
Examining PE Section Table And Sections (Cont.)
43
Examining PE Section Table And Sections (Cont.)
• The section names do not contain common sections added by the compiler (such as
.text, .data, and so on) but contain section names UPX0 and UPX1
• Typically, raw-size and the virtual-size should be almost equal, but small differences
are normal due to section alignment.
• This is a strong indication of a packed binary. The reason for this discrepancy is that
when a packed binary is executed, the decompression routine of the packer will copy
decompressed data or instructions into the memory during runtime.
44
Examining the Compilation Timestamp
• The PE header contains information that specifies when the binary was
compiled.
• Examining this field can give an idea of when the malware was first created.
• This information can be useful in building a timeline of the attack campaign.
• It is also possible that an attacker modifies the timestamp to prevent an
analyst from knowing the actual timestamp.
45
Examining PE Resources
• The resources required by the executable file such
as icons, menu, dialog, and strings are stored in the
resource section (.rsrc) of an executable file.
• Often, attackers store information such as additional
binary, decoy documents, and configuration data in
the resource section, so examining the resource can
reveal valuable information about a binary.
• The resource section also contains version
information that can reveal information about the
origin, company name, program author details, and
copyright information.
46
Examining PE Resources (Cont.)
http://www.angusj.com/resourcehacker/
The malware uses the icon of Microsoft
Excel to give the appearance of an
excel sheet.
The executable also contains file
signature of D0 CF 11 E0 A1 B1 1A E1
which is the sequence of bytes for a
Microsoft Office document file.
The attackers stored a decoy excel
sheet in the resource section. Upon
execution, the malware is executed in
the background, and this decoy excel
sheet is displayed to the user as a
diversion.
47
• Comparing the suspect binary with
previously analyzed samples or the samples
stored in a public or private repository can
give:
 understanding of the malware family,
its characteristics, and the similarity
with the previously analyzed samples..
• Methods:
 Fuzzy Hashing
 Import Hash
 YARA
Comparing And Classifying The Malware
48
• This technique is useful in comparing a suspect binary with the samples in a
repository to identify the samples that are similar.
• This can help in identifying the samples that belong to the same malware family
or the same actor group.
• Cryptographic hashes are not helpful in determining the relationship between
the samples, whereas the fuzzy hashing technique
identifies the similarity between the samples
Classifying Malware Using Fuzzy Hashing
49
Classifying Malware Using Fuzzy Hashing (Cont.)
• ssdeep is a useful tool to generate the fuzzy hash for
a sample, and it also helps in determining percentage
similarity between the samples.
• From the output, out of the three samples, two
samples have 99% similarity, suggesting that these
two samples probably belong to the same malware
family.
https://ssdeep-project.github.io/ssdeep/
50
Classifying Malware Using Fuzzy Hashing (Cont.)
• You might have a directory containing many
malware samples. In that case, it is possible
to run ssdeep on directories and
subdirectories containing malware samples
using the recursive mode (-r) as shown here:
• You can also match a suspect binary with a list of file hashes.
In the following example, the ssdeep hashes of all the
binaries were redirected to a text file (all_hashes.txt), and
then the suspect binary (blab.exe) is matched with all the
hashes in the file:
51
Classifying Malware Using Fuzzy Hashing (Cont.)
52
Classifying Malware Using Import Hash
• Import hash (or imphash) is a technique in which
hash values are calculated based on the
library/imported function (API) names and their
order within the executable.
• If the files were compiled from the same source and
in the same manner, those files would tend to have
the same imphash value.
53
Classifying Malware Using Import Hash (Cont.)
• In this output, the samples have different
cryptographic hash values (MD5), but the impash of
these samples are identical; this indicates that they
were probably compiled from the same source and in
the same manner.
get_imphash.py
Files having the same imphash does not necessarily mean they are from the same threat group; you might
have to correlate information from various sources to classify your malware.
For example, it is possible that the malware samples were generated using a common builder kit.
54
Classifying Malware Using Section Hash
• Like import hashing, section hashing can also help in
identifying related samples.
When an executable is loaded in pestudio, it calculates the
MD5 of each section (.text,.data, .rdata, and so on.)
55
• Yet Another Recursive/Ridiculous Acronym (YARA) is a
tool aimed at helping malware researchers to identify
and classify malware samples.
• With YARA you can create descriptions of malware
families based on textual or binary patterns.
• These YARA rules consist of a set of strings and a
Boolean expression, which determines its logic.
• YARA is multi-platform, running on Windows, Linux and
Mac OS X, and can be used through its command-line
interface or from your own Python scripts with the
yara-python extension.
• The YARA rule consists of the following components:
 Rule identifier
 String Definition
 Condition Section
Classifying Malware Using YARA
http://virustotal.github.io/yara/
56
Applications of YARA
Example 1: Detects an executable file containing an
embedded Microsoft Office document in it
Example 2: Detecting Packers
57
Applications of YARA (Cont.)
• YARA can be used to detect patterns
in any file. This sample YARA rule
detects communication of different
variants of the Gh0stRAT malware:
• Running the preceding rule on a directory containing
network packet captures (pcaps), detected the GhostRAT
pattern:
58
59
60

Malware Static Analysis

  • 1.
    Hossein Yavari Feb. 16,2021 Malware Static Analysis 1
  • 2.
    What is StaticAnalysis? • Technique of analyzing the suspect file without executing it. • Extracting useful information from the suspect binary, • To help how to classify or analyze it and where to focus subsequent analysis efforts. 2
  • 3.
    What Do WeLearn? • Identifying the malware's target architecture • Fingerprinting the malware • Scanning the suspect binary with anti-virus engines • Extracting strings, functions, and metadata associated with the file • Identifying the obfuscation techniques used to thwart analysis • Classifying and comparing the malware samples 3
  • 4.
    Determining the FileType • These methods will help you identify the malware's target operating system and the architecture. • Windows, Linux, etc? • 32-bit/64-bit ? • Example: {.exe, .dll, .sys, .drv, .com, .ocx } are Windows executable files. • Most Windows-based malware are executable files ending with extensions such as .exe, .dll, .sys. • But relying on file extensions alone is not recommended!! 4
  • 5.
    File Signature • Attackersuse different tricks to hide their file by modifying the file extension and changing its appearance to trick users into executing it. • Instead of relying on file extension, File signature can be used to determine the file type. • A file signature is a unique sequence of bytes that is written to the file's header. 5
  • 6.
    Identifying File TypeUsing Manual Method • Using hex editors: • A hex editor is a tool that allows an examiner to inspect each byte of the file • Example: • HxD hex editor (https://mh-nexus.de/en/hxd/) 6
  • 7.
    Identifying File TypeUsing Manual Method (Cont.) 7
  • 8.
    Identifying File TypeUsing Manual Method (Cont.) 8
  • 9.
    Identifying File TypeUsing Manual Method (Cont.) #> xxd -g 1 <targetfile> | more 9
  • 10.
    Identifying File TypeUsing Tools #> file <targetfile> 10
  • 11.
    Identifying File TypeUsing Tools (Cont.) CFF Explorer : https://ntcore.com/?page_id=388 11
  • 12.
    Identifying File TypeUsing Python 12
  • 13.
    Identifying File Type– Hidden Extension 13
  • 14.
    Fingerprinting the Malware •Generating the cryptographic hash values for the suspect binary based on its file content. • Hashing algorithms such as MD5, SHA1 or SHA256 are considered the de facto standard for generating file hashes for the malware specimens. • Same malware sample can use different filenames, but the cryptographic hash calculated based on the file content will remain the same. • Cryptographic hash for your suspect file serves as a unique identifier. 14
  • 15.
    Fingerprinting the Malware(Cont.) • During dynamic analysis, when malware is executed, it can copy itself to a different location or drop another piece of malware. Having hash of the sample can help in identifying whether the newly dropped/copied sample is the same as the original sample or a different one. • File hash is frequently used as an indicator to share with other security researchers to help them identify the sample. • File hash can be used to determine whether the sample has been previously detected by searching online or searching the database of multi-Anti-virus scanning services. 15
  • 16.
  • 17.
    Generating Cryptographic HashUsing Tools (Cont.) 17
  • 18.
  • 19.
    Multiple Anti-Virus Scanning •Scanning the suspect binary with multiple anti- virus scanners helps in determining whether malicious code signatures exist for the suspect file. • Visiting the respective antivirus vendor websites or searching for the signature in search engines, you can yield further details about the suspect file 19
  • 20.
    Scanning the SuspectBinary with VirusTotal https://www.virustotal.com/gui/home/upload 20
  • 21.
    Scanning the SuspectBinary with VirusTotal (Cont.) 21
  • 22.
    Querying Hash ValuesUsing VirusTotal Public API 22
  • 23.
    Risk of UsingAnti-Virus Scanning • If a suspect binary does not get detected by the Anti-Virus scanning engines, it does not necessarily mean that the suspect binary is safe. • These anti-virus engines rely on signatures and heuristics to detect malicious files. The malware authors can easily modify their code and use obfuscation techniques to bypass these detections. • When you upload a binary to a public site, the binary you submit may be shared with third parties and vendors. The suspect binary may contain sensitive, personal, or proprietary information specific to your organization. • Most web-based anti-virus scanning services allow you to search their existing database of scanned files using cryptographic hash values (MD5, SHA1, or SHA256). 23
  • 24.
    Risk of UsingAnti-Virus Scanning • When you submit a binary to the online antivirus scanning engines, the scan results are stored in their database, and most of the scan data is publicly available • Attackers can use the search feature to query the hash of their sample to check whether their binary has been detected. • Detection of their sample may cause the attackers to change their tactics to avoid detection. 24
  • 25.
    Extracting Strings • Stringsare ASCII and Unicode-printable sequences of characters embedded within a file. • It can give clues about the program functionality and indicators associated with a suspect binary. • Strings extracted from the binary can contain references to file names, URLs, domain names, IP addresses, attack commands, registry keys, and so on. • Although strings do not give a clear picture of the purpose and capability of a file, they can give a hint about what malware can do. • For example, if a malware creates a file, the filename is stored as a string in the binary. Or, if a malware resolves a domain name controlled by the attacker, then the domain name is stored as a string. 25
  • 26.
    String Extraction UsingTools #> strings -a <filename> 26
  • 27.
    String Extraction UsingTools (Cont.) #> strings –a -el <filename> 27
  • 28.
    String Extraction UsingTools (Cont.) 28
  • 29.
    Decoding Obfuscated StringsUsing FLOSS • String obfuscation techniques is used to avoid detection. • FireEye Labs Obfuscated String Solver (FLOSS) is a tool designed to identify and extract obfuscated strings from malware automatically. • It can help you determine the strings that malware authors want to hide from string extraction tools. • FLOSS can also be used just like the strings utility to extract human-readable strings (ASCII and Unicode). 29
  • 30.
  • 31.
    Determining File Obfuscation •Obfuscation is used to protect the inner workings of the malware from security researchers, malware analysts, and reverse engineers. • These techniques make it difficult to detect/analyze the binary; extracting the strings from such binary results in very fewer strings, and most of the strings are obscured. • Packers and Cryptors programs use to obfuscate their file to evade detection from security products such as anti-virus and to thwart analysis. 31
  • 32.
    Packers and Cryptors •A Packer is a program that takes the executable as input, and it uses compression to obfuscate the executable's content. • This obfuscated content is then stored within the structure of a new executable file; the result is a new executable file (packed program) with obfuscated content on the disk. • Upon execution of the packed program, it executes a decompression routine, which extracts the original binary in memory during runtime and triggers the execution. 32
  • 33.
    Packers and Cryptors(Cont.) • A Cryptor is like a Packer, but instead of using compression, it uses encryption to obfuscate the executable's content, and the encrypted content is stored in the new executable file. • Upon execution of the encrypted program, it runs a decryption routine to extract the original binary in the memory and then triggers the execution. 33
  • 34.
  • 35.
    Detecting File ObfuscationUsing Exeinfo PE http://www.exeinfo.byethost18.com/?i=1 Loading the packed malware sample into Exeinfo PE shows that it is packed with UPX, and it also gives a hint on which command to use to decompress the obfuscated file; this can make your analysis much easier. 35
  • 36.
    PE Header • ThePE (Portable Executable) file is a series of structures and sub-components that contain the information required by the operating system to load it into memory such as where the executable needs to be loaded into memory, the address where the execution starts, the list of libraries/functions on which the application relies on, and the resources used by the binary. • Examining the PE header yields a wealth of information about the binary, and its functionalities. • You can get a clear understanding of the PE file format by loading a suspect file into PE analysis tools that allow you to examine and modify the PE structure and its sub- components (CFF Explorer, PE Internals​, PPEE(puppy), PEBrowse Professional ,…) 36
  • 37.
    • The functionsthat are executable and imported from other files (mostly DLLs) are called imported functions and provides interaction with file, registry, network, and so on. • For example, if a malware executable wants to create a file on disk, on Windows, it can use an API CreateFile(), which is exported in kernel32.dll. To call the API, it first has to load kernel32.dll into its memory and then call the CreateFile() function. • Inspecting the DLLs that a malware relies upon and the API functions that it imports from the DLLs can give an idea about the functionality and capability of malware and what to anticipate during its execution. • The file dependencies in Windows executables are stored in the import table of the PE file structure. Inspecting File Dependencies and Imports 37
  • 38.
    Inspecting File Dependenciesand Imports (Cont.) 38
  • 39.
    In addition todetermining the malware functionality, imports can help you detect whether a malware sample is obfuscated. If you come across a malware with very few imports, then it is a strong indication of a packed binary. Inspecting File Dependencies and Imports (Cont.) 39
  • 40.
    Using Python toEnumerate DLL Files and Imported Functions https://github.com/erocarrera/pefile 40
  • 41.
    Inspecting Exports • Typically,a DLL exports functions (exports) that are imported by the executable. A DLL cannot run on its own and depends on a host process for executing its code. • An attacker often creates a DLL that exports functions containing malicious functionality. To run the malicious functions within the DLL, it is somehow made to be loaded by a process that calls these malicious functions. • DLLs can also import functions from other libraries (DLLs) to perform system operations. 41
  • 42.
    Examining PE SectionTable And Sections • The actual content of the PE file is divided into sections. • The sections are immediately followed by the PE header. • These sections represent either code or data and they have in-memory attributes such as read/write. • For example, a section with name .text indicates code and has an attribute of read-execute; a section with name .data indicates global data and has an attribute of read-write. 42
  • 43.
    Examining PE SectionTable And Sections (Cont.) 43
  • 44.
    Examining PE SectionTable And Sections (Cont.) • The section names do not contain common sections added by the compiler (such as .text, .data, and so on) but contain section names UPX0 and UPX1 • Typically, raw-size and the virtual-size should be almost equal, but small differences are normal due to section alignment. • This is a strong indication of a packed binary. The reason for this discrepancy is that when a packed binary is executed, the decompression routine of the packer will copy decompressed data or instructions into the memory during runtime. 44
  • 45.
    Examining the CompilationTimestamp • The PE header contains information that specifies when the binary was compiled. • Examining this field can give an idea of when the malware was first created. • This information can be useful in building a timeline of the attack campaign. • It is also possible that an attacker modifies the timestamp to prevent an analyst from knowing the actual timestamp. 45
  • 46.
    Examining PE Resources •The resources required by the executable file such as icons, menu, dialog, and strings are stored in the resource section (.rsrc) of an executable file. • Often, attackers store information such as additional binary, decoy documents, and configuration data in the resource section, so examining the resource can reveal valuable information about a binary. • The resource section also contains version information that can reveal information about the origin, company name, program author details, and copyright information. 46
  • 47.
    Examining PE Resources(Cont.) http://www.angusj.com/resourcehacker/ The malware uses the icon of Microsoft Excel to give the appearance of an excel sheet. The executable also contains file signature of D0 CF 11 E0 A1 B1 1A E1 which is the sequence of bytes for a Microsoft Office document file. The attackers stored a decoy excel sheet in the resource section. Upon execution, the malware is executed in the background, and this decoy excel sheet is displayed to the user as a diversion. 47
  • 48.
    • Comparing thesuspect binary with previously analyzed samples or the samples stored in a public or private repository can give:  understanding of the malware family, its characteristics, and the similarity with the previously analyzed samples.. • Methods:  Fuzzy Hashing  Import Hash  YARA Comparing And Classifying The Malware 48
  • 49.
    • This techniqueis useful in comparing a suspect binary with the samples in a repository to identify the samples that are similar. • This can help in identifying the samples that belong to the same malware family or the same actor group. • Cryptographic hashes are not helpful in determining the relationship between the samples, whereas the fuzzy hashing technique identifies the similarity between the samples Classifying Malware Using Fuzzy Hashing 49
  • 50.
    Classifying Malware UsingFuzzy Hashing (Cont.) • ssdeep is a useful tool to generate the fuzzy hash for a sample, and it also helps in determining percentage similarity between the samples. • From the output, out of the three samples, two samples have 99% similarity, suggesting that these two samples probably belong to the same malware family. https://ssdeep-project.github.io/ssdeep/ 50
  • 51.
    Classifying Malware UsingFuzzy Hashing (Cont.) • You might have a directory containing many malware samples. In that case, it is possible to run ssdeep on directories and subdirectories containing malware samples using the recursive mode (-r) as shown here: • You can also match a suspect binary with a list of file hashes. In the following example, the ssdeep hashes of all the binaries were redirected to a text file (all_hashes.txt), and then the suspect binary (blab.exe) is matched with all the hashes in the file: 51
  • 52.
    Classifying Malware UsingFuzzy Hashing (Cont.) 52
  • 53.
    Classifying Malware UsingImport Hash • Import hash (or imphash) is a technique in which hash values are calculated based on the library/imported function (API) names and their order within the executable. • If the files were compiled from the same source and in the same manner, those files would tend to have the same imphash value. 53
  • 54.
    Classifying Malware UsingImport Hash (Cont.) • In this output, the samples have different cryptographic hash values (MD5), but the impash of these samples are identical; this indicates that they were probably compiled from the same source and in the same manner. get_imphash.py Files having the same imphash does not necessarily mean they are from the same threat group; you might have to correlate information from various sources to classify your malware. For example, it is possible that the malware samples were generated using a common builder kit. 54
  • 55.
    Classifying Malware UsingSection Hash • Like import hashing, section hashing can also help in identifying related samples. When an executable is loaded in pestudio, it calculates the MD5 of each section (.text,.data, .rdata, and so on.) 55
  • 56.
    • Yet AnotherRecursive/Ridiculous Acronym (YARA) is a tool aimed at helping malware researchers to identify and classify malware samples. • With YARA you can create descriptions of malware families based on textual or binary patterns. • These YARA rules consist of a set of strings and a Boolean expression, which determines its logic. • YARA is multi-platform, running on Windows, Linux and Mac OS X, and can be used through its command-line interface or from your own Python scripts with the yara-python extension. • The YARA rule consists of the following components:  Rule identifier  String Definition  Condition Section Classifying Malware Using YARA http://virustotal.github.io/yara/ 56
  • 57.
    Applications of YARA Example1: Detects an executable file containing an embedded Microsoft Office document in it Example 2: Detecting Packers 57
  • 58.
    Applications of YARA(Cont.) • YARA can be used to detect patterns in any file. This sample YARA rule detects communication of different variants of the Gh0stRAT malware: • Running the preceding rule on a directory containing network packet captures (pcaps), detected the GhostRAT pattern: 58
  • 59.
  • 60.