Web Application Mapping
INFR 4662U – Winter 2020
                                                                                                Garrett Hayes
Excerpts and concepts taken from the Web Application Hacker’s Handbook 2nd Edition   License: Creative Commons
Stuttard & Pinto, Wiley Press
                                     2
E n u m e ra t i n g C o n t e n t
                                                    3
Enumeration Basics
§ Enumeration refers to identifying the set of
  resources and functionality that’s part of a
  web application
    § This includes pages, JS files, application
      logs, external resources, etc.
§ Basic enumeration can be done by simply
  visiting the web application and exploring
  how it works
§ Other automated and systematic approaches
  exist to sus out functionality occurring behind
  the scenes
                                                      4
What is Spidering?
§ Spidering refers to the use of automated
  tools that identify and recursively follow links
  in a web application to collect information
  about its structure
    § Content without direct links can be
      found using brute-force techniques that
      look for common/predictable content
      and page names
    § Effective spidering utilities will also parse
      JS and forms to identify backend
      functionality like APIs, WebSockets, etc.
                                                  5
Automated Spidering
§ Automated spidering tools can miss whole
  areas of an application due to:
    § JS being used to render links and drop-
      down menus not visible to the utility
    § Form submission endpoints not being
      seen due to failed automatic form filling
    § AJAX-rendered pages may not show until
      an action is completed by a user
      (e.g. logging in)
                                                  6
Automated Spidering
§ Automated spidering tools can miss whole
  areas of an application due to:
    § Random values in the URL (e.g. expiry
      times) may cause the application to
      spider forever
    § Some content may not be accessible by
      authenticated users
    § Embedded objects like Java applets are
      difficult to spider and may contain links
      or consume other backend assets
                                                 7
Enumeration: robots.txt
§ In some cases, a webmaster may not want
  automated spidering tools (like a GoogleBot)
  to cache or crawl specific pages
    § To avoid this, administrators create a
      robots.txt file in the web root that
      identifies all pages that shouldn’t be
      mapped
    § This file often contains sensitive
      endpoints and directories not intended
      to show up on Google, of which are very
      interesting to an attacker
                                                  8
Manual/Directed Spidering
§ Since a variety of situations cause automated
  spidering tools to fail, some pentesters will
  manually explore a web application while
  using an intercepting proxy to automatically
  build a map of the site
    § For example, one might use BurpSuite to
      automatically index all pages and
      resources found while browsing
    § Two common intercepting proxies are
      BurpSuite and WebScarab
                                                9
Manual/Directed Spidering
§ Manual spidering is often superior to
  automated spidering for many reasons,
  including:
    § More effective identification and
      following of navigation controls
    § Avoiding application actions that can
      break a site (for example, calling a
      backup script or backend functionality)
    § Identifying pages & resources only
      available to logged-in users
                            10
Spidering Tool: BurpSuite
                                                     11
Enumerating Hidden Content
§ Some web application functionality is not made
  visible to users through links or buttons
§ Examples:
    § A form submission triggers a backend call to
      another PHP file
    § A script called backup.php zips up the
      contents of a web application
    § An automation script called test.php adds a
      demo user to a web app
    § Some web app functionality may not be
      visible to all users
                                                     12
Enumerating Hidden Content
§ Common hidden content I’ve seen in many
  pentests include:
    § Backup files or code files with extensions
      like index.php.bak
    § Old versions of files/code that can still be
      called (e.g. home2.php may imply
      home1.php exists)
    § Exposed configuration files
    § Hidden directories used for testing/backups
      that have directory indexing enabled
    § Exposed log files
                                                      13
Brute-Force Enumeration
§ In order to identify backend content not directly
  visible to users, the use of automated brute
  forcing utilities is paramount
    § I recommend gobuster, but there is also
      a GUI version called DirBuster that ships
      with Kali
§ Brute-force utilities require three inputs:
    1. A good wordlist containing common
       directory and file names
    2. One or more known file extensions likely to
       be used by the web app (e.g. .php)
    3. A starting point ( / , for example )
                                                                                                   14
                      GoBuster Brute-Force Attempt
ubuntu@security:~$ ./go/bin/gobuster dir -w ~/Wordlists/common.txt -s 200 -u http://xxxxxxxx.com
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url:       http:// xxxxxxxx.com
[+] Threads:      10
[+] Wordlist:     /home/ubuntu/Wordlists/common.txt
[+] Status codes: 200
[+] User Agent: gobuster/3.0.1
[+] Add Slash: true
[+] Timeout:       10s
===============================================================
2020/01/02 19:44:08 Starting gobuster
===============================================================
/backup/ (Status: 200)
/css/ (Status: 200)
/fonts/ (Status: 200)
/highslide/ (Status: 200)
/icons/ (Status: 200)
/images/ (Status: 200)
/js/ (Status: 200)
                      15
Brute-Force Results
                                                            16
Brute-Force Enumeration
§ When brute-forcing an application, each request will
  return a status code
§ Some common “gotchas” for status codes include:
     § 302 often means a resource exists but you must be
       logged in to access it
     § 401 & 403 means the resource exists but is not
       accessible by any user
     § A 200 code for a page that would never exist (e.g.
       /dassdsdads.php) indicates a redirect is occurring
     § A 400 code indicates you’re using an incorrect
       extension or incorrectly formatted RESTful URL
                                                   17
Brute-Force Wordlists
§ Most web applications use common page
  names and endpoint URLs, allowing us to
  generate effective wordlists by crawling the
  web
    § SecLists on GitHub has a lot of great
      wordlists, including RobotsDisallowed-
      Top1000.txt and common.txt
§ Don’t forget that a lot can vary in a web app.
  You may need to:
    § Use a trailing slash when brute-forcing
      directories
    § Add a specific file extension to requests
    § Filter out non 200/300 status codes
                                                        18
Inferring Web Content
§ Considering the structured nature of web apps, it’s
  common to see predictable page names or RESTful
  resource URLs when exploring or spidering
     § For example:
        https://example.com/users/user/1
        May infer the following pages exist:
        https://example.com/users/user/2
        https://example.com/users/
        https://example.com/admins/
        https://example.com/admins/user/1
        https://example.com/admins/admin/1
                                                         19
Inferring File Extensions
§ Although a web app may consistently use a single
  file extension, like .php for example, it’s possible
  that other file extensions exist and are used for
  backups, alternative versions of files, or older
  versions of files
§ It makes sense to use a good wordlist and append
  the following extensions when brute-forcing files:
     §   .old          §   .tar        § ~1
     §   .bak          §   .tar.gz     § .tmp
     §   .backup       §   .zip        § .temp
     §   .sql          §   .src
     §   .txt          §   .php5
                                                       20
Server Misconfigurations
§ Even if a web application is built securely, it is
  possible that the underlying webserver is
  misconfigured and leaking sensitive
  information
§ Webservers can leak resources like:
     § Whole directory contents if directory
       indexing is enabled
     § Users on a system, especially if user
       directories are enabled
                                      21
Directory Indexing Misconfiguration
                                                                                           22
              User Directories Misconfiguration
Google Dork: inurl:"/~john" intext:"index of"
                                                Note: when user directories are enabled
                                                in Apache, users on the system that have
                                                a public_html directory in their home
                                                path will automatically have that
                                                directory make public at the location
                                                /~username
                                                What might our next steps be to
                                                identify additional users on the system?
                                                         23
Hidden Parameters
§ Webmasters may use custom or hidden parameters
  in GET or POST requests to toggle the visibility or
  functionality of a web app
     § For example, the following URLs may result in a
       response with different content and lengths:
        https://example.com/index.php
        https://example.com/index.php?debug=1
§ A brute-force tool can be used to find hidden
  parameters using:
    § Common parameter names like test, debug,
       bypass, source, etc.
    § Common parameter values like 0, 1, true, false,
       null
                         24
Discovering User Input
                                                          25
Analyzing User Input
§ In preparation for future exploitation attempts, its
  crucial to identify all user input fields and actions
  that can be submitted to the web application
§ User input may be present in:
     § URLs using standard GET request parameters
     § RESTful URLs between slashes
     § Cookies
     § HTTP headers
     § Out-of-band channels
                                                26
User Input: URLs
§ Standard URLS that include GET parameters
  take user input or input that directs the
  functionality of the web application
§ Typical URL parameters look like:
   /search.php?searchTerm=data&results=10
§ Some abnormal URL parameter styles do
  exist, such as:
   /process/search;searchTerm=data
   /process/search?searchTerm=data$results=10
   /process/searchTerm=data/search
   /process/search?searchTerm=data:data2
                                                             27
User Input: RESTful URLs
§ RESTful URLs do not use standard GET parameters;
  rather, data is provided inline in the URL between
  slashes
§ Typical RESTful URL parameters look like:
   /search/data
§ Other alternative forms exist, such as:
   /search/searchTerm/data
   /search/searchTerm/data/
   /search/data/10
   /search/data/data2/10.json
§ In the last case, output data is requested in JSON
  format – it may also be possible to ask for .txt or .xml
                                                      28
User Input: Cookies
§ Cookies set by the web application may be used to
  identify a user or store data temporarily for a
  session
     § Cookie values may be looked up in a
       database or may be used to load specific
       resources
§ For example, a cookie can be used to rebuild a
  shopping cart:
   Cookie: cart=item676&cart=item888&discount=10
§ Or can be used to identify a user:
   Cookie: username=joe.blow&authenticated=1
                                                                     29
User Input: HTTP Headers
§ HTTP headers are automatically generated by client
  browsers, but may be used by a web application
  when directing functionality or enforcing access
  control mechanisms
     §   The host header, for example, indicates to the
         webserver which site the request is destined for
     §   The user agent header indicates the kind of client
         accessing the site (e.g. Chrome vs. GoogleBot)
     §   Access control headers may provide session strings or
         other client-identifying data that is passed to a backend
         database or system
     §   The X-Forwarded-For header used by load balancers
         can be manipulated to make requests look like they’re
         coming from the webserver
                                                           30
User Input: OOB
§ Out-of-band (OOB) functionality refers to any
  code, scripts, automation tools, or external
  services used to facilitate the operations of a
  web application
     § These include external resources such as: web
       forms (Google forms), SMTP services like
       Mailgun, fileservers, etc.
§ OOB resources can be potentially manipulated to
  modify input to a web application – especially if it’s
  an API
     § For example, web services may use a provider
       like MailGun to automatically receive password
       reset requests via email
                                       31
S e r v e r- S i d e A n a l y s i s
                                                  32
Technique: Banner Grabbing
§ Used to glean information about computer
  systems on a network and the services
  running on its open ports
§ Banner grabbing helps identify the version of
  software running on a remote host
§ Usually performed on: HTTP, FTP, and SMTP
§ Tools commonly used:
    § Curl, telnet, Nmap, and Netcat
                                              33
                  Banner Grabbing Example
Request:
curl -I https://ontariotechu.ca
Result:
HTTP/1.1 200 OK
Date: Mon, 13 Jan 2020 20:18:25 GMT
Server: Apache/2.4.18 (Ubuntu)
Strict-Transport-Security: max-age=2600000;
Vary: Host
Content-Type: text/html; charset=UTF-8
                                                      34
Analyzing File Extensions
§ File extensions are the simplest way to identify
  the underlying technology being used to render
  pages
    § Keep in mind that file extensions are
        arbitrary and may be modified or removed
        to evade dissection
§ Common extensions include:
   § .php & .php5 for PHP applications
   § .jsp for Java server pages
   § .pl for Perl CGIs or pages
   § .py for Python CGIs or pages
   § .dll for compiled CGIs or pages (C, C++, etc.)
   § .d2w for WebSphere
                                                            35
Analyzing Error Messages
§ The simplest way to determine the underlying
  framework or webserver being used is to trigger a
  fault in the system that causes an error page to show
     § For example, browsing to /sadklhadlkas will
       likely causes a 404, of which may show the
       webserver version
§ Manipulating GET parameters may cause SQL or
  other application errors, ultimately leaking additional
  information
§ Examples:
        https://example.com?search=’
        https://example.com/users?id=-1000
                                                    36
Analyzing Directory Names
§ Predictable and standard directory naming
  conventions may indicate specific technologies
  are being used
    § For example, Java servlets are often served
      at web paths like /server/name
§ A few other modern and common cases
  include:
    § /rails/ for ruby-on-rails applications
    § /pls/ for Oracle applications and SQL
      gateways
                                                     37
Analyzing Session Tokens
§ Certain session token names (present in cookies)
  may indicate specific web technologies are being
  used by the application:
    § Java uses JSESSIONID
    § PHP uses PHPSESSID
    § The IIS webserver uses ASPSESSIONID
         § Whereas ASP.Net uses
           ASP.NET_SessionID
    § Django uses a more generic session
                                                                            38
                                                      Analysis Example #1
               https://wahh-app.com/calendar.jsp?name=new%20applicants
               &isExpired=0&startDate=22%2F09%2F2010
               &endDate=22%2F03%2F2011&OrderBy=name
Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
                                                                            39
                                                      Analysis Example #2
               https://wahh-app.com/workbench.aspx?template=NewBranch.tpl
               &loc= /default&ver=2.31&edit=false
Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
                                                                            40
                                                      Analysis Example #3
               POST /feedback.php HTTP/1.1
               Host: wahh-app.com
               Content-Length: 389
               from=user@wahh-mail.com&to=helpdesk@wahh-app.com&subject=
               Problem+logging+in&message=Please+help...
Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
                                41
Let’s break!
   S e e Yo u N e x t T i m e