splunk fundamentals
chap1 - what is machine data
buttercup games - international with tons of machine data from web servers, sale
servers, badge readers, security appliances, voicemail
-splunk takes a bunch of data, adds structure to unstructured data
-not jsut app issues: security, user behavior, sales, hardware monitor
-translates a huge amount of info
-machine data makes up for than 90% of data accumulated by organizations
chap2 - what is splunk?
index data, search & investigate, add knowledge, monitor and alert, report &
analyze
-indexer: factory takes raw materials (data), determines how to process it, labels
with sourcetype
events are stored in the index
-searcH: values across multiple data sources, run statistics using splunk search
-knowledge objects: classify, enrich, normalize
-monitor: can identify issues before impact, create alerts for specific conditions,
automatically respond
-reports: dashboards, visualizations
indexers, search heads, forwarders
indexers: store results in indexes as events, organizes for search, only needs to
open efficiently
search head: use splunk search language, handle search requests for indexes,
consolidate/enrich
-dashboards, reports, visualizations
forwarder: splunk enterprise instances, consume data, forward to indexes
-minimal resources, usually resides on machine (such as web server)
deploying/scaling > single instance or full distributed
-single: input/parsing/indexing/searching
good for proof of concept, personal use, learning, small environments
-multiple search heads, indexers, which can be clustered for always availability +
searchable
search requests are processed by the indexers
search strings are sent from the search head
clustering is NOT part of a single instance deployment
chap3 - installing splunk single instance deployment
linux: get software from splunk.com > free splunk > free download > linux >
download deb/tgz/rpm
-can also use wget
-should not be done as root user
-sudo tar xvzf splunk-6etcetcetc -C /opt (untars it in /opt directory)
-cd /opt/splunk/bin
./splunk start | stop | restart | help
./splunk start --accept-license
windows: gui or cmd > double click, accept license > customize|install
-change install location > local system|domain account
OSX: dev/testing usually > free splunk > Mac OS (tgz/dmg disc image)
-cd /Applications/Splunk/bin . sudo ./splunk start
splunk cloud: created by splunk, removes infrastructure requirements, 5GB per day
15 days
-30 day free trial, view my instance, accept license
splunk apps and roles > admin changeme
-app: preconfigured environment, built to solve specific use cases, defined by user
w/ admin role
-roles: determine what user sees/interacts with
-admin, power, user
admin: can install apps, create knowledge objects
power: can create and share knowledge objects, do searches
user: only see their own KOs, and those shared with
two apps: Home app (to launch/manage splunk apps), admins can add apps
Search & Reporting: done by power user, splunkbase contains hundreds more
admin vet0pr00f
launch and manage apps from home app: true
default username/pw for newly installed splunk: admin/changeme
Roles define what users can do
Home/Search & Reporting are what ship w/ splunk enterprise
chap 4 - getting data in, types of data input
admin: admin users get data in there, users typically dont but its good to know how
-add data: upload (gets indexed once), monitor (monitor files/directories, scripts,
windows specific data (event logs), forward (receive data from external forwarders,
installed on remote machines, forwarded to indexers)
-used as main source of data input
upload: not useful in production, but useful for dev + testing
-customer survey data from a focus group? upload > next
-sourcetype used for categorizing data (sourcetype csv detected, can select
manually)
-save as sourcetype, or change name + add description, category, app context
(instrumentation/monitoring/search&reporting, which app is it applied to)
-system is a context that is all systems
-host field: machine from where it originates
-indexes: directories where data is stored
web data index, main index, security index
-breaking them apart returns relevant events, allow access by user role (who sees
what data)
-some data requires different time intervals, retention policies per index
monitor: files or ports on an indexer, similar to upload but choose source to
monitor
-files/directories/http events/tcp+udp/scripts
-apache log = files and directories
-continuously monitor or index once (see events as they happen or see snapshot)
-can whitelist/blacklist certain files in the directory
-choose hostname, system, app context
forwarder: not in scope, but minimal resources installed on many host machines
quiz chap 4:
splunk uses sourcetypes to categorize data
uploaded files get indexed ONCE
in production, FORWARDER data is the source of data input
chap 5 - basic searching, create knowledge objects, reports, dashboards, etc
-7 unique items: splunk bar, switch between apps ,edit account, settings, monitor
search jobs, help
-app bar: navigate the app
-search bar: to run searches
-time range picker: specific events over period
-how to search panel: docs/tutorial
-what to search: summary of what is indexed (data summary, host/source/sourcetype)
-source: file or directory path, port, script
-host: hostname (ip or fqdn)
search history: view/re-run past searches
limit by time is best practice, search becomes a job, contains Save as, search
results, timeline
-save search to Knowledge Objects
patterns tab: are there patterns?
statistics/visuals: if not, pivot/quick reports/search commands
-if yes, they are called TRANSFORMING commands
pause stop share export print, jobs remain active for 10 minutes
-shared search jobs last 7 days, readable to all (who are shared with)
-export: in Raw, csv, xml, json
3 search modes: fast/smart/verbose
-fast: cuts down on field info, field discover is disabled, fields required
-verbose: as much field/event as possible
-smart: toggles behavior based on search
timeline: click+drag = select a time range, zoom+in-out uses original search job
for zoom in
-zoom out requires job to be re-run
events returned in reverse chronological, time is normalized (timestamp shows
timezone of user account)
-host/source/sourcetype is default selected fields
-add to search, exclude from search, start a new search by mouseover
wildcards: fail* (failed, failure, failed)
booleans: AND OR NOT - failed NOT password, all events with failed but NOT password
-order of operation: NOT > OR > AND, parenthesis can be used
-failed NOT (success OR password)
-escape is used for quotes if you need to ACTUALLY search for quotes
failed \"chrisv4\" (would find "chrisv4")
opening a search from job save does NOT re-execute it
chap 5 quiz:
search is sent to splunk, it becomes a 'search job'
NOT, then OR, then AND (AND is implied and therefore others take precedence)
events are returned in REVERSE chronological order (newest first)
shared search jobs remain active for SEVEN days by default (SHARED search jobs)
chap 6: using fields > fields sidebar, fields extracted at search
-selected fields, interesting fields
-selected: host/source/sourcetype
-interesting: values in at LEAST 20% of events
a denotes string, # denotes numeral
-values/count/%, can add to selected fields, quick reports vary by value
-clicking on one makes a TRANSFORMING search, statistical data
-persists for subsequent searches
-can use 'all fields', 'more searches'
more effective searches with fields
sourcetype=linux_secure
host!=mail*
chap 6 quiz:
1301 events
if you add a search from search history, the default time of 24 hours is specified,
not time of original search
-nor is search executed
wilcards CAN be used with field searches
a dest 4 (string search containing 4 values/results)
field values are NOT case sensitive
field names ARE case sensitive
so basically the difference is, the results are not sensitive to case, but the
search for those results is
chap 8: SPL fundamentals
search language, built of 5 components
-search terms, commands (charts), functions (how we want to use the charts),
arguments, clauses
stats list(product_name) as "Games Sold"
| pipe passes into the next component
booleans show in orange, commands are in blue, arguments are green, functions are
purple
-parenthesis highlight and match, troubleshoot searches
-ctrl key + \ = moves pipe to new line
search stores results in memory w/ time limiter, makes 'spreadsheet' smaller
-pipes in search commands shorten this search
-once removed, no longer available for subsequent searches
Fields command: include/exclude specific fields
sourcetype=access_combined | fields status clientip
sourcetype=access_combined | fields - status clientip (removes these from the list)
raw and time are always included
-using -_raw removes these fields
exclusion occurs AFTER extraction, does not improve performance
Table command: specified fields are contained in tabulated format
| table JSESSIONID, product_name, price = creates a spreadsheet in the order you
specified
Rename command: rename fields, JSESSIONID for example
- | rename JSESSIONID as "User Session" product_name AS "Name of Product"
-new field names would be used further down the pipeline
for example, | fields - JSESSIONID would no longer function because it is now "User
Session"
-must be wrapped in quotes
Dedup - remove duplicate events
sourcetype=history* Address_Description="San Francisco" | dedup First_Name Last
Name | Table Username
Sort - display in ascending or descending order
| Sort Vendor product_name (sorts by vendor then product name)
sort + sale_price for ASCENDING
sort - sale_price Vendor in DESCENDING order (shows largest value first)
space in the +/- makes it affect ALL fields, but if you remove the space, only
affects that field
sort -sale_price +Vendor
sort -sale_price limit=20 (only shows 20 events)
chap 8 quiz:
excluding fields does NOT benefit performance, because they must be searched then
discluded
for table User IP, quotation marks is missing -> must be table "User IP"
| fields - status is the way to remove the status field, not using NOT status..
this is a field type not a search term
chap 9 - transforming commands - data table for statistical for visualizations
-top: most common values in a result field set (sourcetype=vendor_sales | top
Vendor) > count + %, limit=0
--countfield/percentfield (change column name)
-sourcetype=vendor_sales | top Vendor limit=5 showperc=False countfield="Number of
Sales"
useother=true (show results not in limit)
rare command, shows least common values
-sourcetype=vendor_sales | rare Vendor limit=3
stats command: count, distinct count, sum, average, list
-count: # of events
-dc (distinct count) unique values for a field
-avg: average of numerical values
-list: all values of a given field
sourcetype=vendor_sales | stats count as "Total Sells By Vendors" by product_name
| stats count(action) AS "Action Events"
sourcetype=vendor_sales | stats distinct_count(product_name) as "Number of games
for sale" by sale_price
sourcetype=vendor_sales | stats sum(price) as "Gross Sales" by Product_Name
| stats count as "Units Sold" sum(price) as "Gross Sales"
sourcetype=vendor_sales | stats avg(sale_price) as "Avg Selling Price (missing or
not valid will not be added)
sourcetype=asset_list | stats list (Asset) as "company assets" by Employee
--this would group all Assets owned per employee
--stats value function works like list, but only unique values
sourcetype=cisco_wsa_squid | stats values(s_hostname) by cs_username
-see all sites a user has been on, showing number of unique sites
chap 9 quiz:
sourcetype=vendor* | stats count _AS_ "Units Sold" (this renames count of vendors
to "Units Sold"
most common values = top
avg = average
Addtotals = NOT a stats function
top/rare have TEN results by default
chap 10: reports & dashboards - can save/share searches w/ reports: save as: Report
w/ title
-yes/no on time range picker
-report shows a 'fresh' set of results, can change range if yes
-reports tab of application menu > open in search
-edit menu: description/permission/schedule/clone/embed/delete
-poweruser: can allow read/write
-run as: owner or user (user = only data user has access to)
-accelerated: smaller searches
-save as: visualization, text, both
visualizations: statistical values can be viewed as a chart
-ip > top values, pie charts
-numbers/time/location
-map visualizations are interactive: drill-down, can save as report/dashboard panel
dashboards: collection of reports
count of products sold by product name
-select a visualization that fits data, customize w/ format/chart
-save as dashboard panel > new > column chart in visualization
vendor sales by product over 7 days, save as dashboard panel to SALES panel
previously created
-keep makin visualizations to same dashboard
edit: can click and drag panels using edit bars
-can change visualization/format/drilldown
-add panels in edit mode, new, clone
create a time range picker, but then tie each panel to the time range picker, it
will update all panels tied to
-dashboards menu is a location of where you can access
chap 10 quiz:
admin/user/power can ALL create reports
dashboards: a collection of reports
time range picker can be included in a report: true
charts can be: numbers, time, location
if search returns statistical values, can be viewed as a chart
chap 11: pivots and datasets > pivot allows users to design reports w/out searches
-data models: KOs that drive pivots, created by admins/power users
-basically an easy way to modify reports
-settings > data models > pivot
-count, tools to filter, visualizations, all time
-can create filters based on field, can use IS/ISNOT/CONTAINS
-can use sidebar to visualize, can save to add to report/dashboard
-no data model? instant pivot
instant pivot: all fields, selected fields, or fields with a selected coverage%
datasets: allow users to gain access to small sets of data
-field names are column headers, summarize.
-explore: visualize w/ pivot
using instant pivot, no data model is used, it is generated
-splunkbase > datasets addon for rapid building of data models
chap 11 quiz:
pivots can be saved as reports/dashboards
data models are KOs that provide data structure for a pivot
data models are made up of datasets
instant pivot is displayed when using a NON-TRANSFORMING search (basically helps
you get there)
module 12: lookups, adding fields/values not included in the index
-csv/scripts/geospatial data
-product id with names for example, categorized as a dataset
-define a LOOKUP TABLE, then define lookup, can be configured automatically
sourcetype=access_combined status=xxx
csv with code,description
100,Continue
200,OK
300,etc
create a lookup table > settings > lookups > add new > choose a dest app (only
avail to that app) > find file > give name for file on server
-can move to another app, delete
-verify it is working using | inputlookup filename.csv
now define lookup > settings > lookups > add new > dest app > name > file uploaded
to server
-time-based: if this field involves time, case sensitivity
-batch index query imrpoves perf
now that lookup table is made and lookup is defined
- | lookup http_status code as status
output can be used
OUTPUT code as "HTTP Code",description as "HTTP Description"
if http code already exists, will be overwritten unless you use OUTPUTNEW
automatic lookup: settings > lookups > automatic lookups > dest app > choose name >
choose lookup > sourcetype
-code = status
-lookup output: code=Code, description=Description
-now searches can automatically use those values rather than having to use OUTPUT
additional lookups: populate lookup w/ search results, external script, geospatial
chap 12 quiz on lookups:
a lookup is characterized as a DATASET
first row in csv lookups represents FIELD names
inputlookup http_status.csv to check that a lookup was added
external data used by a lookup can come from: geospatial data, csv files, scripts
outputnew is used to create new fields rather than overwrite existing
chap 13: scheduled reports + alerts
-reports: runs on a weekly schedule, sending reports via email
create a search > save as report > time range picker > schedule > enable >
frequency
-time range is relative to the schedule
-schedule priority: default > higher > highest
-window: report will be delayed as long as it falls within window, only if you're
okay w/ delay
-send email/run script/write to csv lookup
manage scheduled reports > settings > searches+reports+alerts
-description, search, and time range
-disable, clone, embed, disable, delete
-edit permissions: who sees what results
search and reporting options: owner/app/all apps, can also set read/write for the
report
-run as: owner or user, access of user
-embedded must be SCHEDULED before it will work
splunk alerts: once reports complete and criteria are met
-list in interface, log events, send emails, scripts, webhooks, customer alert
-status=5* > save as alert > web500error > private (only you)
-scheduled (how often search is run)/realtime (search runs continuously, lots of
overhead)
-trigger condition: per result, # of results, # of hosts, # of sources, or custom
conditions
-trigger # of results > 1 in 60 minutes
-once (once within scheduled time) or for each result (every time condition is met)
-throttle: can suppress further
log event: sent to indexer, run a script: shell or bash, send email: very powerful,
webhook (create ticket, POST API)
activity: triggered alerts > view results/edit search
-also alerts menu or settings: searches reports alerts
chap13 quiz:
alerts can run scripts
alerts CAN be shared to all apps
realtime alerts run continuously in background
alert actions are triggered by SAVED SEARCHES
final quiz:
machine data makes up for 90%
forwarders are primary way data is supplied for indexing
search requests are processed by the indexers
search strings are sent from the SEARCH HEAD
3 roles: power/user/admin
true: you can launch and manage apps from the home app
sourcetype: where to break event, timestamps, field value pairs
-sourcetypes are used to categorize the type of data being indexed
events are not always returned in chronological order
AND/NOT/OR = booleans
shared search jobs remain active for 7 days
field values: NOT case sensitive
wildcards CAN be used with field searches
@ is used to round down time
separate indexes: faster searches, multiple retention policies, limit access
exclusion does NOT increase performance
dedup removes results with duplicate field values
Addtotals is NOT a STATS function
a time range picker CAN be included in a report
power/admin can create data models (used for datasets in pivots, they are basically
KOs)
pivots CAN be saved as dashboard panels
a lookup is categorized as a dataset
outputnew is used to not overwrite existing fields in a lookup
alerts can be shared to all apps
admin/changeme is default username/password for a newly installed splunk instance
default apps: home app/search & reporting
forwarder is used as source of data in production envs
time stamp you see in events is based on time zone in your USER account
zooming in does not run a new search on event time line
transforming = create statistics and visualizations
field NAMES are CASE SENSITIVE
top or rare = TEN results by default
users CAN create reports (like archer!)
charts: numbers/time/location
|inputlookup is used to view lookups in a csv file