Network Monitoring
Network Monitoring
NETWORK MONITORING
1. Introduction
Network monitoring is the use of logging and analysis tools to accurately
determine trafc fows, utilisation, and other performance indicators on a
network. Good monitoring tools give you both hard numbers and
graphical aggregate representations of the state of the network. Tis helps
you to visualise precisely what is happening, so you know where
adjustments may be needed.
Tese tools can help you answer critical questions, such as:
For this book, where we've found that a certain tool is no longer being
actively maintained between the previous edition and this one, we have left
it out of this text. Te tools discussed in this section are all being currently
developed as of this writing, but it is left as an exercise to the reader to
determine if a particular tool is suitable for their situation.
With frequent complaints and very low computer usage, the Board is
questioning the need for so much network hardware.
Te Board also wants evidence that the bandwidth they are paying for is
actually being used.
As the network administrator, you are on the receiving end of these
complaints.
2. Network monitoring example 3
How can you diagnose the sudden drop in network and computer
performance and also justify the network hardware and bandwidth costs?
Assume that all of the switches support the Simple Network Management
Protocol (SNMP). SNMP is an application-layer protocol designed to
facilitate the exchange of management information between network
devices.
By assigning an IP address to each switch, you are able to monitor all the
interfaces on that switch, observing the entire network from a single point.
Tis is much easier than enabling SNMP on all computers in a network.
Internet bandwidth costs are justified by showing actual usage, and whether
that usage agrees with your ISP's bandwidth charges.
Future capacity needs are estimated by watching usage trends and
predicting likely growth patterns. Intruders from the Internet are detected
and filtered before they can cause problems.
Monitoring this trafc is easily done with the use of MRTG on an SNMP
enabled device, such as a router. If your router does not support SNMP,
then you can add a switch between your router and your ISP connection,
and monitor the port trafc just as you would with an internal LAN.
Tis is also an excellent time to review your operational policy with the
Board, and discuss ways to bring actual usage in line with that policy.
Later in the week, you receive an emergency phone call in the evening.
Apparently, no one in the lab can browse the web or send email. You rush
to the lab and hastily reboot the proxy server, with no results. Browsing and
email are still broken. You then reboot the router, but there is still no
success. You continue eliminating the possible fault areas one by one until
you realise that the network switch is off - a loose power cable is to blame.
After applying power, the network comes to life again.
How can you troubleshoot such an outage without such time consuming
trial and error? Is it possible to be notified of outages as they occur, rather
than waiting for a user to complain?
With good monitoring tools in place, you will be able to justify the cost of
equipment and bandwidth by effectively demonstrating how it is being used
by the organisation.
You are notified automatically when problems arise, and you have historical
statistics of how the network devices are performing. You can check the
current performance against this history to find unusual behaviour, and
head off problems before they become critical. When problems do come up,
it is simple to determine the source and nature of the problem. Your job is
easier, the Board is satisfied, and your users are much happier.
Tis depends largely on exactly what you want to monitor. If you are
attempting to account for all services accessed per MAC address, this will
consume considerably more resources than simply measuring network fows
on a switch port. But for the majority of installations, a single dedicated
monitoring machine is usually enough.While consolidating monitoring
services to a single machine will streamline administration and upgrades, it
can also ensure better ongoing monitoring.For example, if you install
monitoring services on a web server, and that web server develops problems,
then your network may not be monitored until the problem is resolved. To
a network administrator, the data collected about network performance is
nearly as important as the network itself. Your monitoring should be robust
and protected from service outages as well as possible. Without network
statistics, you are effectively blind to problems with the network.
Figure NM 2: Polling the edge router can show you the overall network
utilisation, but you cannot break the data down further into machines,
services, and users.
4. Monitoring your network 9
For more detail, the dedicated monitoring server must have access to
everything that needs to be watched.
Typically, this means it must have access to the entire network.
To monitor a WAN connection, such as the Internet link to your ISP, the
monitoring server must be able to see the trafc passing through the edge
router. To monitor a LAN, the monitoring server is typically connected to
a monitor port on the switch. If multiple switches are used in an
installation, the monitoring server may need a connection to all of them.
Tat connection can either be a physical cable, or if your network
switches support it, a VLAN specifically configured for monitoring trafc.
Figure NM 3: Use the monitor port on your switch to observe traffic crossing
all of the network ports.
Figure NM 5: If your switch does not provide monitor port functionality, you
can insert a network hub between your Internet router and the LAN, and
connect the monitoring server to the hub.
Once your monitoring server is in place, you are ready to start collecting
data.
What to monitor
It is possible to plot just about any network event and watch its value on a
graph over time.
Since every network is slightly different, you will have to decide what
information is important in order to gauge the performance of your
network.
Here are some important indicators that many network administrators
will typically track.
12 16. NETWORK MONITORING
Wireless statistics
• Received signal and noise from all backbone nodes
• Number of associated stations
• Detected adjacent networks and channels
• Excessive retransmissions
• Radio data rate, if using automatic rate scaling
Switch statistics
• Bandwidth usage per switch port
• Bandwidth usage broken down by protocol
• Bandwidth usage broken down by MAC address
• Broadcasts as a percentage of total packets
• Packet loss and error rate
Internet statistics
• Internet bandwidth use by host and protocol
• Proxy server cache hits
• Top 100 sites accessed
• DNS requests
• Number of inbound emails / spam emails / email bounces
• Outbound email queue size
• Availability of critical services (web servers, email servers, etc.).
• Ping times and packet loss rates to your ISP
• Status of backups
System health statistics
• Memory usage
• Swap file usage
• Process count / zombie processes
• System load
• Uninterruptible Power Supply (UPS) voltage and load
• Temperature, fan speed, and system voltages
• Disk SMART status
• RAID array status
Tere are many freely available tools that will show you as much detail as
you like about what is happening on your network.
You should consider monitoring the availability of any resource where
unavailability would adversely affect your network users.
Don't forget to monitor the monitoring machine itself, for example its
CPU usage and disk space, in order to receive advance warning if it
becomes overloaded or faulty. A monitoring machine that is low on
resources can affect your ability to monitor the network effectively.
2. Spot check tools are designed for troubleshooting and normally run
interactively for short periods of time. A program such as ping may be
considered an active spot check tool, since it generates trafc by polling
a particular machine.
3. Passive spot check tools include protocol analysers, which inspect every
packet on the network and provide complete detail about any network
conversation (including source and destination addresses, protocol
information, and even application data).
6. Network detection
Te simplest wireless monitoring tools simply provide a list of available
networks, along with basic information (such as signal strength and channel).
Tey let you quickly detect nearby networks and determine if they are in
range or are causing interference.
Te built-in client.
All modern operating systems provide built-in support for wireless
networking. Tis typically includes the ability to scan for available
networks, allowing the user to choose a network from a list. While
virtually all wireless devices are guaranteed to have a simple scanning
utility, functionality can vary widely between implementations. Tese tools
are typically only useful for configuring a computer in a home or ofce
setting.
Tey tend to provide little information apart from network names and the
available signal to the access point currently in use.
Netstumbler
(http://www.wirelessdefence.org/Contents/NetstumblerMain.htm). Tis is
the most popular tool for detecting wireless networks using Microsoft
Windows. It supports a variety of wireless cards, and is very easy to use. It
will detect open and encrypted networks, but cannot detect “closed”
wireless networks. It also features a signal/noise meter that plots radio
receiver data as a graph over time. It also integrates with a variety of GPS
devices, for logging precise location and signal strength information.
Tis makes Netstumbler a handy tool to have for an informal site survey.
Macstumbler (http://www.macstumbler.com/). While not directly related to
the Netstumbler, Macstumbler provides much of the same functionality but
for the Mac OS X platform. It works with all Apple Airport cards.
Tese tools will help you to determine just where a connection problem
exists.
7. Spot check tools 15
ping
Just about every operating system (including Windows, Mac OS X, and of
course Linux and BSD) includes a version of the ping utility. It uses
ICMP packets to attempt to contact a specified host, and tells you how
long it takes to get a response.
$ ping yahoo.com
Try pinging an IP address on the Internet. If you can’t reach it, it’s a good
idea to see if you can ping your default router:
$ ping 69.90.235.230
If you can’t ping your default router, then chances are you won’t be able to
get to the Internet either. If you can’t even ping other IP addresses on your
local LAN, then it’s time to check your connection. If you’re using
Ethernet, is it plugged in? If you’re using wireless, are you connected to the
proper wireless network, and is it in range?
$ traceroute -n google.com
8***
My traceroute [v0.69]
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. gremlin.rob.swn 0.0% 4 1.9 2.0 1.7 2.6 0.4
8. Protocol analysers
Network protocol analysers provide a great deal of detail about
information fowing through a network, by allowing you to inspect
individual packets. For wired networks, you can inspect packets at the
data-link layer or above. For wireless networks, you can inspect
information all the way down to individual 802.11 frames. Here are
several popular (and free) network protocol analysers:
Kismet
http://www.kismetwireless.net/
Kismet is a powerful wireless protocol analyser for many platforms
including Linux, Mac OS X, and even the embedded OpenWRT Linux
distribution. It works with any wireless card that supports passive monitor
mode. In addition to basic network detection, Kismet will passively log all
802.11 frames to disk or to the network in standard PCAP format, for
later analysis with tools like Wireshark. Kismet also features associated
client information, AP hardware fingerprinting, Netstumbler detection,
and GPS integration. Since it is a passive network monitor, it can even
detect “closed” wireless networks by analysing trafc sent by wireless
clients. You can run Kismet on several machines at once, and have them
all report over the network back to a central user interface. Tis allows for
wireless monitoring over a large area, such as a university or corporate
campus. Since Kismet uses the radio card's passive monitor mode, it does
all of this without transmitting any data.
Kismet is an invaluable tool for diagnosing wireless network problems.
KisMAC
http://kismac-ng.org
Exclusively for the Mac OS X platform, KisMAC does much of what
Kismet can do, but with a slick Mac OS X graphical interface. It is a
passive scanner that will log data to disk in PCAP format compatible with
Wireshark. It supports passive scanning with AirportExtreme cards as well
as a variety of USB wireless adapters.
tcpdump
http://www.tcpdump.org/
tcpdump is a command-line tool for monitoring network trafc.
20 16. NETWORK MONITORING
It does not have all the bells and whistles of wireshark but it does use
fewer resources. Tcpdump can capture and display all network protocol
information down to the link layer. It can show all of the packet headers
and data received, or just the packets that match particular criteria.
Packets captured with tcpdump can be loaded into wireshark for visual
analysis and further diagnostics. Tis is very useful if you wish to monitor
an interface on a remote system and bring the file back to your local
machine for analysis. Te tcpdump tool is available as a standard tool in
Unix derivatives (Linux, BSD, and Mac OS X). Tere is also a Windows
port called WinDump available at http://www.winpcap.org/windump/
Wireshark
http://www.wireshark.org/
Formerly known as Ethereal, Wireshark is a free network protocol analyser
for Unix and Windows.
It can be daunting to use for first time users or those that are not familiar
with the OSI layers.
It is typically used to isolate and analyse specific trafc to or from an IP
address, but it can be also used as a general purpose fault finding tool. For
example, a machine infected with a network worm or virus can be
identified by looking for the machine that is sending out the same sort of
TCP/IP packets to large groups of IP addresses.
9. Trending tools
Trending tools are used to see how your network is used over a long
period of time. Tey work by periodically monitoring your network
activity, and displaying a summary in a human-readable form (such as a
graph). Trending tools collect data as well as analyse and report on it.
Below are some examples of trending tools. Some of them need to be used
in conjunction with each other, as they are not stand-alone programs.
MRTG
http://oss.oetiker.ch/mrtg/
Te Multi Router Trafc Grapher (MRTG) monitors the trafc load on
network links using SNMP. MRTG generates graphs that provide a visual
representation of inbound and outbound trafc. Tese are typically
displayed on a web page.
MRTG can be a little confusing to set up, especially if you are not familiar
with SNMP. But once it is installed, MRTG requires virtually no
maintenance, unless you change something on the system that is being
monitored (such as its IP address).
22 16. NETWORK MONITORING
grapher.
RRDtool
RRD is short for Round Robin Database. RRD is a database that stores
information in a very compact way that does not expand over time.
RRDtool refers to a suite of tools that allow you to create and modify
RRD databases, as well as generate useful graphs to present the data.
It is used to keep track of time-series data (such as network bandwidth,
machine room temperature, or server load average) and can display that
data as an average over time.
Note that RRDtool itself does not contact network devices to retrieve
data. It is merely a database manipulation tool.
You can use a simple wrapper script (typically in shell or Perl) to do that
work for you. RRDtool is also used by many full featured front-ends that
present you with a friendly web interface for configuration and display.
RRD graphs give you more control over display options and the number
of items available on a graph as compared to MRTG.
ntop
For historical trafc analysis and usage, you will certainly want to
investigate ntop.
Tis program builds a detailed real-time report of observed network
trafc, displayed in your web browser. It integrates with rrdtool, and
makes graphs and charts visually depicting how the network is being used.
On very busy networks, ntop can use a lot of CPU and disk space, but it
gives you extensive insight into how your network is being used. It runs on
Linux, BSD, Mac OS X, and Windows.
While it can be left running to collect historical data, ntop can be fairly
CPU intensive, depending on the amount of trafc observed.
If you are going to run it for long periods you should monitor the CPU
utilisation of the monitoring machine.
24 16. NETWORK MONITORING
Cacti
http://www.cacti.net/
Cacti is a front-end for RRDtool. It stores all of the necessary information
to create graphs in a MySQL database. Te front-end is written in PHP.
Cacti does the work of maintaining graphs, data sources, and handles the
actual data gathering.
Tere is support for SNMP devices, and custom scripts can easily be
written to poll virtually any conceivable network event.
9. Trending tools 25
Figure NM 10: Cacti can manage the polling of your network devices, and
can build very complex and informative visualisations of network behaviour.
NetFlow
NetFlow is a protocol for collecting IP trafc information invented by
Cisco. From the Cisco website:
Flowc
http://netacad.kiev.ua/fowc/. Flowc is an open source NetFlow collector
(see NetFlow above). It is lightweight and easy to configure. Flowc uses a
MySQL database to store aggregated trafc information. Terefore, it is
possible to generate your own reports from the data using SQL, or use the
included report generators. Te built-in report generators produce reports
in HTML, plain text or a graphical format.
SmokePing
http://oss.oetiker.ch/smokeping/. SmokePing is a deluxe latency
measurement tool written in Perl.
It can measure, store and display latency, latency distribution and packet
loss all on a single graph.
SmokePing uses the RRDtool for data storage, and can draw very
informative graphs that present up to the minute information on the state
of your network connection.
SmokePing can optionally send alerts when certain conditions are met,
such as when excessive packet loss is seen on a link for an extended period
of time. An example of SmokePing in action is shown in Figure NM 12.
Figure NM 12: SmokePing can simultaneously display packet loss and latency
spreads in a single graph.
28 16. NETWORK MONITORING
EtherApe
http://etherape.sourceforge.net/
EtherApe displays a graphical representation of network trafc. Hosts and
links change size depending on the amount of trafc sent and received.
Te colours change to represent the protocol most used. As with wireshark
and tcpdump, data can be captured "off the wire" from a live network
connection or read from a tcpdump capture file. EtherApe doesn't show
quite as much detail as ntop, but its resource requirements are much lighter.
iptraf
http://iptraf.seul.org/
IPTraf is a lightweight but powerful LAN monitor. It has an ncurses
interface and runs in a command shell. IPTraf takes a moment to measure
observed trafc, and then displays various network statistics including
TCP and UDP connections, ICMP and OSPF information, trafc fows,
IP checksum errors, and more. It is a simple to use program that uses
minimal system resources. While it does not keep historical data, it is very
useful for displaying an instantaneous usage report.
Argus
http://qosient.com/argus/.
Argus stands for Audit Record Generation and Utilisation System. Argus
is also the name of the mythological Greek god who had hundreds of eyes.
NeTraMet
http://www.caida.org/tools/measurement/netramet/
NeTraMet is another popular fow analysis tool. Like Argus,
NeTraMet consists of two parts: a collector that gathers statistics via
SNMP, and a manager that specifies which fows should be watched.
Flows are specified using a simple programming language that defines the
addresses used on either end, and can include Ethernet, IP, protocol
information, or other identifiers. NeTraMet runs on DOS and most
UNIX systems, including Linux and BSD.
While there are web pages available that will perform a “speed test” in your
browser (such as http://www.dslreports.com/stest or http://speedtest.net/),
these tests are increasingly inaccurate as you get further from the testing
source. Even worse, they do not allow you to test the speed of a given link,
but only the speed of your link to a particular site on the Internet.
Here are a few tools that will allow you to perform throughput testing on
your own networks.
ttcp
Now a standard part of most Unix-like systems, ttcp is a simple network
performance testing tool.
One instance is run on either side of the link you want to test.
Te first node runs in receive mode, and the other transmits:
node_a$ ttcp -r -s
node_b$ ttcp -t -s node_a
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp -> node_a
ttcp-t: socket
ttcp-t: connect
ttcp-t: 16777216 bytes in 249.14 real seconds = 65.76 KB/sec +++
10. Throughput testing 31
After collecting data in one direction, you should reverse the transmit and
receive partners to test the link in the other direction. It can test UDP as
well as TCP streams, and can alter various TCP parameters and buffer
lengths to give the network a good workout. It can even use a user-
supplied data stream instead of sending random data. Remember that the
speed readout is in kilobytes, not kilobits. Multiply the result by 8 to find
the speed in kilobits per second. Te only real disadvantage to ttcp is that
it hasn’t been developed in years. Fortunately, the code has been released
in the public domain and is freely available. Like ping and traceroute, ttcp
is found as a standard tool on many systems.
iperf
http://iperf.sourceforge.net/. Much like ttcp, iperf is a command line tool
for estimating the throughput of a network connection. It supports many
of the same features as ttcp, but uses a “client” and “server” model instead
of a “receive” and “transmit” pair.
To run iperf, launch a server on one side and a client on the other:
node_a$ iperf -s
node_b$ iperf -c node_a
------------------------------------------------------------
Client connecting to node_a, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 5] local 10.15.6.1 port 1212 connected with 10.15.6.23 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-11.3 sec 768 KBytes 558 Kbits/sec
Te server side will continue to listen and accept client connections on
port 5001 until you hit control-C to kill it. Tis can make it handy when
running multiple test runs from a variety of locations.
Te biggest difference between ttcp and iperf is that iperf is under active
development, and has many new features (including IPv6 support). Tis
makes it a good choice as a performance tool when building new
networks.
32 16. NETWORK MONITORING
bing
http://fgouget.free.fr/bing/indx-en.shtml
Rather than food a connection with data and see how long the transfer
takes to complete, bing attempts to estimate the available throughput of a
point-to-point connection by analysing round trip times for various sized
ICMP packets. While it is not always as accurate as a food test, it can
provide a good estimate without transmitting a large number of bytes.
Since bing works using standard ICMP echo requests, it can estimate
available bandwidth without the need to run a special client on the other
end, and can even attempt to estimate the throughput of links outside
your network. Since it uses relatively little bandwidth, bing can give you a
rough idea of network performance without running up the charges that a
food test would certainly incur.
Te following are some open source tools that can help perform this task.
Snort
Snort (http://www.snort.org/) is a packet sniffer and logger which can be
used as a lightweight network intrusion detection system.
It features rule-based logging and can perform protocol analysis, content
searching, and packet matching.
Apache: mod_security
ModSecurity (http://www.modsecurity.org/) is an open source intrusion
detection and prevention engine for web applications.
Tis kind of security tool is also known as a web application firewall.
ModSecurity increases web application security by protecting web
applications from known and unknown attacks.
It can be used on its own, or as a module in the Apache web server
(http://www.apache.org/).
Tere are several sources for updated mod_security rules that help protect
against the latest security exploits.
One excellent resource is GotRoot, which maintains a huge and
frequently updated repository of rules:
http://www.atomicorp.com/wiki/index.php/Atomic_ModSecurity_Rules
Web application security is important in defending against attacks on your
web server, which could result in the theft of valuable or personal data, or
in the server being used to launch attacks or send spam to other Internet
users. As well as being damaging to the Internet as a whole, such
intrusions can seriously reduce your available bandwidth.
Nagios
Nagios (http://nagios.org/) is a program that monitors hosts and services
on your network, notifying you immediately when problems arise.
It can send notifications via email, SMS, or by running a script, and will
send notifications to the relevant person or group depending on the nature
of the problem. Nagios runs on Linux or BSD, and provides a web
interface to show up-to-the-minute system status. Nagios is extensible,
and can monitor the status of virtually any network event. It performs
checks by running small scripts at regular intervals, and checks the results
against an expected response.
34 16. NETWORK MONITORING
Tis can yield much more sophisticated checks than a simple network
probe. For example, ping may tell you that a machine is up, and nmap
may report that a TCP port responds to requests, but Nagios can actually
retrieve a web page or make a database request, and verify that the
response is not an error.
Figure NM 15: Nagios keeps you informed the moment a network fault or
service outage occurs.
Nagios can even notify you when bandwidth usage, packet loss, machine
room temperature, or other network health indicator crosses a particular
threshold.
Tis can give you advance warning of network problems, often allowing
you to respond to the problem before users have a chance to complain.
Zabbix
Zabbix (http://www.zabbix.org/) is an open source realtime monitoring
tool that is something of a hybrid between Cacti and Nagios. It uses a
SQL database for data storage, has its own graph rendering package, and
performs all of the functions you would expect from a modern realtime
monitor (such as SNMP polling and instant notification of error
conditions). Zabbix is released under the GNU General Public License.
12. Other useful tools 35
ngrep
Ngrep provides most of GNU grep's pattern matching features, but
applies them to network trafc. It currently recognises IPv4 and IPv6,
TCP, UDP, ICMP, IGMP, PPP, SLIP, FDDI, Token Ring, and much
more. As it makes extensive use of regular expression matches, it is a tool
suited to advanced users or those that have a good knowledge of regular
expressions. You don't necessarily need to be a regex expert to be able to
make basic use of ngrep. For example, to view all packets that contain the
string GET (presumably HTTP requests), try this:
# ngrep -q GET
By using ngrep creatively, you can detect anything from virus activity to
spam email. You can download ngrep at http://ngrep.sourceforge.net/.
nmap/Zenmap
nmap is a network diagnostic tool for showing the state and availability of
network ports on a network interface. A common use is to scan a network
host on a TCP/IP network for what ports are open, thereby allowing one to
create a "map" of the network services that the machine provides. Te nmap
tool does this by sending specially crafted packets to a target network host
and noticing the response(s). For example, a web server with an open port 80
but no running web server will respond differently to an nmap probe than
one that not only has the port open but is running httpd.
Similarly, you will get a different response to a port that is simply shut off
vs. one that is open on a host but blocked by a firewall.
36 16. NETWORK MONITORING
Zenmap
Zenmap is a cross-platform GUI for nmap which runs under Linux,
Windows, Mac OS X, BSD, etc. and can be downloaded from the
nmap.org site as well.
netcat
Somewhat between nmap and tcpdump, netcat is another diagnostic tool
for poking and prodding at ports and connections on a network. It takes
its name from the UNIX cat(1) utility, which simply reads out whatever
file you ask it to. Similarly, netcat reads and writes data across any
arbitrary TCP or UDP port. Te netcat utility is not a packet analyser but
works on the data(payload) contained in the packets.
For example, here is how to run a very simple 1-line, 1-time web server
with netcat:
Tis is not a definitive list, but should give you an idea of how a wide
range of factors can affect your bandwidth patterns.
With this in mind, let's look at the topic of baselines.
only that it was relatively high, which may not indicate a problem.
Baseline graphs and figures are also useful when analysing the effects of
changes made to the network. It is often very useful to experiment with
such changes by trying different possible values. Knowing what the
baseline looks like will show you whether your changes have improved
matters, or made them worse.
Figure NM 16: By collecting data over a long period of time, you can predict
the growth of your network and make changes before problems develop.
Figure NM 17: The traffic trend at Aidworld logged over a single day.
14. Establishing a baseline 39
Figure NM 18: The same network logged over an entire week reveals a
problem with backups, which caused unexpected congestion for network users.
Te next graph (Figure NM 20) shows the same data over a period of 16
hours. Tis indicates that the values in the graph above are close to the
normal level (baseline), but that there were significant increases in latency
at several times during the early morning, up to 30 times the baseline
value.
Tis indicates that additional monitoring should be performed during
these periods to establish the cause of the high latency to avoid problems
such as the backup not completing in the future.
repetition of increased latency and packet loss in the early morning hours.
14. Establishing a baseline 41
Figure NM 22: The classic network flow graph. The green area represents
inbound traffic, while the blue line represents outbound traffic. The repeating
arcs of outbound traffic show when the nightly backups have run.
Trafc patterns will vary with what you are monitoring.
A router will normally show more incoming trafc than outgoing trafc as
users download data from the Internet.
An excess of outbound trafc that is not transmitted by your network
servers may indicate a peer-to-peer client, unauthorised server, or even a
virus on one or more of your clients.
Tere are no set metrics that indicate what outgoing trafc to incoming
trafc should look like.
It is up to you to establish a baseline to understand what normal network
trafc patterns look like on your network.
42 16. NETWORK MONITORING
Figure NM 24: The horizontal line shows the 95th percentile amount.
MRTG and Cacti will calculate the 95th percentile for you.
Tis is a sample graph of a 960 kbps connection. Te 95th percentile
came to 945 kbps after discarding the highest 5% of trafc.
Figure NM 25: RRDtool can show arbitrary data, such as memory and CPU
usage, expressed as an average over time.
Mail servers require adequate space, as some people may prefer to leave
their email messages on the server for long periods of time.
Te messages can accumulate and fill the hard disk, especially if quotas are
not in use.
If the disk or partition used for mail storage fills up, the mail server
cannot receive mail. If that disk is also used by the system, all kinds of
system problems may occur as the operating system runs out of swap
space and temporary storage.
File servers need to be monitored, even if they have large disks.
Users will find a way to fill any size disk more quickly than you might
think.
15. Monitoring RAM and CPU usage 45
16. Summary
In summary in this chapter we've tried to give you an insight into how to
monitor your network and computing resources cost effectively and
efciently. We've introduced many of our favourite tools to assist you.
Many of them are tried and tested by many network operators.
Hopefully you have understood the importance of monitoring to enable
you to both justify upgrades when necessary to those funding the
upgrades, plus minimise the impact of problems as they occur.
Te end result is keeping your network and computing resources healthy
and keeping all of your users happy with the service you are providing
them.