Networking Tutorial – TCPIP over Ethernet
Welcome to a new networking tutorial, based upon the most common technology - TCP/IP over
Ethernet. Many of the principles will apply to other technologies, but for now I'm aiming for the simple,
rather than totally complete, approach. Just what you need to know to get the job done.
This tutorial is OS-neutral; it uses Linux, Windows and Unix in its examples, but to be honest, those are
small details. How TCP/IP works over Ethernet is the same regardless of the OS.
This tutorial skips a lot of the detail which could be written; It is aimed at the new student of TCP/IP who
wants to understand how data actually gets from one machine to another over the network. That is not
to say that it is a trivial, high-level tutorial. The goal is that the reader should learn the following
information by the end of the tutorial:
• How IP networks are structured
• How data is labelled and sent around LANs
• How data is routed from one network to another, and
• How that routing is determined
I intend to keep adding to the tutorial over time; please let me know what you want. There is certainly a
lot more detail to be dealt with, but I do want to ensure that the text is kept clear. That is more important
than dealing with esoteric issues.
For example, I vow never to mention the OSI 7-layer model. Nobody uses it, yet every networking book
starts by explaining what it is.
In this tutorial, we will imagine a small configuration with a few servers (machines) on a few different
networks. We will start with a single network and by the end of the tutorial, will have built up to the
multiple networks shown here.
Don't be put off if the diagram looks small and therefore trivial; This small network diagram provides
plenty of detail to get our teeth into.
A Single Network
If A wants to talk to B, well, they're on the same network, so A addresses the packet directly to B:
So "A" can send a packet to "B" like this:
Source IP 192.168.1.1 (A)
Destination IP 192.168.1.2 (B)
Data Hello B! This is the Data
Unfortunately, it's not as simple as that. The IP address identifies the machines at a software (logical)
level, but the physical (MAC) layer isn't the same as the logical (IP) layer.
• The IP layer needs to be able to route from Alaska to Zebediela. It works at a relatively high level.
• The MAC layer only needs to talk to machines on the local network (LAN). It works at a low
level.
Source IP 192.168.1.1 (A)
Source MAC 01:C0:F2:69:31:21 (A)
Destination IP 192.168.1.2 (B)
Destination MAC 03:A0:B3:27:A2:2E (B)
Data Hello B! This is the Data
So how does A find out what B's MAC address is?
MAC (Media Access Control)
Unfortunately, it's not always as simple as the previous page implied. The IP address identifies the
machines at a software level, but on the wire, a different type of addressing is used, so some additional
information is also required:
Source IP 192.168.1.1 (A)
Source MAC 01:C0:F2:69:31:21 (A)
Destination IP 192.168.1.2 (B)
Destination MAC 03:A0:B3:27:A2:2E (B)
Data Hello B! This is the Data
What's that about MAC addresses? Those are the hardware addresses of the network cards installed in
those machines. Any device receiving the packet will only process the packet if it matches their hardware
address (or the special broadcast address, which we'll deal with in a minute). This address is assigned by
the hardware manufacturer, from the address pool allocated to them by the IEEE. If you have an Intel
network card, you'll have an Intel MAC address (maybe 00:02:B3:xx:xx:xx). A 3COM network card will
have a 3COM MAC address (maybe 00:04:0B:xx:xx:xx). This is also called the Ethernet address, or the
Physical address. As the ethernet MAC address is a very large number (displayed in hexadecimal (base
16) for clarity - B's address above converts to the number 3,988,735,369,774), every card in the world can
be (and is) unique.
It is possible to change your MAC address, but there is rarely a need to do so.
Broadcast
Hang on, how does "A" know what "B"'s MAC address is? If we look back up to the ifconfig output, we
can see the "Bcast," or Broadcast address. This is configured to be the highest IP address available on the
network. As this network is 192.168.1.0 - 192.168.1.255, the broadcast address is 192.168.1.255. This is a
special address. If A wants to know "B"'s MAC address, it can broadcast a packet, addressed to
192.168.1.255, using ARP, asking who has 192.168.1.2. "B" (or potentially another device on the
network) will reply with "B"'s MAC address. A packet sent to the broadcast address looks like this:
Source IP 192.168.1.1 (A)
Source MAC 01:C0:F2:69:31:21 (A)
Destination IP 192.168.1.255 (broadcast)
Destination MAC FF:FF:FF:FF:FF:FF (broadcast)
Data Who has 192.168.1.2 ? Tell 192.168.1.1
Any machine on the network which knows the answer (but usually "B" itself) will reply with a fully-
populated packet, including its own MAC address - any outgoing packet always includes the sender's IP
and MAC address. This way, "A" can learn "B"'s MAC address if it needs to send it a packet.
Source IP 192.168.1.2 (B)
Source MAC 0A:E1:23:28:AE:F2 (B)
Destination IP 192.168.1.1 (A)
Destination MAC 01:C0:F2:69:31:21 (A)
Data 192.168.1.2 is 0A:E1:23:28:AE:F2
The network card (NIC) listens to packets sent to itself, and also to packets sent to FF:FF:FF:FF:FF:FF. In
a similar way, the IP Stack will listen to packets addressed to the IP broadcast address - 192.168.1.255 in
this case (so long as the MAC address matches, otherwise the packet would already have been discarded
by the NIC).
Routing (Part 1: How to send via the router)
If a packet is being sent to the local network (eg, "A" sending a packet to "B"), it will need B's MAC
address (via ARP, as discussed above). If it is sending to another network, it will not need the destination's
MAC address, just that of the router it is sending the packet to. So, for G to send a packet to F, the packet
sent by G would look like this:
Source IP 192.168.1.4 (G)
Source MAC 02:F1:A0:23:37:52 (G)
Destination IP 192.168.2.4 (F)
Destination MAC 01:33:4A:DC:7B:37 (firewall)
Data Hello F! My name is G
However, the packet received by F (from the firewall) would look like this:
Source IP 192.168.1.4 (G)
Source MAC 01:33:4A:DC:7B:37 (firewall)
Destination IP 192.168.2.4 (F)
Destination MAC 05:4C:5D:CA:83:23 (F)
Data Hello F! My name is G
So G and F never know each others' MAC address; they don't need to. The firewall knows both, because it
talks directly to both hosts.
In the same way, when your PC talks to www.google.com, it does not need to know anything about
Google's physical address, only the address of your ISP's router. At a higher level, you personally don't
need to know Google's IP address (eg, 64.233.183.147), only the TCP name (www.google.com). This is
how TCP/IP blend together; IP deals with the "internet" side of things, whilst TCP deals with the higher
levels.
Netmask
The key to understanding IP routing is the netmask. The netmask tells us whether we can communicate
directly with another machine, or if we need to go via a router. If A wants to talk to B, well, they're on the
same network, so A addresses the packet directly to B. If A wants to talk to E, it will have to send the
packet to the (routing) firewall between those networks, as it cannot send directly to E.
But how does "A" know when to send a simple packet and when to do the harder work?
If we assume that box "A" is Linux, and box "B" is Windows, we will see the following: (may look
strange if your browser window is narrow)
root@A# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:E1:CC:62:34:53
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2025455 errors:0 dropped:0 overruns:0 frame:0
TX packets:1969320 errors:2 dropped:0 overruns:0 carrier:4
collisions:0 txqueuelen:1000
RX bytes:1863973735 (1.7 GiB) TX bytes:1280459205 (1.1 GiB)
Interrupt:185 Base address:0xb800
root@A#
And on the Windows box:
The Windows screenshot shows the purpose of the netmask most clearly, though a bit of binary (and
maybe some hexadecimal) understanding is useful for more complex examples. This means that, if A
wants to talk to B, it compares its own IP address and netmask with B's IP address:
192 168 1 1
A
11000000 10101000 00000001 00000001
255 255 255 0
Mask
11111111 11111111 11111111 00000000
192 168 1 2
B
11000000 10101000 00000001 00000010
Result Network Network Network Host
We need to perform a logical AND on the IP addresses and Netmask. We do this by looking down the
columns; a "1" in the Netmask means that if both IP addresses are the same in that column, then they are
on the same network, a "0" means that these bits can differ between hosts on the same network. Therefore,
the 1's are referred to as the network address, and the 0's are referred to as the host address. In this case,
192.168.1.0 is the (common) network address, so .1 (for A) and .2 (for B) is the host address.
Please see Bases for more information about Base 2 (Binary) and Base 16 (Hexadecimal). See /xx
notation for how this makes the /xx notation make sense, but in a nutshell, the example above has 24 "1"s
in a row, so it is a /24 network.
This means that for A to communicate with B, it can create a simple packet, like this:
Source IP 192.168.1.1 (A)
Destination IP 192.168.1.2 (B)
Data Hello B! This is the Data
Routing (Part 2: How to find a router)
Routing then, works at the next level. What happens when A wants to talk to E? It could broadcast an
ARP request, but E would not see the request, so it would not reply. On this scale, that might seem to be a
limitation, but should everyone really keep asking www.google.com for a physical address? It makes
sense that the physical layer stays at the network level. Beyond that, IP (Internet Protocol) takes over, so
the physical layer is not necessary.
Instead, A finds the IP address for E, via whatever method it is configured to use - /etc/hosts, DNS, LDAP,
etc. It then compares netmasks:
192 168 1 1
A
11000000 10101000 00000001 00000001
255 255 255 0
Mask
11111111 11111111 11111111 00000000
192 168 2 3
E
11000000 10101000 00000010 00000011
Result Network Network Network Host
All that "A" knows, is that its netmask doesn't match E's address completely, for all the bits (marked
"Network", not "Host") that the netmask tell it that it needs to match, so it will have to find a router on the
same network as itself in order to communicate with E. There is often only one router, configured as a
default router. In this case though, we have a few routers to choose from.
The netstat utility shows the routes on a *nix server (Solaris in this example) like this (in the example
diagram shown, this is for "G", because it covers more detail than an example for "A" would provide):
root@G# netstat -rn
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ------ ---------
192.168.1.0 192.168.1.4 U 1 487 hme0
224.0.0.0 192.168.1.4 U 1 0 hme0
192.168.2.64 192.168.2.65 U 1 132 hme1
default 192.168.1.3 UG 1 523
127.0.0.1 127.0.0.1 UH 1 14 lo0
root@G#
This server is configured as 192.168.1.4 and 192.168.2.65, so it is on two different networks, via NICs
hme0 and hme1 respectively. The first line tells it that to get to the 192.168.1.0 network, it can go direct
via 192.168.1.4 (itself) on the hme0 interface. For this, it will need the MAC address of the server it wants
to talk to (A, B or the firewall); if it's not in the ARP table, it will have to ask for it as discussed above.
The second line is the multicast address. You can safely ignore that for now :-)
The third line tells it that to get to the 192.168.2.64 network, it can go via (its own) 192.168.2.65 interface
on hme1.
The fourth line tells it that the default router is at 192.168.1.3. If it needs to get to 192.168.2.0/26 (or any
other network), it needs to go via that router. It may not get there, but the others certainly won't. The
default router is the "last resort"; the other, explicit, routes, are for specific networks. The default router is
usually connected to lots of networks, either directly or indirectly. The useful thing about this is that G
does not need to be explicitly told about that network; if it needs to communicate with the network, it can
simply send a packet to its default router. If you type ping 192.168.3.29 then it will send a packet to the
default router, just in case there is a device at 192.168.3.29. "G" doesn't need to know if there is, or what
its netmask is. It just sends the packet to the router, which deals with the request. In this case, a packet for
192.168.2.0/26 would get passed on, whilst a packet for 192.168.3.29 would simply get no response. The
router, if it can access 192.168.3.x, can sort out the netmask issues on G's behalf.
The final line deals with "localhost", a special address (127.0.0.1) which on any machine will point back
to itself. This is useful for debugging, as well as for non-networked machines which need a network stack.
A cruel joke is to tell a newbie to try hacking 127.0.0.1, or telling them that 127.0.0.1 is an FTP site with a
copy of their hard disk, etc. (examples). In fact, the entire 127.0.0.0/8 (that is, 127.x.x.x) is reserved for
loopback. It's just very rare to need more than one loopback address, so the popular one is 127.0.0.1.
As for the other fields reported by netstat, Flag "U" means the host is Up, "UG" means "Up and a route to
a Gateway (which may pass the packet on)"; "UH" means "Up and a route to a Host (which won't)".
CIDR (Classless Inter-Domain Routing)
From this information, the Operating System can determine the most useful router to choose for a
particular destination. On Solaris, the /etc/netmasks file tells the OS about particular netmasks for
given networks; otherwise, the old, pre-CIDR standard is followed, whereby the IP address itself suggests
its netmask:
From To Class / Comments Netmask
0.0.0.0 9.255.255.255 Class A 255.0.0.0
10.0.0.0 10.255.255.255 Class A Private 255.0.0.0
11.0.0.0 126.255.255.255 Class A 255.0.0.0
127.0.0.0 127.255.255.255 Loopback 255.0.0.0
128.0.0.0 172.15.255.255 Class B 255.255.0.0
172.16.0.0 172.31.255.255 Class B Private 255.255.0.0
172.32.0.0 191.255.255.255 Class B 255.255.0.0
192.0.0.0 192.167.255.255 Class C 255.255.255.0
192.168.0.0 192.168.255.255 Class C Private 255.255.255.0
192.169.0.0 223.255.255.255 Class C 255.255.255.0
224.0.0.0 239.255.255.255 Multicast ...
240.0.0.0 255.255.255.255 Reserved ...
You can see that each class (A,B,C) has a "Private" segment in the middle, which is non-routable. Other
than that, their netmasks are 255.0.0.0, 255.255.0.0. and 255.255.255.0 (ff000000, ffff0000, ffffff00
respectively, in Hex). That turned out to be a little too simplistic as internet usage grew, so we now have
Classless Inter-Domain Routing (CIDR), which forgets about classes, and just says that a network can
have any netmask. The closer you get to such a network, the more likely you are to need to know about
how it is configured (hence /etc/netmasks, and CIDR in DNS, etc).