KEMBAR78
The TCP/IP Stack in the Linux Kernel | PPTX
Divye Kapoor
PracheerAgarwal
Swagat Konchada
 It is the software layer in the kernel that provides a
uniform filesystem interface to userspace programs
 It provides an abstraction within the kernel that allows
for transparent working with a variety of filesystems.
 Thus it allows many different filesystem
implementations to coexist freely
 Each socket is implemented as a “file” mounted on
the sockfs filesystem.
 file->private points to the socket information.
 Inodes provide a method to access the actual
data blocks allocated to a file. For sockets, they
provide buffer space which can be used to hold
socket specific data.
 struct inode
 Every file is represented in the kernel as an
object of the file structure. It requires an inode
provided to it.
 struct file
Struct operations {
int (*read)(int, char *, int);
void (*destroy_inode)(inode *);
void (*dirty_inode) (struct inode *);
int (*write_inode) (struct inode *, int);
void (*drop_inode) (struct inode *);
void (*delete_inode) (struct inode *);
};
Sizeof(operations) = sizeof(function ptr)*6
Divye Kapoor
User Space
Socket, bind, listen, connect, send, recv, write, read etc.
Socket Functions (Kernel)
sys_socket, sys_bind, sys_listen, sys_connect etc. in socket.c
TCP/IP Layer Functions
inet_create, tcp_v4_connect, tcp_sendmsg, tcp_recvmsg
Ethernet Device Layer
dev_hard_start_xmit
Sys_socket()
Sock_create() Sock_map_fd()
Allocate a socket object
(internally an inode
Associated with a file object)
Locate the family requested and
call the create function for that
family
Inet_create()
Lower layer initialization
Sock_alloc_fd()
Allocate a file descriptor
Sock_attach_fd()
Fd_install()
Sys_connect()
Sockfd_lookup_light()
Returns the socket object
associated with the given fd
Move_addr_to_kernel()
For userspace sockaddr *
Sock->ops->connect()
Lower layer call
Tcp_v4_connect()
Socket layer functions
are elided.
Defined in <include/linux/skbuff.h>
 used by every network layer (except the physical layer)
 fields of the structure change as it is passed from one layer to another
 i.e., fields are layer dependent.
struct sk_buff {
... ... ...
#ifdef CONFIG_NET_SCHED
_ _u32 tc_index;
#ifdef CONFIG_NET_CLS_ACT
_ _u32 tc_verd;
_ _u32 tc_classid;
#endif
#endif
}
sk_buff is peppered with C preprocessor #ifdef directives.
CONFIG_NET_SCHED symbol should be defined at compile time for the
structure to have the element tc_index.
enabled with some version of make config by an administrator.
 The kernel maintains all sk_buff structures in a doubly linked list.
struct sk_buff_head {/* only the head of the list */
/*These two members must be first. */
struct sk_buff * next;
struct sk_buff * prev;
_ _u32 qlen;
spinlock_t lock;/* atomicity in accessing a sk_buff list. */
};
 Layout
 General
 Feature-specific
 Management functions
 struct sock * sk
sock data structure of the socket that owns this buffer
 unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
 unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
 unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
 atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
 struct sock * sk
sock data structure of the socket that owns this buffer
 unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by
head) and the data in the fragments
 unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
 unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
 atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
 struct sock * sk
sock data structure of the socket that owns this buffer
 unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
 unsigned int data_len
unlike len, data_len accounts only for the size of the data in the
fragments.
 unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
 atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
 struct sock * sk
sock data structure of the socket that owns this buffer
 unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
 unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
 unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
 atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc and atomic_dec
 struct sock * sk
sock data structure of the socket that owns this buffer
 unsigned int len
includes both the data in the main buffer (i.e., the one pointed to by head)
and the data in the fragments
 unsigned int data_len
unlike len, data_len accounts only for the size of the data in the fragments.
 unsigned int truesize
skb->truesize = size + sizeof(struct sk_buff);
 atomic_t users
reference count, or the number of entities using this sk_buff buffer
atomic_inc() and atomic_dec()
• unsigned char *head
• sk_buff_data_t end
• unsigned char *data
• sk_buff_data_t tail
struct net_device *dev
 represents the receiving interface or the to be transmitted device(or
interface) corresponding to the packet.
 usually represents the virtual device’s(representation of all devices
grouped) net_device structure.
 Pointers to protocol headers.
 sk_buff_data_t transport_header;
 sk_buff_data_t network_header;
 sk_buff_data_t mac_header;
updation of data is done using the *_header pointers
 char cb[40]
 This is a "control buffer," or storage for private information, maintained
by each layer for internal use.
struct tcp_skb_cb {
... ... ... _ _u32 seq; /* Starting sequence number */
_ _u32 end_seq; /* SEQ + FIN + SYN + datalen*/
_ _u32 when; /* used to compute rtt's */
_ _u8 flags; /*TCP header flags. */
... ... ...
};
Defined in <include/linux/skbuff.h> & <net/core/skbuff.c>
skb_put(struct sk_buff *, usingned int len)
skb_push(struct sk_buff *skb, unsigned int len)
skb_pull(struct sk_buff *skb, unsigned int len)
skb_reserve(struct sk_buff *skb, int len)
Each of the above four memory management functions return the data ptr.
defined in <net/core/skbuff.c>
struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
int fclone, int node)
…
size = SKB_DATA_ALIGN(size);
data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
…
struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
unsigned int length, gfp_t gfp_mask)
The buffer allocation function meant for use by device drivers
Executed in interrupt mode
Freeing memory: kfree_skb and dev_kfree_skb
Release buffer back to the buffer-pool.
Buffer released only when skb_users counter is 1. If not, the counter is
decremented.
Socket layer functions
are elided.
 Defined in <include/linux/netdevice.h>
 stores all information specifically regarding a network device
 one such structure for each device, both real ones (such as Ethernet
NICs) and virtual ones
 Network devices can be classified into types such as Ethernet cards and
Token Ring cards
 Each type may come in several models.
 Model specific parameters are initialized by device driver software.
 Parameters common for different models are initiated by kernel.
struct net_device{
char name[IFNAMSIZ];
int ifindex;
/* device name hash chain, ex: eth0 */
struct hlist_node name_hlist;
unsigned long mem_end;/* shared mem end */
unsigned long mem_start; /* shared mem start */
unsigned long base_addr; /* device I/O address */
unsigned int irq; /* device IRQ number*/
unsigned char if_port; /* Selectable AUI,TP,..*/
unsigned char dma; /* DMA channel */
…
struct net_device{
char name[IFNAMSIZ];
int ifindex;
/* device name hash chain, ex: eth0 */
struct hlist_node name_hlist;
unsigned long mem_end; /* shared mem end */
unsigned long mem_start; /* shared mem start */
unsigned long base_addr; /* device I/O address */
unsigned int irq; /* device IRQ number*/
unsigned char if_port; /* Selectable AUI,TP,..*/
unsigned char dma; /* DMA channel */
…
struct net_device{
char name[IFNAMSIZ];
int ifindex;
/* device name hash chain, ex: eth0 */
struct hlist_node name_hlist;
unsigned long mem_end;/* shared mem end */
unsigned long mem_start; /* shared mem start */
unsigned long base_addr; /* device I/O address */
unsigned int irq; /* device IRQ number*/
unsigned char if_port; /* Selectable AUI,TP,..*/
unsigned char dma; /* DMA channel
*/
struct net_device{
char name[IFNAMSIZ];
/* device name hash chain, ex: eth0 */
struct hlist_node name_hlist;
unsigned long mem_end;/* shared mem end */
unsigned long mem_start; /* shared mem start */
unsigned long base_addr; /* device I/O address */
unsigned int irq; /* device IRQ number*/
unsigned char if_port; /* Selectable AUI,TP,..*/
unsigned char dma; /* DMA channel */
unsigned short flags; /* interface flags (a la BSD) */
…
struct net_device{
char name[IFNAMSIZ];
/* device name hash chain, ex: eth0 */
struct hlist_node name_hlist;
unsigned long mem_end;/* shared mem end */
unsigned long mem_start; /* shared mem start */
unsigned long base_addr; /* device I/O address */
unsigned int irq; /* device IRQ number*/
unsigned char if_port; /* Selectable AUI,TP,..*/
unsigned char dma; /* DMA channel */
unsigned short flags; /* interface flags (a la BSD)*/
…
struct net_device{
char name[IFNAMSIZ];
/* device name hash chain, ex: eth0 */
struct hlist_node name_hlist;
unsigned long mem_end;/* shared mem end */
unsigned long mem_start; /* shared mem start */
unsigned long base_addr; /* device I/O address */
unsigned int irq; /* device IRQ number*/
unsigned char if_port; /* Selectable AUI,TP,..*/
unsigned char dma; /* DMA channel */
unsigned short flags; /* interface flags (a la BSD)*/
/* ex : IFF_UP || IFF_RUNNING || IFF_MULTICAST */
struct net_device{
…
unsigned mtu; /* interface MTU value */
unsigned short type; /* interface hardware type */
unsigned short hard_header_len; /* hardware hdr length */
unsigned char dev_addr[MAX_ADDR_LEN];
unsigned char addr_len; /* hardware address length */
unsigned char broadcast[MAX_ADDR_LEN];
unsigned int promiscuity;
…
struct net_device{
…
unsigned mtu; /* interface MTU value */
unsigned short type; /* interface hardware type*/
unsigned short hard_header_len; /* hardware hdr length */
unsigned char dev_addr[MAX_ADDR_LEN];
unsigned char addr_len; /* hardware address length */
unsigned char broadcast[MAX_ADDR_LEN];
unsigned int promiscuity;
…
struct net_device{
…
unsigned mtu; /* interface MTU value */
unsigned short type; /* interface hardware type */
unsigned short hard_header_len;/* hardware hdr length */
unsigned char dev_addr[MAX_ADDR_LEN];
unsigned char addr_len; /* hardware address length */
unsigned char broadcast[MAX_ADDR_LEN];
unsigned int promiscuity;
…
struct net_device{
…
unsigned mtu; /* interface MTU value */
unsigned short type; /* interface hardware type */
unsigned short hard_header_len; /* hardware hdr length */
unsigned char dev_addr[MAX_ADDR_LEN];
unsigned char addr_len; /* hardware address length*/
unsigned char broadcast[MAX_ADDR_LEN];
unsigned int promiscuity;
…
struct net_device{
…
unsigned mtu; /* interface MTU value */
unsigned short type; /* interface hardware type */
unsigned short hard_header_len; /* hardware hdr length */
unsigned char dev_addr[MAX_ADDR_LEN];
unsigned char addr_len; /* hardware address length */
unsigned char broadcast[MAX_ADDR_LEN];
unsigned int promiscuity;
…
struct net_device{
…
unsigned mtu; /* interface MTU value */
unsigned short type; /* interface hardware type */
unsigned short hard_header_len; /* hardware hdr length */
unsigned char dev_addr[MAX_ADDR_LEN];
unsigned char addr_len; /* hardware address length */
unsigned char broadcast[MAX_ADDR_LEN];
unsigned int promiscuity;
…
struct net_device{
…
struct net_device *next;
struct hlist_node name_hlist;
struct hlist_node index_hlist;
We don’t process the packet in the interrupt subroutine.
Netif_rx() – raise the net Rx softIRQ.
Net_rx_action() is called - start processing the packet
 Processing of packet starts with the protocol switching section
Netif_receive_skb() is called to process the packet and find out the next protocol layer.
Protocol family of the packet is extracted from the link layer header.
ip_rcv() is an entry point for IP packets processing.
Checks if the packet we have is destined for some other host (using PACKET_OTHERHOST)
Check the checksum of the packet by calling ip_fast_csum()
Call ip_route_input() , this routine checks kernel routing table rt_hash_table.
If packet needs to be forwarded input routine is ip_forward()
Otherwise ip_local_deliver()
ip_send() is called to check if the packet needs to be fragmented
If yes , fragment the packet by calling ip_fragment()
Packet output path – ip_finish_output()
ip_local_deliver() – packets need to delivered locally
ip_defrag()
Protocol identifier field skb->np.iph->protocol (in IP header).
ForTCP, we find the receive handler as tcp_v4_rcv() (entry point for theTCP layer)
_tcp_v4_lookup() – find the socket to which the packet belongs
Establised sockets are maintained in the hash table tcp_ehash.
Established socket not found – New connection request for any listening socket
Search for listening socket – tcp_v4_lookup_listener()
tcp_rcv_established()
Application read the data from the receive queue if it issues recv()
Kernel routine to read data fromTCP socket is tcp_recvmsg()
The TCP/IP Stack in the Linux Kernel

The TCP/IP Stack in the Linux Kernel

  • 1.
  • 3.
     It isthe software layer in the kernel that provides a uniform filesystem interface to userspace programs  It provides an abstraction within the kernel that allows for transparent working with a variety of filesystems.  Thus it allows many different filesystem implementations to coexist freely  Each socket is implemented as a “file” mounted on the sockfs filesystem.  file->private points to the socket information.
  • 4.
     Inodes providea method to access the actual data blocks allocated to a file. For sockets, they provide buffer space which can be used to hold socket specific data.  struct inode  Every file is represented in the kernel as an object of the file structure. It requires an inode provided to it.  struct file
  • 5.
    Struct operations { int(*read)(int, char *, int); void (*destroy_inode)(inode *); void (*dirty_inode) (struct inode *); int (*write_inode) (struct inode *, int); void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); }; Sizeof(operations) = sizeof(function ptr)*6
  • 6.
  • 7.
    User Space Socket, bind,listen, connect, send, recv, write, read etc. Socket Functions (Kernel) sys_socket, sys_bind, sys_listen, sys_connect etc. in socket.c TCP/IP Layer Functions inet_create, tcp_v4_connect, tcp_sendmsg, tcp_recvmsg Ethernet Device Layer dev_hard_start_xmit
  • 8.
    Sys_socket() Sock_create() Sock_map_fd() Allocate asocket object (internally an inode Associated with a file object) Locate the family requested and call the create function for that family Inet_create() Lower layer initialization Sock_alloc_fd() Allocate a file descriptor Sock_attach_fd() Fd_install()
  • 9.
    Sys_connect() Sockfd_lookup_light() Returns the socketobject associated with the given fd Move_addr_to_kernel() For userspace sockaddr * Sock->ops->connect() Lower layer call Tcp_v4_connect()
  • 13.
  • 15.
    Defined in <include/linux/skbuff.h> used by every network layer (except the physical layer)  fields of the structure change as it is passed from one layer to another  i.e., fields are layer dependent.
  • 16.
    struct sk_buff { ...... ... #ifdef CONFIG_NET_SCHED _ _u32 tc_index; #ifdef CONFIG_NET_CLS_ACT _ _u32 tc_verd; _ _u32 tc_classid; #endif #endif } sk_buff is peppered with C preprocessor #ifdef directives. CONFIG_NET_SCHED symbol should be defined at compile time for the structure to have the element tc_index. enabled with some version of make config by an administrator.
  • 17.
     The kernelmaintains all sk_buff structures in a doubly linked list. struct sk_buff_head {/* only the head of the list */ /*These two members must be first. */ struct sk_buff * next; struct sk_buff * prev; _ _u32 qlen; spinlock_t lock;/* atomicity in accessing a sk_buff list. */ };
  • 18.
     Layout  General Feature-specific  Management functions
  • 19.
     struct sock* sk sock data structure of the socket that owns this buffer  unsigned int len includes both the data in the main buffer (i.e., the one pointed to by head) and the data in the fragments  unsigned int data_len unlike len, data_len accounts only for the size of the data in the fragments.  unsigned int truesize skb->truesize = size + sizeof(struct sk_buff);  atomic_t users reference count, or the number of entities using this sk_buff buffer atomic_inc and atomic_dec
  • 20.
     struct sock* sk sock data structure of the socket that owns this buffer  unsigned int len includes both the data in the main buffer (i.e., the one pointed to by head) and the data in the fragments  unsigned int data_len unlike len, data_len accounts only for the size of the data in the fragments.  unsigned int truesize skb->truesize = size + sizeof(struct sk_buff);  atomic_t users reference count, or the number of entities using this sk_buff buffer atomic_inc and atomic_dec
  • 21.
     struct sock* sk sock data structure of the socket that owns this buffer  unsigned int len includes both the data in the main buffer (i.e., the one pointed to by head) and the data in the fragments  unsigned int data_len unlike len, data_len accounts only for the size of the data in the fragments.  unsigned int truesize skb->truesize = size + sizeof(struct sk_buff);  atomic_t users reference count, or the number of entities using this sk_buff buffer atomic_inc and atomic_dec
  • 22.
     struct sock* sk sock data structure of the socket that owns this buffer  unsigned int len includes both the data in the main buffer (i.e., the one pointed to by head) and the data in the fragments  unsigned int data_len unlike len, data_len accounts only for the size of the data in the fragments.  unsigned int truesize skb->truesize = size + sizeof(struct sk_buff);  atomic_t users reference count, or the number of entities using this sk_buff buffer atomic_inc and atomic_dec
  • 23.
     struct sock* sk sock data structure of the socket that owns this buffer  unsigned int len includes both the data in the main buffer (i.e., the one pointed to by head) and the data in the fragments  unsigned int data_len unlike len, data_len accounts only for the size of the data in the fragments.  unsigned int truesize skb->truesize = size + sizeof(struct sk_buff);  atomic_t users reference count, or the number of entities using this sk_buff buffer atomic_inc() and atomic_dec()
  • 24.
    • unsigned char*head • sk_buff_data_t end • unsigned char *data • sk_buff_data_t tail
  • 25.
    struct net_device *dev represents the receiving interface or the to be transmitted device(or interface) corresponding to the packet.  usually represents the virtual device’s(representation of all devices grouped) net_device structure.  Pointers to protocol headers.  sk_buff_data_t transport_header;  sk_buff_data_t network_header;  sk_buff_data_t mac_header;
  • 26.
    updation of datais done using the *_header pointers
  • 27.
     char cb[40] This is a "control buffer," or storage for private information, maintained by each layer for internal use. struct tcp_skb_cb { ... ... ... _ _u32 seq; /* Starting sequence number */ _ _u32 end_seq; /* SEQ + FIN + SYN + datalen*/ _ _u32 when; /* used to compute rtt's */ _ _u8 flags; /*TCP header flags. */ ... ... ... };
  • 28.
    Defined in <include/linux/skbuff.h>& <net/core/skbuff.c> skb_put(struct sk_buff *, usingned int len)
  • 29.
  • 30.
  • 31.
    skb_reserve(struct sk_buff *skb,int len) Each of the above four memory management functions return the data ptr.
  • 32.
    defined in <net/core/skbuff.c> structsk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, int fclone, int node) … size = SKB_DATA_ALIGN(size); data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); …
  • 33.
    struct sk_buff *__netdev_alloc_skb(structnet_device *dev, unsigned int length, gfp_t gfp_mask) The buffer allocation function meant for use by device drivers Executed in interrupt mode Freeing memory: kfree_skb and dev_kfree_skb Release buffer back to the buffer-pool. Buffer released only when skb_users counter is 1. If not, the counter is decremented.
  • 36.
  • 46.
     Defined in<include/linux/netdevice.h>  stores all information specifically regarding a network device  one such structure for each device, both real ones (such as Ethernet NICs) and virtual ones  Network devices can be classified into types such as Ethernet cards and Token Ring cards  Each type may come in several models.  Model specific parameters are initialized by device driver software.  Parameters common for different models are initiated by kernel.
  • 47.
    struct net_device{ char name[IFNAMSIZ]; intifindex; /* device name hash chain, ex: eth0 */ struct hlist_node name_hlist; unsigned long mem_end;/* shared mem end */ unsigned long mem_start; /* shared mem start */ unsigned long base_addr; /* device I/O address */ unsigned int irq; /* device IRQ number*/ unsigned char if_port; /* Selectable AUI,TP,..*/ unsigned char dma; /* DMA channel */ …
  • 48.
    struct net_device{ char name[IFNAMSIZ]; intifindex; /* device name hash chain, ex: eth0 */ struct hlist_node name_hlist; unsigned long mem_end; /* shared mem end */ unsigned long mem_start; /* shared mem start */ unsigned long base_addr; /* device I/O address */ unsigned int irq; /* device IRQ number*/ unsigned char if_port; /* Selectable AUI,TP,..*/ unsigned char dma; /* DMA channel */ …
  • 49.
    struct net_device{ char name[IFNAMSIZ]; intifindex; /* device name hash chain, ex: eth0 */ struct hlist_node name_hlist; unsigned long mem_end;/* shared mem end */ unsigned long mem_start; /* shared mem start */ unsigned long base_addr; /* device I/O address */ unsigned int irq; /* device IRQ number*/ unsigned char if_port; /* Selectable AUI,TP,..*/ unsigned char dma; /* DMA channel */
  • 50.
    struct net_device{ char name[IFNAMSIZ]; /*device name hash chain, ex: eth0 */ struct hlist_node name_hlist; unsigned long mem_end;/* shared mem end */ unsigned long mem_start; /* shared mem start */ unsigned long base_addr; /* device I/O address */ unsigned int irq; /* device IRQ number*/ unsigned char if_port; /* Selectable AUI,TP,..*/ unsigned char dma; /* DMA channel */ unsigned short flags; /* interface flags (a la BSD) */ …
  • 51.
    struct net_device{ char name[IFNAMSIZ]; /*device name hash chain, ex: eth0 */ struct hlist_node name_hlist; unsigned long mem_end;/* shared mem end */ unsigned long mem_start; /* shared mem start */ unsigned long base_addr; /* device I/O address */ unsigned int irq; /* device IRQ number*/ unsigned char if_port; /* Selectable AUI,TP,..*/ unsigned char dma; /* DMA channel */ unsigned short flags; /* interface flags (a la BSD)*/ …
  • 52.
    struct net_device{ char name[IFNAMSIZ]; /*device name hash chain, ex: eth0 */ struct hlist_node name_hlist; unsigned long mem_end;/* shared mem end */ unsigned long mem_start; /* shared mem start */ unsigned long base_addr; /* device I/O address */ unsigned int irq; /* device IRQ number*/ unsigned char if_port; /* Selectable AUI,TP,..*/ unsigned char dma; /* DMA channel */ unsigned short flags; /* interface flags (a la BSD)*/ /* ex : IFF_UP || IFF_RUNNING || IFF_MULTICAST */
  • 53.
    struct net_device{ … unsigned mtu;/* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len; /* hardware hdr length */ unsigned char dev_addr[MAX_ADDR_LEN]; unsigned char addr_len; /* hardware address length */ unsigned char broadcast[MAX_ADDR_LEN]; unsigned int promiscuity; …
  • 54.
    struct net_device{ … unsigned mtu;/* interface MTU value */ unsigned short type; /* interface hardware type*/ unsigned short hard_header_len; /* hardware hdr length */ unsigned char dev_addr[MAX_ADDR_LEN]; unsigned char addr_len; /* hardware address length */ unsigned char broadcast[MAX_ADDR_LEN]; unsigned int promiscuity; …
  • 55.
    struct net_device{ … unsigned mtu;/* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len;/* hardware hdr length */ unsigned char dev_addr[MAX_ADDR_LEN]; unsigned char addr_len; /* hardware address length */ unsigned char broadcast[MAX_ADDR_LEN]; unsigned int promiscuity; …
  • 56.
    struct net_device{ … unsigned mtu;/* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len; /* hardware hdr length */ unsigned char dev_addr[MAX_ADDR_LEN]; unsigned char addr_len; /* hardware address length*/ unsigned char broadcast[MAX_ADDR_LEN]; unsigned int promiscuity; …
  • 57.
    struct net_device{ … unsigned mtu;/* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len; /* hardware hdr length */ unsigned char dev_addr[MAX_ADDR_LEN]; unsigned char addr_len; /* hardware address length */ unsigned char broadcast[MAX_ADDR_LEN]; unsigned int promiscuity; …
  • 58.
    struct net_device{ … unsigned mtu;/* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len; /* hardware hdr length */ unsigned char dev_addr[MAX_ADDR_LEN]; unsigned char addr_len; /* hardware address length */ unsigned char broadcast[MAX_ADDR_LEN]; unsigned int promiscuity; …
  • 59.
    struct net_device{ … struct net_device*next; struct hlist_node name_hlist; struct hlist_node index_hlist;
  • 61.
    We don’t processthe packet in the interrupt subroutine. Netif_rx() – raise the net Rx softIRQ. Net_rx_action() is called - start processing the packet  Processing of packet starts with the protocol switching section
  • 62.
    Netif_receive_skb() is calledto process the packet and find out the next protocol layer. Protocol family of the packet is extracted from the link layer header.
  • 63.
    ip_rcv() is anentry point for IP packets processing. Checks if the packet we have is destined for some other host (using PACKET_OTHERHOST) Check the checksum of the packet by calling ip_fast_csum()
  • 64.
    Call ip_route_input() ,this routine checks kernel routing table rt_hash_table. If packet needs to be forwarded input routine is ip_forward() Otherwise ip_local_deliver() ip_send() is called to check if the packet needs to be fragmented If yes , fragment the packet by calling ip_fragment() Packet output path – ip_finish_output() ip_local_deliver() – packets need to delivered locally
  • 65.
    ip_defrag() Protocol identifier fieldskb->np.iph->protocol (in IP header). ForTCP, we find the receive handler as tcp_v4_rcv() (entry point for theTCP layer)
  • 66.
    _tcp_v4_lookup() – findthe socket to which the packet belongs Establised sockets are maintained in the hash table tcp_ehash. Established socket not found – New connection request for any listening socket Search for listening socket – tcp_v4_lookup_listener() tcp_rcv_established()
  • 67.
    Application read thedata from the receive queue if it issues recv() Kernel routine to read data fromTCP socket is tcp_recvmsg()

Editor's Notes

  • #50 Req_irq and free_irq
  • #51 Req_irq and free_irq
  • #52 Req_irq and free_irq
  • #53 Req_irq and free_irq