KEMBAR78
NSX vSphere Troubleshooting Guide | PDF | Transmission Control Protocol | Computer Network
100% found this document useful (1 vote)
440 views25 pages

NSX vSphere Troubleshooting Guide

The document provides instructions and commands for troubleshooting NSX networking issues from the vSphere Web Client UI and ESXi host command line. It describes checking the status of controllers and host preparation from the UI, and provides many NSX CLI commands for investigating issues on the controllers and compute nodes, including commands to check VIB installation, VXLAN interfaces, logical switches, logical routers, bridging tables and more.

Uploaded by

tdewanjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
440 views25 pages

NSX vSphere Troubleshooting Guide

The document provides instructions and commands for troubleshooting NSX networking issues from the vSphere Web Client UI and ESXi host command line. It describes checking the status of controllers and host preparation from the UI, and provides many NSX CLI commands for investigating issues on the controllers and compute nodes, including commands to check VIB installation, VXLAN interfaces, logical switches, logical routers, bridging tables and more.

Uploaded by

tdewanjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

NSX vSphere Web UI

Before jumping into the marvelous world of command lines, as a starter, well check the
state of the environment from the vSphere Web Client UI standpoint.
Authenticate to your web client, and click on Network & Security > Installation >
Management .

You should see a green status for your three controllers.


Next click on Network & Security > Installation > Host Preparation and open up
each cluster.

All the nodes are also green.


Now click on Network & Security > Installation > Logical Network
Preparation and open up each cluster

Each compute node should have a Virtual Tunnel Endpoint (VTEP) vmkernel interface
(vmk3 here) with an IP Address assigned to it.
Dont worry if you get any errors, this article is meant to help you troubleshoot the root
cause.

Transport Network
If VXLAN Connectivity isnt operational, I mean if a VM on a VXLAN cannot ping another
one on the same logical switch the most common reason is a misconfiguration on the
transport network. To check that, SSH to a Compute node and type :
ping ++netstack=vxlan -d -s 1572 -I vmk3 1.2.3.4
++netstack=vxlan instruct the ESXi host to use the VXLAN TCP/IP stack.
-d set Dont Fragment bit on IPv4 packet
-s 1572 set packet size to 1572 to check if MTU is correctly setup up to 1600
-I VXLAN vmkernel interface name
1.2.3.4 Destination ESXi host IP Address

If the ping fails, launch another one without the dont fragment/size argument set
ping ++netstack=vxlan -I vmk3 1.2.3.4

If this one succeed, it means your MTU isnt correctly set to at least 1600 on your transport
network.
If both fails its a VLAN ID or Uplink misconfiguration. Before going any further you have to
make sure that these pings works.
If both succeed, but you still dont have connectivity on the virtual wire, Ill show you, in the
Compute node controller connectivity section, how to investigate that using net-vdl2 -l .
Note: If you dont know the name of your VXLAN vmkernel you can easily check it, by
looking at the configuration of your VDS.

But youve also seen that information in the Logical Network Preparation UI above.

Controller
You can get the IP Address of your Controller by clicking on the VM
named NSX_Controller_<ID> in the vSphere Web Client.

To investigate controller issues, SSH to one of your controller VM to use the CLI (login:
admin, password: the one set at deployment time).
status
# show control-cluster status
Type
Status
Since
-------------------------------------------------------------------------------Join status:
Join complete
09/14 14:08:46
Majority status: Connected to cluster majority
09/18 08:45:16
Restart status:
This controller can be safely restarted
09/18 08:45:06
Cluster ID:
b20ddc88-cd62-49ad-b120-572c23108520
Node UUID:
b20ddc88-cd62-49ad-b120-572c23108520
Role
Configured status Active status
-------------------------------------------------------------------------------api_provider
enabled
activated
persistence_server enabled
activated
switch_manager
enabled
activated
logical_manager
enabled
activated
directory_server enabled
activated

List all the nodes in the cluster.


# show control-cluster startup-nodes
192.168.110.201, 192.168.110.202, 192.168.110.203

List the implemented role on your controller, the Not Configured for api_provider is
normal, its the NSX-Manager whos published the NSX-v API.
# show control-cluster roles
Listen-IP Master? Last-Changed Count
api_provider
Not configured
Yes 09/18 08:45:17
6
persistence_server
N/A
Yes 09/18 08:45:17
5
switch_manager
127.0.0.1
Yes 09/18 08:45:17
6
logical_manager
N/A
Yes 09/18 08:45:17
6
directory_server
N/A
Yes 09/18 08:45:17
6

List current connections to your controller.

# show control-cluster connections


role
port
listening open conns
-------------------------------------------------------api_provider
api/443
Y
1
-------------------------------------------------------persistence_server server/2878
Y
2
client/2888
Y
3
election/3888 Y
0
-------------------------------------------------------switch_manager
ovsmgmt/6632 Y
0
openflow/6633 Y
0
-------------------------------------------------------system
cluster/7777 Y
2

Get Controller Statistics


# show control-cluster core stats
role
port
listening open conns
-------------------------------------------------------api_provider
api/443
Y
1
-------------------------------------------------------persistence_server server/2878
Y
2
client/2888
Y
3
election/3888 Y
0
-------------------------------------------------------switch_manager
ovsmgmt/6632 Y
0
openflow/6633 Y
0
-------------------------------------------------------system
cluster/7777 Y
2

Controller networking
# show network interface
Interface
breth0
eth0

Address/Netmask
MTU
192.168.110.201/24 1500
1500 UP

Admin-Status Link-Status
UP
UP
UP

# show network default-gateway


# show network dns-servers

NTP is mandatory, so make sure its correctly configured


# show network ntp-servers
# show network ntp-status

To troubleshoot controller networking you can also use

#
#
#
#

traceroute <ip_address or dns_name>


ping <ip address>
ping interface addr <alternate_src_ip> <ip_address>
watch network interface breth0 traffic

L2 networking troubleshooting
First make sure to connect on the master controller of the virtual network you want to
troubleshoot. You can then use the following commands
# show control-cluster logical-switches vni 5001
VNI
Controller
BUM-Replication ARP-Proxy Connections VTEPs
5001
192.168.110.201 Enabled
Enabled 0
0

As you can see above, you should now connect to 192.168.110.201 to troubleshoot VNI
5001.
Let see what we can use on that controller node to get more information on this virtual wire
(VNI = 5001).
If you want to check the managemement TCP connection between Controller and ESXi
# show control-cluster logical-switches connection-table 5001
Host-IP
Port ID
192.168.110.51 17528 2
192.168.110.52 46026 3
192.168.210.56 42257 4
192.168.210.51 30969 5
192.168.210.57 12127 6
192.168.210.52 30280 7

To see which ESXi instantiate this Logical Switch (LS).


Note: Mac address in the output are the one from the VXLAN vmkernel interface not the
one from the Physical uplink.
# show control-cluster logical-switches vtep-table 5001
VNI
IP
Segment
MAC
Connection-ID
5001
192.168.250.54 192.168.250.0 00:50:56:6c:20:30 4
5001
192.168.250.53 192.168.250.0 00:50:56:60:24:53 6
5001
192.168.250.52 192.168.250.0 00:50:56:61:23:00 7
5001
192.168.250.51 192.168.250.0 00:50:56:6b:4b:a4 5
5001
192.168.150.51 192.168.150.0 00:50:56:60:6a:3a 2
5001
192.168.150.52 192.168.150.0 00:50:56:6e:5e:e3 3

List Mac addresses on a Logical Switch.


# show control-cluster logical-switches mac-table 5001
VNI
MAC
VTEP-IP
Connection-ID
5001
00:50:56:ae:9b:be 192.168.250.51 5

Same for ARP table on the Logical Switch

# show control-cluster logical-switches arp-table 5001

List the VNIs on a specific ESXi


# show control-cluster logical-switches joined-vnis <ESXi_MGT_IP>
VNI
Controller
BUM-Replication ARP-Proxy Connections VTEPs
5001
192.168.110.201 Enabled
Enabled 6
6

Shows ESXi VTEP IP/Mac addresses for each joined VNIs


# show control-cluster logical-switches vtep-records <ESXi_MGT_IP>
VNI
IP
Segment
MAC
Connection-ID
5001
192.168.150.51 192.168.150.0 00:50:56:60:6a:3a 2

List all VMs Mac addresses on each VNIs of a specific ESXi with their associated VTEP.
# show control-cluster logical-switches mac-records <ESXi_MGT_IP>

List all VMs IP/Mac on each each VNIs of a specific ESXi


# show control-cluster logical-switches arp-records <ESXi_MGT_IP>

L3 networking troubleshooting
First you can list all of your logical routers
# show control-cluster logical-routers instance all
LR-Id
LR-Name
Hosts[]
Edge-Connection Service-Controller
1460487509 default+edge-1
192.168.110.51
192.168.110.201
192.168.110.52
192.168.210.52
192.168.210.51
192.168.210.57
192.168.210.56

You can then use the LR-Id above to get interface details on one instance
# show control-cluster logical-routers interface-summary 1460487509
Interface
Type Id
IP[]
570d45550000000b
vlan 100
570d45550000000c
vxlan 5004
10.10.10.1/24
570d45550000000a
vxlan 5000

Use the Interface name to get even more details on VXLAN 5004 LIF for example
# show control-cluster logical-routers interface 1460487509
570d45550000000c
Interface-Name: 570d45550000000c
Logical-Router-Id:1460487509
Id:
5004
Type:
vxlan
IP:
10.10.10.1/24
DVS-UUID:
1cec0e50-029c-a921-b6d8-d0fc73e57969

ee660e50-e861-6d04-b4d8-1d462df952bc
Mac:
02:50:56:8e:21:35
Mtu:
1500
Multicast-IP:
0.0.0.1
Designated-IP:
Is-Sedimented: false
Bridge-Id:
Bridge-Name:

To get the routing table of your logical router


# show control-cluster logical-routers routes 1460487509

Bridging
To get more information on a all bridge instance hosted on your logical router
# show control-cluster logical-routers bridges <lr-id> all
LR-Id
Bridge-Id Host
Active
1460487509 1
192.168.110.52 true

And now the Mac address on them


# show control-cluster logical-routers bridge-mac <lr-id> all
LR-Id
Bridge-Id Mac
Vlan-Id Vxlan-Id Port-Id Source
1460487509 1
00:50:56:ae:9b:be 0
5000
50331650 vxlan

Compute Nodes
In the introduction weve seen how to send ping on the transport network. But from a SSH
connection to an ESXi node, we can use many more troubleshooting commands. This
section will details most of them.
VIBs
When you prepare a compute host for NSX-v, vSphere Installation Bundle (VIBs) are
automatically installed by the NSX Manager via ESX Agency Manager (EAM). You can
check they were correctly installed (output abridged):
# esxcli software vib get --vibname esx-vxlan
...
Summary: Vxlan and host tool
Description: This package loads module and configures firewall for vxlan
networking.
...
Provides: vxlan = 2.0.0.0-nsx, vdr = 1.0.0.0
Maintenance Mode Required: False
...
# esxcli software vib get --vibname esx-vsip

...
Summary: vsip module
Description: This package contains DFW and NetX data and control plane
components.
...
Provides: vsip = 1.0.0-0
Maintenance Mode Required: False
...
# esxcli software vib get --vibname esx-dvfilter-switch-security
...
Summary: dvfilter-switch-security module
Description: This package contains dvfilter-switch-security module.
...
Provides: switchSecurity = 0.1.0.0
Maintenance Mode Required: False
...

When you remove a host from an NSX Prepared cluster the VIBs will be automatically
removed, but you can remove them from the command line :
# esxcli software vib remove -n esx-vsip
# esxcli software vib remove -n esx-vxlan
# esxcli software vib remove -n esx-dvfilter-switch-security

Youll then have to reboot your host.


Physical Nics
To list all the host physical interface, to check driver and MTU:
# esxcli network nic list
Name PCI Device
Driver Link Speed Duplex MAC Address
Description
------ ------------- ------ ---- ----- ------ ----------------- ---- -----------------------------------------vmnic0 0000:000:14.0 igb
Up
1000 Full 00:25:90:f4:76:8e
Intel Corporation Ethernet Connection I354
vmnic1 0000:000:14.1 igb
Up
1000 Full 00:25:90:f4:76:8f
Corporation Ethernet Connection I354
vmnic2 0000:000:14.2 igb
Up
1000 Full 00:25:90:f4:76:90
Intel Corporation Ethernet Connection I354
vmnic3 0000:000:14.3 igb
Up
1000 Full 00:25:90:f4:76:91
Intel Corporation Ethernet Connection I354

To get more details on a Nic


# esxcli network nic get -n vmnic0
Advertised Auto Negotiation: true

MTU

1600
1600 Intel
1600
1600

Advertised Link Modes: 10baseT/Half, 10baseT/Full, 100baseT/Half,


100baseT/Ful 1000baseT/Full
Auto Negotiation: true
Cable Type: Twisted Pair
Current Message Level: 7
Driver Info:
Bus Info: 0000:00:14.0
Driver: igb
Firmware Version: 0.0.0
Version: 5.2.5
Link Detected: true
Link Status: Up by explicit linkSet
Name: vmnic0
PHYAddress: 0
Pause Autonegotiate: true
Pause RX: false
Pause TX: false
Supported Ports: TP
Supports Auto Negotiation: true
Supports Pause: true
Supports Wakeon: true
Transceiver: internal
Wakeon: MagicPacket(tm)

To check TCP segmentation offload (TSO) and checksum offload (CSO) settings for a Nic
# esxcli network nic tso get -n vmnic0
# esxcli network nic cso get -n vmnic0

If you see some strange behavior while using LACP, you can use the following commands
to leave only one interface up to verify if LACP negotiation is reponsible for your issues:
# esxcli network nic down -n vmnic1
# esxcli network nic down -n vmnic2
# esxcli network nic down -n vmnic3

To revert it:
# esxcli network nic up -n vmnic1
# esxcli network nic up -n vmnic2
# esxcli network nic up -n vmnic3

VMs
Get a list of all VMs on the compute node
# esxcfg-vswitch -l
Switch Name
Num Ports Used Ports Configured Ports MTU
vSwitch0
1536
1
128
1500
PortGroup Name

VLAN ID Used Ports Uplinks

Uplinks

VM Network

DVS Name
Num Ports Used Ports Configured Ports MTU
Uplinks
Mgmt_Edge_VDS 1536
13
512
1600 vmnic1,vmnic0
DVPort ID
In Use
Client
897
1
vmnic0
767
1
vmk2
639
1
vmk1
510
1
vmk0
895
0
905
1
vmk3
907
1
vmnic1
506
1
NSX_Controller_c6aea614-0dc7-40fd-b6460230608d4709.eth0
497
1
dr-4-bridging-0.eth0
127
1
br-sv-01a.eth0

VMkernel Port
# esxcfg-vmknic -l
Interface Port Group/DVPort IP Family IP Address
Netmask
Broadcast
MAC Address
MTU
TSO MSS Enabled Type
vmk0
510
IPv4
192.168.110.52
255.255.255.0 192.168. 110.255 00:50:56:09:08:3c 1500 65535
true
STATIC
vmk1
639
IPv4
10.10.20.52
255.255.255.0
10.10.20 .255 00:50:56:64:f0:9b 1500 65535
true STATIC
vmk2
767
IPv4
10.10.30.52
255.255.255.0
10.10.30 .255 00:50:56:65:67:8e 1500 65535
true STATIC
vmk3
905
IPv4
192.168.150.52
255.255.255.0 192.168. 150.255 00:50:56:6e:5e:e3 1600 65535
true
STATIC

ARP Table
# esxcli network ip neighbor list
Neighbor
Mac Address
Vmknic Expiry State Type
--------------- ----------------- ------ -------- ----- ------192.168.110.202 00:50:56:8e:52:25 vmk0
674 sec
Unknown
192.168.110.42 00:50:56:09:45:60 vmk0 1196 sec
Unknown
192.168.110.10 00:50:56:03:00:2a vmk0 1199 sec
Unknown
192.168.110.203 00:50:56:8e:7a:a4 vmk0
457 sec
Unknown
192.168.110.201 00:50:56:8e:ea:bd vmk0
792 sec
Unknown
192.168.110.22 00:50:56:09:11:07 vmk0 1146 sec
Unknown
10.10.20.60
00:50:56:27:49:6b vmk1
506 sec
Unknown

VXLAN
Dump VXLAN configuration

# esxcli network vswitch dvs vmware vxlan list


VDS ID
VDS Name
MTU Segment ID
Gateway IP
Gateway MAC
Network Count Vmknic Count
----------------------------------------------- ------------- ---- ------------- ----------------------------- ------------- -----------1c ec 0e 50 02 9c a9 21-b6 d8 d0 fc 73 e5 79 69 Mgmt_Edge_VDS 1600
192.168.150.0 192.168.150.2 00:50:56:27:48:7d
2
1

List VXLAN networks, great to see whos the master controller for each VXLAN.
# esxcli network vswitch dvs vmware vxlan network list --vdsname=<VDS_NAME>
VXLAN ID Multicast IP
Control Plane
Controller
Connection Port Count MAC Entry Count ARP Entry Count MTEP Count
-------- ------------------------- ----------------------------------- --------------------- ------------------------ --------------- ---------5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.202 (up)
1
1
0
0
5004 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.203 (up)
1
0
0
0

Get more details on a specific VXLAN wire


# esxcli network vswitch dvs vmware vxlan network mac list --vdsname=Mgmt_Edge_VDS --vxlan-id=<VXLAN ID>
Inner MAC
Outer MAC
Outer IP
Flags
----------------- ----------------- -------------- -------00:50:56:ae:9b:be ff:ff:ff:ff:ff:ff 192.168.250.51 00001111

But you can also get specific information on the VXLAN like mac, arp, port, mtep and stats
like this
For example to list all the remote Mac addresses, pushed by the controller for a specific
VNI:
# esxcli network vswitch dvs vmware vxlan network mac list -vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>

You can get also the remote IP Addresses/Mac Addresses that are still in the local ESXi
cache. They timeout after 5 if no traffic.
# esxcli network vswitch dvs vmware vxlan network arp list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>

To get a list of remote known ESXi for a VNI. Youll also see whos the MTEP on each
Transport Network subnet.
# esxcli network vswitch dvs vmware vxlan network vtep list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>

Note: if you dont get the vtep argument but he mtep one intead, just
run /etc/init.d/hostd restart

# esxcli network vswitch dvs vmware vxlan network port list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>
# esxcli network vswitch dvs vmware vxlan network stats list --vdsname=<VDS_NAME> --vxlan-id=<VXLAN_ID>

Controller connectivity
To check Controller connectivity from ESXi (VDL= Virtual Distributed Layer 2)
# net-vdl2 -l
VXLAN Global States:
Control plane Out-Of-Sync:
No
UDP port:
8472
VXLAN VDS:
Mgmt_Edge_VDS
VDS ID: 1c ec 0e 50 02 9c a9 21-b6 d8 d0 fc 73 e5 79 69
MTU: 1600
Segment ID:
192.168.150.0
Gateway IP:
192.168.150.2
Gateway MAC: 00:50:56:27:48:7d
Vmknic count: 1
VXLAN vmknic: vmk3
VDS port ID: 905
Switch port ID: 50331656
Endpoint ID: 0
VLAN ID:
0
IP:
192.168.150.52
Netmask:
255.255.255.0
Segment ID:
192.168.150.0
IP acquire timeout:
0
Multicast group count: 0
Network count: 2
VXLAN network: 5000
Multicast IP: N/A (headend replication)
Control plane: Enabled (multicast proxy,ARP proxy)
Controller:
192.168.110.202 (up)
MAC entry count:
1
ARP entry count:
0
Port count:
1
VXLAN network: 5004
Multicast IP: N/A (headend replication)
Control plane: Enabled (multicast proxy,ARP proxy)
Controller:
192.168.110.203 (up)
MAC entry count:
0
ARP entry count:
0
Port count:
1

Or

# esxcli network vswitch dvs vmware vxlan network list -vds-name <vds
name>
VXLAN ID Multicast IP
Control Plane
Controller
Connection Port Count MAC Entry Count ARP Entry Count MTEP Count
-------- ------------------------- ----------------------------------- --------------------- ------------------------ --------------- ---------5000 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.202 (up)
1
1
0
0
5004 N/A (headend replication) Enabled (multicast proxy,ARP proxy)
192.168.110.203 (up)
1
0
0
0

If you see a controller down message above, you can fix it by restarting netcpa like this
# /etc/init.d/netcpad restart

Note: netcpa is a user world agent that communicate thru SSL with the NSX Controller.
To check ESXi controller connections.
# esxcli network ip connection list| grep tcp | grep 1234
tcp
0
0 192.168.110.52:43925 192.168.110.203:1234
ESTABLISHED
44923 newreno netcpa-worker
tcp
0
0 192.168.110.52:46026 192.168.110.202:1234
ESTABLISHED
46232 newreno netcpa-worker
tcp
0
0 192.168.110.52:39244 192.168.110.201:1234
ESTABLISHED
44923 newreno netcpa-worker

As you can see, your compute node is connected to all Controllers.


Logical Router
First get a list of distributed router (VDR) instances
# net-vdr --instance -l
VDR Instance Information :
--------------------------VDR Instance:
default+edge-1:1460487509
Vdr Name:
default+edge-1
Vdr Id:
1460487509
Number of Lifs:
3
Number of Routes:
1
State:
Enabled
Controller IP:
192.168.110.201
Control Plane Active:
Yes
Control Plane IP:
192.168.110.52
Edge Active:
Yes

Dump all the logical interfaces (LIFs) for a VDR instance

# net-vdr --lif -l default+edge-1


VDR default+edge-1:1460487509 LIF Information :
Name:
570d45550000000c
Mode:
Routing, Distributed, Internal
Id:
Vxlan:5004
Ip(Mask):
10.10.10.1(255.255.255.0)
Connected Dvs:
Mgmt_Edge_VDS
VXLAN Control Plane: Enabled
VXLAN Multicast IP: 0.0.0.1
State:
Enabled
Flags:
0x2288
Name:
570d45550000000b
Mode:
Bridging, Sedimented, Internal
Id:
Vlan:100
Bridge Id:
mybridge:1
Ip(Mask):
0.0.0.0(0.0.0.0)
Connected Dvs:
Mgmt_Edge_VDS
Designated Instance: No
DI IP:
192.168.110.51
State:
Enabled
Flags:
0xd4
Name:
570d45550000000a
Mode:
Bridging, Sedimented, Internal
Id:
Vxlan:5000
Bridge Id:
mybridge:1
Ip(Mask):
0.0.0.0(0.0.0.0)
Connected Dvs:
Mgmt_Edge_VDS
VXLAN Control Plane: Enabled
VXLAN Multicast IP: 0.0.0.1
State:
Enabled
Flags:
0x23d4

Check routing status


# net-vdr -R -l default+edge-1
VDR default+edge-1:1460487509 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject]
Destination
GenMask
Interface
-----------------------

Gateway
-----

Flags

Ref Origin UpTime

--- ------ ------

---------

10.10.10.0
255.255.255.0
570d45550000000c

0.0.0.0

UCI

1 MANUAL 410777

ARP information
# net-vdr --nbr -l default+edge-1
VDR default+edge-1:1460487509 ARP Information :
Legend: [S: Static], [V: Valid], [P: Proxy], [I: Interface]
Legend: [N: Nascent], [L: Local], [D: Deleted]
Network
Mac
Flags
Expiry
SrcPort Interface Refcnt
------------------- --------- --------- -----10.10.10.1
02:50:56:56:44:52 VI
permanent 0
570d45550000000c 1

Designated instance statistics


# net-vdr --di stats
VDR Designated Instance Statistics:
RX Pkts:
0
RX Bytes:
0
TX Pkts:
0
TX Bytes:
0
ARP Requests:
0
ARP Response:
0
ARP Resolved:
0
Err RX:
0
Err TX:
0
Err Message too Small:
0
Err Message too Big:
0
Err Invalid Version:
0
Err Invalid Message Type:
0
Err Instance Not Found:
0
Err LIF not DI:
0
Err No memory:
0
Err ARP not found:
0
Err Proxy not found:
0
Err DI Remote:
0

Bridging
Dump bridge info
# net-vdr --bridge -l <vdrName>
VDR default+edge-1:1460487509 Bridge Information :

Bridge config:
Name:id
mybridge:1
Portset name:
DVS name:
Mgmt_Edge_VDS
Ref count:
2
Number of networks: 2
Number of uplinks: 0
Network 'vlan-100-type-bridging' config:
Ref count:
2
Network type:
1
VLAN ID:
100
VXLAN ID:
0
Ageing time:
300
Fdb entry hold time:1
FRP filter enable: 1
Network port '50331655' config:
Ref count:
2
Port ID:
0x3000007
VLAN ID:
4095
IOChains installed: 0
Network 'vxlan-5000-type-bridging' config:
Ref count:
2
Network type:
1
VLAN ID:
0
VXLAN ID:
5000
Ageing time:
300
Fdb entry hold time:1
FRP filter enable: 1
Network port '50331655' config:
Ref count:
2
Port ID:
0x3000007
VLAN ID:
4095
IOChains installed: 0

Lists MAC table, learnt on both VXLAN and VLAN sides


# net-vdr -b --mac default+edge-1
VDR default+edge-1:1460487509 Bridge Information :
Network 'vlan-100-type-bridging' MAC address table:
MAC table on PortID:
0x0
MAC table paging mode:
0
Single MAC address enable:
0

Single MAC address:


00:00:00:00:00:00
MAC table last entry shown:
00:50:56:91:5e:93 VLAN-VXLAN: 100-0 Port:
50331661
total number of MAC addresses: 1
number of MAC addresses returned: 1
MAC addresses:
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
------------------- ------------ ------- -------- ---------------- --00:50:56:91:5e:93 Dynamic
100
0
50331661 0

Network 'vxlan-5000-type-bridging' MAC address table:


MAC table on PortID:
0x0
MAC table paging mode:
0
Single MAC address enable:
0
Single MAC address:
00:00:00:00:00:00
MAC table last entry shown:
00:50:56:ae:9b:be VLAN-VXLAN: 0-5000
Port: 50331650
total number of MAC addresses: 1
number of MAC addresses returned: 1
MAC addresses:
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
------------------- ------------ ------- -------- ---------------- --00:50:56:ae:9b:be Dynamic
0
5000
50331650 0

Dump statistics (output not shown)


# net-vdr -b --stats default+edge-1

Distributed Firewall
To investigate the Distributed Firewall Rules applied to a Virtual Machine vNic, first get the
VM UUID (vcUuid) for it using (output abridged)
# summarize-dvfilter
..
world 230869 vmm0:br-sv-01a vcUuid:'50 11 97 6c fa 73 a7 5b-a7 0e 36 1f a2
f5 84 38'
..

Now find the filter name for that VM UUID


# vsipioctl getfilters
Filter Name
VM UUID
VNIC Index
Service Profile

: nic-230869-eth0-vmware-sfw.2
: 50 11 97 6c fa 73 a7 5b-a7 0e 36 1f a2 f5 84 38
:0
: --NOT SET--

Use the filter name above to list the associated Distributed Firewall Rules

# vsipioctl getrules -f nic-230869-eth0-vmware-sfw.2

To details Addresses Sets


# vsipioctl getaddrsets -f nic-230869-eth0-vmware-sfw.2

Packet Capture
vSphere 5 offers a new command, pktcap-uw to capture packet at different level of the
processing.

You can get look at all possibilities


pktcap-uw -h |more

As you can see on the diagram above, we can now capture traffic at the vmnic , vmknic ,
vnic level.
Let see how it works from the outside world to the VM. Im not going to include the ouput of
the command here, I advice you to try on your hosts instead. By the way I also advice to
save the output to a file in pcap format with -o ./save.pcap , youll then be able to open it
from Wireshark.
uplink/vmnic
You can open up the DVUplinks section of your VDS to get the name of your uplink
interface. Here well be using vmnic0 . So to capture packets received on this uplink, use
pktcap-uw --uplink vmnic0

By default it will only capture received traffic (RX), to capture packets sent on the uplink to
the outside world use the --capture argument like this
pktcap-uw --uplink vmnic0 --capture UplinkSnd

Well details all the filtering options at the end of this section, but in the meantime you can
for example filter out only ICMP packet received on a specific destination by using -proto 0x01 and --destip <ip>

pktcap-uw uplink vmnic0 proto 0x01 dstip <IP>

Or to capture ICMP Packets that are sent on vmnic0 from an IP Address 192.168.25.113
pktcap-uw --uplink vmnic0 --capture UplinkSnd proto 0x01 --srcip
192.168.25.113

You can also capture ARP packets


pktcap-uw --uplink vmnic0 ethtype 0x0806

vmknic - Virtual Adapters


Capture packets reaching vmknic adapter is also possible, just use --vmk argument.
pktcap-uw --vmk vmk0

Switchport
To capture on a specific switchport, you first have to get the ID of the port. Launch
# esxtop

Type n , to get a list of all the ports with the corresponding attachment. Take note of Port ID
of the port youre interested in and use a --switchport argument like this
pktcap-uw switchport <port-id> -proto 0x01

Traffic direction
For switchport , vmk , uplink , dvfilter , direction of traffic is specified using -dir 0 for inbound and --dir 1 for outbound but inbound is assumed.

0- Rx (Default)
1- Tx

So dont be surprised, pktcap-uw doesnt work like tcpdump and by default only capture the
received (RX) traffic. Dont forget to change that if necessary by specifying --dir 1 , it will
switch the capture to the Transmit (Tx) direction.
Argument and Filtering Options
-o save.pcap to save capture to a file in pcap format
-c 25 capture only 25 packets
-vxlan <segment id> to specify VXLAN VNI of flow
--vlan <VLANID> filter for VLAN ID
-ip <x.x.x.x> filter for SRC or DST
--srcmac <xx:xx:xx:xx:xx> filter for source mac address
--dstmac <xx:xx:xx:xx:xx> filter for source mac address
--srcip <x.x.x.x[/<range>]> filter for source IP

--dstip <x.x.x.x[/<range>]> filter for destination IP


--dstport <DSTPORT> to specify a TCP destination Port
--srcport <SRCPORT> to specify a TCP source Port
--tcpport <PORT> filter for source or destination Port
--proto 0x<IPPROTYPE> filter on hexadecimal protocol id: 0x01 for ICMP, 0x06 for TCP,

0x11 for UDP.list here


--ethtype 0x<ETHTYPE> filter on ethernet type, 0x0806 for ARP

Decoding capture
Weve shown you how to save the captured packets to a file, to get a quick overview of the
kind of traffic passing by, you can decode the pcap using tcpdump like this
tcpdump-uw -r save.pcap

But using Wireshark will give you a better vision of the traffic, with all the details.
Tracing
If you are interested in seeing even more details on the processing of the packet through
the ESXi TCP/IP stack, just add --trace argument to see packet traversing the ESXi
network stack. Looks for Drop message that indicate something went wrong in the
processing.
Drops
When things dont work as you expect, one really usefull command is
pktcap-uw --capture Drop

You should see here some errors like VLAN Mismatch or something else that will give you
a hint about why traffic isnt flowing as you would expect.
DVFilter
This command captures packets as seen by the dvfilter (before the filtering happens)
pktcap-uw --capture PreDVFilter --dvfilterName <filter name>

This command captures packets after being subject to the dvfilter.


pktcap-uw --capture PostDVFilter --dvfilterName <filter name>

Capture point
You can get a list of all possible capture point with -A
pktcap-uw -A
Supported capture points:
1: Dynamic -- The dynamic inserted runtime capture point.
2: UplinkRcv -- The function that receives packets from uplink dev
3: UplinkSnd -- Function to Tx packets on uplink
4: Vmxnet3Tx -- Function in vnic backend to Tx packets from guest
5: Vmxnet3Rx -- Function in vnic backend to Rx packets to guest

6: PortInput -- Port_Input function of any given port


7: IOChain -- The virtual switch port iochain capture point.
8: EtherswitchDispath -- Function that receives packets for switch
9: EtherswitchOutput -- Function that sends out packets, from switch
10: PortOutput -- Port_Output function of any given port
11: TcpipDispatch -- Tcpip Dispatch function
12: PreDVFilter -- The DVFIlter capture point
13: PostDVFilter -- The DVFilter capture point
14: Drop -- Dropped Packets capture point
15: VdrRxLeaf -- The Leaf Rx IOChain for VDR
16: VdrTxLeaf -- The Leaf Tx IOChain for VDR
17: VdrRxTerminal -- Terminal Rx IOChain for VDR
18: VdrTxTerminal -- Terminal Tx IOChain for VDR
19: PktFree -- Packets freeing point

Let me share with you a little bit more details about some of them.
PortOutput show traffic delivered from the vSwitch to the Guest when used with switch

port or to the physical adapter if used with a physical adapter


VdrRxLeaf - Capture packets at the receive leaf I/O chain of a dynamic router in VMware
NSX. Use this capture point together with the lifID option
VdrRxTerminal - Capture packets at the receive terminal I/O chain of a dynamic router in
VMware NSX. Use this capture point together with the lifID option
VdrTxLeaf - Capture packets at the transmit leaf I/O chain of a dynamic router in VMware
NSX. Use this capture point together with the lifID option
VdrTxTerminal - Capture packets at the transmit terminal I/O chain of a dynamic router in
VMware NSX. Use this capture point together with the lifID option
`
CTRL-D
Never press CTRL-D to interupt a running packet capture or youll be left with a background
process still running. If youve done it you can kill it like this
kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)

Then check it was killed


lsof |grep pktcap-uw |awk '{print $1}'| sort -u

NSX Edge CLI


NSX Edge offers lots of Layer 4 to Layer 7 services, to name a few :

VPN

SSL-VPN

LB

FW

DHCP Relay

Lets now details the command line interface available from a SSH connection to an NSX
Edge
show ip route
show ip ospf neighbor
show ip ospf database
show
show
show
show
show
show
show
show
show
show
show

configuration {ospf|bgp|isis|static-routing}
configuration {firewall|nat|dhcp|dns}
configuration {loadbalancer|ipec|sslvpn-plus}
interface [IFNAME]
firewall
ip {route|ospf|bgp|forwarding}
arp
system {cpu|memory|network-stats|storage|uptime}
service {dhcp|dns|highavailability|ipsec|loadbalancer|sslvpn-plus}
log {follow|reverse}
floatable

But one of the most convenient one is the following which enable you to dump all
ingress/egress traffic on a specific edge interface
debug packet display interface vNic_0

Youll then get a live display of whats flowing on that virtual network interface:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vNic_0, link-type EN10MB (Ethernet), capture size 65535 bytes
22:43:44.190770 IP 172.16.16.3 > 224.0.0.5: OSPFv2, Hello, length 48
22:43:45.914868 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [S], seq
1790616868, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 3],
length 0
22:43:45.915713 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [S.], seq
841084052, ack 1790616869, win 29200, options [mss
1460,nop,nop,sackOK,nop,wscale 6], length 0
22:43:45.915750 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [.], ack 1,
win 1825, length 0
22:43:45.915990 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [P.], seq
1:101, ack 1, win 1825, length 100
22:43:45.916497 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [.], ack
101, win 457, length 0
22:43:45.922654 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [P.], seq
4381:4434, ack 101, win 457, length 53
22:43:45.922674 IP 192.168.2.20.8080 > 172.16.16.1.17773: Flags [F.], seq
4434, ack 101, win 457, length 0
22:43:45.922842 IP 172.16.16.1.17773 > 192.168.2.20.8080: Flags [.], ack 1,
win 1825, options [nop,nop,sack 1 {4381:4434}], length 0

You can use the same filtering syntax as the one used by tcpdump, for example :
debug packet display interface vNic_0 icmp

If you have multiple words for your filter, just add underscore between them
debug packet display interface vNic_0 dst_port_443

Logs
Controller Logs
Check ESXi connectivity issues from the Controller
show log cloudnet/cloudnet_java-vnet-controller.<start-time-stamp>.log

NSX Manager
SSH (l: admin) to the NSX Manager and use the following command to access logs
nsxmgr-l-01a> show manager log follow

You can switch over to a unix shell using


nsxmgr-l-01a> enable
Password: <your NSX-Mgr pwd>
nsx_manager# st e
Password: <ASK NSX SUPPORT FOR PASSWORD>
[root@nsxmgr-l-01a ~]#

ESXi Logs
/var/log/esxupdate.log check this file if you have VIB installation issues
/var/log/vmkernel.log Distributed Firewall logs are sent to this file
/var/log/netcpa.log User World Agent logs

EAM Logs
EAM logs should be checked when installation of the VIBs module fails.
/storage/log/vmware/vpx/eam.log on Linux vCenter
ProgramData/VMware/VMware VirtualCenter/Logs/ on Windows vCenter

Advanced troubleshooting tips & tricks


If you want to troubleshoot your User World Agent, you can increase the netcpa log level
like this:
Start by stopping the daemon
# /etc/init.d/netcpad stop

Enable write permisions on netcpas config file:


# chmod +wt /etc/vmware/netcpa/netcpa.xml

Increase log level:


# vi /etc/vmware/netcpa/netcpa.xml

Change the XMLs /config/log/level value to verbose, save and restart netcpad
# /etc/init.d/netcpad start

You might also like