EtherNet/IP and Logix troubleshooting
Application Note: EtherNet/IP System Troubleshooting (A59533568)
Revision history: 8 December, 2003 - Duplicate IP Address Add comment that work is in progress within ODVA EtherNet/IP to examine a standard mechanism to detect and defense. - Network trace Removed reference to vendor software products - UDP Statistics Clarify enhancement request. - Change disposition of this document from Internal to Global access. 2 October, 2003 Initial release
Purpose of Application Note
When troubleshooting any EtherNet/IP system, you must have a logical order to troubleshooting. The order for each troubleshooting case is dependent on the details for that case. The purpose of this document is to list, in order of priority, troubleshooting steps. This paper should make you aware of two things: 1. What/where should I look? slow PC or slow application running on the PC node configuration (IP address, etc.) congested network (lots of traffic such as broadcast) slow network (satellite or frame relay) misconfigured switch or router Logix controller resources - controller processing capability (5550, 5555, 5563)
1 of 13
EtherNet/IP and Logix troubleshooting
- timeslice for communications - cached message queue (32 max) - unconnected outgoing buffers (40 max) - etc. insufficient processing capability in a ENBT module duplicate IP addresses defective Ethernet network hardware (e.g. cable, switch port, or ENBT module) web server diagnostics or RSLinx diagnostics
2. When all logical troubleshooting in step 1 above is not helpful, consider noise.
2 of 13
EtherNet/IP and Logix troubleshooting
Scope
This paper will result in one of the following: Problem identified and solution implemented Error conditions identified for report to Rockwell Tech Support Additionally, there are potentially many possible troubleshooting scenarios. In general, there are three types of problems: It does not work at all Example: An I/O node is not connected to a switch (missing cable) Example: Cannot ping a node. Example: All MSG instruction to a specific 1756-ENBT fail. It works, but too slow Example: A resource (PC, controller, 1756-ENBT) in the system is overloaded. It works, but fails intermittently Example: The CLGX outgoing unconnected message buffer is being exceeded. Example: Noise is causing an I/O connection to be lost.
The steps below will help solve any of these problems but to keep this document short, it does not detail individual troubleshooting possibilities. The detailed steps below can be used as follows: It does not work at all Example: See Ping, Physical Layer It works, but too slow Examples: See Logix Controller System Overhead, Module Device Capacity, I/O or Produce/Consume Tags, Rockwell Ethernet NIC, Logix Controller outgoing unconnected message buffer, etc. It works but fails intermittently Examples: See Switch configuration, I/O or Produce/Consume Tags, Logix Controller unconnected message buffer, etc.
3 of 13
EtherNet/IP and Logix troubleshooting
Where should I look? The list below gives an order to troubleshooting. Start with Ping and work your way down. Skip any steps that you know are not necessary.
Ping If you cannot get a reply using Ping. Example: Request timed out could mean a number of things inc target is powered down. Unknown host means the specified IP address is bad, e.g. 255.255.255.255. Destination host is not reachable could mean a number of things including a bad cable.
Look for: AC power not applied a missing or defective cable (a clue would be that the Link light is off or intermittent) you forgot to configure the module you forgot to completely configure the target node including subnet mask and gateway Example: attempting to ping a module on a different subnet and the subnet mask is set incorrectly or the gateway address is incorrect. on some switches (e.g. Cisco 3550), port mirroring disables pinging (on the mirror-to port) If replies are intermittent, ping continuously and see how much deviation. If the jitter is more than 10ms or you skip a reply: something is busy (network or NIC) However, a busy 1756-ENBT probably wont be the problem. From measurements, a 1756-ENBT running at 100% CPU Utilization replies in the range 10-16ms. Note that if you find a heavily loaded interface, reduce the load to, lets say, not more than 90% to allow for some margin. the network is long (satellite or Frame relay) noise is corrupting packets and they are being dropped Example: ping t 130.130.130.1 This will ping continuously
If you can ping successfully, but the problem is not solved, continue on with the steps below. For help with the Ping command, just enter Ping from a cmd screen (DOS screen). You could also use RSWho to test connectivity. However, ping is simpler to use and is faster.
Bad Hardware If communications are consistently bad, replace suspect hardware to isolate the trouble area. Examples: Cable Rockwell Ethernet interface (e.g. 1756-ENBT) Switch port
4 of 13
EtherNet/IP and Logix troubleshooting
The problem may be old firmware or old hardware. Record hardware and firmware versions and report to Technical Support for the appropriate vendor.
Switch configuration, Autonegotiation or hard-configuration The autonegotiation specification (in the 802.3 standard) allows for interpretation by developers. The result is that every vendors Autonegotiation firmware works nearly the same but not exactly. If one node is configured for half-duplex and the other for full-duplex, random and possibly frequent communications will be lost. To see Rockwell duplex/speed status, see Rockwell web server diagnostics, Class 1 Packet Statistics. Verify that the status reported here matches the switch configuration. Example: If your switch is configured for Autonegotiation, the Rockwell web server page should indicate Auto Negotiated speed and duplex. If you are running out of troubleshooting ideas, hard configure the speed and duplex on the switch ports and also on all RA nodes, this will eliminate one more variable. As of RSLogix version 12, you can hard configure speed and duplex. As of RSLinx version 2.41 (build 10), this feature is not yet supported but has been requested.
I/O or Produce/Consume Tags (class 1 messaging) Look at Missed Frames in the web browser Diagnostics (see detailed web server description below). This parameter is only for I/O or produce tag messaging. Although some applications may run OK when losing some frames, you should strive for a system with zero (0) dropped frames. Furthermore, if you are dropping at least 4 consecutive frames, you might be dropping a CIP connection. To clarify, if you are dropping connections, this will definitely be incrementing. If you are not dropping connections, this may be incrementing if your system is not as stable as possible. Viewing Missed Frames will give you something numerical to help quantify a problem. Note that yellow triangles in RSLogix5000 I/O Configuration tree will not be seen if a connection is lost and recovered quickly enough. However, the Missed Frames counter will see everything even one missed frame. This counter is excellent for diagnostics because of its high resolution.
EtherNet/IP Module Device Capacity Use the web server to verify that CPU utilization on the Ethernet NIC is less than 100%. If utilization is at 100%, this may be the problem. To reduce the utilization: make I/O RPI values larger (slower) reduce the number of I/O connections make non-critical traffic less frequent (e.g. MSGs and HMI) add another EtherNet/IP module and divide the traffic load
5 of 13
EtherNet/IP and Logix troubleshooting
Logix Controller outgoing unconnected message buffer ControlLogix has a limit of 10 outgoing unconnected buffers. As of version 8, this can be increased to 40. See KnowBase document for details. These are required for all messaging explicit and implicit (for establishing a connection). If the controller tries to exceed this limit, it will fail. Example, if you try to initiate 50 MSG instructions simultaneously, those in excess of the buffer size will fail. See KnowBase document G20181 for information on how to read unconnected outgoing buffers attribute 17 is reserve (unused) attribuite18 is high-water mark attribute 19 is buffers currently in use Use RSLogix5000 version 12 to read the above values reliably.
Logix Controller System Overhead Add more time for communications by increasing the continuous task timeslice or run the higher priority tasks (eg. Periodic) tasks less frequently or at a lower priority. The default timeslice is 10%. Increase it to 30-50% to see affect.
Slow PC Application If you think the customers application might be running slow, there are two possibilities: the PC is not powerful enough the application runs slow (or accesses controller data inefficiently) For either case, look at the CPU utilization in the Task Manager to see how close to 100% it is. Another approach would be to stop the application and use a simple application, OPC test client, that comes with RSLinx to access all the data you need. Configure the topic poll rate for 1ms to make it goes as fast as the Rockwell controller(s) will go. If you can achieve sufficient throughput using this approach, you hopefully will have convinced the customer that the problem is the application (or that the PC is not powerful enough).
Duplicate IP Address If two Rockwell nodes are duplicated, the last one to be configured will steal the IP address. Detection of this can be simple or difficult. Simple: In the I/O tree, a 1794-AENT is configured and operating nicely. However, a 17560ENBT is then accidently configured for the same address. The result would be that the Logix controller would declare the connection to the AENT is lost. Difficult: Messages (MSG instruction) from one CLGX to another CLGX are occurring OK. Then, after a third device is configured, the MSGs are failing. If you ping the IP address, it will ping OK. If the 3rd device is of the same type (e.g. 1756-ENBT) but does not have the desired tag, even RSWho will show good connectivity but the MSG will fail.
6 of 13
EtherNet/IP and Logix troubleshooting
However, work is in progress within ODVA EtherNet/IP to examine a standard mechanism to detect and defense against duplicate addresses.
Network trace If you have not solved the problem by now, we need to see what is happening on the network. Take a trace and forward to Tech Support for analysis. Make sure that the trace has the problem in it. Without waiting for an analysis of the trace, start looking at the physical layer (see below).
Noise or Intermittent Defective Hardware All of the above steps are logical. If the above steps dont solve the problem, noise or bad hardware is the problem. Intermittent communications is most likely caused by one of the following. Ethernet cable placement Example: Visually inspect for cable placement next to 480VAC. Noise/grounding including Example: Physically detach an intermittent chassis from the enclosure and see how it operates. Intermittent hardware Focus on a communications problem between 2 nodes and try the following: - Replace a Rockwell Ethernet interface. - Move the cat5 cable (from a Rockwell node) to a different switch port. - Replace an Ethernet cable
7 of 13
EtherNet/IP and Logix troubleshooting
Web Server Description
From the Rockwell web server home page, the following are parameters that have proven useful when troubleshooting a system on one of the following modules: 1756-ENBT 1788-ENBT 1794-AENT 1769-L35E Other Rockwell EtherNet/IP products have a different looks to them at this time. However, there is a migration plan for uniformity for all of our products.
In the Address field of Internet Explorer or Netscape, enter the IP address of an Ethernet interface module. Example: 10.88.76.96 You will see something similar to the following ---
Of all the Rockwell Ethernet modules that you may have (CLGX, Flex I/O, etc.), the Ethernet interface(s) within the controller chassis is where you want to start troubleshooting since it probably is the busiest. Up to this time, most requests for troubleshooting involved the I/O and produce tag. The diagnostics most useful I/O and produce tag are marked with an asterisk ( * ) below. Report all errors, timeouts, etc. to Rockwell Automation Technical Support.
8 of 13
EtherNet/IP and Logix troubleshooting
How much is too much? The answer to the question, How many errors of type X are bad?, is application dependent. For example, if you have a single bad UDP checksum (caused by electrical noise) every 100 packets, that packet will be discarded. One customer may say this is not a problem because his production line is running fine. However, another customer may say that this is unacceptable.
Link name: Module Information This page is self-descriptive. Firmware revision and module uptime are important. Link name: TCP/IP Configuration This page is self-descriptive and useful. Link name: Chassis Who This page is self-descriptive and useful.
Link name: Diagnostic Information
9 of 13
EtherNet/IP and Logix troubleshooting
Backplane Statistics Identifies backplane errors. Report timeouts or errors to Rockwell Technical Support. Connection Manager Statistics Identify if any Rejects or Timeouts are incrementing. Note: you can get the same info from RSLinx by right clicking on the Ethernet module and selecting Module Statistics and selecting Connection Manager. Link name: Ethernet Statistics Input errors Output errors Link name: TCP Statistics Connection requests These are out-going from the controller thru an ENBT. Connection accepts These are in-coming from the wire through an ENBT to a controller. These will increment while you are on line with a web browser. Discards These are bad packets that have been discarded.
Link name: UDP Statistics At this time, this screen will increment only if other devices are sending non-CIP UDP packets to this module. At this time, no devices send non-CIP UDP packets to this module. From testing with a produced tag (RPI=10ms), the total UDP packets and input UDP packets do increment (on the company network) but they increment at a rate of only 1-3 every 10-30 seconds. With an RPI of 10ms, the produce tag rate is 200 packets per second. The conclusion is that there is no relationship between CIP packets and UDP statistics. Without connecting Sniffer to investigate, the assumption is that someone in the building is sending multicast to all stations, including my ENBT module. Also, the addition of CIP UDP checksum errors has formally been requested.
Link name: Encapsulation Statistics Shows cumulative and active in/out TCP connections used for encapsulation (CIP) sessions. The TCP statistics shown here are for all TCP connection (CIP+ HTTP+ telnet, etc. ).
Link name: Enet/IP (CIP) Statistics Active Class 1 Transports provides the number of transports. In general, two (2) class 1 transports equates to a connection. Use this number to verify against your calculated class 1 total.
10 of 13
EtherNet/IP and Logix troubleshooting
Class 3 transport information is supplied here including client (outgoing) and server(incoming) detailed information. Unconnected message information is also provided here. The UCMM Worst Backlog (Client) can be used to see the unconnected message high-water mark for messages to legacy PLCs. If this is 10 and you have the Logix processor configured for a maximum of 10, this would be a sign that you may be trying to exceed the controllers limit.
Link name: Class 1 (CIP) Packet Statistics *Link Status (including negotiation description) *Speed *Duplex *Method for selecting duplex and speed (eg. Autonegotiation) *CPU Utilization Percentage (includes processing for everything on the module) Current TCP connections (these are for all connections, class 1 and class 3) Includes actual connections and ones being built but not yet complete. Current incoming TCP connections (these are for all connections, class 1 and class 3) Current outgoing TCP connections (these are for all connections, class 1 and class 3) Includes actual connections and ones being built but not yet complete. *Actual class 1 packets per second (for I/O and produce tag only) Compare your calculated to this number. Reserve Class 1 capacity is how much is unused. *Total Missed Class 1 Packets (for I/O and produce tag only)
Link name: *Class 1 (CIP) Active Transports You should see only the RPIs you configured. Example: If all your configured RPIs are 50ms, you should see only 50ms API. Link name: Class 3 (CIP) Active Transports For explicit messaging, transports are the same as connections. Examine the remote addresses. Verify that these are correct for your system. Examine the number of Class 3 transports. The number of transports expected depends on what you are doing. Example: RSLogix5000 opens 1 CIP connection. Example, a PvPlus can use 1 or more depending on the volume of tags on scan. With 488 tags on scan (120 integers, 120 dints, 128 reals, 128 bools), a PvPlus (actually RSLinx Enterprise) opened three transports.
11 of 13
EtherNet/IP and Logix troubleshooting
RSLinx Diagnostics
From RSLinx, in RSWho, you can right click, select Module Statistics and select the tabs/links listed below. Link name: General This tab is self-descriptive. Link name: Port Diagnostics Most of this information, and more, can also be found in the web server in 3 places: Diagnostics Ethernet Statistics Diagnostics TCP Statistics Diagnostics IP Statistics For the most part, the amount of information in the web server is greater but requires that you to look in 3 different places to see everything. Additionally, RSLinx Port Diagnostics does show some values (e.g. alignment errors) that are not seen in the web sever. The recommendation is that you look at RSLinx port diagnostics and note any errors.
Link name: Connection Manager Same as Connection Manager in web server. Link name: Backplane Same as Backplane stats in web server.
12 of 13
EtherNet/IP and Logix troubleshooting
References: 1. Noise ----------------------------------------------------EtherNet/IP Media Planning and Installation Manual Publication ENET-IN001A-EN-P
Industrial Automation Wiring and Grounding Guidelines, 1770-4.1
GMC-RM001 www.ab.com/manuals/gmc/GMC-RM001A-EN-P-JUL01.pdf
2. System Planning and module capacities -------EtherNet/IP Performance and Application Pub ENET-AP001C-EN-P
13 of 13