
Network problems with Intel PCI cards
Hi there.
I recently built a new whitebox running
ESXi v4.1u1.
Hardware used:
Motherboard: Biostar A770E3 Ver. 6
RAM: Patriot Memory PGV38G1333ELK 8 GB (2 x 4 GB)
Array Controller: HighPoint RocketRAID 4320
NIC: 1 x Intel PRO/1000 MT Desktop Adapter PCI Card, 1 x Intel PRO/1000 MT Dual Port Server Adapter PCI-X Card
Internal SATA ports are not used (that's what the array is for) but FYI they were quickly recognized by ESXi. The internal NIC is a Realtek one and I made a custom oem.tgz file with the drivers for both the array controller and the Realtek onboard NIC. The Intel NICs are using the included drivers.
Array controller is working perfectly and it's manageable by using the Out-Of-Band network connection. RAM modules are OK as well. ESXi is working normally and it was installed on a USB drive. There's one VM with Windows Server 2008 R2 Standard Edition installed that hosts a large number of files. My desktop is a Windows 7 Ultimate custom built PC with an Intel DX58SO Motherboard and an i7 Processor.
Here comes the problem.
Code:
Apr 7 22:49:43 vmkernel: 0:08:10:28.859 cpu0:4100)<3>e1000: vmnic3: e1000_clean_tx_irq: Detected Tx Unit Hang
Apr 7 22:49:43 vmkernel: Tx Queue <0>
Apr 7 22:49:43 vmkernel: TDH <d1>
Apr 7 22:49:43 vmkernel: TDT <f3>
Apr 7 22:49:43 vmkernel: next_to_use <f3>
Apr 7 22:49:43 vmkernel: next_to_clean <ce>
Apr 7 22:49:43 vmkernel: buffer_info[next_to_clean]
Apr 7 22:49:43 vmkernel: t
Apr 7 22:49:45 vmkernel: 0:08:10:30.861 cpu0:4100)<3>e1000: vmnic3: e1000_clean_tx_irq: Detected Tx Unit Hang
Apr 7 22:49:45 vmkernel: Tx Queue <0>
Apr 7 22:49:45 vmkernel: TDH <d1>
Apr 7 22:49:45 vmkernel: TDT <f3>
Apr 7 22:49:45 vmkernel: next_to_use <f3>
Apr 7 22:49:45 vmkernel: next_to_clean <ce>
Apr 7 22:49:45 vmkernel: buffer_info[next_to_clean]
Apr 7 22:49:45 vmkernel: t
Apr 7 22:49:47 vmkernel: 0:08:10:32.864 cpu0:4100)<3>e1000: vmnic3: e1000_clean_tx_irq: Detected Tx Unit Hang
Apr 7 22:49:47 vmkernel: Tx Queue <0>
Apr 7 22:49:47 vmkernel: TDH <d1>
Apr 7 22:49:47 vmkernel: TDT <f3>
Apr 7 22:49:47 vmkernel: next_to_use <f3>
Apr 7 22:49:47 vmkernel: next_to_clean <ce>
Apr 7 22:49:47 vmkernel: buffer_info[next_to_clean]
Apr 7 22:49:47 vmkernel: t
Apr 7 22:49:47 vmkernel: 0:08:10:33.373 cpu1:4374)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic3: transmit timed out
Apr 7 22:49:47 vobd: Apr 07 22:49:47.773: 29433373958us: [vob.net.uplink.watchdog.timeout] Watchdog timeout occurred for uplink vmnic3.
Apr 7 22:49:47 vobd: Apr 07 22:49:47.999: 29433599158us: [vob.net.pg.uplink.transition.down] Uplink: vmnic3 is down. Affected portgroup: VM Network. 0 uplinks up. Failed criteria: 130.
Apr 7 22:49:47 vobd: Apr 07 22:49:47.999: 29433599208us: [vob.net.vmnic.linkstate.down] vmnic vmnic3 linkstate down.
Apr 7 22:49:48 Hostd: [2011-04-07 22:49:48.048 37F03B90 info 'NetworkProvider'] Unable to enable WOL Success: Success
Apr 7 22:49:48 Hostd: [2011-04-07 22:49:48.100 37BC2B90 verbose 'DvsTracker'] FetchDVPortgroups: added 0 items
Apr 7 22:49:48 Hostd: [2011-04-07 22:49:48.112 37BC2B90 verbose 'DvsTracker'] FetchDVPortgroups: added 0 items
Apr 7 22:49:48 Hostd: [2011-04-07 22:49:48.125 37BC2B90 verbose 'DvsTracker'] FetchDVPortgroups: added 0 items
Apr 7 22:49:49 Hostd: [2011-04-07 22:49:49.113 37F4DB90 verbose 'DvsTracker'] FetchSwitches: added 0 items
Apr 7 22:49:49 Hostd: [2011-04-07 22:49:49.114 37F4DB90 verbose 'DvsTracker'] FetchDVPortgroups: added 0 items
Apr 7 22:49:50 vmkernel: 0:08:10:36.550 cpu0:4364)<6>e1000: vmnic3: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Apr 7 22:49:51 vobd: Apr 07 22:49:51.007: 29436607195us: [vob.net.pg.uplink.transition.up] Uplink:vmnic3 is up. Affected portgroup: VM Network. 1 uplinks up.
Apr 7 22:49:51 vobd: Apr 07 22:49:51.007: 29436607244us: [vob.net.vmnic.linkstate.up] vmnic vmnic3 linkstate up.
vmnic3 is one of the Intel NIC cards. I already tested the other Intel NIC and the problem is the exact same one. This happens when I start any kind of network communication (it even happens when I connect to the VM via VNC, either in my local network or via Internet). If I stop all network communication (by turning off the VM or by disconnecting the VM from the vSwitch that has those vmnics set), the error stops appearing.
These cards were working before on my previous whitebox but that one had the previous ESXi v4.1 build. I haven't YET tried to install the older v4.1 version of ESXi in this new whitebox since the changes were minimal and (as much as I looked into the documentation) the only changes were the inclusion of some other 3rd party drivers. I tried to make a new vSwitch but using the Realtek onboard NIC instead and the problem is somewhat similar but the difference is that the same error does not show up on the /var/log/messages log file. Instead, the constant errors that the Realtek NIC shows is the following one:
Code:
Apr 7 22:55:20 vmkernel: 0:08:16:05.611 cpu0:4125)<3>RTNL: assertion failed at vmkdrivers/src26/drivers/net/r8168/r8168_n.c (1771)
Apr 7 22:55:40 vmkernel: 0:08:16:25.612 cpu0:4125)<3>RTNL: assertion failed at vmkdrivers/src26/drivers/net/r8168/r8168_n.c (1771)
Apr 7 22:56:00 vmkernel: 0:08:16:45.612 cpu0:4125)<3>RTNL: assertion failed at vmkdrivers/src26/drivers/net/r8168/r8168_n.c (1771)
Apr 7 22:56:09 Hostd: [2011-04-07 22:56:09.660 37EC2B90 verbose 'Proxysvc Req00175'] New proxy client TCP(local=127.0.0.1:57806, peer=127.0.0.1:80)
Apr 7 22:56:09 Hostd: [2011-04-07 22:56:09.664 37BC2B90 info 'Vmomi'] Activation [N5Vmomi10ActivationE:0x37a31f10] : Invoke done [waitForUpdates] on [vmodl.query.PropertyCollector:ha-property-collector]
Apr 7 22:56:09 Hostd: [2011-04-07 22:56:09.664 37BC2B90 verbose 'Vmomi'] Arg version:
Apr 7 22:56:09 Hostd: "47"
Apr 7 22:56:09 Hostd: [2011-04-07 22:56:09.664 37BC2B90 info 'Vmomi'] Throw vmodl.fault.RequestCanceled
Apr 7 22:56:09 Hostd: [2011-04-07 22:56:09.664 37BC2B90 info 'Vmomi'] Result:
Apr 7 22:56:09 Hostd: (vmodl.fault.RequestCanceled) {
Apr 7 22:56:09 Hostd: dynamicType = <unset>,
Apr 7 22:56:09 Hostd: faultCause = (vmodl.MethodFault) null,
Apr 7 22:56:09 Hostd: msg = "",
Apr 7 22:56:09 Hostd: }
Apr 7 22:56:09 Hostd: [2011-04-07 22:56:09.666 37EC2B90 error 'App'] Failed to read header on stream TCP(local=127.0.0.1:52167, peer=127.0.0.1:0): N7Vmacore15SystemExceptionE(Connection reset by peer)
Apr 7 22:56:12 Hostd: [2011-04-07 22:56:12.936 37B40B90 verbose 'Proxysvc Req00176'] New proxy client TCP(local=10.0.0.2:52405, peer=10.0.0.10:80)
Apr 7 22:56:20 vmkernel: 0:08:17:05.612 cpu0:4125)<3>RTNL: assertion failed at vmkdrivers/src26/drivers/net/r8168/r8168_n.c (1771)
Not sure if the Vmomi errors have something to do with this, although I have no clue what that means anyway. I'm about to order a PCI-Express x1 Intel EXPI9301CTBLK NIC and see what happens. This one should use the e1000e driver instead of the old e1000 driver since it's a PCI-Express card.
I know this problem is a bit complex but any shed of light or information will be greatly appreciated. I looked almost everywhere on Google searches but I have yet to find a definite answer to my specific problem. Thank you all in advance.