View unanswered posts | View active topics It is currently Tue Oct 17, 2017 11:27 pm



Reply to topic  [ 10 posts ] 
 ESXi 4.0 host completely freezes copying 300+mb from guest 
Author Message

Joined: Fri Jan 01, 2010 4:59 pm
Posts: 30
Reply with quote
Post ESXi 4.0 host completely freezes copying 300+mb from guest
My ESXi box has developed a problem with hard freezing – the entire system completely locks and is not responsive to any keykstroke, network activity, anything (caps lock light isn’t responsive indicating compete systemic freeze). The only thing I can do is to turn it off and restart.

The problem seems to occur when I’m copying a series of files from a guest XP machine – if I just run normally it’s ok, if I copy a block of files from this (say, over 300mb in size) then I end up with a completely dead system.

I have two western digital green drives in there – a 500gig for primary OS and a 2nd 1tb one for main data. The 500gig did show some errors which I’ve rectified with the WD diagnostics tools (subsequent tests show no issues) but I’m still getting the freezing, the situation is no different now to what it was earlier.

I’m a bit stumped as to where to go with this – there are no log entries as the system crashes before it can write a log. I’ve upgraded the BIOS I have a configuration of a


The configuration is:

Motherboard: GA-EG41MF-US2H
CPU: Core2Duo E7300
Ram: 2x2gig sticks
ESXi: 4.1 after all patching (all current ones applied)
NIC: 2 NICs, Intel Pro1000GT (primary NIC and used for mgmt zone), onboard Realtek RLT8111C (problems were present before getting this working however)
Drives: Primary Western Digital 500gig Green SATA drive, Secondary Western Digital 1tb Green SATA drive


Has anyone seen this type of behavior at all? The system is generally relatively content most of the time but migrating files from that target machine seems to knock the entire thing flying….


Sun May 30, 2010 10:36 am
Profile
Site Admin

Joined: Mon Mar 16, 2009 10:13 pm
Posts: 3880
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
Might be worth running a thorough memory check. You can also configure syslog to see if that captures anything.

Since you can reproduce the error I would suggest pressing ALT+F12 at the console to see the vmkernel log file and then run the copy to see if anything is logged (it's the same output you'll see with the syslog file).

_________________
Dave Mishchenko
VMware vExpert 2009-2013
Image
Now available - VMware ESXi: Planning, Implementation, and Security
Also available - vSphere Quick Start Guide


Mon May 31, 2010 12:32 pm
Profile

Joined: Fri Jan 01, 2010 4:59 pm
Posts: 30
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
Thanks Dave,

I did a single pass of memtest86 3.3 and found no problems from that cycle; I also did manufacturer diagnostics on the drive and it returned no issues either, the current configuration has been physically operational since around January (although I did swap out one of the drives for a PATA 300gig drive which hosts the OS and some of the VM’s).

The defective client system was an XP SP3 build with two virtual drives, one for OS and the 2nd a 700gig data drive – this is one that I built a while back in the ESXi 3.5 days using one of the forum techniques to extend the drive size (have been using that drive file for about 18 months or so).

I took down the offending XP vm, build a new temporary XP vm and mapped that drive to the new machine and copied files from it without issue; I then removed it from the original VM and re-added it and I haven’t been able to recreate the problem so far. I’m keen to find out what it was because it was able to take the entire infrastructure down – obviously this is a pretty significant concern, I’m wondering if there’s an issue of how the drive was being utilised (virtual SCSI controller perhaps?). I noticed some strange sounds coming from the drive when the system had frozen – it wasn’t quite a click of death, it was an unusual seeking sound but the diagnostics have shown no issue. I moved entire virtual drives and machines on and of that device without any significant issue earlier so I’m tending to think right now that it’s not a hard drive failure at this point.

If I can replicate the issue again I’ll publish the response on the monitor screen, obviously it’s a rather significant concern right now.


Mon May 31, 2010 3:36 pm
Profile

Joined: Fri Jan 01, 2010 4:59 pm
Posts: 30
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
Righto, it's collapsed again - this time I had it on the diagnostics screen, the entire thing's frozen and can't move (not even caps-lock is responding).

The line "WARNING: VFAT: 154: File_loct" appears in white inverted text three times, there are a series of references to "VSCSI" but that's the only thing I can see that looks warning-like, it was several hours before failure.

The last line is for CPU0, "VMXNET2: 4448: unicastAddr" followed by a mac address.

I can't get any more info, the system is completely frozen.


Wed Jun 02, 2010 10:46 pm
Profile
Site Admin

Joined: Mon Mar 16, 2009 10:13 pm
Posts: 3880
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
Which virtual NIC type are you using with the VM? Does the problem occur with another guest OS?

_________________
Dave Mishchenko
VMware vExpert 2009-2013
Image
Now available - VMware ESXi: Planning, Implementation, and Security
Also available - vSphere Quick Start Guide


Sat Jun 05, 2010 11:15 am
Profile

Joined: Fri Jan 01, 2010 4:59 pm
Posts: 30
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
It's just the "VM Network" adapter that represents the Intel Pro 1000GT card; I'm not having the crash with any other machine though. It seems to be related to copying large files from the system aggressively; for example the offending system has a 4gig ISO file on it and if I copy that off to another machine (the other machine is the one initiating the copy) it would crash.

I’m using an application on the other machine called “Teracopy” which permits me to vary the buffer on the calling machine – if I set that buffer down to 64k then the system is less aggressive in its demand for data stream (below the standard windows copy call). This seems to reduce the incidence of failure; if I raise it to a higher level then it causes the collapse. Using machines that don’t have Teracopy as their copy routine (i.e. just standard windows file copy) causes the crash.

I’m currently thinking that there’s something wrong with the virtual disk file; I’m going to put some large files onto another VM today and copy from there to see if I can fail it on another machine to test this theory. I built another virtual system and pointed it to the offending virtual disk and it also crashed so it’s not specifically the VM client’s build, the common factor seems to be this particular virtual disk or something involved with it.

I’m currently thinking of evacuating the 600gig of data that’s on there and delete the virtual disk and re-create it would this be a worthwhile exercise? The virtual disk was created in ESXi 3.5 and it was extended with a technique (found on one of the support sites) to make it larger than a particular size limit, I think this was 500gig at the time. I’ve been using this file for a long time though I’ve only been using it with 4.0 for about 4-5 months; the crashing behaviour has developed over a period of time but was particularly bad the other day (probably made worse by the primary drive with the ESXi boot OS on it having problems too).

Another thing I’ve noticed is that there’s a faint, repeating sound from the hard drive as though it’s attempting to seek or retry when the system fails; I’ve run complete manufacturer diagnostics on the drive and there are no issues at all, I have a feeling that it’s stuck in some kind of disk operation when the system fails and it’s stuck in that action but as the logging screen freezes I can’t use that for analysis.


Sat Jun 05, 2010 2:04 pm
Profile

Joined: Fri Jan 01, 2010 4:59 pm
Posts: 30
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
Since last posting I suspected it could have been the guest drive created on the 1tb drive (it was a 700gig guest drive). I migrated all data from this and deleted it and re-created it - it seemed to behave for a while but since applying the latest updates with the patcher I'm back to the system freezing again.

I can't get any diagnostics or logs because it freezes instantly - it happens when I'm copying files from the guest machine across the network. Not even the console responds, it just stays greyed out (as it does when it's inactive).

I have to physically power the entire machine off and start again, this is obviously a very dangerous situation for a shared host.

The re-created 700gig guest drive is mounted as IDE 0:0.

Where can I go from here with this issue, has anyone any suggestions?


Fri Jul 09, 2010 6:20 am
Profile

Joined: Mon Jun 28, 2010 2:59 pm
Posts: 8
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
if you mean 700GB as your VMDK... then make sure the datastore if formatted to use 4MB block size. this will accommodate larger VMDKs. Heres the breakdown

• 1MB block size – 256GB maximum file size
• 2MB block size – 512GB maximum file size
• 4MB block size – 1024GB maximum file size
• 8MB block size – 2048GB maximum file size

**NOTE: changing the block size requires the datastore to be reformatted. so backup your data**


Fri Jul 09, 2010 3:52 pm
Profile

Joined: Fri Jan 01, 2010 4:59 pm
Posts: 30
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
Thanks boavista, it was formatted at 4mb block size with ESXi 3.5 when originally implemented and is still this way; apparently you can't even create a file that large without it from what I've read.


Fri Jul 09, 2010 4:35 pm
Profile

Joined: Fri Jan 01, 2010 4:59 pm
Posts: 30
Reply with quote
Post Re: ESXi 4.0 host completely freezes copying 300+mb from gue
Update: did some more diagnostic testing to rule out storage and drive issues - I transferred a series of files that was collapsing the system from one VM guest to another and it worked so I re-focused on networking.

I enabled the onboard RT8111 NIC (which I hadn't bothered with since the rebuild) and configured that to be the main NIC for this system leaving the Intel Pro 1000GT in place but without a cable.

Re-performed the transfer that was consistently collapsing - and it worked!

I'm going to do some more load testing to see if I can replicate the failure again but so far it looks to be a problem with the way the system is using that Intel Pro 1000GT NIC. It’s working but if you put a load of data through it (using SMB via a guest machine) the entire host locks up irretrievably.

The problem doesn’t seem to occur with uploading and download to/from the datastore (it sometimes falls over but it’s never locked the entire system up) so I suspect there’s something more subtle or less obvious going on.

The NIC was running 1000mbit with full duplex – it is connected via crossover cable to another card of the same type on my firewall (a Smoothwall 3.0 system).

Is there any way that I can do any diagnostics on the card for this type of thing or should I just bin it as a defective card?


Sat Jul 10, 2010 7:11 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 10 posts ] 

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.