View unanswered posts | View active topics It is currently Tue Dec 10, 2019 1:04 pm



Reply to topic  [ 6 posts ] 
 Local Datastore become Inactive after long time 
Author Message

Joined: Wed May 18, 2011 6:46 am
Posts: 3
Reply with quote
Post Local Datastore become Inactive after long time
Hi,

I'm running ESXI 4.1 (260247), on a DELL R410, with a DELL SAS6/IR SATA Controller, with 2x 500GB SATA Drives.

Everything was running fine, but since one week, after some time, my datastore become inactive.

I can still connect to vsphere, but the datastore is shown as Inactive. VM appears to be still runing (No one is marked as Inactive), but I can't ping anyone, nor reach them using Console or SSH.

The only solution I have found is to restart the host (Which takes 30-40min, as it's trying to save the state of the VM, but seem's he can't, as the datastore is inactive.). Anyway, after 30-40min, the server reboot, and everything is running fine.

The datastore become inactive always during the night. During the night, I have a job that backup my mail server. Maybe, it's because of a high I/O of the disks, then the datastore become inactive, even if it's seems to be weird, as I had this job running for months without any problems.

I don't have anything in the logs, so it's a little hard to debug.

I tried rescanning VMFS Modules, Refresh Datastore, Scan the controller, but nothing works expect rebooting the host.


If anyone have an idea, it would be greatly appreciated, as I can't sleep to much last weeks, because I have to check if my server is still running every night...



Thanks !


Wed May 18, 2011 6:51 am
Profile
Site Admin

Joined: Mon Mar 16, 2009 10:13 pm
Posts: 3880
Reply with quote
Post Re: Local Datastore become Inactive after long time
Is the problem gone if you disable the backup job? I would take a look at the logs. To start with /var/log/messages - that's the current vmkernel log file. When it gets to be 1 MB in size, it gets compressed to messages.1.gz - ESXi keeps 7 copies around so you should be able to get something out of one of those files.

If you look at Configuration > Software > Advanaced Settings and browse to Syslog.Local.Datastore path - is it pointing to a HD partition or datastore? If so then the vmkernel log files will survive a reboot. The logs in /var/log are in a ram disk and they're gone when the host reboots.

_________________
Dave Mishchenko
VMware vExpert 2009-2013
Image
Now available - VMware ESXi: Planning, Implementation, and Security
Also available - vSphere Quick Start Guide


Wed May 18, 2011 9:23 am
Profile

Joined: Wed May 18, 2011 6:46 am
Posts: 3
Reply with quote
Post Re: Local Datastore become Inactive after long time
Well, I'm going to try that, but the problem is that it's occuring every 2-3 days, not everyday, and I'm a bit afraid of not having backup of my mail server ^^

But I'll investigate on this.

For the logs, well the fact is that I log everything on a syslog server, and when the problem occur, well, I get nothing....

I'm sorry I'm not at work anymore, so I'll post the last logs entries here tomorrow.

I didn't updated to Update1 yet, so I'll try this also, as I saw this in the changelog, and could be related to my problem :

"ESXi hosts might fail when using LSI SAS HBAs connected to SATA disks
Data loss might occur on ESXi hosts using LSI SAS HBAs connected to SATA disks. This issue occurs when the maximum I/O size is set to more than 64KB in mptsas driver and LSI SAS HBAs are connected to SATA disks. The issue is resolved in this release"

I don't know how to check the maximum I/O size in the mptsas driver so...


Wed May 18, 2011 9:37 am
Profile
Site Admin

Joined: Mon Mar 16, 2009 10:13 pm
Posts: 3880
Reply with quote
Post Re: Local Datastore become Inactive after long time
nameless wrote:
Well, I'm going to try that, but the problem is that it's occuring every 2-3 days, not everyday, and I'm a bit afraid of not having backup of my mail server ^^

But I'll investigate on this.

For the logs, well the fact is that I log everything on a syslog server, and when the problem occur, well, I get nothing....

I'm sorry I'm not at work anymore, so I'll post the last logs entries here tomorrow.

I didn't updated to Update1 yet, so I'll try this also, as I saw this in the changelog, and could be related to my problem :

"ESXi hosts might fail when using LSI SAS HBAs connected to SATA disks
Data loss might occur on ESXi hosts using LSI SAS HBAs connected to SATA disks. This issue occurs when the maximum I/O size is set to more than 64KB in mptsas driver and LSI SAS HBAs are connected to SATA disks. The issue is resolved in this release"

I don't know how to check the maximum I/O size in the mptsas driver so...

Interesting that you mention the mptsas driver. I have a customer with IBM x3690s that has experienced host failures and IBM has suggested an update to that driver. Note that they're at 4.1 update 1, but there is a newer driver that IBM is recommending that they go to. There might be a newer driver for the version you're running which would avoid you having to go to 4.1u1 until you're ready to.

_________________
Dave Mishchenko
VMware vExpert 2009-2013
Image
Now available - VMware ESXi: Planning, Implementation, and Security
Also available - vSphere Quick Start Guide


Wed May 18, 2011 10:02 am
Profile

Joined: Wed May 18, 2011 6:46 am
Posts: 3
Reply with quote
Post Re: Local Datastore become Inactive after long time
Hello,

Thanks for your answer. I think I'll update my server to 4.1 U1 on next week.

Yesterday my server didn't crashed, but I spotted this on the messages logs from my ESXI :

Code:
May 18 22:16:19 vmkernel: 0:23:03:35.411 cpu9:4231)ScsiDeviceIO: 1672: Command 0x2a to device "naa.600508e000000000f7a03804e1b70305" failed H:0x5 D:0x0 P:0x0 Pos                                                                            sible sense data: 0x0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.462 cpu7:4103)<6>mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
May 18 22:16:22 vmkernel: 0:23:03:38.464 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027f9dcd40) to NMP device "naa.600508e000000000f7a03804e1b7                                                                            0305" failed on physical path "vmhba2:C1:T0:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x
May 18 22:16:23 0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.464 cpu7:4103)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600508e000000000f7a03804e1b70305" state in dou                                                                            bt; requested fast path state update...
May 18 22:16:22 vmkernel: 0:23:03:38.464 cpu7:4103)ScsiDeviceIO: 1672: Command 0x2a to device "naa.600508e000000000f7a03804e1b70305" failed H:0x8 D:0x0 P:0x0 Pos                                                                            sible sense data: 0x0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.466 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027eeea740) to NMP device "naa.600508e000000000f7a03804e1b7                                                                            0305" failed on physical path "vmhba2:C1:T0:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x
May 18 22:16:23 0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.466 cpu7:4103)ScsiDeviceIO: 1672: Command 0x2a to device "naa.600508e000000000f7a03804e1b70305" failed H:0x8 D:0x0 P:0x0 Pos                                                                            sible sense data: 0x0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.468 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027f0f0940) to NMP device "naa.600508e000000000f7a03804e1b7                                                                            0305" failed on physical path "vmhba2:C1:T0:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x
May 18 22:16:24 0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.468 cpu7:4103)ScsiDeviceIO: 1672: Command 0x2a to device "naa.600508e000000000f7a03804e1b70305" failed H:0x8 D:0x0 P:0x0 Pos                                                                            sible sense data: 0x0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.470 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027f9d4340) to NMP device "naa.600508e000000000f7a03804e1b7                                                                            0305" failed on physical path "vmhba2:C1:T0:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x
May 18 22:16:24 0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.470 cpu7:4103)ScsiDeviceIO: 1672: Command 0x2a to device "naa.600508e000000000f7a03804e1b70305" failed H:0x8 D:0x0 P:0x0 Pos                                                                            sible sense data: 0x0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.472 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027f26aa40) to NMP device "naa.600508e000000000f7a03804e1b7                                                                            0305" failed on physical path "vmhba2:C1:T0:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x
May 18 22:16:24 0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.472 cpu7:4103)ScsiDeviceIO: 1672: Command 0x2a to device "naa.600508e000000000f7a03804e1b70305" failed H:0x8 D:0x0 P:0x0 Pos                                                                            sible sense data: 0x0 0x0 0x0.
May 18 22:16:22 vmkernel: 0:23:03:38.473 cpu7:4103)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027f24e040) to NMP device "naa.600508e000000000f7a03804e1b7                                                                            0305" failed on physical path "vmhba2:C1:T0:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x
May 18 22:16:24 0 0x0 0x0.


It only appeared this time, and after that, nothing...

When my server crashed, the last line I had on my logs is :

Code:
1:23:18:58.554 cpu7:4103)<6>mptbase: ioc0: LogInfo(0x31110700): Originator={PL},
<="" a=""> />Code={Reset}, SubCode(0x0700)


Don't know if it could be related or not...


Wed May 18, 2011 11:59 pm
Profile
Site Admin

Joined: Mon Mar 16, 2009 10:13 pm
Posts: 3880
Reply with quote
Post Re: Local Datastore become Inactive after long time
Might be worth running a diagnostics app on the server (especially the RAID controller).

_________________
Dave Mishchenko
VMware vExpert 2009-2013
Image
Now available - VMware ESXi: Planning, Implementation, and Security
Also available - vSphere Quick Start Guide


Thu May 19, 2011 12:15 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 6 posts ] 

Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.