View unanswered posts | View active topics It is currently Sat May 25, 2013 8:20 pm



Reply to topic  [ 8 posts ] 
 s3420gplx saga - fakeraid for ESXi 4.1 
Author Message

Joined: Wed Mar 09, 2011 3:14 am
Posts: 5
Post s3420gplx saga - fakeraid for ESXi 4.1
Hi folks,

I am a newcomer to ESXi with no experience in the area ;)

I am currently looking to rebuild my home server with ESXi to allow more flexibility in supporting guest OS environments. Hardware spec as follows:
    Intel Sever board s3420gplx
    Intel Xeon X3440 CPU
    16gb ram - KVR1066D3Q8R7SK2/16GI
    Intended ESXi OS drives: 2 x Hitachi 500GB Travelstar 7K500 drives
    HBA card 1: LSI SAS 9201-16i
    HBA card 2: IBM serverRAID M1015
    Non OS disk: 10 x Hitachi 2tb Deskstar 5K3000 drives

I have a friend with identical hardware so the plan is do this twice over.

A bit of research was done to try and ensure "everything is on the HCL list". My first experiment installing ESXi 4.1 reveals its not that simple. Firstly, one of the onboard intel Nics (the 82578DM) is not supported :evil: The second challenge was the intel ESRT2 fakeraid - with more research I would have realized that ESXi does not support any ""fakeraid"" drivers.

For the first challenge, following guidance from the forum here I found a modified e1000 driver which was suggested to support this chipset, identified the PCI ids and built my pci.ids and simple.map file. Wanting to understand how this all works, I used ESXi to spin up an ubuntu environment to build oem.tgz and deconstructed the ESXi 4.1 ISO. I don't plan to use a USB stick to run ESXi and this seemed to be where most of the 'guides' where aimed at. To rebuild a new ISO with the NIX driver, I did the following:
    mount the iso
    edit isolinux.cfg at the root of the iso to include oem.tgz after the append vmkboot text
    copy oem.tgz into the root of the iso
    extract imagedd (using bunzip2)
    mount imagedd iso
    copy oem.tgz into the root of the imagedd iso
    unmount imagedd iso
    re-bzip imagedd
    re-create the iso using mkisofs

I then burnt a new CD with this ISO and now have a re-usable installer with the missing NIC driver - and it works :D

The next challenge, fakeraid. For my purposes, I am looking to only run raid 1 (mirror) with no write back cache in a home environment. I consider this to be safer than running on a single disk with no mirror. I don't have any PCI slots readily available to host any more cards, so a dedicated raid solution is a challenge. I have looked at the 'cheat' 2 SATA in 1 SATA out raid solutions but don't see these as ideal either.

So.. part 2: drivers for ""fakeraid"". First, I looked to see if anyone else had success here - from what I can see this is not the case. I then looked at two options: 1) Attempt to compile the generic linux dmraid driver OR 2) Attempt to compile the LSI / Intel drivers.

A lot of mystery seems to surround compiling drivers for ESXi, most guides are for prior versions of ESXi (a bit has changed) or assume more knowledge than I can claim to know. Credit where it is due though, the guide from kernel crash was probably the most helpful in getting some ideas on where to start. http://www.kernelcrash.com/blog/using-a-marvell-lan-card-with-esxi-4/2009/08/22/

First step was to spin up a Centos 5.5 environment in another VM. My limited understanding is that this is considered the "better" environment to attempt ESXi 4.1 driver compilation. The ISO download and install is a bit slow after Ubuntu, I only went with 'standard' options...

Next we need the ESXi OSS - for 4.1 I found this hosted in 3 parts. These were downloaded and moved onto the Centos environment and their contents extracted. Inside these packages there is a folder vmkdrivers-4.1 with a file vmkdrivers-gpl.tgz - this is where all the 'stock' ESXi driver code resides. This too was then extracted into a new folder.

Inside this package there is a kernel sourcecode we apparently need. I found it sitting at the same level as the drivers named kernel-sourcecode-410.2.6.18-164.0.0.253625.x86_64.rpm

Not really knowing what else was required, and wishing there was a "yum install every possible thing I could need" I pretty much installed every package mentioned in every guide I could find.. mostly as a result of repeated failures to compile the stock drivers (and the new drivers). Packages I installed include: qt-devel gtk2-devel gcc kernel-devel readline-devel ncurses-devel libevent-devel 'Development Tools'.

The good news was this then made it so I was able to compile the standard drivers without any errors. There is a build script in the drivers folder called build-vmkdrivers.sh which does this.

I had a quick attempt at the dmraid drivers but found a few conflicts - the "old" dmraid drivers won't compile on the new 2.6 kernel (at least from what I can understand). I grabbed the latest dmraid driver and found while it was happy with the kernel, there is a problem that it requires a library that appears to only be available in a later revision of the LVM2 package than what centos is built on. I threw in the hat at this point on the dmraid driver :)

Now the fun part - compiling "new" stuff. I then went on to the Intel drivers from their site (which are really LSI drivers). Its a smaller package than dmraid and I hoped would be a bit easier. These were downloaded and extracted into the drivers/vmkdrivers/src_current/drivers/scsi/<your folder> location. I then made a copy of the build script and picked a driver I hoped would be close to mine - another LSI raid driver - and cut all the lines not relating to that driver out. Once done, I then cloned the 'per c' file records to match the number of c files I had (3) and updated the driver number, name and paths and filenames. For the link command I also added in the precompiled archive file for redhat 5 64 bit (closest match I could guess) - unfortunately Intel and LSI don't provide "all" the source code.

Running the cut down script two things where apparent. First, there was a missing header ioctl32.h and second one of the c files had major issues compiling.

ioctl32.h is not included in ESXi - looking through the code it actually referenced two different versions, a linux and an asm version. Based on the if checks the asm was the one we wanted. It looked like this was an old reference, since ioctl32.h no longer exists in centos. I found that the asm library effectively pointed to a compatibility support ioctl.h so put this into the path for compilation. This resolved the calls to this header with no further issues.

With the major compile error it looks like the driver did not anticipate an environment where we could have a linux kernel > 2.6 and have a VM Module define at the same time. The code seemed to suggest that the 2.6 path would give us what was needed, so I made a small edit to the build script to remove the define for the VM Module reference.

Running build again there is now only a single warning, and we get a 4.6Mb megasr.o file - most of the size comes from the archive from my experimenting.

To test this out, I followed advice on the forum here and simply dropped the .o file into /usr/lib/vmware/vmkmod/ and ran vmkload_mod megasr.o debug=10

Now the issue is that I am getting "Unresolved symbols". I have looked through the message log and every single one relates to a subset of the custom functions from the driver.

At this point google is finally failing me, I am hoping someone can offer some guidance on how to work through this. If nothing else it has been a great learning experience, hoping to find a solution.

Thoughts / suggestions most welcome!

Cheers,

Gmk2


Wed Mar 09, 2011 4:12 am
Profile

Joined: Wed Mar 09, 2011 3:14 am
Posts: 5
Post Re: s3420gplx saga - fakeraid for ESXi 4.1
The errors in the message log contain multiple instances of the following:

Code:
WARNING: Elf: 1742: Relocation of symbol <megasr_printk> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_memcpy> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_memcmp> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <osl_alloc_mem_for_fw_download> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_rescan> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_memmove> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <osl_get_adapter> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_stall_execution> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <lsraid_async_queue> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_read_register_ulong> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_stall_execution_msleep> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <megasr_raid1_double_buffer> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <megasr_adapter_class> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_create_timer> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_claim_resource> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_get_os_time> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <osl_free_mem_from_fw_download> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_set_device_queue_depth> failed: Unresolved symbol
WARNING: Elf: 1742: Relocation of symbol <oss_destroy_timer> failed: Unresolved symbol
WARNING: Elf: 3064: Kernel based module load of megasr failed: Unresolved symbol <ElfRelocateFile failed>


A bit more research suggests that maybe the issue is that the functions in the precompiled (no source available) archive are not being matched to the equivalent function in the newly compiled code. Possibly due to:
    precompiled library using an explicit kernel version compatibility option
    precompiled library using an explicit license option
    something else?

I would hope there is no explicit kernel version compatibility in the precompiled source, if there is then I don't know how you would resolve this.

I have not touched the license declarations - in fact only one of the three source files in the intel / LSI library declares a license. I would not expect this to be an issue.

Since the compile / link completed cleanly maybe there is another option I can use in the link to try and identify the cause of the problem?


Thu Mar 10, 2011 1:55 pm
Profile

Joined: Tue Nov 17, 2009 6:10 am
Posts: 83
Post Re: s3420gplx saga - fakeraid for ESXi 4.1
Hi,

Software RAID-1 in ESXi is a joke!

The only options to be safe are:

1) Hardware RAID-1 HBA --> BEST!
2) Hardware RAID-1 over SATA interposer (like external 2xHD enclosure with one eSATA port) --> FAILS WHEN INTERPOSER CONFIG LOST!
3) Single USB pendrive with sencondary backup --> RELIABLE!
4) Hardware RAID-1 SD card reader --> EXPENSIVE!

In my opinion, if you can't use the first one, the option 3 is the best: you can do a backup (dd of the full pendrive) with the ESXi server up, and write it to a second pendrive (with another computer you can write the image). The only bad point is that you need to manually change the pendrive in case of a wrong boot (the backup can be scheduled automatically).

Regards!


Fri Mar 11, 2011 1:38 am
Profile

Joined: Wed Mar 09, 2011 3:14 am
Posts: 5
Post Re: s3420gplx saga - fakeraid for ESXi 4.1
Appreciate the reply.

There are certainly limits to the applicability of the software / fakeraid approach.

I have worked through a number of other options and they all have a level of pro / con to them. For my specific home use, a software / fakeraid 1 with write back cache disabled looks to be as safe / safer than other equally available options (provided a driver can be established and tested!).

If nothing else it is proving an interesting puzzle to solve.

-- EDIT --

I found the linker was not including the source correctly. Resolving this removed the unresolved symbol errors I was getting, and created some new ones. The new list of symbols don't appear to be custom functions (for eg: memmove). Maybe something is wrong in the compile environment that these are not being resolved.. memmove is found in vmk's string.h so I would have thought it knew what to do with it.


Fri Mar 11, 2011 5:21 pm
Profile
Site Admin

Joined: Mon Mar 16, 2009 10:13 pm
Posts: 3875
Post Re: s3420gplx saga - fakeraid for ESXi 4.1
I'm not a driver compiling guy so I don't have much to offer on this. I appreciate all the detail you've posted and I hope you get it worked out.

_________________
Dave Mishchenko
VMware vExpert 2009-2012
Image
Now available - VMware ESXi: Planning, Implementation, and Security
Also available - vSphere Quick Start Guide


Fri Mar 11, 2011 11:49 pm
Profile

Joined: Wed Mar 09, 2011 3:14 am
Posts: 5
Post Re: s3420gplx saga - fakeraid for ESXi 4.1
Having resolved the original compile issue, the unresolved symbols all seemed to relate to "standard" linux functions (like memmove) or vm functions common with other modules.

From what I can guesstimate, the ESXi kernel does not host common functions in as much as a typical linux kernel does, so the whole dynamic linking thing doesn't work so well. From this perspective it seems that I need to get 'all' the functions to be resolved before I can see if the driver even works.

I tried linking in strings.o to get memmove to resolve to no avail. I then made a cutdown custom file with only the strings.c includes, the memmove function and its EXPORT_SYMBOL line. As a result of compiling and linking this object I now see memmove as a defined function in the driver.

So the challenge is to understand why using the compiled strings.o file did not produce the same outcome. If I cannot figure that out I am down to 'one at a time' pulling out the missing functions and adding to my custom package to make this compile.

Ideas welcome - has anyone had a go at this? Would be great to get some input.


Wed Mar 16, 2011 2:01 am
Profile

Joined: Wed Mar 09, 2011 3:14 am
Posts: 5
Post Re: s3420gplx saga - fakeraid for ESXi 4.1
Ok, bit more crawling through the code...

Digging through the source (VM's, not the drivers) I found a reference to changes from ESXi 4.0 to ESXi 4.x whereby they stopped exporting all symbols; there is a file vmkdrivers/src_v4/vmklinux26/vmware/linux_exports.c which makes the following statement:
Code:
/*
 * In the 4.0 release, the loader exports all the "extern" symbols that
 * vmklinux has. In 4.x, the loader only exports symbols that are
 * explicitly tagged via the use of the macro EXPORT_SYMBOL(). In order
 * to maintain binary compatibility, those "extern" * symbols that are
 * not tagged are tagged in this file.
 *
 * This file only contains the EXPORT_SYMBOL() calls.
 */

When these are exposed via an EXPORT_SYMBOL statement these then get hit by vmkdrivers/src_v4/include/linux/module.h:
Code:
#define EXPORT_SYMBOL(sym)              VMK_MODULE_EXPORT_SYMBOL(sym)

This then eventually leads into BLD/build/HEADERS/vmkapi-current/vmkernel64/release/base/vmkapi_module_int.h:
Code:
#define __VMK_MODULE_EXPORT_SYMBOL_ALIASED(__symname, __alias)          \
   static char __vmk_symbol_str_##__symname##__alias[] = #__symname;    \
   static char __vmk_symbol_str_##__alias[] = #__alias;                 \
   static struct vmk_ExportSymbolAlias const                            \
   __vmk_symbol_##__symname##__alias                                    \
   __VMK_EXPORT_ALIAS_ATTRS                                             \
   = {                                                                  \
      .name = __vmk_symbol_str_##__symname##__alias,                    \
      .alias = __vmk_symbol_str_##__alias,                              \
   };

As a result, my suspicion now is that things are not linking because my driver is looking for:
Code:
  7302: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printk

While the vm environment is providing:
Code:
  743: 00000000000016b0    16 OBJECT  LOCAL  DEFAULT    9 __vmk_symbol_printk

The trick is some of these driver references are in a precompiled library from LSI - for which no source is available. Most of these unresolved symbols look to be standard c library, including this makes no change.

I guess I need to find a way to alias the VM functions or still missing some header?


Mon Mar 28, 2011 2:49 am
Profile
Site Admin

Joined: Mon Mar 16, 2009 10:13 pm
Posts: 3875
Post Re: s3420gplx saga - fakeraid for ESXi 4.1
How have you made out with this?

_________________
Dave Mishchenko
VMware vExpert 2009-2012
Image
Now available - VMware ESXi: Planning, Implementation, and Security
Also available - vSphere Quick Start Guide


Thu Apr 07, 2011 11:13 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 8 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF.