ESXi Issues caused by hp-ams module

by Florian Grehl
December 15, 2014December 26, 2014
10 Comments

I recently had strange issues with Hewlett-Packard servers. ESXi hosts randomly have shown a couple of different symptoms:

ESXi host unmanageable
ESXi host grayed out in vCenter
Starting host services fails with an error message:
Call "HostServiceSystem.Restart" for object "serviceSystem-[*]" on vCenter Server * failed.
Cannot perform vMotion to or from the host
Starting virtual machine fails with an error message:
Power On virtual machine *
A general system error occurred: The virtual machine could not start
VMK_NO_MEMORY
Restarting services in DCUI fails
A general system error occurred: Command /bin/sh failed
SSH connection to the host possible, but no response after login requests
Local console displays an error message:
/bin/sh cannot fork
Error Message received at syslog server
sfcb-HTTPS-Daemon[*]: handleHttpRequest fork failed: Cannot allocate memory
crond[*]: can't vfork
cpu*:*)WARNING: Heap: *: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed.
cpu*:*)WARNING: Heap: *: Heap globalCartel-1 already at its maximum size. Cannot expand)
DCUI message log (ALT+F12) displays an error message
WARNING: Heap: *: Heap globalCartel-1 already at its maximum size. Cannot expand.

The problem was caused by the hp-ams module (HP Agentless Management Service) which has a known problem in these versions:

hp-ams 9.5
hp-ams 9.6
hp-ams 10.0

You can verify the version with the following command:

# esxcli software vib list |grep hp-ams

The issue has been resolved in hp-ams 10.0.1 which an be downloaded from the HP website:

If you cannot upgrade the server immediately due to change management processes, you can also mitigate the issue by stopping the hp-ams service and removing the package:

# /etc/init.d/hp-ams.sh stop 
# esxcli software vib remove -n hp-ams

Tags:HP Issue

10 thoughts on “ESXi Issues caused by hp-ams module”

Pavel December 15, 2014 at 10:42 pm
Reply
Yeah! I got this problem on my servers HP DL360 with ESXi 5.5. Thanks for solution.
Angel December 15, 2014 at 11:49 pm
Reply
Interesting.. last Friday I experienced exactly the same problem.
I am still puzzled what can be the cause of this bug. I had ESX hosts with hp-ams 9.6 running for half a year without issues. :???:
1. fgrehl December 16, 2014 at 7:10 am
  Reply
  Me too... Some die after 4-5 weeks, some still alive after month. Maybe something or someone triggers hp-ams services which causes that issue.
Dean December 16, 2014 at 3:32 pm
Reply
Its caused by a memory leak that fills the ram allocation on the ESXi host, meaning it cannot respond to requests, everything else is just false alarms tbh,
if you esxcli onto the host, you will find it gives the cannot fork message,
I wrote a post on this a while ago.
I would recommend you add in the HP repository to your update manager download sources to capture the latest HP customized drivers.
http://www.educationalcentre.co.uk/automatically-download-hp-drivers-to-vmware-update-manager/
Mark January 16, 2015 at 3:03 pm
Reply
Thanks for this post. I'm not sure the RAM usage has anything to do with it. For example, I've got a host with 213GB out of 255GB used (MemoryUsageGB from get-vmhost in powercli) and ssh start works, where other hosts at a lower memory usage level (180GB for example) are unable to have ssh started. Are you looking at another metric when you mention a RAM leak? Thanks
1. fgrehl January 16, 2015 at 6:10 pm
  Reply
  The physical memory usage has nothing to do with this issue. The problem can happen with 180GB free physical memory.
  There is a limit for the vmkernel. I haven't figured out what's the limit or how to measure it.
2. 1. Mark January 20, 2015 at 8:29 pm
    Reply
    Many thanks. Until we refine it with something more specific, this seems to at least tell us when a host has hit the issue (splunk search - "esx5_syslog" "globalCartel-1 "). Update here if I find a way to measure.
  2. 1. fgrehl January 20, 2015 at 8:44 pm
      Reply
      Don't forget that we have a fix for that issue ;-)
    2. 1. Mark January 21, 2015 at 2:35 pm
        There is always a fix. It's trying to figure out exactly when to implement etc (wait till new release, that includes fix etc). I get you though. FYI, first response from sneddo here - https://communities.vmware.com/message/2468620#2468620 works well as a report. Script from HP is good for console, but needed report type info.
Peter January 30, 2015 at 11:05 am
Reply
Hi!
does this problem occur on all G 6/7 or Gen8 servers?

ESXi Issues caused by hp-ams module

Share:

10 thoughts on “ESXi Issues caused by hp-ams module”

Leave a Reply Cancel reply