VMware: Reconnecting “Disconnected” VMWare ESX Server to Virtual Infrastructure (vpx,vmware-hostd)

A lot of backups failed overnight and this was my first clue to a problem in the virtual environment.  I checked the monitoring server and there were no down VMs or alerts coming from the hosts.  After logging into the management system through virtual infrastructure client, I saw that all the virtual machines were running, but about 30 showed “disconnected”.  These VMs were in a muted, almost dulled out color and were un-manageable.  There was also an alarm on the connection state on our second VMWare ESX server.

Image of the Infrastructure Client showing VMs in disconnected state.  VM names were replaced with underlines.

View when the malfunctioning ESX server is selected

Bellow is a snippet of the log showing an error from VPXA

[2008-10-27 16:41:39.482 'App' 3076446112 verbose] [SchedulePolling] Last stats polling used [0] ms
[2008-10-27 16:41:39.482 'App' 3076446112 verbose] [VpxaHalCnxHostagent] Creating temporary connect spec: localhost:443
[2008-10-27 16:41:39.483 'App' 3076446112 error] [VpxaHalCnxHostagent] Failed to discover namespace: Connection refused

[2008-10-27 16:41:39.483 'App' 3076446112 warning] [VpxaHalCnxHostagent] Could not resolve namespace for authenticating to host agent
When comparing the vmware-* process listing from a working vmware server, vmware-hostd was running.  On the server having problems, it was not.

# ps -auxc | grep vmware
root      2179  0.0  0.0  4256  228 ?        S    Sep29   0:00 vmware-watchdog
root      2823  0.0  0.1  4268  444 ?        S    Sep29   0:00 vmware-watchdog
root      2907  0.0  0.2  4256  568 ?        S    Sep29   0:00 vmware-watchdog
root      5801  0.0  0.4  4204 1144 ?        S    16:41   0:00 vmware-watchdog

I could not determine what init.d script called vmware-hostd.  Update: it is the  /etc/init.d/mgmt-vmware script that calls vmware-hostd.

Example

# /etc/init.d/mgmt-vmware status
vmware-hostd (pid 6047) is running

Since the system had 30 virtual machines working fine but un-manageable, we did not want to risk taking the whole system down.  Nohup was used to start vmware-hostd as a background process to make sure it would stay alive after the SSH session was disconnected.

Restart vmware-hostd

# /etc/init.d/mgmt-vmware restart

or

# nohup /usr/sbin/vmware-hostd &
[1] 6047
# nohup: appending output to `nohup.out’

Re-check vmware processes
# ps -auxc | grep vmware
root      2179  0.0  0.0  4256  228 ?        S    Sep29   0:00 vmware-watchdog
root      2823  0.0  0.1  4268  444 ?        S    Sep29   0:00 vmware-watchdog
root      2907  0.0  0.2  4256  568 ?        S    Sep29   0:00 vmware-watchdog
root      5801  0.0  0.4  4204 1144 ?        S    16:41   0:00 vmware-watchdog
root      6047 11.0 19.8 70396 53304 pts/2   R    16:54   0:01 vmware-hostd

Connection restored to Virtual Infrastructure server!

Migrated virtual machines off to the other ESX servers and entered maintenance mode on server 2.  The server generated core dumps and this is an intermittent problem we have been having.  Will re-install VMWare ESX on it to see if it will fix the issues.  If not, time to troubleshoot hardware and OS.

Note:  This process resolved the issue without taking down the running VMs on the box.

About these ads

~ by Kevin Goodman on October 27, 2008.

6 Responses to “VMware: Reconnecting “Disconnected” VMWare ESX Server to Virtual Infrastructure (vpx,vmware-hostd)”

  1. This Saved me hours of hunting thanks

  2. Very good description. This information helped immediatly and
    saved hours :-)

  3. Thank you ! :-)

  4. Thanks for the description. Definitely helped me.

    One thing to note that… the mgmt-restart didn’t restart for me, so I had to kill the vmware-hostd process (ps -ef|grep vmware-hostd… kill PID etc etc). Then the restart process worked beautifully and connection restored.

    Thanks once again.

  5. a big thanks

  6. This worked great! Thanks for posting this!

    I had two hosts that would disconnect themselves a couple minutes after reconnecting them through vsphere. Now they stay connected.

    [root@ESX01 ~]# ps -auxc | grep vmware
    Warning: bad syntax, perhaps a bogus ‘-’? See /usr/share/doc/procps-3.2.7/FAQ
    root 2741 0.0 0.3 65940 1280 ? S May01 0:00 vmware-watchdog
    root 2865 0.0 0.3 65936 1280 ? S May01 0:00 vmware-watchdog
    root 2904 0.0 0.3 65936 1276 ? S May01 0:00 vmware-watchdog
    root 2924 0.0 0.3 65936 1284 ? S May01 0:00 vmware-watchdog
    root 4693 0.0 0.3 65940 1284 ? S May01 0:00 vmware-watchdog
    root 4700 3.9 19.7 122444 78688 ? Ssl May01 147:04 vmware-hostd
    root 4887 0.0 0.3 65936 1288 ? Ss May01 0:00 vmware-watchdog
    [root@ESX01 ~]# /etc/init.d/mgmt-vmware status
    vmware-hostd (pid 4700) is running…
    [root@ESX01 ~]# /etc/init.d/mgmt-vmware restart
    Stopping VMware ESX Management services:
    VMware ESX Host Agent Watchdog [ OK ]
    VMware ESX Host Agent [ OK ]
    Starting VMware ESX Management services:
    VMware ESX Host Agent (background) [ OK ]
    Availability report startup (background) [ OK ]
    [root@ESX01 ~]# ps -auxc | grep vmware
    Warning: bad syntax, perhaps a bogus ‘-’? See /usr/share/doc/procps-3.2.7/FAQ
    root 2741 0.0 0.3 65940 1280 ? S May01 0:00 vmware-watchdog
    root 2865 0.0 0.3 65936 1280 ? S May01 0:00 vmware-watchdog
    root 2904 0.0 0.3 65936 1276 ? S May01 0:00 vmware-watchdog
    root 2924 0.0 0.3 65936 1284 ? S May01 0:00 vmware-watchdog
    root 4887 0.0 0.3 65936 1308 ? Ss May01 0:00 vmware-watchdog
    root 15250 0.0 0.2 63844 1192 pts/0 S 12:44 0:00 vmware-watchdog
    root 15256 25.8 9.4 84680 37628 ? Ssl 12:44 0:01 vmware-hostd

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 1,372 other followers

%d bloggers like this: