Cisco Nexus 1000v 4.2 Upgrade Problems And Fix


Update


I’m glad  Jeff (see comments) caught where I went wrong.  After his comment I went back through the n1000v_upgrade_software PDF from cisco.  For version 4.0(4)SV1(3a), the VEMs should have been upgraded first.  It is on page 9 of the document.

Upgrading from Release 4.0(4)SV1(3, 3a, or 3b) to Release 4.2(1)SV1(4)

  • Step 1 Upgrading the VEMs: Release 4.0(4)SV1(2, 3, 3a, 3b) to Release 4.2(1)SV1(4), page 20
  • Step 2 Upgrading the VSMs to Release 4.2(1)SV1(4) Using the Upgrade Application, page 33
  • I was following lower down in the document that showed an upgrade of the VSM first then the VEM.  Looking closer that was for one that required an intermediate upgrade between the current running and the newer 4.2(1)SV1(4) firmware.  Thanks for the catch!


    Original post:

    So last weekend was our Cisco Nexus 1000v upgrade. We were upgrading from 4.0(4)SV1(3a) to 4.2(1)SV1(4). What should have been an easy upgrade really turned into a huge headache. Below is a walk-through of the upgrade process with notes on where things went wrong.

    To start, lets verify the current running state of the VSM (1000v switch) and the VEM (ESX host modules). Below you will see that they are all on version 4.0(4)SV1(3a). Also note the standby VSM module number is 2. This will be the first one reloaded.

    1000vSW# sh module
    Mod  Ports  Module-Type                      Model              Status
    ---  -----  -------------------------------- ------------------ ------------
    1    0      Virtual Supervisor Module        Nexus1000V         active *
    2    0      Virtual Supervisor Module        Nexus1000V         ha-standby
    3    248    Virtual Ethernet Module          NA                 ok
    4    248    Virtual Ethernet Module          NA                 ok
    5    248    Virtual Ethernet Module          NA                 ok
    6    248    Virtual Ethernet Module          NA                 ok
    7    248    Virtual Ethernet Module          NA                 ok
    8    248    Virtual Ethernet Module          NA                 ok
    9    248    Virtual Ethernet Module          NA                 ok
    
    Mod  Sw                Hw
    ---  ----------------  ------------------------------------------------
    1    4.0(4)SV1(3a)      0.0
    2    4.0(4)SV1(3a)      0.0
    3    4.0(4)SV1(3a)      VMware ESXi 4.1.0 Releasebuild-348481 (2.0)
    4    4.0(4)SV1(3a)      VMware ESXi 4.1.0 Releasebuild-348481 (2.0)
    5    4.0(4)SV1(3a)      VMware ESXi 4.1.0 Releasebuild-348481 (2.0)
    6    4.0(4)SV1(3a)      VMware ESXi 4.1.0 Releasebuild-348481 (2.0)
    7    4.0(4)SV1(3a)      VMware ESXi 4.1.0 Releasebuild-348481 (2.0)
    8    4.0(4)SV1(3a)      VMware ESXi 4.1.0 Releasebuild-348481 (2.0)
    9    4.0(4)SV1(3a)      VMware ESXi 4.1.0 Releasebuild-348481 (2.0)

    Show boot displays the current location/firmware being used on bootup. This will be changed with the upgrade.

    1000vSW# sh boot
    sup-1
    kickstart variable = bootflash:/nexus-1000v-kickstart-mz.4.0.4.SV1.3a.bin
    system variable = bootflash:/nexus-1000v-mz.4.0.4.SV1.3a.bin
    sup-2
    kickstart variable = bootflash:/nexus-1000v-kickstart-mz.4.0.4.SV1.3a.bin
    system variable = bootflash:/nexus-1000v-mz.4.0.4.SV1.3a.bin
    No module boot variable set

    Just like any other devices, make sure there is enough room on the bootflash device to store the new firmware

    1000vSW# dir
    
    Usage for bootflash://
      481083392 bytes used
     1907187712 bytes free
     2388271104 bytes total

    After adding up the file size of the both files, there is enough on the bootflash to hold the new firmware. I already had the files on a host running scp, but the files can be loaded from any hosts running the following protocols

  • TFTP
  • FTP
  • SCP
  • SFTP
    The syntax for copying over SCP is as follows 

    copy scp://user01@127.0.0.15/IOS/1000v/nexus-1000v-kickstart-mz.4.2.1.SV1.4.bin bootflash://
    copy scp://user01@127.0.0.15/IOS/1000v/nexus-1000v-mz.4.2.1.SV1.4.bin bootflash://

    From the active VSM (module 1 in this case), the new firmware is loaded into the configuration via the “install” command.

    1000vSW# install all system bootflash:nexus-1000v-mz.4.2.1.SV1.4.bin kickstart bootflash:nexus-1000v-kickstart-mz.4.2.1.SV1.4.bin
    System image sync to standby is in progress...
    System image is synced to standby.
    Kickstart image sync to Standby is in progress...
    Kickstart image is synced to standby.
    Boot variables are updated to running configuration.

    The boot variables were updated per the above, but it is still worth verifying.  Below shows that the running configuration was updated to load the new firmware.

    1000vSW# sh running-config | inc boot
    boot kickstart bootflash:/nexus-1000v-kickstart-mz.4.2.1.SV1.4.bin sup-1
    boot system bootflash:/nexus-1000v-mz.4.2.1.SV1.4.bin sup-1
    boot kickstart bootflash:/nexus-1000v-kickstart-mz.4.2.1.SV1.4.bin sup-2
    boot system bootflash:/nexus-1000v-mz.4.2.1.SV1.4.bin sup-2

    I almost skipped this command the first time. Make sure to save the configuration to startup-config or the reload of standby module will load up the startup-configs boot variables. That would bring it up on the same 4.0(4)SV1(3a) version.

    1000vSW# copy running-config startup-config
    [########################################] 100%

    We saw the module status earlier in this post, but I will reiterate it here. Module 2 shows as the standby VSM here, so it will be reloaded first.

    1000vSW# sh module
    Mod  Ports  Module-Type                      Model              Status
    ---  -----  -------------------------------- ------------------ ------------
    1    0      Virtual Supervisor Module        Nexus1000V         active *
    2    0      Virtual Supervisor Module        Nexus1000V         ha-standby

    Time to reboot the standby module. This is done as follows.

    1000vSW# reload module 2
    This command will reboot standby supervisor module. (y/n)?  [n] y
    about to reset standby sup
    1000vSW# 2011 Feb 25 19:39:33 1000vSW %PLATFORM-2-PFM_MODULE_RESET: Manual restart of Module 2 from Command Line Interface

    Module 2 is still restarting, notice it missing under “Mod”

    1000vSW# sh module
    Mod  Ports  Module-Type                      Model              Status
    ---  -----  -------------------------------- ------------------ ------------
    1    0      Virtual Supervisor Module        Nexus1000V         active *
    3    248    Virtual Ethernet Module          NA                 ok
    4    248    Virtual Ethernet Module          NA                 ok

    * you can bring up the console of the standby switch in vCenter to check progress

    Ok, this is where everything went wrong for us.  A typical upgrade would have the module connect back in showing the new version of code running on it. For us, this did not happen. Instead module 1 and 2 were not able to communicated with each other. Both decided to become active.

    This became a big problem. All VMs running were still able to pass traffic across the network, but none of ESX hosts showed up as modules in the VSM, thus isolating them. First symptom I saw was connectivity loss via SSH to the VSM IP. It seemed to happen about every two minutes. luckily I saw the firmware version changing each time I did a show module after reconnecting. So the two 1000v modules were in IP conflict with each other, stealing it away from one another every few minutes.

    I waited for the module running version 4.0(4)SV1(3a) to grab the IP again and issued a reload. This was module 1. Once it restarted and came up on the new firmware, it started talking with module 2.

    1000vSW# sh module
    Mod  Ports  Module-Type                       Model               Status
    ---  -----  --------------------------------  ------------------  ------------
    1    0      Virtual Supervisor Module         Nexus1000V          ha-standby
    2    0      Virtual Supervisor Module         Nexus1000V          active *
    
    Mod  Sw                Hw
    ---  ----------------  ------------------------------------------------
    1    4.2(1)SV1(4)      0.0
    2    4.2(1)SV1(4)      0.0

    Now that we are on the same firmware, it was time to make module 1 primary/active once more. This was non-intrusive.

    1000vSW# system switchover

    Once more to verify that module 1 has become active:

    1000vSW# sh module
    Mod  Ports  Module-Type                       Model               Status
    ---  -----  --------------------------------  ------------------  ------------
    1    0      Virtual Supervisor Module         Nexus1000V          active *
    2    0      Virtual Supervisor Module         Nexus1000V          ha-standby
    
    Mod  Sw                Hw
    ---  ----------------  ------------------------------------------------
    1    4.2(1)SV1(4)      0.0
    2    4.2(1)SV1(4)      0.0

    Also to be safe, have the VSM reconnect with Virtual Center.

    1000vSW## configure t
    Enter configuration commands, one per line.  End with CNTL/Z.
    1000vSW#(config)# svs connection VirtualCenter
    1000vSW#(config-svs-conn)# connect
    1000vSW#(config-svs-conn)# end

    Verify the connection as well.

    1000vSW#  show svs connections
    
    connection vcenter:
        ip address: 127.0.0.99
        remote port: 80
        protocol: vmware-vim https
        certificate: default
        datacenter name: Test
        DVS uuid: 55 55 55 55 55 55 55 55
        config status: Enabled
        operational status: Connected
        sync status: Complete
        version: VMware vCenter Server

    At this point we should be good. At least that is what I thought.  Going back to what I said before
    “but none of ESX hosts showed up as modules in the VSM.”
    Yeah, I didn’t notice that untill after the fact. Normally I vmotion over an individual host to a different server to test connectivity, but this time I didn’t.

    I selected our first vSphere (ESX) host and entered maintenance mode, thus vmotioning all VMs off to other hosts. Immediately they fell off of the network. All pings were lost. These were running critical/production servers. Of course we were in a maintenance window, but this still was worst case scenario. Normally each hosts VEM would show up in the show module, but they did not after the upgrade.

    1000vSW# sh module
    Mod  Ports  Module-Type                       Model               Status
    ---  -----  --------------------------------  ------------------  ------------
    1    0      Virtual Supervisor Module         Nexus1000V          active *
    2    0      Virtual Supervisor Module         Nexus1000V          ha-standby
    
    Mod  Sw                Hw
    ---  ----------------  ------------------------------------------------
    1    4.2(1)SV1(4)      0.0
    2    4.2(1)SV1(4)      0.0
    
    Mod  MAC-Address(es)                         Serial-Num
    ---  --------------------------------------  ----------
    1    00-00-00-00-00-00 to 00-00-00-00-00-00  NA
    2    00-00-00-00-00-00 to 00-00-00-00-00-00  NA
    
    Mod  Server-IP        Server-UUID                           Server-Name
    ---  ---------------  ------------------------------------  --------------------
    1    127.0.0.55     NA                                    NA
    2    127.0.0.55     NA                                    NA

    The above resembled what I was seeing. Just the VSM modules. The mac addresses were correct, I just changed them to all 0′s here. We tried shutting down a VM and moving it to different hosts in the cluster, but still no pings. I also tried stopping and starting the VEM via command line on the ESX host.

    The only thing that fixed this was pushing the VEM upgrades. This was done using VMware Update Manager. Very easy, but we had to put the hosts into maintenance mode to do so. This wound up being very time-consuming. Once server 1′s VEM was updated, we were able to vMotion VMs to it without losing network connectivity to them. It went like this:

  • Update ESX host 1 VEM to 4.2
  • vMotion ESX host 2 VMs to host 1
  • Update ESX host 2 VEM to 4.2
  • vMotion ESX host 3 VMs to host 2
  • Update ESX host 3 VEM to 4.2
  • vMotion ESX host 4 VMs to host 3
  • Update ESX host 4 VEM to 4.2
  • Repeat until done!
  • It would have been a lot easier if the hosts could have been placed into maintenance mode and VMs automatically migrated to other members in the cluster. Once completed, everything looked fine in the VSM. Even the hosts showed up!

    Mod  Server-IP        Server-UUID                           Server-Name
    ---  ---------------  ------------------------------------  --------------------
    1    127.0.0.55       NA                                    NA
    2    127.0.0.55       NA                                    NA
    3    127.0.0.24       22222222-2222-2222-2222-22222222221g  vm_host01 
    4    127.0.0.26       22222222-2222-2222-2222-22222222222g  vm_host03 
    5    127.0.0.27       22222222-2222-2222-2222-22222222223g  vm_host04 
    6    127.0.0.28       22222222-2222-2222-2222-22222222224g  vm_host05 
    7    127.0.0.25       22222222-2222-2222-2222-22222222225g  vm_host02 
    8    127.0.0.29       22222222-2222-2222-2222-22222222226g  vm_host06
    9    127.0.0.120      22222222-2222-2222-2222-22222222228w  vm_host07

    A new rule for us to now follow: Always put one host into maintenance mode before upgrading!
    This would have allowed us to upgrade without downtime in this scenario.

    About these ads

    ~ by Kevin Goodman on March 1, 2011.

    2 Responses to “Cisco Nexus 1000v 4.2 Upgrade Problems And Fix”

    1. I could be mistaken, but when I read the extensive 1(4) upgrade documentation (and watched the videos), I thought the process was to update the VEMs to 1(4) first, then update the VSM’s to 1(4)?.

    2. You are right. I went back through the n1000v_upgrade_software PDF from cisco. For version 4.0(4)SV1(3a), the VEMs should have been upgraded first. It is on page 9 of the document.

      Upgrading from Release 4.0(4)SV1(3, 3a, or 3b) to Release 4.2(1)SV1(4)
      Step 1 Upgrading the VEMs: Release 4.0(4)SV1(2, 3, 3a, 3b) to Release 4.2(1)SV1(4), page 20
      Step 2 Upgrading the VSMs to Release 4.2(1)SV1(4) Using the Upgrade Application, page 33

      I was following lower down in the document that showed an upgrade of the VSM first then the VEM. Looking closer that was for one that required an intermediate upgrade between the current running and the newer 4.2(1)SV1(4) firmware. Thanks for the catch! Will update the post shortly.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

     
    Follow

    Get every new post delivered to your Inbox.

    Join 1,372 other followers

    %d bloggers like this: