Following a networking change there was a warm start on our IBM V7000 storage nodes\cannisters that caused an outage to the VMware environment in the sense that locks on certain LUNs caused a mini-APD (all Paths Down) This issue occurs if the ESXi/ESX host cannot reserve the LUN. The LUN may be locked by another host (an ESXi/ESX host or any other server that has access to the LUN). Typically, there is nothing queued for the LUN. The reservation is done at the SCSI level.
Caution: The reserve, release, and reset commands can interrupt the operations of other servers on a storage area network (SAN). Use these commands with caution.
Note: LUN resets are used to remove all SCSI-2 reservations on a specific device. A LUN reset does not affect any virtual machines that are running on the LUN.
- SSH into the host and type esxcfg-scsidevs -c to verify that the LUN is detected by the ESX host at boot time. If the LUN is not listed then rescan the storage
- Next type cat /var/log/vmkernel.log
- press Shift+G to reach the end of the file
- You will see messages in the log such as below
- x0b1800, oxid xffff SCSI Reservation Conflict –
2015-01-23T18:59:57.061Z cpu63:32832)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 3:(0):3271: FCP cmd x16 failed <0/4> sid x0b2700, did
- You will need to find the naa ID or the vml ID of the LUNs you need to reset.
- You can do this by running the command esxcfg-info | egrep -B5 “s Reserved|Pending”
- The host that has Pending Reserves with a value that is larger than 0 is holding the lock.
- We then had to run the below command to reset the LUNs
- vmkfstools -L lunreset /vmfs/devices/disks/naa.60050768028080befc00000000000116
- Then run vmkfstools -V to rescan
- Occasionally you may need to restart the management services on particular hosts by running /sbin/services.sh restart in a putty session then restart the vCenter service but it depends on your individual situation
What is LUN Masking?
LUN (Logical Unit Number) Masking is an authorization process that makes a LUN available to some hosts and unavailable to other hosts.LUN Masking is implemented primarily at the HBA (Host Bus Adapter) level. LUN Masking implemented at this level is vulnerable to any attack that compromises the HBA. Some storage controllers also support LUN Masking.
LUN Masking is important because Windows based servers attempt to write volume labels to all available LUN’s. This can render the LUN’s unusable by other operating systems and can result in data loss.
How to MASK on a VMware ESXi Host
- Step 1: Identifying the volume in question and obtaining the naa ID
- Step 2: Run the esxcli command to associate/find this naa ID with the vmhba identifiers
- Step 3: Masking the volume when you want to preserve data from the VMFS volumes for later use or if the volume is already deleted
- Step 4: Loading the Claim Rules
- Step 5: Verify that the claimrule has loaded:
- Step 6: Unclaim the volume in question
- Step 7: Check Messages
- Step 8: Unpresent the LUN
- Step 9: Rescan all hosts
- Step 10 Restore normal claim rules
- Step 11: Rescan Datastores
- Check in both places as listed in the table above that you have the correct ID
- Note: Check every LUN as sometimes VMware calls the same Datastore different LUN Numbers and this will affect your commands later
- Make a note of the naa ID
- Once you have the naa ID from the above step, run the following command
- Note we take the : off
- -L parameter will show a compact list of paths
- We can see there are 2 paths to the LUN called C0:T0:L40 and C0:T1:L40
- C=Channel, T=Target, L=LUN
- Next we need to check and see what claim rules exist in order to not use an existing claim rule number
- esxcli storage core claimrule list
- Note I had to revert to the vSphere 4 CLI command as I am screenprinting from vSphere 5 not 4!
- At this point you should be absolutely clear what LUN number you are using!
- Next, you can use any rule numbers for the new claim rule that isn’t in the list above and pretty much anything from 101 upwards
- In theory I have several paths so i should do this exercise for all of the paths
- The Class for those rules will show as file which means that it is loaded in /etc/vmware/esx.conf but it isn’t yet loaded into runtime.
- Run the following command to see those rules displayed twice, once as the file Class and once as the runtime Class
- Before these paths can be associated with the new plugin (MASK_PATH), they need to be disassociated from the plugin they are currently using. In this case those paths are claimed by the NMP plugin (rule 65535). This next command will unclaim all paths for that device and then reclaim them based on the claimrules in runtime.
- Refresh the Datastore and you should see it vanish from the host view
- Run the following command to check it now shows no paths
- esxcfg-mpath -L | grep naa.60050768028080befc00000000000050 again will now show no paths
- Now get your Storage Team to remove the LUN from the SAN
- Rescan all hosts and make sure the Datastore has gone
- To restore normal claimrules, perform these steps for every host that had visibility to the LUN, or from all hosts on which you created rules earlier:
- Run esxcli corestorage claimrule load
- Run esxcli corestorage claimrule list
- Note that you do not see/should not see the rules that you created earlier.
- Perform a rescan on all ESX hosts that had visibility to the LUN. If all of the hosts are in a cluster, right-click the cluster and click Rescan for Datastores. Previously masked LUNs should now be accessible to the ESX hosts
- Next you may have to follow the following KB Article if you find you have these messages in the logs or you cannot add new LUNs
- Run the following commands on all HBA Adapters
Useful Video of LUN Masking
Useful VMware Docs (ESXi4)
Useful VMware Doc (ESXi5)
This week we have upgraded our hosts to VMware ESXi 4.1.0, 582267. Our storage guy has given us 2 x 2TB LUN’s but I was unable to add them as per screen-print below. Previously he has created 2TB LUNs and these have been fine
Unable to read partition information from disk
It seems Update 2 enforces the maximum LUN size, which is 2TB minus 512 Bytes with vSphere 4.x. Depending on the storage system, 2 TB could be either 2.000 GB (marketing size) or 2.048GB (technical size). The above mentioned maximum relates to the technical size, so with the storage system you have, you may need to configure 2.047GB max.