DLPAR or Dynamic Logical Partitioning problems

DLPAR provides users the ability to dynamically add, remove or modify LPAR resources such as memory, CPU, or I/O devices. The most common problem with DLPAR operations is related to RMC (Resource Monitoring and Control).

Since DLPAR function relies on RMC connection between HMC and LPARs, you should ensure that the public network interface of your HMC is properly configured and HMC can reach your LPARs via network (HMC connection to FSP (Flexisible Service Processor) of managed systems is not enough). If any firewalls between HMC and LPARs exist, check that port 657 upd/tcp is open in both directions. Ensure that RMC connection is allowed for the public interface of HMC as well:

HMC Management --> Change Network Settings --> LAN adapters (choose the public one) --> Firewall settings

To start troubleshooting any problem with Dynamic Logical Partitioning, go to HMC restricted shell and check the output of the command:

# lspartition -dlpar
The output will show you any logical partitions which are ready for DLPAR operations. If there is no output at all, that means that either there are no LPARs which can communicate with HMC via network, or there is some problem with the HMC itself. If you suspect the HMC, try with rebooting it. Experience shows that lots of strange problems associated with HMC can be solved just by reboot.
The output of lspartition -dlpar of working RMC communication for the LPAR in interest should be something similar to that:
<#1> Partition:<1 10.30.23.15="" hostname="">
Active:<1>, OS:, DCaps:<0x4ebf>, CmdCaps:<0x1b 0x1b="">, PinnedMem:<384>
You should check if DCaps value is higher than 0x0 and active value is higher than 0. If it is not, perform the next steps from the LPAR you are trying to perform DLPAR operations.
Check the RMC connection to HMC using the following command:
# lsrsrc IBM.ManagementServer
If you are using AIX 7.1 type the following command instead:
# lsrsrc IBM.MCP
You should see something like this:
Resource Persistent Attributes for IBM.MCP
resource 1:
        MNName            = "10.30.23.15"
        NodeID            = 18194515442147552355
        KeyToken          = "hmc.localdomain"
        IPAddresses       = {"10.30.23.15"}
        ConnectivityNames = {"10.30.23.15"}
        HMCName           = "7042CR4*XXXXXXX"
        HMCIPAddr         = "10.30.23.10"
        HMCAddIPs         = "192.168.128.1"
        HMCAddIPv6s       = ""
        ActivePeerDomain  = ""
        NodeNameList      = {"Test"}
If you can see information about the HMC, it’s a good sign; if not, check the status of the main daemon IBM.DRM used for dynamic logical partitioning:
# lssrc -g rsct_rm
Subsystem         Group            PID          Status
 IBM.ServiceRM    rsct_rm                       inoperative
 IBM.DRM          rsct_rm                       inoperative
 IBM.ERRM         rsct_rm                       inoperative
 IBM.AuditRM      rsct_rm                       inoperative
 IBM.MgmtDomainRM rsct_rm                       inoperative
 IBM.HostRM       rsct_rm                       inoperative
If it is in inoperative state, you can restart it with the following commands (note that sometimes, especially in AIX 7, this daemon is not active all the time but only when needed):
# /usr/sbin/rsct/bin/rmcctrl -z
# /usr/sbin/rsct/bin/rmcctrl -A
# /usr/sbin/rsct/bin/rmcctrl -p
Check its status again:
# lssrc -a | grep rsct
 ctrmc            rsct             5374166      active
 IBM.HostRM       rsct_rm          14156014     active
 IBM.ServiceRM    rsct_rm          5439716      active
 IBM.MgmtDomainRM rsct_rm          9240692      active
 IBM.DRM          rsct_rm          17301604     active
 ctcas            rsct                          inoperative
 IBM.ERRM         rsct_rm                       inoperative
 IBM.AuditRM      rsct_rm                       inoperative
If the above does not change the output of lspartition -dlpar, you can try to reconfigure the RMC by using recfgct command. Basically this command recreates the RMC connection.
Before using the recfgct command make sure that your server is not part of CSM or GPFS cluster because it could bring you more trouble than non-working DLPAR.
The full path of recfgct command is:
# /usr/sbin/rsct/install/bin/recfgct
Wait 5 to 10 minutes and check the RMC deamon again:
# lssrc -g rsct_rm
....
# lsrsrc IBM.ManagementServer
or
# lsrsrc IBM.MCP
Advice:
When you change system resources dynamically do not forget to modify your LPAR profile accordingly, since at next boot system resources will be assigned according to the profile (which is not affected when performing DLPAR functions).

No comments:

Powered by Blogger.