DLPAR provides users the ability to dynamically add, remove or modify LPAR resources such as memory, CPU, or I/O devices. The most common problem with DLPAR operations is related to RMC
(Resource Monitoring and Control).
Since DLPAR function relies on RMC connection between HMC and LPARs, you should ensure that the public network interface of your HMC is properly configured and HMC can reach your LPARs via network (HMC connection to FSP (Flexisible Service Processor) of managed systems is not enough). If any firewalls between HMC and LPARs exist, check that port 657 upd/tcp is open in both directions. Ensure that RMC connection is allowed for the public interface of HMC as well:
HMC Management --> Change Network Settings --> LAN adapters (choose the public one) --> Firewall settings
To start troubleshooting any problem with Dynamic Logical Partitioning, go to HMC restricted shell and check the output of the command:
The output of lspartition -dlpar of working RMC communication for the LPAR in interest should be something similar to that:
Check the RMC connection to HMC using the following command:
Before using the recfgct command make sure that your server is not part of CSM or GPFS cluster because it could bring you more trouble than non-working DLPAR.
The full path of recfgct command is:
When you change system resources dynamically do not forget to modify your LPAR profile accordingly, since at next boot system resources will be assigned according to the profile (which is not affected when performing DLPAR functions).
Since DLPAR function relies on RMC connection between HMC and LPARs, you should ensure that the public network interface of your HMC is properly configured and HMC can reach your LPARs via network (HMC connection to FSP (Flexisible Service Processor) of managed systems is not enough). If any firewalls between HMC and LPARs exist, check that port 657 upd/tcp is open in both directions. Ensure that RMC connection is allowed for the public interface of HMC as well:
HMC Management --> Change Network Settings --> LAN adapters (choose the public one) --> Firewall settings
To start troubleshooting any problem with Dynamic Logical Partitioning, go to HMC restricted shell and check the output of the command:
# lspartition -dlparThe output will show you any logical partitions which are ready for DLPAR operations. If there is no output at all, that means that either there are no LPARs which can communicate with HMC via network, or there is some problem with the HMC itself. If you suspect the HMC, try with rebooting it. Experience shows that lots of strange problems associated with HMC can be solved just by reboot.
The output of lspartition -dlpar of working RMC communication for the LPAR in interest should be something similar to that:
<#1> Partition:<1 10.30.23.15="" hostname="">1>You should check if DCaps value is higher than 0x0 and active value is higher than 0. If it is not, perform the next steps from the LPAR you are trying to perform DLPAR operations.
Active:<1>, OS:, DCaps:<0x4ebf>, CmdCaps:<0x1b 0x1b="">, PinnedMem:<384>384>0x1b>0x4ebf> 1>
Check the RMC connection to HMC using the following command:
# lsrsrc IBM.ManagementServerIf you are using AIX 7.1 type the following command instead:
# lsrsrc IBM.MCPYou should see something like this:
Resource Persistent Attributes for IBM.MCPIf you can see information about the HMC, it’s a good sign; if not, check the status of the main daemon IBM.DRM used for dynamic logical partitioning:
resource 1:
MNName = "10.30.23.15"
NodeID = 18194515442147552355
KeyToken = "hmc.localdomain"
IPAddresses = {"10.30.23.15"}
ConnectivityNames = {"10.30.23.15"}
HMCName = "7042CR4*XXXXXXX"
HMCIPAddr = "10.30.23.10"
HMCAddIPs = "192.168.128.1"
HMCAddIPv6s = ""
ActivePeerDomain = ""
NodeNameList = {"Test"}
# lssrc -g rsct_rmIf it is in inoperative state, you can restart it with the following commands (note that sometimes, especially in AIX 7, this daemon is not active all the time but only when needed):
Subsystem Group PID Status
IBM.ServiceRM rsct_rm inoperative
IBM.DRM rsct_rm inoperative
IBM.ERRM rsct_rm inoperative
IBM.AuditRM rsct_rm inoperative
IBM.MgmtDomainRM rsct_rm inoperative
IBM.HostRM rsct_rm inoperative
# /usr/sbin/rsct/bin/rmcctrl -zCheck its status again:
# /usr/sbin/rsct/bin/rmcctrl -A
# /usr/sbin/rsct/bin/rmcctrl -p
# lssrc -a | grep rsctIf the above does not change the output of lspartition -dlpar, you can try to reconfigure the RMC by using recfgct command. Basically this command recreates the RMC connection.
ctrmc rsct 5374166 active
IBM.HostRM rsct_rm 14156014 active
IBM.ServiceRM rsct_rm 5439716 active
IBM.MgmtDomainRM rsct_rm 9240692 active
IBM.DRM rsct_rm 17301604 active
ctcas rsct inoperative
IBM.ERRM rsct_rm inoperative
IBM.AuditRM rsct_rm inoperative
Before using the recfgct command make sure that your server is not part of CSM or GPFS cluster because it could bring you more trouble than non-working DLPAR.
The full path of recfgct command is:
# /usr/sbin/rsct/install/bin/recfgctWait 5 to 10 minutes and check the RMC deamon again:
# lssrc -g rsct_rmor
....
# lsrsrc IBM.ManagementServer
# lsrsrc IBM.MCPAdvice:
When you change system resources dynamically do not forget to modify your LPAR profile accordingly, since at next boot system resources will be assigned according to the profile (which is not affected when performing DLPAR functions).
No comments: