How to recover from /var/etc/pam.conf error on QFX5100 during commit

The purpose of this articles is to explain the circumstances due to which /var/etc/pam.conf error is seen during commit on QFX5100 and also the steps to recover from this commit error. The same error will cause any new QFX5100 node to get added to the Qfabric setup.

If you are adding new QFX5100 node to your existing Qfabric setup, you may have first perform a software upgrade on the standalone QFX5100 to match the Qfabric release. Software upgrade will also ensure that the device-mode of QFX5100 will automatically converted into node-device. Once the software upgrade is done and the standalone QFX5100 is converted to device-mode = node-device then the physcial cabling is done and the device gets detected and prvoisioned in the Qfabric Setup.

However manually upgrading a standalone QFX5100 through CLI from any 13.2X51-Dxx image to the Qfabric image 13.2X52-Dxx image, the upgrade will go through correctly but post upgrade the QFX5100 will show following error during commit.

The above error suggest that /var/etc/pam.conf file canot not be modified and hence no configuartion changes can be done on this device unless commit error issue gets resolved.

Even though the QFX5100 device will get detected in the Qfabric Setup and the physcial connectivit works fine but due to the presence of above error the Fabric Manager running on the director device will not be able to push & commit the node specific confguration to this device. As a result of which we see in the fabric inventory – the newly installed node device will shows ” Failed ( could not connect)” under the configuration column.

Also due the aboive mentioned reason any commit done on Qfabric will throw error suggesting that it is unable to commit to the same device which is exibiting error in pam.conf file.

In this case issue is observed after performing software upgade through CLI from Junos 13.2X51-Dxx release to a recent Qfabric/QFX release.
Note that on Junos 13.2X51-Dxx image security flags such as schg and sunlnk are enabled by default for pam.conf file.

To explain the issue we are upgrading QFX5100 from 13.2X51-D38 QFX release to 13.2X52-D20.6 Qfabric release.

Ater CLI upgrade to 13.2X52-D20.6 we could still see the same security flags are still enabled. The schg and sunlnk flags will prevent any modification to the file at the time of commit.

As a result of the set flags the commit is failing on the device and its status is “configuration disconnected” in Fabric inventory.

Note :
1.This issue will be encountered if you perform CLI software upgrade from Junos 13.2X51-Dxx to any recent release.
2.This issue will be encountered if you perform CLI downgrade to any lower release from Junos OS Release 14.1X53
3.However If you perform software upgrade through UBS recovery install media then you will not encounter this issue.

Note : The following ~7 step solution will help you resolve this issue. To recover from this issue you require the root password for the QFX5100 where pam.conf error is seen.

Step 1: Once you hit the pam.conf error after software upgarde on QFX5100 – first verify whether schg & sunlnk flags enabled for /var/etc/pam.conf file.

Step 2 : Unset the schq & sunlnk flags using the following command

Step 3 : Verify if the problematic flags got unset successfully

Step 4 : Change the file permission to read, write & execute for all users and reboot the nodes.

Step 5 : manually reboot the QFX5100 in question

Step 7 : Post reboot the devices are showing connected & configured in fabric inventory and the file permissions for /var/etc/pam.conf is set to default value once again. Commits will works fine.

 

About the author

James Palmer

1 Comment

Leave a Comment