The purpose of this articles is to explain the circumstances due to which /var/etc/pam.conf error is seen during commit on QFX5100 and also the steps to recover from this commit error. The same error will cause any new QFX5100 node to get added to the Qfabric setup.
If you are adding new QFX5100 node to your existing Qfabric setup, you may have first perform a software upgrade on the standalone QFX5100 to match the Qfabric release. Software upgrade will also ensure that the device-mode of QFX5100 will automatically converted into node-device. Once the software upgrade is done and the standalone QFX5100 is converted to device-mode = node-device then the physcial cabling is done and the device gets detected and prvoisioned in the Qfabric Setup.
However manually upgrading a standalone QFX5100 through CLI from any 13.2X51-Dxx image to the Qfabric image 13.2X52-Dxx image, the upgrade will go through correctly but post upgrade the QFX5100 will show following error during commit.
root@QFX5100# commit <----------------------------- Commit error seen on the QFX5100 Node. error: rename failed for /var/etc/pam.conf
The above error suggest that /var/etc/pam.conf file canot not be modified and hence no configuartion changes can be done on this device unless commit error issue gets resolved.
Even though the QFX5100 device will get detected in the Qfabric Setup and the physcial connectivit works fine but due to the presence of above error the Fabric Manager running on the director device will not be able to push & commit the node specific confguration to this device. As a result of which we see in the fabric inventory – the newly installed node device will shows ” Failed ( could not connect)” under the configuration column.
root@qfabric# run show fabric administration inventory Item Identifier Connection Configuration Node group qf1-rsng3 Connected Failed (could not connect) <------------------ qf1-rsng3-1 TA3715260125 Connected qf1-rsng3-2 TA3715270063 Connected
Also due the aboive mentioned reason any commit done on Qfabric will throw error suggesting that it is unable to commit to the same device which is exibiting error in pam.conf file.
root@qfabric# commit warning: from qf1-rsng3: stale or missing configuration on device; will retry in background; check output of 'run show fabric administration inventory node-group qf1-rsng3 detail;
In this case issue is observed after performing software upgade through CLI from Junos 13.2X51-Dxx release to a recent Qfabric/QFX release.
Note that on Junos 13.2X51-Dxx image security flags such as schg and sunlnk are enabled by default for pam.conf file.
To explain the issue we are upgrading QFX5100 from 13.2X51-D38 QFX release to 13.2X52-D20.6 Qfabric release.
Hostname: QFX5100-48S-6Q-Infra-1 Model: qfx5100-48s-6q JUNOS Base OS Software Suite [13.2X51-D38] root@QFX5100-48S-6Q-Infra-1:RE:0% ls -lo /var/etc/pam.conf -rw-r----- 1 root wheel schg,sunlnk 417 Feb 15 21:25 /var/etc/pam.conf <-----------------------------------
Ater CLI upgrade to 13.2X52-D20.6 we could still see the same security flags are still enabled. The schg and sunlnk flags will prevent any modification to the file at the time of commit.
root@TA3715270063> show version Hostname: TA3715270063 Model: qfx5100-48s-6qf JUNOS Base OS Software Suite [13.2X52-D20.6] root@TA3715260125:RE:0% ls -lo /var/etc/pam.conf -rw-r----- 1 root wheel schg,sunlnk 417 Feb 15 21:25 /var/etc/pam.conf <--------------------------------
As a result of the set flags the commit is failing on the device and its status is “configuration disconnected” in Fabric inventory.
Note :
1.This issue will be encountered if you perform CLI software upgrade from Junos 13.2X51-Dxx to any recent release.
2.This issue will be encountered if you perform CLI downgrade to any lower release from Junos OS Release 14.1X53
3.However If you perform software upgrade through UBS recovery install media then you will not encounter this issue.
Note : The following ~7 step solution will help you resolve this issue. To recover from this issue you require the root password for the QFX5100 where pam.conf error is seen.
Step 1: Once you hit the pam.conf error after software upgarde on QFX5100 – first verify whether schg & sunlnk flags enabled for /var/etc/pam.conf file.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ root@qf1-rsng3:RE:1% ls -l /var/etc/pam.conf -rw-r----- 1 root wheel 417 Feb 16 06:03 /var/etc/pam.conf root@qf1-rsng3:RE:1% ls -lo /var/etc/pam.conf -rw-r----- 1 root wheel schg,sunlnk 417 Feb 16 06:03 /var/etc/pam.conf
Step 2 : Unset the schq & sunlnk flags using the following command
------------------------------------------------------------------------------------------------- root@qf1-rsng3:RE:1% chflags noschg,nosunlnk /var/etc/pam.conf
Step 3 : Verify if the problematic flags got unset successfully
------------------------------------------------------------------------------------- root@qf1-rsng3:RE:1% ls -lo /var/etc/pam.conf -rw-r----- 1 root wheel - 417 Feb 16 06:03 /var/etc/pam.conf
Step 4 : Change the file permission to read, write & execute for all users and reboot the nodes.
-------------------------------------------------------------------------------------------------------------------------------------- root@qf1-rsng3:RE:1% chmod 777 /var/etc/pam.conf
Step 5 : manually reboot the QFX5100 in question
-------------------------------------------------------------------------- root@qf1-rsng3> request system reboot all-members Reboot the system ? [yes,no] (no) yes
Step 7 : Post reboot the devices are showing connected & configured in fabric inventory and the file permissions for /var/etc/pam.conf is set to default value once again. Commits will works fine.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ root@qfabric# run show fabric administration inventory Item Identifier Connection Configuration Node group qf1-rsng3 Connected Configured qf1-rsng3-1 TA3715260125 Connected qf1-rsng3-2 TA3715270063 Connected root@qf1-rsng1:RE:3% ls -l /var/etc/pam.conf -rw-r----- 1 root wheel 417 Feb 16 07:44 /var/etc/pam.conf root@qf1-rsng1:RE:3% ls -lo /var/etc/pam.conf -rw-r----- 1 root wheel - 417 Feb 16 07:44 /var/etc/pam.conf