Recently I logged into my NetGear ReadyNAS and it prompted me that there was a firmware update (version 6.7.2 to 6.7.3). I had some time to kill so I said “What the heck” and let it do its thing. After it restarted, it lost all network connection. When I went down to the rack where we have the unit, all the drive IO lights were on but there was a red light next to a caution symbol. Uh oh. I also noticed that the link lights on my network card were good and blinking, yet I could not find the dang thing on the network. After waiting for a couple hours (that hopeful thinking that it might resolve itself) I gave the unit the good ol’ reboot to no avail. I put the unit in Tech Support Mode and used NetGear’s RAIDar application (found here) to find the joker…Both NICs were assigned a 192.168.168.168 IP address. Yeah that wasn’t my network. I found this strange considering there was DHCP enabled on the network it was connected to, so even if the NICs lost their IP config they should have just grabbed one from the pool. So I disconnected one of the NICs so that it didn’t have a duplicate IP and rebooted it. I then added a second IP address to my machine on the 192.168.168.0/24 network so I could talk to the thing. I still could not hit the web GUI and I’ll be honest, I’m no Linux expert with these custom kernels so I wasn’t going to start fiddling with that. You know what time it is? Time to call NetGear support…
After what can only be described as a terrible experience making it through tier 1, 2, and 3 support I finally got in touch with an engineer after 4 hours of struggling. Long story short, this guy SSH’d into the unit and discovered that the boot partition was completely full, and therefore could not complete the upgrade. Makes sense, but you would think they would code some logic in there to recognize that prior to beginning the upgrade. Regardless, the engineer cleaned up a bunch of old image files on the boot partition so that it was only around 40% full and wah-lah we have an operational ReadNAS again after a reboot.
Naturally, I picked this engineers brain on how to avoid this from happening again and luckily he gave me more information than I even asked for! For starters, you can check the usage on any of your partitions without having to get the thing in Tech Support Mode. To do this, you have to enable SSH on the unit via the web GUI. Disclaimer: if you make any changes via SSH, you void all support until it is Factory Reset (which will wipe all of your data). Once you’ve made it in, login as root using your admin password and run the following command:
btrfs fi sh
This will output all of your volumes and their usage statistics. Basically a quick way to check if you have space in your boot volume for an upgrade. He also shared the following wisdom on how your volumes are effected as they grow:
80% Full – Degraded performance | 90% Full – Stability issues begin to occur | 95% Full – The volume becomes RO
Another thing to note that he shared was that the Logs section of the web UI only shows part on one log. If you select the download option you get them all. Found in this download is a disk_info.log file that has a whole bunch of info on your installed disks. He told me the thresholds that are normal for disks, and if I were to see abnormalities that that disk is more than likely beginning to fail. I will list these thresholds below:
ATA Error Count: 1-2 | Current Pending Sector Count and Uncorrectable Sector Count: 50, but if they begin suddenly rising rapidly that drive is on its way out
All in all the engineer I got me up and running and gave me some good info on how to avoid issues with my unit. Moving forward, I’ll be checking my volume capacity before upgrade!!