
full image - Repost: Power loss times two (from Reddit.com, Power loss times two)
Mining:
Exchanges:
Donations:
This power-related post reminded me of a certain server at work a few years ago.I was working as a developer/sysadmin with my job defacto being internal tech support and making sure development, deployment and running of code went smooth (though I hesitate to call it CI/CD in the current implied current meaning of the word).The server site relevant to this story was at the time undergoing construction work and we frequently had to cut power to different server rooms. Hardly ever an issue, the UPS's would kick in and for longer outages we brought external power.On one scheduled outage a late evening I get a call about the UPS in the server room failing to deliver any power and all equipment in the room had some unexpected downtime.* It actually brought the bulk of the company down with it because of the network layout.Despite this I and the on site technician could get most of the functionality back up and running within 15 minutes and all of it except an old server running development and test VM's was running, error checked and functionality tested within 45 minutes. Somewhat of a feat considering the technical debt that site had.I figured the test server could wait until next day and didn't bother to check it after booting it. No company crucial service was running on it. No one except me was going to work on it that week. Some of the services on it should already have started together with the machine and for the rest I would just need to log on and start them manually. Besides it had been without power during another UPS failure some 6 months prior without any issues getting it up and running.Next day I log onto the test server. No VM on it is running. Very strange. At least some of them should have started automatically. Check logs, they have a handful of messages saying that the processor model isn't supported and the VM's failed to start. Yet saying on the next logged row that the processor is supported.Hours of troubleshooting later I have read all relevant documentation I could find, read all relevant forum/bug report threads I could find with Google (it was like two), had verified no surprise updates had been installed, had went through all available backups for meaningful changes and checked for hardware failures. All hardware worked I couldn't see setting or configuration on the server that had changed from before the crash.This wasn't good. Sure, the server wasn't required to keep the business running but it was part of some CI/CD flows and had test servers we ran demos from. If I couldn't fix it within the week we would risk having to postpone code deployment and customer demos. My stress level approached 11 as my plan for the week derailed.Out of the blue my boss gave me a number to call some friend of his. The friend was initially just as bamboozled as I was but later said that one of the warnings could be a side effect related to a BIOS setting on some server motherboards. Then it clicked for us: The motherboard battery had drained and the BIOS reset to factory defaults.A battery change and an obscure BIOS checkbox later the VM's are up and running as if nothing had ever happened.*Never found out why the UPS never kicked in, but I later got the warning saying it had started to deliver power when planned outage started.
Social Media Icons