I felt a need to dump this morning out on paper… It’s been quite a morning, one that was made much better by backups that we don’t always think to make.
It started with my phone waking me up by ringing off the hook at 7:15AM. I answered the first call, and it was from a teacher at one of the schools I work for. She was calling me to let me know the internet had gone done at school #1. This happens sometimes. 90% of the time it’s the internet provider(s). 9% of the time it’s a box that needs to be rebooted or something caused by a power outage. At that point, it was looking like just a rather rough start to the morning.
Upon hanging up, my phone rang again, this time from a number I didn’t have saved. I answered it, and it was a parent from another school, letting me know their child was sick today. This isn’t normally a call that gets to me. In fact, it’s only a call that comes my way if their phone system is entirely offline, and the first cell phone call failed (call to the school cell). I quickly jotted down the note, and then darted down stairs to my computer. Two shells fly open and I begin testing connections at both schools. While doing this, my phone continues to ring from teachers (and or parents) that were calling.
School #2 thankfully was easy. They had a brief outage, and thankfully it all came back online and things were back to normal by 7:35 for them. School #1 though was looking like it wasn’t the internet provider and that a box was going to need to be rebooted. I quickly sent a message off to my employees who cover that school before it starts and said it looks like “server a” needs a reboot… They would get to that when they got in, so I went and took a shower and prepared to head up to the school as quickly as time would allow for.
When I got out of the shower, I had a note on my screen that simply said, “It won’t boot”. We were no longer in the 99% of cases, but now the 1%.
This particular server is actually the linchpin if you will for the entire network setup. It acts as the router for 2 networks and handles the primary networks DHCP. When it’s offline, nothing is able to communicate with anything else… so having it offline is bad. Having it not boot is a setup for a really bad day.
My daughter is currently home ill, so my amazing, awesome mother was already scheduled to come to our house this morning. That allowed me to leave 45 minutes earlier than I normally would have. I got up to the school and walked in to see the machine giving the dreaded no system disk error. Why we don’t have raid in this machine, I’m not quite sure… I blame cheap hardware… Regardless…
My guys and I grab an SSD off the parts shelf (we have a bunch), throw it in the machine, and download the latest version of the PFSense (the OS that runs on this particular box.) PFSense on the whole is a very very quick install. In fact, it took longer to download as I had to do some magic to get online than it did to install it. Once installed though, there is quite a bit of configuration that needs to be done for our setup to work right. Static IPs, routing tables, filters, snort, vlans, dhcp, dhcp reservations, dns, etc. all need to be configured. Typically, when I do it by hand, it can be a multi-hour job to get right. This particular school though would probably take closer to a full day due to the complexity.
Luckily – I had a backup of the config (something the OS provides) saved in the server room from just last week. This is something that I try to do pretty much every time I make a change to this device. Occasionally I forget if it’s a small change, but in this particular case, I had made one just last Thursday after making changes to the network config. That backup meant that internet was up and running by the time teachers had to take attendance, as opposed to a full day later.
Moral of the story – Backups are king. Only way today could have gone smoother is if I had a hot spare of this particular server… Sadly, that costs money :-).