Friday 29 June 2007

Web Server Down!

We were changing switches on the network this past week due to our old switches having a status of fatal. No wonder people keep getting kicked off the network and the AS400 shutting down overnight. At first we thought that it was due to the server room overheating and the poor air conditioning, but no we needed to buy new switches to replace or help out the old ones.

So the boss orders us some new "managed gigabit" switches (boy, that would be sweet for files sharing) to be installed into the network. When the switches arrive me and my co-workers start to rewire the network so that the servers all will go through a dedicated "server switch". What I didn't know was that the DMZ connection for a number of the servers didn't go into the new server switch but should stay as it was already configured.

DMZ stands for de-militarised zone and is used for people connecting via VPN's , allowing them much more access to such thing as the files server and mail server. This area still goes through the firewall like any other external traffic but isn't subjected to as many rules and checks, allowing registered users access to servers whilst keeping the network secure.

But anyway I corrected my mistake and carried on putting all the servers through the new switch, checking as I went along that they were showing as being connected on each screen. When my boss arrives at work, the web server starts giving out warning beeps, displaying a message that the RAID 5 was in error on one of the drives or something.

So he decided to pull out the problematic drive and then all hell breaks loose. Another 2 of these drives go into error and he can't seem to recover them, so he decided to reformat the server and in doing so looses the whole web site for the company. By this time we had people ringing in telling us they couldn't connect to the website. So a holding page is hastily sets up to tell users that we are having problems and check back soon.

It turns out that the web server hasn't been being backed up properly and we have lost all of the data for the site. So in hope of help we contact our web host and see if they have a backup of the site from when they last did work on it, luckily they did and would get the site up and running again. After much stress and worry the site was returned to its original state and our problem with the server was fixed by replacing the faulty drive.

Moral of the story is remember to back everything up properly and if a disc is in error don't just wipe the discs.

No comments: