Thanks everyone, I am happy to know folks care about our little Community and enjoy reading our Forums...
Before I explain what happened and what went on, let me take a moment to encourage all of you to post things, post questions, comments, thoughts, jokes, whatever -- We love our folks and want them to be as active posting as we are reading posts. I will be out of town most of this week, and soon I will be out of the country for two weeks so I need everyone to make posts and keep things fun and interesting !!
OK, here is a brief summary of what happened.
First we noticed that the box was not responding properly early Wednesday morning. Sirex and I txted went back and forth and I decided to reboot the 102 server box.... I rebooted it, then it apparently never came back up. I got many txts and emails about the Forums, but by the time folks noticed something was wrong Sirex and I had already been scrambling for several hours.
I contacted Logicsouth and asked them to hard boot the server. Not much difference.
Later in the day after trying to contact the server all day, I suddenly was able to reach the server, then lost it then reached it then lost it, I could get in for about 15-45 seconds then I'd lose the connection. Sirex was concerned the HD was failing so we decided to ask the Logicsouth folks to shut the box down completely. Even thought i had a lot of work, I decided to drive to the server, takes me about 1 hr 15 round trip. I brought the ailing box home, and discovered it was booting fine, then got into windows, then would immediately reboot and repeat over and over.
SCSI Drives have some nice tools ATA drives don;t so I was able to thoroughly scan the Drives for errors and found none.
I suspected a bad RAM chip, I have seen this before so I pulled all chips but one, and it would not boot, labeled that one "bad" and put in another single chip, and the server booted and ran perfectly...
After defragging and running various tests most of Wed night I felt I had found the problem. Thursday morning I was dong various updates and the server suddenly rebooted again. Now I was stumped.
then ran memtest and it made it through one complete pass clean. Then during the day Thursday, I had various oddball glitches, sometimes it would not bot, sometimes it would not recognize the keyboard, sometimes it claimed windows was missing key dll files. Now I ws afraid the motherboard was going bad, and Sirex and I started to work on Plan B, to migrate everything to the 101 box.
Then late Thursday evening/early Friday morning I began to suspect the power supply so I removed everything that was non essential, and the server worked perfectly. I could not make it crash or reboot.... After work I drove to Logicsout to copy some files off the old box onto 101 so Sirex could move forward with Plan B, then I went by my storage locker and retrieved an old box that also has an EPS12V power supply, and put that power supply into the 102 box Friday night.... Sirex was almost finished with Plan B, but I was beginning to think we would not need it, the new power supply was working perfectly.
Last night I ran memtest through the night, and this morning saw we had memory errors... Crap!! When you have a remote box it must be 100% reliable and will reboot cleanly every single time... Also occasionally the server would not boot through to windows.
Plan B was back in play... I drove back to the server and was down there about 5 hours today, cleaning the boxes, reconfiguring the NIC cards, and some other things to allow Plan B to move forward. Got in touch with Sirex and he finished the implementation of Plan B.
And we are back up... We learned some things but overall the failure and loss of a server box was not as bad as it could have been...
FYI below is the info on the 102 Box below. I still hope to salvage it. I spend almost $4,000 for this box, and built it myself and it has served us well.
Dual Intel® Xeon™ Processors 3.06Ghz 512K 533 FSB, 604 pin PPGA FC-PGA2 (BX80532KE3066D SL6RR)
iWill Dual Xeon™ Motherboard, Intel® E7505 chipset (DPI533 Rev. 1.0, 07/04/03 BIOS)
2 X 1GByte TwinX Matched Memory Pair DDR RAM - Corsair, XMS 2700 Platinum Heat Spreaders - (TwinX1024-2700LLPT)
Maxtor 36.4GB 68pin U320-SCSI 10,000RPM Hard Drive (Atlas(TM) 10K IV) (8 meg Cache)
Adaptec 29160 U160 LVD SCSI Controller Card
Intel® PRO/1000 MT Server Adapter (PWLA8490MT)
Antec Server Tower Case (SX-830)
Fortron 550W EPS 12 Power Supply (FSP55060PLN)
Before I explain what happened and what went on, let me take a moment to encourage all of you to post things, post questions, comments, thoughts, jokes, whatever -- We love our folks and want them to be as active posting as we are reading posts. I will be out of town most of this week, and soon I will be out of the country for two weeks so I need everyone to make posts and keep things fun and interesting !!
OK, here is a brief summary of what happened.
First we noticed that the box was not responding properly early Wednesday morning. Sirex and I txted went back and forth and I decided to reboot the 102 server box.... I rebooted it, then it apparently never came back up. I got many txts and emails about the Forums, but by the time folks noticed something was wrong Sirex and I had already been scrambling for several hours.
I contacted Logicsouth and asked them to hard boot the server. Not much difference.
Later in the day after trying to contact the server all day, I suddenly was able to reach the server, then lost it then reached it then lost it, I could get in for about 15-45 seconds then I'd lose the connection. Sirex was concerned the HD was failing so we decided to ask the Logicsouth folks to shut the box down completely. Even thought i had a lot of work, I decided to drive to the server, takes me about 1 hr 15 round trip. I brought the ailing box home, and discovered it was booting fine, then got into windows, then would immediately reboot and repeat over and over.
SCSI Drives have some nice tools ATA drives don;t so I was able to thoroughly scan the Drives for errors and found none.
I suspected a bad RAM chip, I have seen this before so I pulled all chips but one, and it would not boot, labeled that one "bad" and put in another single chip, and the server booted and ran perfectly...
After defragging and running various tests most of Wed night I felt I had found the problem. Thursday morning I was dong various updates and the server suddenly rebooted again. Now I was stumped.
then ran memtest and it made it through one complete pass clean. Then during the day Thursday, I had various oddball glitches, sometimes it would not bot, sometimes it would not recognize the keyboard, sometimes it claimed windows was missing key dll files. Now I ws afraid the motherboard was going bad, and Sirex and I started to work on Plan B, to migrate everything to the 101 box.
Then late Thursday evening/early Friday morning I began to suspect the power supply so I removed everything that was non essential, and the server worked perfectly. I could not make it crash or reboot.... After work I drove to Logicsout to copy some files off the old box onto 101 so Sirex could move forward with Plan B, then I went by my storage locker and retrieved an old box that also has an EPS12V power supply, and put that power supply into the 102 box Friday night.... Sirex was almost finished with Plan B, but I was beginning to think we would not need it, the new power supply was working perfectly.
Last night I ran memtest through the night, and this morning saw we had memory errors... Crap!! When you have a remote box it must be 100% reliable and will reboot cleanly every single time... Also occasionally the server would not boot through to windows.
Plan B was back in play... I drove back to the server and was down there about 5 hours today, cleaning the boxes, reconfiguring the NIC cards, and some other things to allow Plan B to move forward. Got in touch with Sirex and he finished the implementation of Plan B.
And we are back up... We learned some things but overall the failure and loss of a server box was not as bad as it could have been...
FYI below is the info on the 102 Box below. I still hope to salvage it. I spend almost $4,000 for this box, and built it myself and it has served us well.
Dual Intel® Xeon™ Processors 3.06Ghz 512K 533 FSB, 604 pin PPGA FC-PGA2 (BX80532KE3066D SL6RR)
iWill Dual Xeon™ Motherboard, Intel® E7505 chipset (DPI533 Rev. 1.0, 07/04/03 BIOS)
2 X 1GByte TwinX Matched Memory Pair DDR RAM - Corsair, XMS 2700 Platinum Heat Spreaders - (TwinX1024-2700LLPT)
Maxtor 36.4GB 68pin U320-SCSI 10,000RPM Hard Drive (Atlas(TM) 10K IV) (8 meg Cache)
Adaptec 29160 U160 LVD SCSI Controller Card
Intel® PRO/1000 MT Server Adapter (PWLA8490MT)
Antec Server Tower Case (SX-830)
Fortron 550W EPS 12 Power Supply (FSP55060PLN)
Comment