Announcement

Collapse
No announcement yet.

Good Job!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #16
    Thanks everyone, I am happy to know folks care about our little Community and enjoy reading our Forums...

    Before I explain what happened and what went on, let me take a moment to encourage all of you to post things, post questions, comments, thoughts, jokes, whatever -- We love our folks and want them to be as active posting as we are reading posts. I will be out of town most of this week, and soon I will be out of the country for two weeks so I need everyone to make posts and keep things fun and interesting !!

    OK, here is a brief summary of what happened.

    First we noticed that the box was not responding properly early Wednesday morning. Sirex and I txted went back and forth and I decided to reboot the 102 server box.... I rebooted it, then it apparently never came back up. I got many txts and emails about the Forums, but by the time folks noticed something was wrong Sirex and I had already been scrambling for several hours.

    I contacted Logicsouth and asked them to hard boot the server. Not much difference.

    Later in the day after trying to contact the server all day, I suddenly was able to reach the server, then lost it then reached it then lost it, I could get in for about 15-45 seconds then I'd lose the connection. Sirex was concerned the HD was failing so we decided to ask the Logicsouth folks to shut the box down completely. Even thought i had a lot of work, I decided to drive to the server, takes me about 1 hr 15 round trip. I brought the ailing box home, and discovered it was booting fine, then got into windows, then would immediately reboot and repeat over and over.

    SCSI Drives have some nice tools ATA drives don;t so I was able to thoroughly scan the Drives for errors and found none.

    I suspected a bad RAM chip, I have seen this before so I pulled all chips but one, and it would not boot, labeled that one "bad" and put in another single chip, and the server booted and ran perfectly...

    After defragging and running various tests most of Wed night I felt I had found the problem. Thursday morning I was dong various updates and the server suddenly rebooted again. Now I was stumped.

    then ran memtest and it made it through one complete pass clean. Then during the day Thursday, I had various oddball glitches, sometimes it would not bot, sometimes it would not recognize the keyboard, sometimes it claimed windows was missing key dll files. Now I ws afraid the motherboard was going bad, and Sirex and I started to work on Plan B, to migrate everything to the 101 box.

    Then late Thursday evening/early Friday morning I began to suspect the power supply so I removed everything that was non essential, and the server worked perfectly. I could not make it crash or reboot.... After work I drove to Logicsout to copy some files off the old box onto 101 so Sirex could move forward with Plan B, then I went by my storage locker and retrieved an old box that also has an EPS12V power supply, and put that power supply into the 102 box Friday night.... Sirex was almost finished with Plan B, but I was beginning to think we would not need it, the new power supply was working perfectly.

    Last night I ran memtest through the night, and this morning saw we had memory errors... Crap!! When you have a remote box it must be 100% reliable and will reboot cleanly every single time... Also occasionally the server would not boot through to windows.

    Plan B was back in play... I drove back to the server and was down there about 5 hours today, cleaning the boxes, reconfiguring the NIC cards, and some other things to allow Plan B to move forward. Got in touch with Sirex and he finished the implementation of Plan B.

    And we are back up... We learned some things but overall the failure and loss of a server box was not as bad as it could have been...


    FYI below is the info on the 102 Box below. I still hope to salvage it. I spend almost $4,000 for this box, and built it myself and it has served us well.

    Dual Intel® Xeon™ Processors 3.06Ghz 512K 533 FSB, 604 pin PPGA FC-PGA2 (BX80532KE3066D SL6RR)
    iWill Dual Xeon™ Motherboard, Intel® E7505 chipset (DPI533 Rev. 1.0, 07/04/03 BIOS)
    2 X 1GByte TwinX Matched Memory Pair DDR RAM - Corsair, XMS 2700 Platinum Heat Spreaders - (TwinX1024-2700LLPT)
    Maxtor 36.4GB 68pin U320-SCSI 10,000RPM Hard Drive (Atlas(TM) 10K IV) (8 meg Cache)
    Adaptec 29160 U160 LVD SCSI Controller Card
    Intel® PRO/1000 MT Server Adapter (PWLA8490MT)
    Antec Server Tower Case (SX-830)
    Fortron 550W EPS 12 Power Supply (FSP55060PLN)

    Comment


      #17
      Thanks for the work Cain.
      So it was a memory and PSU problem?

      Comment


        #18
        Never realized how much I spent here until I couldn't. Glad its back.
        Motivate

        Comment


          #19
          Good job Cain ! I have my fix
          sigpic

          The sentence below is true.
          The sentence above is false.

          Comment


            #20
            All that work and a sore back from sitting on the floor.
            Thank you,
            Apache

            Where do you put the Bayonet?
            Chesty Puller (upon seeing a flamethrower for the first time)
            I am all in favor of keeping dangerous weapons out of the hands of fools. Lets start with typewriters.
            Frank Lloyd Wright

            Comment


              #21
              Originally posted by goldenfooler View Post
              Thanks for the work Cain.
              So it was a memory and PSU problem?
              Yes, a combination of the two apparently...

              Comment


                #22
                WTG!

                Comment


                  #23
                  Originally posted by Cain View Post
                  Yes, a combination of the two apparently...
                  Possibility that the PSU blew and took the RAM with it?

                  Comment


                    #24
                    I suspect the constant rebooting for almost 30 hours damaged other things...

                    Comment


                      #25
                      Since I replied to Duke's original post that was deleted, let me repost here:

                      Thanks!



                      -Rand
                      [img]https://farm5.staticflickr.com/4333/35734799273_0013dbe418_z.jpg[/img]

                      Killing CLRs since 2004. BOOSH!
                      Support Cainslair. Donate here! [url]http://www.cainslair.org/billspaypal.php?[/url]

                      Comment


                        #26

                        Comment


                          #27
                          The Forums are smooth and quick today.
                          Apache

                          Where do you put the Bayonet?
                          Chesty Puller (upon seeing a flamethrower for the first time)
                          I am all in favor of keeping dangerous weapons out of the hands of fools. Lets start with typewriters.
                          Frank Lloyd Wright

                          Comment


                            #28
                            They are now running on a much more powerful machine.... This is the machine that was capable of running a smooth 32 player UT2004 ONS game, when only a few servers in the world could pull that feat off...

                            Comment


                              #29
                              Thank you Cain and Sirex!

                              It is amazing how much of a pain in the butt PSU's can be when they go out.

                              Back when I use to repair computers in college, my favorite (and frequent) PSU problem to fix was always fixed by simply plugging the computer back in for the user, lol.
                              Nauticas

                              Comment


                                #30
                                Lmao know to this day i still get calls to fix a not working pc that wont turn on due to that problem.




                                I'm not insane. I'm just overwhelming!

                                ·····••••• Support Cainslair. Donate here!•••••·····
                                ·····••••• and get extra options! •••••·····

                                Comment

                                Cain's Lair Forums Statistics

                                Collapse

                                Topics: 26,187   Posts: 269,850   Members: 6,183   Active Members: 6
                                Welcome to our newest member, Fermin13Q.

                                Today's Birthdays

                                Collapse

                                Top Active Users

                                Collapse

                                There are no top active users.

                                More Posts

                                Collapse

                                • Reply to Hi guys!
                                  by Evil_T0NY {CLR}
                                  I've been Alpha and will be Beta testing the Delta Force game. It's been really getting good reviews! Definitely a good Battlefield feel to it like the...
                                  14 Nov 2024, 08:50 PM
                                • Reply to Hope your all OK over there
                                  by Apache Warrior
                                  We had 17 inches of rain from the storm on November 7, 2024.
                                  Apache
                                  11 Nov 2024, 07:55 AM
                                • Reply to Hope your all OK over there
                                  by Sirex
                                  Aye, I'm inclined to agree with that lmao
                                  Gone are the days of warm summers and snow filled winters here, nothing but rain and wind for 8mths of...
                                  10 Nov 2024, 08:53 PM
                                • Reply to Hope your all OK over there
                                  by Apache Warrior
                                  Now we have had a lot of flooding in this area and there are still a lot of houses that have not been repaired. Must be the apocalypse.
                                  ...
                                  8 Nov 2024, 09:23 AM
                                Working...
                                X