Repeat system failures - how to debug?

TimeScience
Posts: 27
Joined: Tue Dec 08, 2009 4:47 am

Repeat system failures - how to debug?

Post by TimeScience »

Hi,

I have a fit-pc2 (xp, headless, with auto-on) that runs a gigapixel timelapse camera system we have developed (http://www.gigavision.org). I've been having an ongoing problem where it crashes catastrophically and then reboots with the "windows recovered from a serious error" message. I had thought this was an actual hardware problem since I gather crashes like that often stem from issues with RAM, etc, but the computer is in a somewhat hard to access rooftop so I couldn't easily run any ram tests on site. I waited until I got a new diskless fitpc2 and swapped the drive from the crashing system into the new pc. The new PC is now exhibiting the same crashes so I'm assuming the first one didn't have a hardware problem.

The system isn't doing anything particularly taxing - our software runs the cpu at ~60% for about an hour every 3-4 hours. It uses about 140MB of the 1gb total ram. It does have norton antivirus installed which I see from a recent post could cause resource issues but other than that I can't think of anything particularly unique about the windows install.

The system with the crashing problems is an Always-on machine. We have another system running a similar camera system that doesn't have the On feature - it is also unreliable and shuts off at random times, usually daily, and then someone has to go turn it back on (xp IS configured on that one to restart on system errors) so I'm wondering if this is the same issue just manifesting differently.

Any suggestions on how I'd begin to debug this? The computer is totally unstable, it restarts as many as 2-3 times a day. I need to build a number of camera systems with Fitpc's in them the next month or so and I'm concerned that stability issues will show up in other builds so I need to fix this asap.

Any suggestions would be greatly appreciated.

irads

Re: Repeat system failures - how to debug?

Post by irads »

In normal desktop setup fit-PC2 is known to run XP very stable. I believe you can take that as a reasonable assumption when trying to find the cause of the crashes.

It may be related to the interface to the camera system - how is it connected? By USB? If so make sure to disable C-states in BIOS - set to "GV3 only".

If the system draws power through USB this might also be a source to instability. In that case consider a powered hub.

Another potential source of instability may be the display driver. The GMA500 driver is not designed to work headless. You may try uninstalling it and using the headless IEGD driver or just default VESA.

SW issues:
The crash may be related to the camera application. If it generates a log-file you may be able to find a repeated pattern.
I would not overrule Norton antivirus as the source to the crashes - you may uninstall it and run for a few days. If stable - there are other antiviruses.

All this testing is better carried out in the lab rather than the field. The important first step is reproducing the problem in the lab.

Good luck! (cool application BTW)

TimeScience
Posts: 27
Joined: Tue Dec 08, 2009 4:47 am

Re: Repeat system failures - how to debug?

Post by TimeScience »

Thanks for the tips.

I have noticed it often seems to crash when I'm performing long disk operations like copying or backing up 20GB of image files. To reduce potential issues with system resources, I removed Norton anti-virus and replaced it with Avast which seems to be a lot more low key and less intrusive than norton.

Can you clarify the USB issue? One of the big selling points for us in choosing a Fitpc was that it had 6 USB ports so we could avoid having a hub in the system.

A related USB issue - on both fit-pc2 systems I'm actively using only 1 of the front USB ports works - is this a known issue? Any fixes you can suggest.

Thanks,

Tim

LG1
Posts: 39
Joined: Mon Jul 05, 2010 4:16 pm

Re: Repeat system failures - how to debug?

Post by LG1 »

TimeScience wrote:Thanks for the tips.

I have noticed it often seems to crash when I'm performing long disk operations like copying or backing up 20GB of image files. To reduce potential issues with system resources, I removed Norton anti-virus and replaced it with Avast which seems to be a lot more low key and less intrusive than norton.

Can you clarify the USB issue? One of the big selling points for us in choosing a Fitpc was that it had 6 USB ports so we could avoid having a hub in the system.

A related USB issue - on both fit-pc2 systems I'm actively using only 1 of the front USB ports works - is this a known issue? Any fixes you can suggest.

Thanks,

Tim
maybe its heat related?
did you monitor your HD temp? and cpu?
maybe the HD is overheating the us15w ( max 70c )
or just the HD is getting to hot.
because you talking about when copying big files.

TimeScience
Posts: 27
Joined: Tue Dec 08, 2009 4:47 am

Re: Repeat system failures - how to debug?

Post by TimeScience »

Good suggestion. I don't think it is heat related because I've been having these problems from the start, even when the system was just sitting on a desk doing nothing. It just hasn't been stable. It seemed to happen more when I was doing big copy operations but that's definitely not the sole cause. Also, in the last few weeks the fitpc has been in a very hot housing around 45-50C (~115F) and it hasn't been any less stable due to the heat. Wouldn't I see some more predictable pattern if it was heat related?

This is kind of my main question... the system reboots and says there was a serious error but I don't know where to look next to see what caused it. The default Application and System logs don't seem that helpful.

Any suggestions for better ways to track system status, events and variables etc a little better to pinpoint what is actually going on when it crashes? Like is there some event I'd see in the system prior to crash that would indicate it was heat (or a USB issue) causing the problem?

LG1
Posts: 39
Joined: Mon Jul 05, 2010 4:16 pm

Re: Repeat system failures - how to debug?

Post by LG1 »

TimeScience wrote:Good suggestion. I don't think it is heat related because I've been having these problems from the start, even when the system was just sitting on a desk doing nothing. It just hasn't been stable. It seemed to happen more when I was doing big copy operations but that's definitely not the sole cause. Also, in the last few weeks the fitpc has been in a very hot housing around 45-50C (~115F) and it hasn't been any less stable due to the heat. Wouldn't I see some more predictable pattern if it was heat related?

This is kind of my main question... the system reboots and says there was a serious error but I don't know where to look next to see what caused it. The default Application and System logs don't seem that helpful.

Any suggestions for better ways to track system status, events and variables etc a little better to pinpoint what is actually going on when it crashes? Like is there some event I'd see in the system prior to crash that would indicate it was heat (or a USB issue) causing the problem?
funny thing is that i have a fitpc2 for a few days now and also have shutdown´s, restarts of windows xp
until today only at night ( running it as server 24-7 )
did not have any trouble during the day yet....

and during the night it runs a full system virus scan....
so a lot of HD activity...

i will see what happens tomorrow :P

irads

Re: Repeat system failures - how to debug?

Post by irads »

If you suspect a HW problem you may consider replacing the unit - please email rma@fit-pc.com

TimeScience
Posts: 27
Joined: Tue Dec 08, 2009 4:47 am

Re: Repeat system failures - how to debug?

Post by TimeScience »

As I mentioned previously, I originally thought it was a hardware issue but I am having the same issues on a 2nd fit-pc2, so unless it is a problem with the hard drive I swapped between the two machines, it seems like it is a software issue.

Can you speak to the issue of powered usb causing hardware problems?

Thanks

irads

Re: Repeat system failures - how to debug?

Post by irads »

I'm not aware of USB hubs causing system instability with fit-PC2.

TimeScience
Posts: 27
Joined: Tue Dec 08, 2009 4:47 am

Re: Repeat system failures - how to debug?

Post by TimeScience »

Sorry to be unclear, I wasn't referring to issues with a hub but to your previous comment:
If the system draws power through USB this might also be a source to instability. In that case consider a powered hub
I was wondering if you could expand on that. Does this mean that if I am using all USB ports at the same time I might run into issues if some or all of the USB devices pull power from USB?

thanks for your help

Tim

Post Reply

Return to “Windows XP”