Skip to main content

windows 7 - How do I troubleshoot when I have no clue where to start?


I am looking for hints, tips and answers on how to get started on troubleshooting when:



  1. The problem is intermittent

  2. The problem could lie literally anywhere - operating system; free source software; my own software developments; purchased software; crumbs on the keyboard; the specific combination of software I am currently running; Maxwell's demon; the little blue men actually running the machine have gone on strike; etc.

  3. I have expertise only in a few of the areas that are potential candidates for the cause of the problem.




The specific problem I am having is detailed below as an example, but I am not seeking answers to my current problem, but rather where and how to start on tackling such problems.


I am currently encountering a problem with my new machine. On a few occasions the machine has just frozen; not accepting keystrokes, mouseclicks, or anything except the power on/off switch. Invariably I have been merely browsing the web; I have had a few (<= 6 other applications) running. None of these applications are major; and represent a mix of commercial programs and open source programs, typically migrated from Unix of some variety.


My machine is a Windows 7 I7 quad core laptop.


EDIT:


Although I stated that the actual problem description was only an example, some of the comments are concentrating on solving this problem. Unfortunately, as it was only an example, the information given is correct but not complete. To avoid having people wasting their time on trying, remotely, to aid with the actual problem, I am giving some other information about my setup. As I originally said, I am not seeking answers to this specific problem.


My machine is a high powered laptop; is my main machine; is used for development and technical writing, communications - email, web, FTP, etc, and for photo editing and indexing. A rigorous and extensive suite of hardware test programs,including CPU tests, multiple memory tests, and tests on all other components are run on it at least monthly. Also run at least monthly are a full virus scan; a full spyware scan; a disk cleanup; and a disk defragmentation.


The disk contains approximately 3*10^6 files; disk usage is 300 Gb leaving 150 Gb free. Memory is 8 Gb. While the machine can get slightly warm when I am running a full complement of major development tools, I have encountered the problem only when using the machine very lightly - web browsing plus Textpad plus Graphviz plus a Firebird database plus a lightweight database browser (Flame Robin). In these circumstances even the fan is not slightly warm. I have made no changes to software, operating system or hardware over the period I have encountered the problem. There have been a number of automatic updates occur - Microsoft, Adobe, and Lenovo mostly but not exclusively.


This background puts into context (I hope) my reasons for asking this question the way I did. I am now going to start investigating the various logs mentioned in the answers as a first step in trying to narrow the field of investigation. And I am going to try an exercise one of the characteristics suggested in the answers I have received so far - patience - in my investigation.



Answer



Get a better idea.


You ain't going to win a battle without sufficient field information.




  1. Describe your problem in detail so that you have a good idea of it, who knows it just happens once.




  2. Track back in time what happened before and together with the problem, both you and your computer.




  3. Think of the possible causes because sometimes it might be something that's not obvious.




  4. Get more information whenever you have no idea of what's happening, this could range from Events, to SysInternals Tools, to Performance Analysis, to Debugging, to any other tool in your expertise.




  5. Test your assumptions to be sure that your thoughts don't filter the cause away.




Divide and conquer.


Because that's how military defeat their opponent even when outnumbered.


Eliminate the possible causes, or you'll have a problem keeping track of the problem. This way, you will get closer and closer to the root cause of the problem, it allows you to solve the problem a lot easier.


For example, with hardware, disconnect and remove anything that you don't need for fixing your problem. This way, you might disconnect the component causing the problem. And then it's again a matter of inserting half the components in, checking if it reoccurs and repeat splitting till you have the bad component...


Testing something on another computer, if available, is also a good benefit towards solving the problem.


For example, with software, rebooting into safe mode, disabling start-up entries also helps. This also applies to enabling/disabling settings, trying the default configuration and so on...


Let's put it to the test.



I am currently encountering a problem with my new machine. On a few occasions the machine has just frozen; not accepting keystrokes, mouseclicks, or anything except the power on/off switch. Invariably I have been merely browsing the web; I have had a few (<= 6 other applications) running. None of these applications are major; and represent a mix of commercial programs and open source programs, typically migrated from Unix of some variety.





  1. That's a proper description by itself, it doesn't just happen once either.




  2. You know what happened together with the problem,
    but haven't thought of things you or your computer did before the problem.


    I can't tell this, but you, your event log and recently modified files/folders could tell.




  3. Possible cause is most likely to be CPU related, because it's the component that processes things.


    More specific this could be a process, a driver or failing hardware (perhaps temperature problems?).




  4. I know it's CPU, but don't know what. Events don't show this, Process Explorer would hang on DPC.


    So, next step, I let trace analysis run which I close after the hang has occured.


    I look into the trace, and I see that driver X is causing the problem!




  5. No real assumptions are made. The CPU assumption is handled by our Divide & Conquer approach...




So, this is where I start dividing to conquer the problem, I stop once solved:




  1. Problem with current version of the driver?
    Update the driver to the latest version.




  2. Problem with newest versions of the driver?
    Get a new trace. Update the driver to an older version different from the initial.




  3. Problem with the device? Configuration problem in the registry?
    Get a new trace. Reinstall and/or disable the device if possible.




  4. Problem is random, is it the processor heating up?
    Check the processor temperature, replace fan if needed.




  5. Problem is not the processor, are there other hardware and software influences?
    Remove hardware and disable software from running, to nail down third-party influence.




  6. Problem is not in a removable part, it should be replaced.
    In the worst case, if all else fails, you need to go for a replacement.




Getting new traces and removing hardware gives us more information, so we know where to look next.


Comments

Popular Posts

How do I transmit a single hexadecimal value serial data in PuTTY using an Alt code?

I am trying to sent a specific hexadecimal value across a serial COM port using PuTTY. Specifically, I want to send the hex codes 9C, B6, FC, and 8B. I have looked up the Alt codes for these and they are 156, 182, 252, and 139 respectively. However, whenever I input the Alt codes, a preceding hex value of C2 is sent before 9C, B6, and 8B so the values that are sent are C2 9C, C2 B6, and C2 8B. The value for FC is changed to C3 FC. Why are these values being placed before the hex value and why is FC being changed altogether? To me, it seems like there is a problem internally converting the Alt code to hex. Is there a way to directly input hex values without using Alt codes in PuTTY? Answer What you're seeing is just ordinary text character set conversion. As far as PuTTY is concerned, you are typing (and reading) text , not raw binary data, therefore it has to convert the text to bytes in whatever configured character set before sending it over the wire. In other words, when y...

linux - Extract/save a mail attachment using bash

Using normal bash tools (ie, built-ins or commonly-available command-line tools), is it possible, and how to extract/save attachments on emails? For example, say I have a nightly report which arrives via email but is a zip archive of several log files. I want to save all those zips into a backup directory. How would I accomplish that? Answer If you're aiming for portability, beware that there are several different versions of mail(1) and mailx(1) . There's a POSIX mailx command, but with very few requirements. And none of the implementations I have seem to parse attachments anyway. You might have the mpack package . Its munpack command saves all parts of a MIME message into separate files, then all you have to do is save the interesting parts and clean up the rest. There's also metamail . An equivalent of munpack is metamail -wy .

ubuntu - Why does my USB hdd returns SG_IO: bad/missing sense data?

I am able to boot and run commands from external USB hdd; the message in question appears for about 45 seconds then booting continues. GRUB2 is installed on internal HDD. When choosing to boot directly to /dev/sdb the message doesn't appear, however boot time is about the same as booting to internal HDD. /dev/sdb: Timing cached reads: 1018 MB in 2.00 seconds = 508.97 MB/sec Timing buffered disk reads: 80 MB in 3.03 seconds = 26.37 MB/sec pfeiffep@de:~$ sudo hdparm -i /dev/sdb /dev/sdb: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 10 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 HDIO_GET_IDENTITY failed: Invalid argument Gparted correctly identifies the drive as SAMSUNG MP0402H. Any ideas how to remedy the HDIO & SG_IO messages?

Desktop reboots itself on sleep or hibernate

I have been using an ASUS M2NPV-VM motherboard for main home desktop workstation, operating Windows Vista x64. This computer has right from day one not been able to enter hibernate or standby; after Windows performs its final actions and brings the machine down, it would automatically revive itself for a reboot. Updating to the second latest BIOS (1201)has not helped (the latest BIOS revision would induce video refresh problems rendering it unusable). I have been reading related discussions on incidents similar to mine to no avail of a true workable solution. They appear to be more speculative guesses rather than actual knowledge on the inner workings of motherboard hardware. Does anybody have any electronic engineering experience on PC energy-saving standards to provide a more informed opinion how to go about getting this to work? More stories: this motherboard could not even reboot properly the first thing i used it. It was due to refresh rate of the onboard GPU, which had no influe...