Marginal Hardware, Intermittent Problems and Best Practices

Earlier this spring I replaced what seemed to be a perfectly good desktop PC with a new PC to overcome my remote session SmartSDR programs from grinding to a halt after a couple hours running.

The old PC really didn’t meet recommendations anymore as the available resources increasingly were first tasked by progressive Windows updates, next by updates/bloat from productivity software I needed to run, and last the more modest increases in resource needs by SmartSDR with its supporting packages.

Pretty obvious that with SmartSDR the major feature increases for v3 come with some overhead needs. At the time I wasn’t able to discuss the needs in the FRS Community Forum as I was experiencing them as part of the Alpha Test team trying out MultiFlex in the test environment before it was released publicly.

I have my theories what was happening, but truthfully how much time was I going to waste on marginal specification hardware? The Desktop hadn’t been specified to run SmartSDR in the first place, rather was a simple workplace tool spec.

Confession on my part is I am loath to change hardware/software if it is working well, and further if allocating a computer upgrade budget I’ll divert funding to other team members if I cannot justify the spend on my PC. I’d done that sort of diversion skipping my work desktop the last team upgrade, choosing to keep an older PC so that spend could improve the rest of the team’s machines. They are also doing a lot more heavy graphics lifting than I am, with my having more of an administrative role. And more importantly for what was my usage my desktop worked fine.

The productivity software we all use both underwent significant revisions and we added several additional major packages, including collaborative-ware requiring some resources constantly. So the old PC started getting bogged down.

Add SmartSDR feature increases and the occasional creaking because a predictable program breakdown after a few hours running. Some of the issues were Windows itself, some the limited system resources, and some how SmartSDR was coded.

Really didn’t matter – the combination wasn’t working right, failing over fairly short periods of operation it broke.

What had happened is more things were moving than I first imagined, and I had overran the the effective capabilities of this particular PC through the cumulative effects of the various upgrades.

I also think the particular machine was a bit “spec optimistic” where marketing oversells the actual performance.  Available memory wasn’t “exactly” what it said due to various background reasons.

The worst was the failure wasn’t always happening, and was unpredictable.  From the casual user perspective it wasn’t really possible to identify what was the trigger to the breakdown.

We had this same sort of phenomena on power equipment.  Engines which shut down seemingly at random, generators that test well but then won’t hold up under a full load in the field, again at random.

Best Practices are always to break down what you do know so you can categorize each aspect into categories.  Typically those categories would include:

  • Proven Good/Okay
  • Believed Good/Okay
  • Not Evaluated
  • Not Possible to Evaluate
  • Believed Faulty
  • Proven Faulty

The next Analysis Matrix centers on Dependencies/Interrelationships:

  • Proven Dependencies/Interrelationships
  • Suspected Dependencies/Interrelationships
  • Not Evaluated
  • Not Possible to Evaluate
  • Believed Independent
  • Proven Independent

Obviously most of us use a more casual approach, but we often do a level of categorization to direct our focus.  The approach is less about absolutes or creating actual lists, than it is about organizing how we approach the problem.

Additional Best Practices are use of Isolation, Elimination and Substitution to further segment the problem field in a way that leads to an actual solution.

Isolation, say in the case of my desktop problem would be to run SmartSDR on its own and see if over an extended period it failed.  Drop any add-on radio programs, and drop running any productivity software.

Elimination would be to remove programs and rerun the practical test.

Substitution might entail changing the version of SmartSDR from an Alpha Test version to one of the known stable General Release versions.

A further Best Practice is timelining changes.  What software updated when, what hardware/network changes also happened, and getting all of this on a timeline.  It is helpful it you can also put on your timeline when the problem appeared and if there were changes in severity/frequency along the way.

I think these Best Practice ideas will help a person needed to breakdown a case of Marginal Hardware or Intermittent Failures into a matrix that leads to a solution.

BTW in my case the replacement PC  which any analysis said was needed has worked the charm.  Problems disappeared.

73

Steve
K9ZW

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: