Embedded System Design Library: Defensive Programming

(C) 2009 Hank Wallace

PREVIOUS – Embedded System Design Library: Threading
NEXT – Embedded System Design Library: Interrupts

This series of articles concerns embedded systems design and programming, and how to do it with excellence, whether you are new to the discipline or a veteran. This article is about defensive programming.

Many embedded programs written these days do not run on naked micros, but run under some purchased operating system. That’s good in that you don’t have to write basic support services, but it can be bad when the OS is of inferior quality.

The use of less reliable operating systems such as Windows CE requires very defensive programming practices, so that your program runs without crashing and without killing the operating system.

Windows CE is basically Windows NT with 50% of the API calls deleted, seemingly at random. The operating system has memory leaks and security holes which are documented by Microsoft and other parties, and these things take years to fix, if they are ever addressed. You have to live with this dumpster OS because this stuff is sold as quality merchandise at Best Buy.

Programming on a poor OS is like driving through a crime-ridden inner city. I had an associate on a business trip in Miami. He was driving through on the freeway at night when he was directed to an exit by a detour sign. After a few blocks the detour signs petered out, and he was driving through the slums of Miami. He pulled over in a parking lot to ask a policeman directions. The officer said to him, “You do NOT want to be here,” and he repeated this several times, finally saying, “Follow me.” The cop led him out of the area and back to the freeway.

Unfortunately, with some OS’s, you don’t want to be there and you cannot get away, so you have to arm yourself properly.

Programming defensively involves not doing anything risky. Using API calls that have only been released in the latest version of the OS is usually a disaster. On a Java project I would search for ways to accomplish a task and find several options, one of them usually being a recently added package. Every time I tried the new stuff, it did not work or would not compile. Sometimes I found that writing the code myself was the most reliable solution.

Making an API call without some kind of hang prevention is risky, or at least logging what was happening so you can identify the hang after it occurs. Do you know that the timeouts on Microsoft ftp function calls are broken? I searched for a solution to this and found complaints going back ten years. I had to wrap each ftp function call in a thread so I could kill it if the ftp site went down in mid-transfer.

In fact, it’s best to use the least number of specialized API calls as is possible. I realize you might be limiting features, but it’s not specifically needed, leave it out. In an OS such as Windows CE where the OEM configures available options, it’s highly likely that some fancy feature you are depending on will not be present on one or more platforms. This results in (surprise!) your programs being non-portable at best, and the cause of system crashes at worst.

You need to understand that Microsoft has security and quality problems because of poor design and testing of OS components. Reduce your interaction with those components and you reduce your risk of failure.

But even if you program defensively, you will still have problems. For example, I had a customer experiencing data file corruption on a portable device. At one meeting, I joked with another guy about how the whole OS, down deep, was probably just making interrupt 21H calls (it’s a DOS joke). We all laughed and then went on with the meeting.

Some time later, it was revealed that a vendor of the drivers for the media had not written them to be thread-safe with long filenames! I just about hyperventilated laughing about the previous DOS joke, DOS being single threaded. Some idiot at the vendor copied some DOS code and did not even test it for thread safety! Their recommendation to us was to either shorten the filenames, or run ALL file accesses through one thread on the hardware!

Do you know that many OS bugs are reported by users in newsgroups, but are never fixed by the vendor? Just searching for the function name can reveal issues. If you are considering using a specialized API call instead of your own code, search for the name of the function along with words like “bug”, “crash”, “not working”, and the ever useful “sucks”. You’ll be surprised what you find. In many cases, the issue is not the function itself, but the lame documentation or nonfunctional examples, and other victims (ahem, users) can save you a load of time and nailbiting.

Some other defensive activities are:

  • Check all parameters passed to functions, ranges for scalars, pointers, and array indices.
  • Pass buffer lengths to functions and test them with every call.
  • Check all messages sent between computers over a network. It’s shocking to see code that parses UDP or TCP data streams with no length or error check fields, and no data format parsing.
  • Maintain check codes on RAM based data structures to detect corruption.
  • Maintain software watchdog timers on threads and state machines, to detect hangs and reset the system (after logging the details).
  • Have routines that check the general operation of the program, logging problems.
  • Maintain an event log containing the date, time, priority code, source of the event and event text.

You need to program defensively against not just sloppy OS programmers, but those on your team. If you are writing a subsection of code for your product, you need to hammer that baby until it’s pure steel. There’s nothing worse than getting a code module from another team member and finding that it’s junk. Make sure that your work is only the best.

If you are designing one end of a communicating system, be sure that you exercise your end of the link mercilessly before connecting it to the other party’s code. You do this by creating a hardware interface and writing a test program that exercises your program as it will be used in the final system. Yes, it’s a lot of work, but you will end up doing it eventually, and it’s easier to do with your test program than by staring at your counterpart’s sloppy code loaded with GOTOs and unreadable comment banners. Get it working and leave him in the lab over the weekend.

And PLEASE understand that the documentation of APIs is only the start. You have to code to how the APIs actually work as well as what the documentation says, always being aware that what the doc says and how it works may be two entirely different things. This is especially true with sockets programming (including *nix), and of course anything Windows.

Understand that defensive programming is secure programming. Defensive programming is reliable programming. Program this way and your code will be better and the benefits will trickle down to future systems as you hack that code into other functions that your customer requires.

Author Biography

Hank Wallace is the owner of Atlantic Quality Design, Inc., a consulting firm located in Fincastle, Virginia. He has experience in many areas of embedded software and hardware development, and system design. See www.aqdi.com for more information.