(C) 2009 Hank Wallace
PREVIOUS – Embedded System Design Library: Compilers
NEXT – Embedded System Design Library: Fate
This series of articles concerns embedded systems design and programming, and how to do it with excellence, whether you are new to the discipline or a veteran. This article is about some common issues that trip us up in embedded programs.
Most of the programs we write implement the logical solution to the problem we are attacking. The nasty bits are the details of making the solution run on the hardware.
For example, I have found that Windows programs (Win32 or WinCE) have a hideous problem with large auto arrays. Declaring “char cBuffer[1024]” within a function is a recipe for disaster. I don’t know what they are doing in the compiler to make this dangerous, but such large arrays almost always cause strange behavior, even if the stack space is ten times larger and there is only one thread running. Again, it’s probably some issue caused by graduate student compiler writers and insufficiently tested code that goes ignored decade after decade.
To avoid the issue, declare the array as static, or allocate it on the heap.
Beyond that, it’s good to evaluate the program for sufficient stack space. With Windows, this is nearly impossible, but on a real embedded system where you have control of memory space allocation it can be done. In many programs I have declared a variable at the top of the static variable space in RAM and monitored the stack pointer to be sure it never crosses that boundary. This is best done in an interrupt routine, periodically. In fact, you can compute the excess stack space easily and log or print it during development. None of us need the stack crashing into static variables in RAM.
The same thing can be done to detect heap overflows.
When using interrupt routines, the notion of atomic access is important. Every processor has the ability to disable interrupts, permitting access by the background code to variables that are modified by interrupt routines. Sometimes this works well, but at other times disabling interrupts is not possible due to the performance of the system.
That’s when it helps to know your CPU. Every processor performs some set of data access instructions atomically. For example, on 8-bit micros, a load or store of a single byte value is atomic and cannot be interrupted. Many 8-bitters also have some 16-bit atomic load and store instructions. More complex operations are implemented by the compiler in multiple instructions and are not atomic; the sequences may be interrupted detrimentally. Understanding how the CPU works in this respect can save you some grief and make for efficient code, though not as portable.
For example, say that there is a one-byte flag passed between the background task and an interrupt routine. On every processor I have used, it is possible to read and write such a flag without disabling interrupts, because the operation is atomic. On a 16-bit micro, read and write of a 16-bit quantity is also atomic.
If you have a complex operation to perform, perhaps a computation, you can read the source quantity into a temp variable, perform the computation, then write the temp back into the volatile variable. As long as the interrupt routine does not modify the variable, this works.
If the interrupt routine modifies such a variable, then the temporary variable trick still works, as long as the background task does not modify the variable. If both code threads modify the variable, then it is necessary to disable interrupts or use some other access arbitration.
Another common issue I like to screen for is the cycle hog, a routine that takes more CPU time that expected. This is easily done by setting a hardware trace point and scoping the signal. For example, setting a bit on entry to an interrupt routine and resetting it on exit allows you to get a visual feel for the execution time. With the system under full load, you will be able to easily determine the maximum execution time. That figure is helpful in knowing whether critically timed activities will be detrimentally interactive. Say you have two interrupts handling unsynchronized peripherals. If their peak loads happen to correspond at some point in time, you need to be sure there is enough CPU bandwidth to handle the situation.
One important bookkeeping task involves examining the memory map of the compiled program. Most programmers care nary a bit about such things, and if the program runs they are off to the next project. This is a mistake because some issues are caused by dynamically changing memory spaces. If your program uses the heap, it is especially imperative that you manually examine the memory map to determine that there will be no collisions under any condition.
Checking out the memory map also lets you know how much RAM and code space remains for future upgrades. It’s always helpful to know this information when entering a meeting with the marketing geeks who are just slobbering to double the feature set of your product.
The watchdog timer also bears some checking. Many programmers sprinkle watchdog resets around their program to such an extent that the program could hang in any loop and not reset. This is not good! I try to pick a single point within the program that runs periodically, but not an interrupt routine. If the program hangs just about anywhere, the CPU resets. Use the watchdog as a watchdog, not as a family pet!
It’s also good to measure worst case the time between watchdog resets, and compare that against the variation in timeout interval in the CPU data sheet, since many watchdog timers run on RC oscillators on the chip.
If you grep your program for the word ‘while’ you will find dozens if not hundreds of uses. Every while loop in your program should have a well-defined termination condition or timeout. The same goes for ‘for’ loops.
Here’s a list of items that I check during development and before release of any program:
- Check all port I/O directions during operation
- Check all port pullups
- Check all port I/O directions and states in sleep mode
- Measure all port I/O voltages in sleep, check for floating inputs
- Measure power supply current in sleep
- Verify prototype board circuit against production schematic
- Check watchdog timer hit point(s) and timing margin
- Check memory space limits and usage
- Check free stack space
- Check array indices
- Ensure all constants are stored in the proper space
- Ensure all variables are initialized
- Check all while loops for exit conditions and timeouts
- Test power off memory clear and recovery, parameter retention
- Examine lint output
There’s a huge list of hardware and software checklist items in my Electronics Design Checklist.
I’ve done quite a few products with OTP or mask programmed micros. With those, there is no fixing problems later, in the field, and generally a problem is not found until there are thousands of finished products sitting on someone’s dock. It’s important to get it right the first time.
I hope these hints will help you design and build more reliable products.
Author Biography
Hank Wallace is the owner of Atlantic Quality Design, Inc., a consulting firm located in Fincastle, Virginia. He has experience in many areas of embedded software and hardware development, and system design. See www.aqdi.com for more information.