Tips for embedded real-time design

June 21, 2001
Real-time constraints change the way designers architect control tasks.

The PC/104 standard defines a modular system architecture that uses 3.5-in.2 boards that snap together. The result is a stackthrough bus which uses ISA technology. The related PC/104+ standard uses PCI-format rather than ISA boards. A recent offering in the latter category is the Panther single-board computer from VersaLogic Corp., Eugene, Oreg. It can carry Socket 7 CPUs (i.e., Pentiums and Pentium clones) as fast as 400 MHz that include the K6, K6-2, K6-III, and low-power K6-2E. One of its claims to fame is 4 Mbytes of display memory, and an ability to generate dual independent displays, allowing, for example, the display of two different images each with independent timing and refresh rates.

At a recently held conference on embedded software, a speaker fielded the following question from the audience: "When we build our systems, we limit the execution load of the processor to 80% so there's enough leeway for meeting timing requirements. How much execution load leeway do you recommend?"

The speaker's answer: The system should be built for an execution load of 100%. Limiting the load to 80% wastes 20% of the resources.

The speaker was David Stewart of Embedded Research Solutions LLC, a software consulting and contracting firm in Columbia, Md. "That question highlights a common practice — writing software with tolerances of 20% or more," he laments. "Imagine building the cylinders for a car engine with 20% tolerance in each dimension!"

Stewart knows a thing or two about developing code for small computers. Before joining ERS he headed up the real-time laboratory at the University of Maryland and designed robotics software at Carnegie Mellon University's well-known Robotics Institute. This work has given him expertise in fielding small systems. It also has given him insights into devising reliable real-time controllers built around small computers.

In this regard, precision is one of his pet peeves. "Industry doesn't design software precisely, so developers need high tolerances to guarantee the system meets timing specifications," he says. "But more precise design methods let you use 100% of the resources. Suppose that instead of executing with 20% tolerance, the exact same software is designed to use 100% of a processor that is 20% less powerful. That's analogous to the difference between getting by with a 1.3-GHz Pentium IV instead of a new 1.7-GHz chip, which costs $150 more. For an embedded processor, there might only be a $1 difference in the alternatives. But if you are talking about making 1 million units, that's a $1 million savings for the manufacturer."

Moreover, it takes no longer to develop precise software than to do things the traditional way, claims Stewart. In fact, he says, such efforts can be faster because they need dramatically shorter testing and debugging periods.

However, for developers accustomed to working with large, full-blown computer systems, there has to be a change in mindset when it comes to fielding small embedded machines. Take as an example the area of component-based software. Software components, self-contained sections of code, are today viewed as a way of ensuring that programs get developed in understandable and manageable hunks. There are numerous tools available to help coders generate software components for relatively large systems.

Trouble is, a lot of these tools go out the window for systems smaller than 32 bits. "Techniques such as code generation and object request brokers, commonly used on bigger systems, just aren't suitable for 8 and 16-bit processors," explains Stewart. "Code generation tools might produce code occupying 120 kbytes of memory when small embedded systems more typically carry on the order of 16 kbytes."

The approach Stewart preaches for work on real-time embedded platforms is to first set up what is called an application-independent framework. A framework is a piece of software that, among other things, may schedule software tasks to run, facilitate the transfer of data from one software task to another, and manage housekeeping associated with starting and stopping software routines that perform specific tasks.

Once they've defined a framework, developers then devise software modules that plug into the framework by means of specific interface specifications. The modules perform tasks specific to the application at hand, such as reading an encoder generating a motor speed command, or executing a PID control algorithm. Defining programs this way lets software developers write program modules flexible enough to be used in more than one project.

The beauty of the approach is that it will work on both large and small embedded systems. In 32-bit computer systems that manage a lot of resources, the framework may take the form of middleware that sits on top of a conventional real-time operating system. But frameworks are particularly helpful in small eight and 16-bit embedded systems, with just a few kilobytes of memory. Here the framework may essentially function as a realtime executive.

Small computers
No question use of software components and other timesaving techniques are important. Estimates are that software development can account for as much as half the total development time on embedded system projects. Moreover, software techniques can also dramatically impact the progression of hardware development and system integration. Together, these phases often account for as much as 80% of an embedded development effort.

One of the techniques that can help shave time off development is to tweak code for better handling of real-time tasks. This is true even for large, 32-bit systems that involve the use of numerous resources and which are almost exclusively run by commercial real-time operating systems.

Any processor has peculiarities in handling specific operations that developers should account for, says Stewart. Typically, some actions take place either more slowly or quicker than might be expected.

Floating-point instructions in systems employing the Motorola 68030 with a 68882 floating-point coprocessor serve as an example. "Floating-point addition takes the same amount of time as integer addition, which may be counterintuitive, because the chip contains floating-point hardware," says Stewart. Mixed numerical formats can also be a source of surprises. "Adding an integer to a floating-point number takes three times longer than just adding two floatingpoint numbers," he points out. "Most people think this operation should happen in less time than just adding two floating-point numbers. But the integer must first be converted to floating-point format before the addition can take place. There's another conversion if the result must be an integer. I have seen systems where just a few operations took place in floating-point format, and this behavior resulted in something slower than if the whole thing had been done in floating point."

Allowances for such idiosyncrasies should take place in the initial phases of design. So too should coarse grain optimizations: Noting specific features of the processor hardware and deciding how to take advantage of them in the ensuing software design. Of course, sections of code optimized for a specific processor won't work as well on another machine.

Component-based software techniques are particularly helpful in such cases. The idea, says Stewart, is to create both generic and optimized modules that provide a specific function, then use the one that is most appropriate for the hardware at hand.

Considerations such as the need for optimization make it important that software developers be included in the initial phases of any design. Otherwise, computer resources can be easily overdesigned. "In one case, adding a little additional complexity to the hardware reduced the processor execution time by 25%," relates Stewart.

He illustrates the typical trade-off discussion this way: Suppose the processor costs $10. Say one software operation uses 25% of it. That means the hardware cost of that operation is $2.50. Now consider using $1.50 worth of intelligent I/O instead. There's a net savings of $1 in hardware resources. There is also 25% more processor time available for something else. Alternatively, there is an opportunity to run the processor 25% slower to save power, or to use a slower, lessexpensive processor to cut costs.

Typical hardware/software trade-offs are in peripheral circuits as, for example, analog/digital converters. Simple a/d circuitry that requires software for multiplexing may initially appear to be less costly than more intelligent devices that may only require software that handles a simple data exchange. But the question can only be answered by looking at where the flexibility lies in the system, and deciding what resources are the most important.

To interrupt or not
Stewart's own work has spanned 8-bit 6805 to 32-bit 68k-based Motorola microcontrollers, as well as Z180 devices from Zilog and digital signal processors from Texas Instruments. Nevertheless, the techniques he advocates are useful on small systems with limited memory from any vendor.

One example of systems in this category are those from Z World Inc., Davis, Calif. The firm has devised a Z80-like microprocessor called Rabbit 2000. It is an 8-bit device with both internal and external 8-bit data buses. It uses an extended Z80-style instruction set which, according to Z World, provides performance as good as many 16-bit processors.

The device is typically deployed on a small board along with a megabyte or less of static RAM or Flash memory. Programming is via a variation of C language called Dynamic C. It is said to have enhancements and variations designed to facilitate realtime programming, with constructs for cooperative and preemptive multitasking, protecting writes to variables during power failures, and writing interrupt routines.

The Rabbit processor's handling of interrupts, through four levels of priority, points up a technique often used in systems handling numerous time-sensitive tasks. Developers of real-time software often take an eventdriven (or interruptdriven) organizational approach. The idea in such cases is to write the program essentially as a collection of routines that each executes on the receipt of their respective interrupt signals. These signals are typically generated by external devices such as position encoders, Halleffect speed sensors, and so forth. They synchronize their execution through use of mechanisms such as semaphores, registers containing values indicating the status of the various interrupt routines running on a system.

Interestingly, however, Embedded Research's Stewart points out that interrupts often do more harm than good in real-time systems. Most interrupt-driven systems, in fact, can be converted to one following a cyclic regime, where timers trigger routines that check I/O points systematically to see if they need service.

A point to note about interrupt-based schemes is that they generally signal the need for interrupt service, the completion of service, by putting a value in a register. But procedures in a predefined list can often check the appropriate registers periodically to accomplish the same thing, points out Stewart.

Moreover, the presence of interrupts and other nonpreemtable high-priority events can be a cause of priority inversion — the kind of problem that caused the Mars Pathfinder to fail. "Interrupt-driven systems are difficult if not impossible to verify analytically," he points out. "The more interrupts you have, the more difficult it becomes to guarantee that the processor will meet its deadlines."

To prove analytically that an interruptdriven system will work, developers must use real-time systems theory to estimate how often interrupt routines will run and how long each will be active. Measurements of execution time must then verify that the analysis is correct.

It is a relatively straightforward process to devise scheduling algorithms for small systems. Stewart estimates that a rudimentary real-time cyclic executive can consist of about 30 lines of code, yet still support a component-based software approach. The do-it-yourself approach to writing a multirate real-time executive works best for systems that lack graphic display monitors, hard disks, and other peripherals that transfer huge amounts of data. For systems that do have these devices, he recommends use of a commercial real-time operating system that provides all necessary drivers. Developers should then create a middleware layer to provide a component-based software interface, instead of relying on the processoriented interfaces provided by nearly every real-time operating system.

Some off-the-shelf embedded systems also provide special interfaces targeting such uses. An example is the StrongARM platform from ARM Ltd. in the U.K. The 206-MHz 32-bit reduced instruction set (Risc) CPU core is used on a single-board computer recently developed by Applied Data Systems Inc., Columbia, Md. The processor delivers 235 Mips and dissipates 400 mW, letting it support two LCDs simultaneously. Compatible displays range from quarter VGA (320 240 pixels) up to XGA (1,024 768 pixels). Applications include point-of-sale terminals and gasoline pumps.

A number of the ARM-based systems that Applied Data provides go into medical applications. Some of these are classified as Class 1, those that are invasive and where a malfunction can be potentially life threatening. Developers of such equipment frequently go with a commercial real-time operating system called VXworks because of its fast interrupt response. Along with the operating system, Applied Data supplies low-level software such as drivers. Recently, it made available networking routines that permit one embedded system to access files on another over a network.

Embedded smarts for mobile robots

The Pioneer family of mobile robots from ActivMedia Robotics LLC employ single-board computers to perform tasks such collaborating to move an object that's too heavy for an individual robot to handle alone. Robots from ActivMedia in Menlo Park, Calif., use an architecture originally developed at SRI's Artificial Intelligence Center and Stanford University. A recently released tall robot called the PeopleBot includes options such as a speech system, image processing, tabletop gripper, and a navigation system.

Powering the device, which starts at $5,995, is a VSBC-6 single-board computer from VersaLogic Corp., Eugene, Ore. Use of the PC/104 standard allows the 5.75 8-in. board to accept add-on cards for functions such as frame grabbing and sound. Processors available on these boards include the 266-MHz or 400-MHz AMD K6-2, a 266-MHz AMD K6-2E-AMZ low-power fanless version, and a 233-MHz Intel Pentium. The boards support up to 256 Mbytes of low-power system RAM, but those in the robots carry 32k of DRAM and another 32k of Flash-ROM.

ActivMedia configures robots carrying the on-board computers to run under Linux and to act as X-Windows terminals over Ethernet, working with a remote PC. Ethernet connections are either via tether or wireless link.

Rules of the road for developing real-time software
Develop a framework —The framework handles basic scheduling and polling and interacts with software modules in well-defined ways. Modules are then defined to handle application-specific tasks.

Use a high-level language, not assembler — There is little reason to write much code in assembly language, even for eight-bit processors. Some vendors now provide C compilers that optimize code for eight-bit systems. C++ and Java are likely to be the languages of choice for 32-bit processors. There may be something gained by coding small, computationally intensive routines in assembler, but remember that the effort needed to maintain assembler code can be significant.

Commercial real-time OSs have a place — Real-time execs are readily available with libraries of drivers. Systems that incorporate widely used peripherals like hard disks and graphic displays, or that use networking protocols, are good candidates for an off-the-shelf OS. In contrast, a configurable real-time executive is most effective for custom-embedded systems with only custom peripherals.

No single hardware platform excels at real time — Processor selection generally depends on the I/O options. Most developers end up choosing a platform on which they have the most programming experience.

Think twice about using interrupt-driven scheduling — Though it may not seem so initially, many systems can handle real-time tasks deterministically without resorting to interrupts. There has been a great deal of research into scheduling algorithms that can help analytically verify determinism. It is nearly impossible to analytically verify determinism in interrupt driven schemes.

Note hardware idiosyncrasies — Some processor instructions are likely to execute either more slowly or quicker than you might expect. Software strategies should allow for such quirks. Floating-point operations are particularly prone to nonintuitive behavior.

Optimize up front and near the finish line — The place for coarse-grain optimization, planning for specific algorithms that will maximize throughput, is in the initial stages of software design where it can have an impact on other facets of system design. Fine-grain optimization, tweaking individual lines of code, can wait until programming is nearly complete.

Prevent overdesign — There is a potential for over-designing the system if software developers are not on the front-end of the development effort. The reason: It may be feasible to reduce the programming effort drastically by making small but strategic additional investments in smart hardware such as intelligent I/O ports. This may reduce requirements for memory and cut development time.

StrongARM tactics for embedded control

Typical applications for StrongARMbased boards from Applied Data Systems are data acquisition and display as, for example, in mobile cropmanagement systems.

Though perhaps best known for their use in handheld devices such as PDAs and palmtops, Riscbased ARM chips are now deployed in embedded applications of all kinds. They were originally developed by Advanced Risc Machines, Los Gatos, Calif., and are now licensed to various semiconductor manufacturers.

The StrongARM processor is a high-speed version also known for its small die size and low power. Among the embedded system suppliers fielding StrongARM-based boards is Applied Data Systems Inc., Columbia, Md. Its 3 5-in. Bitsy board generally goes into point-of-sale terminals, gasoline pumps, and similar applications incorporating some sort of display.

The board carries a StrongARM SA-1110 running at 206 MHz and dissipating less than 450 mW. It incorporates an LCD video interface handling up to XGA resolution with 8-bit color.

In a typical operating environment, says ADS, the board might run under the VxWorks real-time operating system, though it also works with Win CE, Linux, and OS-9 as well. A recently added option is the ability to share files over a network, thanks to new software that implements sharing and control capabilities.

I/O is key in small embedded systems

The decision to use a specific embedded computer often hinges on the kind of input/output facilities available. Such was the case for Sperry Product Innovation Inc. in Boston. Working with Sealed Air Corp., the product development company deployed a BL1700 singleboard computer in a new product for void-fill packaging. Sperry created a custom, piggyback board that uses multiplexed DIP switches and connector headers. This configuration permits easy field upgrades. The system uses three of the board's ten 12-bit analog inputs and most of the 32 digital I/Os available. It controls equipment functions and reports system alerts in real time.

The BL1700 board running the system carries a Z180 processor running at about 18 MHz. The 4.2 X 6.25-in. board carries a port that accepts expansion boards to handle specialized functions such as connections to keypads and displays. Analog inputs get sampled at rates of up to 3.5 ksamples/sec. A PWM output can be synthesized by use of between four and seven digital output channels.

In another case, IC Cubed in Petaluma, Calif., designed a system around a single-board computer to regulate the temperature in police cars. Its Chilly Dog device targets K-9 units where the dog handler has left the car. If it senses that the car interior is too hot, it starts the engine to switch on the AC for awhile. It does this as often as necessary. If the car won't start, it beams a warning signal to the handler.

Powering the electronics is a Z World Jackrabbit BL 1800 board. It contains the Z-80-like Rabbit 2000 microprocessor running at 29.5 MHz. The 2.5 X 3.5-in. board provides 24 CMOS-compatible I/O, three analog channels, and four high-power outputs. Three of these outputs can sink up to 1 A and can directly drive inductive loads. Also on board are five eight-bit timers and one 10-bit timer with two match registers. Four of the eight-bit timers can be cascaded, and a watchdog supervisor is standard.

Online resources for embedded real-time design
Embedded Research Solutions LLC site. Includes white papers and tutorial material on software design for real-time systems.
Z World Corp. site, with application notes and links to Rabbit Semiconductor site.
VersaLogic Corp. site for 32-bit SBC, PC/104, embedded PCI, and STD 32 systems
Applied Data Systems site for ARM, StrongARM 32-bit Risc embedded systems.
Embedded Systems Conference site. Contains archives of papers presented at previous Embedded System Conferences.

Voice your opinion!

To join the conversation, and become an exclusive member of Machine Design, create an account today!