Hear me, see me, filter me

Robert Oshana
   Engineering Manager
   Texas Instruments Inc.
   Dallas, Tex.

PDAs and MP3 players are just some of the applications for DSPs.

A flow path outlines the basics of DSP design. Sophisticated software tools help designers build DSP applications.

A code development model for efficient DSP programming.

Required DSP execution speed for typical applications.

Digital signal processors have the distinction of seeing the bulk of their applications in embedded systems. DSP-based apparatus are rarely called computers. More likely they are called cell phones, sonar, radar, process controls, communication systems, or audio synthesizers.

But not all DPSs are created equal. Some are optimized for specific signal-processing tasks such as fast-Fourier transforms. Others, in recent years, have sported general-purpose instructions to help field control and signal-processing capabilities in a single chip. One result: Now there are some real choices to be made when developers decide what programming language would best get them to the goal line.

Development tools also have become more robust and able to expose more complicated problems. All this makes a difference when planning systems based around DSPs. It pays to know a few things about development options and the kinds of problems they are best at discerning.

First a few basics. DSPs still excel at crunching complex algorithms that general-purpose processors can't easily handle. Unlike their general-purpose counterparts, DSPs can execute - with single-cycle instructions - functions such as filtering and fast-Fourier transforms (FFTs). DSPs are also scalable. They can share real-time signal-processing jobs over multiple processors to speed execution, for example.

There are two basic DSP architectures: Von Neumann and Harvard. Traditional Von Neumann architectures use one interface for both data and program space. (General-purpose computers are also Von Neumann machines to large degree.) In contrast, the Harvard architecture uses two buses for simultaneous access to both data and program space each cycle. Internal memory stores intermediate variables for quicker access than from external memory. DSPs designed to support high data throughputs may have one or more communication ports. An associated direct-memory-access (DMA) controller typically controls these I/O ports, allowing data to stream in and out of a processor while the CPU crunches data.

Real-time applications such as sonar and radar rely on multiprocessing and high-bandwidth I/O. One relatively recent development is the emergence of off-the-shelf real-time operating systems specifically for DSPs. Some would argue that operating systems have evolved such that developing code for multiprocessor DSP jobs is a trivial extension to programming a single processor. DSPs come with a standard set of software tools as do many general-purpose processors. Third-party enhanced tool suites often wrap standard tools in an interactive GUI. Simulators and emulators are two common but quite different tools.

Simulators and emulators

So-called in-circuit emulators were once the only option for assessing system performance. To use them a design had to be committed to silicon. Modern simulators, in contrast, can simulate entire systems, including processors, on-chip peripherals, system-level peripherals, peripheral hardware devices, the operating system, and application software. Software simulators let engineers develop and integrate software without first having to purchase a DSP and associated hardware.

Typical simulators consist of a high-level language debugger and a DSP simulation engine. The simulation engine is a software model of the physical DSP device. Typical instruction-level simulators are good for early development work, prototyping, and proof of concept. However, these all-software tools are relatively slow so they aren't appropriate for simulating large applications or for performance analysis.

Better estimates of timing and system behavior require cycle-accurate simulators or VHDL simulators. These tools model delays in memory access, pipeline stalls, and all other hardware-related functions that instruction-level simulators ignore.

Simulation can take place at various levels of abstraction. Abstraction means to extract the essential properties while omitting inessential details. Higher levels of abstraction focus on larger, more important pieces of information of progressively smaller volume and number. Conversely, lower abstraction levels reveal more detail and raise the volume of information. Obviously there is a trade-off between accuracy and performance when modeling at different levels of abstraction. And regardless of abstraction level, simulators are no substitute for actual hardware. Often times software developers claim there's a hardware problem because their software, which ran fine on a simulator, won't work on an emulator. In many cases the software is to blame. Emulators catch many timing-related problems with software that instruction-level simulators miss.

Emulators provide access to DSPs and peripherals for debugging and integrating software and hardware. Emulators read and write to hardware registers and memory. Also supported are common functions including breakpoints, single stepping, and benchmarking. Most emulators are both spatially and temporally nonintrusive. Spatially nonintrusive means an emulator needs no additional hardware or software in the target environment. Temporally nonintrusive emulators let a processor or system execute at full speed.

But shrinking die size in DSPs and other ICs has most chipmakers omitting emulator logic from the chips themselves. A chip-emulator interconnect standard called Joint Test Action Group permits board-level testing but requires some on-chip logic to implement. Emulation tools can also support parallel-processing applications. Here, the scan interconnection is daisy-chained between multiple processors, each controlled by its own emulator. A multitasking operating system controls each DSP in a separate window.

Some programming issues

Assembly language used to be a de facto programming standard though that's changing. A general lack of portability, and difficulty maintaining and writing assembly language code has some industries limiting its use to only portions of code that have to run super fast in an attempt to lower life-cycle costs.

Parallel devices and VLIW (very-large instruction word) architectures make the job of efficient assembly programming even harder. VLIW is an instruction set in which the compiler packs a number of simple, noninterdependent operations into the same instruction word. This makes it nearly impossible to manually pipeline and optimize an algorithm without a cycle-accurate simulator and plenty of trial and error.

DSP makers are beginning to recognize the limitations of assembly language and now include increasingly sophisticated tools and more portable languages. C/C++ compilers and Java are becoming commonplace. For example, an assembly language optimizer developed for the TMS320C6xx VLIW device lets programmers write serial assembly language implementations then parallelize them to run efficiently on a device. The use of a virtual register set eliminates the allocating of specific registers.

Still, such tools are no substitute for manual programming. Even when using a high-level language such as C, efficient algorithm implementations may be extremely difficult to make work properly. This is especially true for algorithmic-intensive DSP applications. Here, the preferred approach is; Make it run correctly, then make it run fast.

Comparing hardware

System designers have a variety of ways to implement digital signal-processing algorithms. ASICs can act as DSP coprocessors but lack flexibility. Risc processors have extremely fast clock speeds and perform well in certain DSP applications though scalability and other real-time (predictability) issues remain.

FPGAs are also fast and can do certain DSP algorithms but tend to be more difficult to develop code for than DSPs. Host signal processing is an emerging method for executing DSP algorithms on a host PC. Many lower-end, multimedia applications are done this way, though more demanding jobs are better served by full-blown DSPs.

Defining DSP performance

Processor makers typically spec device speed in operations or instructions per second. Such quotes often reflect ideal cases. Actual performance may be much lower. A more important metric is how fast an application and associated algorithms run on a device within a system. In that regard, use advertised benchmarks that resemble actual algorithms, especially for some higher performance DSPs with optimizing compilers. Subtle differences in algorithm structure can trigger a compiler to optimize a particular piece of code, throwing off performance measurements by orders of magnitude.