Robert Oshana
Engineering Manager
Texas Instruments Inc.
Dallas, Tex.
Digital signal processors have the distinction of seeing the bulk of their applications in embedded systems. DSP-based apparatus are rarely called computers. More likely they are called cell phones, sonar, radar, process controls, communication systems, or audio synthesizers. But not all DPSs are created equal. Some are optimized for specific signal-processing tasks such as fast-Fourier transforms. Others, in recent years, have sported general-purpose instructions to help field control and signal-processing capabilities in a single chip. One result: Now there are some real choices to be made when developers decide what programming language would best get them to the goal line. Development tools also have become more robust and able to expose more complicated problems. All this makes a difference when planning systems based around DSPs. It pays to know a few things about development options and the kinds of problems they are best at discerning. First a few basics. DSPs still excel at crunching complex algorithms that general-purpose processors can't easily handle. Unlike their general-purpose counterparts, DSPs can execute - with single-cycle instructions - functions such as filtering and fast-Fourier transforms (FFTs). DSPs are also scalable. They can share real-time signal-processing jobs over multiple processors to speed execution, for example. There are two basic DSP architectures: Von Neumann and Harvard. Traditional Von Neumann architectures use one interface for both data and program space. (General-purpose computers are also Von Neumann machines to large degree.) In contrast, the Harvard architecture uses two buses for simultaneous access to both data and program space each cycle. Internal memory stores intermediate variables for quicker access than from external memory. DSPs designed to support high data throughputs may have one or more communication ports. An associated direct-memory-access (DMA) controller typically controls these I/O ports, allowing data to stream in and out of a processor while the CPU crunches data. Real-time applications such as sonar and radar rely on multiprocessing and high-bandwidth I/O. One relatively recent development is the emergence of off-the-shelf real-time operating systems specifically for DSPs. Some would argue that operating systems have evolved such that developing code for multiprocessor DSP jobs is a trivial extension to programming a single processor. DSPs come with a standard set of software tools as do many general-purpose processors. Third-party enhanced tool suites often wrap standard tools in an interactive GUI. Simulators and emulators are two common but quite different tools. Simulators and emulatorsSo-called in-circuit emulators were once the only option for assessing system performance. To use them a design had to be committed to silicon. Modern simulators, in contrast, can simulate entire systems, including processors, on-chip peripherals, system-level peripherals, peripheral hardware devices, the operating system, and application software. Software simulators let engineers develop and integrate software without first having to purchase a DSP and associated hardware. Typical simulators consist of a high-level language debugger and a DSP simulation engine. The simulation engine is a software model of the physical DSP device. Typical instruction-level simulators are good for early development work, prototyping, and proof of concept. However, these all-software tools are relatively slow so they aren't appropriate for simulating large applications or for performance analysis. Better estimates of timing and system behavior require cycle-accurate simulators or VHDL simulators. These tools model delays in memory access, pipeline stalls, and all other hardware-related functions that instruction-level simulators ignore. Simulation can take place at various levels of abstraction. Abstraction means to extract the essential properties while omitting inessential details. Higher levels of abstraction focus on larger, more important pieces of information of progressively smaller volume and number. Conversely, lower abstraction levels reveal more detail and raise the volume of information. Obviously there is a trade-off between accuracy and performance when modeling at different levels of abstraction. And regardless of abstraction level, simulators are no substitute for actual hardware. Often times software developers claim there's a hardware problem because their software, which ran fine on a simulator, won't work on an emulator. In many cases the software is to blame. Emulators catch many timing-related problems with software that instruction-level simulators miss. Emulators provide access to DSPs and peripherals for debugging and integrating software and hardware. Emulators read and write to hardware registers and memory. Also supported are common functions including breakpoints, single stepping, and benchmarking. Most emulators are both spatially and temporally nonintrusive. Spatially nonintrusive means an emulator needs no additional hardware or software in the target environment. Temporally nonintrusive emulators let a processor or system execute at full speed. But shrinking die size in DSPs and other ICs has most chipmakers omitting emulator logic from the chips themselves. A chip-emulator interconnect standard called Joint Test Action Group permits board-level testing but requires some on-chip logic to implement. Emulation tools can also support parallel-processing applications. Here, the scan interconnection is daisy-chained between multiple processors, each controlled by its own emulator. A multitasking operating system controls each DSP in a separate window. Some programming issuesAssembly language used to be a de facto programming standard though that's changing. A general lack of portability, and difficulty maintaining and writing assembly language code has some industries limiting its use to only portions of code that have to run super fast in an attempt to lower life-cycle costs. Parallel devices and VLIW (very-large instruction word) architectures make the job of efficient assembly programming even harder. VLIW is an instruction set in which the compiler packs a number of simple, noninterdependent operations into the same instruction word. This makes it nearly impossible to manually pipeline and optimize an algorithm without a cycle-accurate simulator and plenty of trial and error. DSP makers are beginning to recognize the limitations of assembly language and now include increasingly sophisticated tools and more portable languages. C/C++ compilers and Java are becoming commonplace. For example, an assembly language optimizer developed for the TMS320C6xx VLIW device lets programmers write serial assembly language implementations then parallelize them to run efficiently on a device. The use of a virtual register set eliminates the allocating of specific registers. Still, such tools are no substitute for manual programming. Even when using a high-level language such as C, efficient algorithm implementations may be extremely difficult to make work properly. This is especially true for algorithmic-intensive DSP applications. Here, the preferred approach is; Make it run correctly, then make it run fast.
|