As part of optimization package Dalsoft provides port of GNU compilers to the Digital Signal Processor or CPU of your choice (referred as target). Please click here for some non technical details and/or contact us if you like to further discuss the possibilities of retargeting GNU compilers to a processor of your choice.

Introduction

The GNU Compiler Collection gcc offers several versions of the compiler front ends (C, C++, Objective C, Fortran, Java) and means for retargeting compiler to a new architecture. The advantages of using gcc are too numerous to be mentioned here, however the high execution speed of the code it generates is not one of them. Numerous studies ( for example, this one ) have shown that code, generated by gcc, does not match the speed achieved by other compilers for the same DSP/CPU target. We believe that this is due to the nature of the technology that is available for retargeting  (e.g. machine description, target description macros etc.) which we found inherently restrictive and incapable of  allowing implementation with a tight interface between compiler and hardware architecture which is necessary to achieve  high utilization of the available resources. To address this problem we at Dalsoft took another approach: we are using superb front ends, supported by gcc, fully utilizing the vast number of machine-independent (middle-end) optimizations, provided by gcc, but the final code optimization leave to our optimizer.
The study we performed shows such an approach produces code on par or better than one generated by compilers specially crafted for a particular target. See this for the results of another study comparing the code generated by the gcc compiler, icc compiler (Intel's compiler specifically designed for x86) and dco optimized code that was generated by the gcc compiler.

General Overview

From the users point of view, our port looks exactly as most other ports of gcc. We offer legendary compiler driver gcc and enable all the options available. Compiler is activated and, if no-compilation errors has been detected, produces assembly code for the program compiled. The assembly code, as produced by the compiler, may be processed by gcc to generated object file. However, in order to take full advantage of the options and features provided by the target processor, the next stage, that may be performed, is to process the assembly code, generated by the compiler, by our optimizer (dco). Note that use of dco is optional and should be done only if extra optimizations are desirable.

The design and implementation of the gcc – dco package is done in such a way as to:
  •  take full advantage of the functionality provided by gcc
    we are using front ends, supported by gcc, and provide machine description and target specific macros that enable full and most efficient utilization of the functionality provided by gcc (e.g vast number of machine-independent (middle-end) optimizations, register allocation etc.).
  • provide tight integration between gcc and dco
    compiler code generation is done in such a way as to enable dco to perform its work in most efficient and extensive manner.

Retargets

At this time we successfully retargeted gcc package for StarCore DSP from Freescale.

Retargeting for StarCore

Currently we only offer the StarCore version of the C compiler - click here to get a copy.
StarCore is a powerful and complicated Digital Signal Processor. It provides great number of options and superb functionality but also imposes severe restrictions on the code sequence to be valid and/or optimal. We feel that generating correct and optimal StarCore code utilizing only gcc is as difficult as recreating the smile of Mona Lisa using spray paint. But integration with our StarCore optimizer (sco) solved the problem.

We evaluated our work using POWERSTONE benchmark suite.

Assembly code was generated by gcc run with maximum optimizations (-O3). Optimizer was used with the default set of options - no special optimizations (e.g. vectorization, loop unrolling) were attempted. The results were verified and performance was compared to one achieved by the Metrowerk compiler (Motorola's compiler specifically designed for StarCore) run with maximum optimizations (-O3).

The following table lists times (first number) and the total object size (second number - e.g. 13019/19424 means that code run for 13019 clocks and total size of the object file, text+data, was 19424 bytes) observed during our study. Needless to say that using sco with special optimization options may produce even better results.

>Powerstone benchmarks
  gcc+sco Metrowerk
compress 11704 / 18962 13019 / 19424
engine 78876 / 1877 86671 / 3184
eval2 437 / 2357 916 / 2960
jpeg 283167 / 13980 334757 / 13846
ucbqsort 339717 / 2540 363775 / 3096
v42bis 22005 / 41469 88295 / 36858