Some compilers allow use of predefined set of StarCore assembly patterns inside a C program. For example the following kernel:

    s = 0;

    for( i = 0; i < N; i++ )

     {

      s += v[i]*v[i];

     }

May be rewritten as:

    s = 0;

    for( i = 0; i < N; i++ )

     {

      s0 = L_mac( s, v[i], v[i] );

     }

which compiler translate into:

    loopstart3

L14

 [

    mac      d2,d2,d1

    aslw     d0,d2

    move.l   (r0)+,d0

 ]

    loopend3

Also this code is quit efficien ( runs at 1clock/point ), sco optimizes it into:

       loopstart3

__dco_lu_10:

       [

       move.l (r3)+n1,d6

       mac d2,d2,d4

       aslw d0,d0

       move.l (r0)+n1,d2

       ]

       [

       aslw d6,d7

       aslw d2,d6

       move.l (r0)+n1,d2

       mac d0,d0,d5

       move.l (r3)+n1,d0

       ]

       [

       mac d6,d6,d3

       mac d7,d7,d1

       aslw d2,d2

       ]

       loopend3

which processes four points in 3 clocks – 25% improvement.