Some compilers allow use of predefined set of StarCore assembly patterns inside a C program. For example the following kernel:

    s = 0;
    for( i = 0; i < N; i++ )
     {
      s += v[i]*v[i];
     }

May be rewritten as:

    s = 0;
    for( i = 0; i < N; i++ )
     {
      s0 = L_mac( s, v[i], v[i] );
     }

which compiler translate into:

    loopstart3
L14
    [
    mac      d2,d2,d1
    aslw     d0,d2
    move.l   (r0)+,d0
    ]
    loopend3

Also this code is quit efficien ( runs at 1clock/point ), sco optimizes it into:

       loopstart3
__dco_lu_10:
       [
       move.l (r3)+n1,d6
       mac d2,d2,d4
       aslw d0,d0
       move.l (r0)+n1,d2
       ]
       [
       aslw d6,d7
       aslw d2,d6
       move.l (r0)+n1,d2
       mac d0,d0,d5
       move.l (r3)+n1,d0
       ]
       [
       mac d6,d6,d3
       mac d7,d7,d1
       aslw d2,d2
       ]
       loopend3

which processes four points in 3 clocks  25% improvement.