Some
compilers allow use of predefined set of StarCore assembly patterns
inside a C program. For example the following kernel:
s = 0;
for( i = 0; i
< N; i++ )
{
s
+= v[i]*v[i];
}
May
be rewritten as:
s = 0;
for( i = 0; i
< N; i++ )
{
s0
= L_mac( s, v[i], v[i] );
}
which
compiler translate into:
loopstart3
L14
[
mac d2,d2,d1
aslw d0,d2
move.l (r0)+,d0
]
loopend3
loopstart3
__dco_lu_10:
[
move.l
(r3)+n1,d6
mac
d2,d2,d4
aslw
d0,d0
move.l
(r0)+n1,d2
]
[
aslw
d6,d7
aslw
d2,d6
move.l
(r0)+n1,d2
mac
d0,d0,d5
move.l
(r3)+n1,d0
]
[
mac
d6,d6,d3
mac
d7,d7,d1
aslw
d2,d2
]
loopend3
which
processes four points in 3 clocks – 25% improvement.