Recently I've been teaching myself PowerPC assembly through porting JONESFORTH to PowerPC on Mac OS X. It struck me to run the same little fibonacci-sequence microbenchmark that I ran lo these many years past. The results were interesting:
|Language||Implementation Detail||Time (per (fib 29) call, in milliseconds)||Ops/s||Ratio (opt. C)||Ratio (unopt. C)|
|FORTH||JONESFORTH ported to PPC||277||81096000||4.95||2.37|
The hand-coded assembly beats all the other entrants (perhaps unsurprisingly). The naive indirect-threaded FORTH is the fastest interpreted language, merely 5 times slower than fully optimised C.
Here's the JONESFORTH code:
: FIB DUP 2 >= IF 1- DUP RECURSE SWAP 1- RECURSE + ELSE DROP 1 THEN ;
and here's the PPC assembly (arg and result in
_SFIB: cmpwi r3,2 bge 1f li r3,1 blr 1: mflr r0 stw r0,-4(r1) addi r3,r3,-1 stwu r3,-8(r1) bl _SFIB lwz r4,0(r1) stw r3,0(r1) addi r3,r4,-1 bl _SFIB lwz r4,0(r1) add r3,r3,r4 lwz r0,4(r1) addi r1,r1,8 mtlr r0 blr