...but the CPU was still pegged.

If you're letting it run full-speed and outputting to /dev/null for speed test, the CPU will always be pegged. But if they get 30% CPU usage, that should be (percentage of real-time playback) x (average CPU usage during decode) = 30%. So it should be finishing in 0:50. Even so, with your -O9 finishing in 2:09, that's 78% (a pretty good start).
_________________________
--The Amigo