1 min readOct 14, 2018
Thanks Andy!
Precalculating (1 << i) is indeed a good approach, even though once the loop is unrolled they turn into constants (the compiler should evaluate the expression in compile time and replace it with the result). But profiling is the king here — the best way to know is just to try and measure the execution time. Sometimes things like branch prediction in the CPU can really surprise you.
p.s. I just found out some developer from the CP community did even more profiling this weekend, comparing the various optimization decorators (native, viper and arm_thumb).