Single Instruction, Multiple Data

A quite interesting technology for video games is SIMD, spelled out Single Instruction Multiple Data, meaning that one computer instruction processes more than one value. This may be best illustrated by an example. A normal piece of code that every game needs will look vaguely like this:

positionX += time * velocityX;
positionY += time * velocityY;
positionZ += time * velocityZ;

The += means increase the value to the left by the value on the right. So we have six calculations, consisting of three additions and three multiplications, and we can approximately say that they take six units of time. With SIMD, though, the pseudocode looks as follows:

position += toVector(time) * velocity

Here, the two variables are both vectors consisting of multiple components (by default four). Not counting the toVector-method (which can often be removed by some means) we are down to only two operations. The result is still the same, but the additions and multiplications are now happening in parallel.

Different versions of SIMD are known as Altivec on PowerPC CPUs, various versions of SSE on Intel and, interestingly, NEON on ARM CPUs. ARM CPUs are not as well known as those by Intel, but are used for just about everything, including as the CPUs in iPhones and iPod touches. The first two generations of either (up the iPhone 3G) have a so-called ARMv6 CPU, without NEON, but the iPhone 3GS and the third-generation iPod touch both have an ARMv7 “Cortex” CPU which does feature this SIMD extension.

Apple doesn’t acknowledge this anywhere, neither for users nor for developers, but if you follow the instructions on then it is possible to write NEON code that works on an iPhone. If you also happen to have a non-NEON-version (as I already did for my rail simulator), then you can create code for both plain ARMv6 and the extensions of the ARMv7 and get, as in the days of the PowerPC to Intel transition, an universal binary that’ll run on both. I did test that with my railroad simulator and it works

Whether I’ll actually use that, though, is a different question. It does make the application slightly larger (about 80 kB at the moment, but I plan on writing more code, increasing this difference), and I have to test on both. Since I don’t actually plan to develop two versions of the game at the moment, though, I have no actual advantage, because the iPhone 3GS is much faster than an older device without such tricks already. Finally, it doesn’t really matter either: Most of the performance-critical work my game does is drawing things, which is of course done by the completely separate GPU. The parts that I did optimize with NEON never were important for the performance in any way. In the end, it is a nice research project for me, but not truly useful.

Written on October 18th, 2009 at 07:17 pm


    New comments can no longer be posted because it got to annoying to fight all the spam.