[r-t] Hardware support for permutation operations

Mark Davies mark at snowtiger.net
Wed Feb 17 08:54:47 UTC 2010

Just been having a look at the finer details of recent processor 
architectures. One big change is that chip manufacturers (Intel, AMD and 
IBM with the PowerPC) have been beefing up the vector processing 
capabilities of their chips. These are generally based on 128-bit 
registers, which can be processed in parallel as a series of bytes or 
larger words. Basically, they are turning them into old-fashioned 
supercomputers (new supercomputers simply being lots of ordinary 
microprocessor CPUs clustered together...).

Now, in Intel's SSSE3 revision of SSE3, there is a PSHUFB instruction, 
which appears to permute the 16 bytes in one 128-bit register by the 16 
index values in another register. That looks like a straight, 
one-machine-cycle permutation operation to me. in SSE5 there is a 
"PPERM" instruction which is also described as a permutation function; 
how it differs, I don't know.

In the real world, I'm not sure this makes much difference to us, but in 
theory hardware support for permutations means all that tedious 
table-building is dispensed with... on 16 bells or less at least!

(I think IBM may have got there first with the Power6 and their 
"AltiVec" instruction set, which also contains permutation support, but 
with the departure of Apple to Intel, that's not much use for most of 
us. SSSE3 and SSE5 are however widely available.)


More information about the ringing-theory mailing list