[r-t] Hardware support for permutation operations
Mark Davies
mark at snowtiger.net
Wed Feb 17 08:54:47 UTC 2010
Just been having a look at the finer details of recent processor
architectures. One big change is that chip manufacturers (Intel, AMD and
IBM with the PowerPC) have been beefing up the vector processing
capabilities of their chips. These are generally based on 128-bit
registers, which can be processed in parallel as a series of bytes or
larger words. Basically, they are turning them into old-fashioned
supercomputers (new supercomputers simply being lots of ordinary
microprocessor CPUs clustered together...).
Now, in Intel's SSSE3 revision of SSE3, there is a PSHUFB instruction,
which appears to permute the 16 bytes in one 128-bit register by the 16
index values in another register. That looks like a straight,
one-machine-cycle permutation operation to me. in SSE5 there is a
"PPERM" instruction which is also described as a permutation function;
how it differs, I don't know.
In the real world, I'm not sure this makes much difference to us, but in
theory hardware support for permutations means all that tedious
table-building is dispensed with... on 16 bells or less at least!
(I think IBM may have got there first with the Power6 and their
"AltiVec" instruction set, which also contains permutation support, but
with the departure of Apple to Intel, that's not much use for most of
us. SSSE3 and SSE5 are however widely available.)
MBD
More information about the ringing-theory
mailing list