[r-t] Hardware support for permutation operations

Quentin Armitage Quentin at Armitage.org.uk
Thu Feb 18 18:16:54 UTC 2010

On Wed, 2010-02-17 at 08:54 +0000, Mark Davies wrote:

> Now, in Intel's SSSE3 revision of SSE3, there is a PSHUFB instruction, 
> which appears to permute the 16 bytes in one 128-bit register by the 16 
> index values in another register. That looks like a straight, 
> one-machine-cycle permutation operation to me. in SSE5 there is a 
> "PPERM" instruction which is also described as a permutation function; 
> how it differs, I don't know.

AMD seem to have dropped SSE5, and are replacing it with XOP, with PPERM
being replaced by VPPERM; the operation of the instruction seems to be
the same, but there is a slight restriction on the addressing modes of
the inputs. VPPERM takes 32 bytes (2 128-bit registers or 1 register and
1 128-bit memory reference) as input, permutes (a subset) of them by the
16 index values (and opcodes) in another 128-bit register/memory
location and produces 16 bytes (128-bit register) as output. Two VPPERM
instructions with the same input would therefore support up to 32 bells.
This technique would not work with PSHUFB since it overwrites  the input
register with the result. The forthcoming AVX extensions from Intel will
allow separate input and output, but do not extend it to 256-bit

> (I think IBM may have got there first with the Power6 and their 
> "AltiVec" instruction set, which also contains permutation support, but 
> with the departure of Apple to Intel, that's not much use for most of 
> us. SSSE3 and SSE5 are however widely available.)

Have any SSE5 processors been produced? From what I can see the first
XOP processors are due to start manufacture in 2011.

Quentin Armitage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://bellringers.net/pipermail/ringing-theory/attachments/20100218/ca37eea9/attachment-0004.html>

More information about the ringing-theory mailing list