Instruction Inputs Output Description |
Swizzle: arbitrarily rearrange the order of input vector components (may cost a clock or two) |
ADD a,b, c.yxwz; |
Negate: flip the sign of an input (free) |
ADD a,b, -c; |
Saturate: clamp output values to lie between 0 and 1 (free) |
ADD_SAT a,b,c; |
Writemask: only change certain components of the output vector (free) |
ADD a.xz, b,c; |
Opcode Time/pixel Maximum Count Slow options (Swizzle, saturate, negate, writemask)These numbers were collected using November 2005 drivers on an ATI Radeon 9550, a fairly low-end card.
ABS: -0.00 ns 128 times 1.00 ns Swizzle 0.46 ns Saturate
ADD: 0.47 ns 64 times 2.05 ns Swizzle
CMP: 0.46 ns 64 times 2.21 ns Swizzle
COS: 5.34 ns 6 times
DP3: 0.46 ns 64 times 2.05 ns Swizzle
DP4: 0.46 ns 64 times 2.64 ns Swizzle
DPH: 0.46 ns 64 times 2.67 ns Swizzle
DST: 0.49 ns 62 times 1.46 ns Saturate 0.43 ns Writemask
EX2: 0.47 ns 64 times
FLR: 0.97 ns 32 times 2.43 ns Swizzle
FRC: 0.46 ns 64 times 1.94 ns Swizzle
LG2: 0.46 ns 64 times
LIT: -0.18 ns 128 times 4.37 ns Swizzle 0.00 ns Negate
LRP: 0.46 ns 64 times 2.25 ns Swizzle 0.97 ns Negate
MAD: 0.46 ns 64 times 2.21 ns Swizzle
MAX: 0.46 ns 64 times 2.05 ns Swizzle
MIN: 0.46 ns 64 times 2.05 ns Swizzle
MOV: -0.00 ns 128 times 0.46 ns Saturate
MUL: 0.47 ns 64 times 2.05 ns Swizzle
POW: 1.46 ns 21 times
RCP: 0.46 ns 64 times
RSQ: 0.46 ns 64 times
SCS: 3.40 ns 9 times 4.30 ns Saturate
SGE: 0.97 ns 32 times 2.72 ns Swizzle
SLT: 0.97 ns 32 times 2.72 ns Swizzle
SIN: 4.37 ns 7 times 4.86 ns Saturate
SUB: 0.46 ns 64 times 2.05 ns Swizzle
SWZ: 1.00 ns 29 times 1.94 ns Saturate 0.05 ns Writemask
TEX: 0.48 ns 3 times 0.97 ns Swizzle
TXP: 0.49 ns 3 times 0.97 ns Swizzle
TXB: 0.49 ns 3 times 0.97 ns Swizzle
XPD: 0.97 ns 32 times 3.88 ns Swizzle
Opcode Time/pixel Maximum Count Slow options (Swizzle, saturate, negate, writemask)These numbers were collected using November 2005 drivers from an nVidia GeForce 6800 (stock), which is a much more expensive card than the ATI card tested above, so don't infer anything from the relative performance.
ABS: -0.00 ns 1024 times 0.07 ns Saturate
ADD: 0.00 ns 1024 times 0.13 ns Saturate
CMP: -0.01 ns 1024 times
COS: 0.13 ns 1024 times
DP3: 0.07 ns 1024 times
DP4: 0.07 ns 1024 times 0.13 ns Writemask
DPH: 0.14 ns 1024 times
DST: 0.13 ns 1024 times 0.02 ns Swizzle
EX2: 0.13 ns 1024 times
FLR: 0.13 ns 1024 times
FRC: 0.13 ns 1024 times
LG2: 0.13 ns 1024 times
LIT: 0.39 ns 1024 times 0.26 ns Writemask
LRP: 0.14 ns 1024 times
MAD: 0.13 ns 1024 times
MAX: 0.01 ns 1024 times 0.13 ns Swizzle 0.13 ns Negate
MIN: 0.01 ns 1024 times 0.13 ns Swizzle 0.13 ns Negate
MOV: 0.00 ns 1024 times
MUL: 0.00 ns 1024 times 0.07 ns Saturate
POW: 0.26 ns 1024 times
RCP: -0.01 ns 1024 times 0.13 ns Negate 0.13 ns Saturate
RSQ: 0.25 ns 1024 times
SCS: 0.13 ns 1024 times
SGE: 0.13 ns 1024 times
SLT: 0.13 ns 1024 times
SIN: 0.13 ns 1024 times
SUB: -0.00 ns 1024 times 0.13 ns Saturate
SWZ: -0.00 ns 1024 times 0.07 ns Saturate
TEX: 0.13 ns 1024 times
TXP: 0.13 ns 1024 times
TXB: 0.13 ns 1024 times
XPD: 0.14 ns 1024 times