The SSE3 instruction set adds about a dozen instructions (intrinsics are in the pimmintrin.h header). The main operation these instructions provide is the ability to do “horizontal” adds and subtracts (ARM-NEON refers to these as ‘pairwise’ operations) for float4 and double2 data.

 Result = _mm_hadd_ps(V1,V2);
->
 Result[0] = V1[0] + V1[1];
 Result[1] = V1[2] + V1[3];
 Result[2] = V2[0] + V2[1];
 Result[3] = V2[2] + V2[3];

There are variants that use different signs for the two values, but otherwise they are basically the same.

The majority of the DirectXMath library is designed around avoiding the needing for these operations, but they are useful for dot-product operations (VMX128 on the Xbox 360 had a specific instruction for doing dot-products across a vector, but not a general pairwise add).

The existing SSE/SSE2 dot-product for float4:

 inline XMVECTOR XMVector4Dot(FXMVECTOR V1, FXMVECTOR V2)
 {
 XMVECTOR vTemp2 = V2;
 XMVECTOR vTemp = _mm_mul_ps(V1,vTemp2);
 vTemp2 = _mm_shuffle_ps(vTemp2,vTemp,_MM_SHUFFLE(1,0,0,0));
 vTemp2 = _mm_add_ps(vTemp2,vTemp);
 vTemp = _mm_shuffle_ps(vTemp,vTemp2,_MM_SHUFFLE(0,3,0,0));
 vTemp = _mm_add_ps(vTemp,vTemp2);
 return XM_PERMUTE_PS(vTemp,_MM_SHUFFLE(2,2,2,2));
 }

can be rewritten using SSE3 as:

 inline XMVECTOR XMVector4Dot(FXMVECTOR V1, FXMVECTOR V2)
 {
 XMVECTOR vTemp = _mm_mul_ps(V1,V2);
 vTemp = _mm_hadd_ps( vTemp, vTemp );
 return _mm_hadd_ps( vTemp, vTemp );
 }

This version has the same number of multiply/add operations, but there are three fewer shuffles required. As we’ll see in a future installment, there are actually some better options than this in SSE 4.1.

There are also two new instructions which can be used as a special-case substitute for the XMVectorSwizzle<> template. We’ll make use of these in a future installment.

`XMVectorSwizzle<0,0,2,2>(V)`	`_mm_moveldup_ps(V)`
`XMVectorSwizzle<1,1,3,3>(V)`	`_mm_movehdup_ps(V)`

The Supplemental SSE3 (SSSE3) instruction set adds the equivalent “horizontal” adds and subtracts for various integer vectors, so they are not particularly useful for DirectXMath. These intrinsics are located in the tmmintrin.h header. There are also some other useful integer operations that make life simpler for implementing algorithms like Fast Block Compress, codecs, or other image processing on integer data which are a bit out of scope for DirectXMath.

There is one SSSE3 intrinsic of interest for DirectXMath: _mm_shuffle_epi8. The purpose of this instruction is to be able to rearrange the bytes in a vector, which makes it an excellent function for doing vector-based Big-Endian/Little-Endian swaps without having to ‘spill’ the vector to memory and reload it.

inline XMVECTOR XMVectorEndian( FXMVECTOR V )
 {
 static const XMVECTORU32 idx = { 0x00010203, 0x04050607, 0x08090A0B, 0x0C0D0E0F };
 __m128i Result = _mm_shuffle_epi8( _mm_castps_si128(V), idx );
 return _mm_castsi128_ps( Result );
 }

There’s not enough use for this kind of operation to make this function part of the library (Windows x86, Windows x64, and Windows RT are all Little-Endian platforms), but it can be useful for some cross-platform tools processing (Xbox 360 is Big-Endian).

Processor Support

SSE3 is supported by Intel Pentium 4 processors (“Prescott”), AMD Athlon 64 (“revision E”), AMD Phenom, and later processors. This means most, but not quite all, x64 capable CPUs should support SSE3.

Supplemental SSE3 (SSSE3) is supported by Intel Core 2 Duo, Intel Core i7/i5/i3, Intel Atom, AMD Bulldozer, and later processors.

 int CPUInfo[4] = {-1};
 __cpuid( CPUInfo, 0 );
 bool bSSE3 = false;
 bool bSSSE3 = false;
 if ( CPUInfo[0] > 0 )
 {
 __cpuid(CPUInfo, 1 );
 bSSE3 = (CPUInfo[2] & 0x1) != 0;
 bSSSE3 = (CPUInfo[2] & 0x200) != 0;
 }

You can also use the IsProcessorFeaturePresent Win32 API with PF_SSE3_INSTRUCTIONS_AVAILABLE on Windows Vista or later to detect SSE3 support. This API does not report support for SSSE3.

Utility Code

The source code attached to this blog post is bound to the Microsoft Public License (MS-PL).

DirectXMath: SSE3 and SSSE3

Processor Support

Utility Code

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...