Windows 8 includes an updated “DirectX 11.1 Runtime” that supports Direct3D 11.1, updates Direct2D and DirectWrite, DXGI 1.2, and a revision of the Windows Imaging Component (WIC).

Portions of the “DirectX 11.1 Runtime” are being made available on Windows 7 Service Pack 1 via the Platform Update for Windows 7 Service Pack 1 and Windows Server 2008 R2 Service Pack 1 (KB 2670838) included with Internet Explorer 10 for Windows 7. This includes the updated components above, but is limited to WDDM 1.1 drivers on Windows 7.

Full technical details on what is and is not included in the update are available on MSDN. For information about IE10 compatibility, see this article.

Note: KB 2670838 does not include XINPUT 1.4 or XAudio 2.8 on Windows 7. These remain Windows 8 exclusive. See XINPUT and Windows 8 and XAudio2 and Windows 8 for guidance on handling this difference in Win32 desktop applications.

Update:KB 2670838 and Internet Explorer 10 for Windows 7 are now available. Users with the prerelease version installed should update their systems.

Notes for users of the DirectX SDK

The updated headers and link libraries needed to target the new components on Windows 8 and Windows 7 are in the Windows 8.0 SDK as indicated in previous posts (see Where is the DirectX SDK?). See MSDN for details on ‘mixing’ the Windows 8.0 SDK and legacy DirectX SDK if needed.

It is also important to note that the updated “Debug Runtime” components in the Windows 8. 0 SDK are required on Windows 7 once KB 2670838 is installed. The legacy DirectX SDK (June 2010)“Debug Runtime” for Direct3D 10.x and Direct3D 11.x is not compatible with Windows 8 or Windows 7 once this update is applied. You can install the Windows 8.0 SDK standalone, VS 2012 which includes the Windows 8.0 SDK, or the VS 2012 Remote Debugging Tools (x86 or x64) to get the updated SDK Debug Layers files.

The legacy PIX for Windows tool in the DirectX SDK (June 2010) release does not support Direct3D 10.x or Direct3D 11.x applications on Windows 8, and after this update is applied it will no longer support these applications on Windows 7. Direct3D 9 application debugging continues to function.

Notes for users of VS 2012

Visual Studio 2012’s Graphics Debugger supports Direct3D 11.0 applications on Windows 7 and DirectX 11.x applications on Windows 8. Improved support for KB 2670838 is in VS 2012 Update 2.

When using VS 2012 Update 1's new "v110_xp" Platform Toolset the DirectX 11.1, WIC2, and related headers are not available.

↧

Windows Imaging Component and Windows 8

November 19, 2012, 3:07 pm

≫ Next: Visual Studio 2012 Update 1

≪ Previous: DirectX 11.1 and Windows 7

There are a number of new features and some bugs fixed in the Windows Imaging Component for Windows 8. With the installation of KB 2670838 this new version of WIC is also available on Windows 7 Service Pack 1.

The Windows 8.0 SDK contains the latest version of the headers needed to build with the new version of WIC. The behavior of wincodec.h changes depending your build-settings. If you build with _WIN32_WINNT set to 0x602 or later, then WINCODEC_SDK_VERSION, CLSID_WICImagingFactory, and CLSID_WICPngDecoder are set to use "WIC2" by default. Otherwise, it uses the old "WIC1" version. This means Windows Store apps and Win32 desktop applications built for Windows 8 only are already using the new version of WIC. No muss. No fuss. Win32 desktop applications built for older versions of Windows continue to use "WIC1" and the old behaviors are maintained.

If, however, you want to use "WIC2" when it is available but successfully fall back to "WIC1" on Windows Vista or Windows 7 without the KB 2670838 update, then things get a little tricky. The _WIN7_PLATFORM_UPDATE define 'opts-in' to the Windows 8 header behavior without requiring you set your _WIN32_WINNT define in a way that doesn't support older versions of Windows. You will want to avoid using WINCODEC_SDK_VERSION, CLSID_WICImagingFactory, and CLSID_WICPngDecoder and instead use the explicit 'version' ones. You also need to be careful when using the four new WIC pixel format GUIDs (GUID_WICPixelFormat32bppRGB, GUID_WICPixelFormat64bppRGB, GUID_WICPixelFormat96bppRGBFloat, and GUID_WICPixelFormat64bppPRGBAHalf) as they are not valid for use with "WIC1" APIs.

For example, here is how you should be creating the WIC factory for Win32 desktop applications that support older versions of Windows:

 #define _WIN7_PLATFORM_UPDATE
 #include <wincodec.h>

 // CoInitializeEx needs called at some point before this 

 IWICImagingFactory* wicFactory = nullptr;
 HRESULT hr = CoCreateInstance(CLSID_WICImagingFactory2, nullptr, CLSCTX_INPROC_SERVER, __uuidof(IWICImagingFactory2), reinterpret_cast<LPVOID*>( &wicFactory ) );
 if ( SUCCEEDED(hr) )
 {
 // WIC2 is available on Windows 8 and Windows 7 SP1 with KB 2670838 installed
 // Note you only need to QI IWICImagingFactory2 if you need to call CreateImageEncoder
 }
 else
 {
 hr = CoCreateInstance(CLSID_WICImagingFactory1, nullptr, CLSCTX_INPROC_SERVER, __uuidof(IWICImagingFactory), reinterpret_cast<LPVOID*>( &wicFactory ) );
 }

This results in the application using the new "WIC2" behaviors when available, but falls back to the older versions of WIC when it's not available.

DirectXTK and DirectXTex were both recently updated to support "WIC2" when it is available. This includes use of new WIC pixel formats ( GUID_WICPixelFormat96bppRGBFloat is the most useful since it matches DXGI_FORMAT_R32G32B32_FLOAT ), opts into the new Windows BMP BITMAPV5HEADER support (which encodes 32-bit with alpha channels for GUID_WICPixelFormat32bppBGRA and reads such BMP files as well), and can make use of the fix to the TIFF decoder for 96bpp floating-point images (which load as GUID_WICPixelFormat96bppRGBFloat).

Windows phone: Note that the Windows phone 8 platform does not support the WIC API.

VS 2012 Update 1: When building with the "v110_xp" Platform Toolset, the WIC2 header content is not available so avoid the use of _WIN7_PLATFORM_UPDATE for these configurations.

↧

Visual Studio 2012 Update 1

November 26, 2012, 2:44 pm

≫ Next: Direct3D SDK Debug Layer Tricks

≪ Previous: Windows Imaging Component and Windows 8

An update to Visual Studio 2012 is now available for download. For full details, see the following blog posts: Visual Studio team blog, Somasegar's blog, and Visual Studio ALM + Team Foundation Server blog.

This update includes support for targeting Windows XP with the Visual C++ 2012 toolset and CRT. This provides C++11 Language and Standard Library support for Win32 desktop applications compatible with the legacy Windows XP platform. This is accomplished through using the Platform Toolset "v110_xp". Details on this were announced for the CTP.

See KB 2797915

Compiler and CRT

VS 2012 Update 1 includes a new version of the compiler (17.00.51106.1) and C/C++ Runtime (11.0.51106.1).

MSDN downloads has the updated retail redistribution packages which are now compatible with Windows XP Service Pack 3.

Windows SDK and Windows XP

When building applications that support the legacy Windows XP platform, you are using a platform header and library set similar to those that shipped in the Windows 7.1 SDK rather than the Windows 8.0 SDK with the integrated DirectX SDK content (see Where is the DirectX SDK?). Many of the "DirectX" headers and libraries are included with these Windows XP compatible platform headers (see DirectX SDKs of a certain age) such as Direct3D 9, DirectSound, and DirectInput. You will, however, need to continue to use the legacy DirectX SDK for Windows XP compatible versions of the D3DCompile API (#43), legacy D3DX, XAUDIO2, XINPUT, and PIX for Windows tool (the Visual Studio 2012 Graphics Debugger does not support Direct3D 9 applications). See MSDN for details on 'mixing' the legacy DirectX SDK header and library paths into your VS 2012 project.

Windows SDK 7.1A is installed as part of VS 2012 Update 1 for use with the "v110_xp" Platform Toolset, which contains the headers, libraries, and a subset of the tools that originally shipped in the Windows SDK 7.1. There are older Direct3D 10 and Direct3D 11 headers as part of this 7.1 era toolset which are outdated compared to the Windows 8.0 SDK versions using the standard "v110" Platform Toolset, particularly the SDK Debug Layers installed by the Windows 8.0 SDK on Windows 7 and Windows 8. The Platform Toolset "v110_xp" is therefore not recommended for developing DirectX 11 applications, but it can technically be done with some caution. Windows SDK 7.1A does not contain a dxguid.lib so must either locally define the required GUIDs in your project by using #define INITGUID in one of your .cpp files, or use the legacy DirectX SDK version.

Note: The Direct3D 9 Developer Runtime is only available on Windows 7 or older versions of the OS using the legacy DirectX SDK. Direct3D 9 debugging on Windows 8 is only supported using 'checked' builds of the OS.

DirectXMath and Windows XP

DirectXMath in the Windows 8.0 SDK is compatible with Windows XP (it can be used to target Windows XP applications using Visual C++ 2010 and the Windows 8.0 SDK), but the "v110_xp" Platform Toolset include paths will not find it. You may want to make use XNAMath 2.05 for your Windows XP configurations, or you can make a local copy of the DirectXMath headers in the project for Windows XP configurations.

Code Analysis

This update includes Code Analysis /analyze support for Windows phone 8 applications (i.e. when using the Platform Toolset "v110_wp80")

Note that the Platform Toolset "v110_xp" does not support Code Analysis /analyze which is disabled due to incompatibilities with the Windows SDK 7.1A headers.

Related:Visual Studio 2012 Update 2

↧

Direct3D SDK Debug Layer Tricks

November 30, 2012, 12:25 pm

≫ Next: Game Rating Systems and Windows 7

≪ Previous: Visual Studio 2012 Update 1

When programming graphics applications, one of the more frustrating aspects of development is that you can end up writing thousands of lines of code and when you run it, all you get is a blank screen. Or maybe a blue screen. Or a crash. But often, not actually a useful image. Errors in state setting, transformation math, and other coding problems can mean your application is completely valid just not useful. Other kinds of coding problems are due to misuse or abuse of the Direct3D API itself. In those cases, enabling the 'debug' device can quickly help identify the problems.

With Direct3D 9 and older versions of the API, there was a "Developer Runtime" installed by the (now legacy) DirectX SDK and the DirectX Control Panel was used to switch the debug support on an off. This worked reasonably well in a world where only the running application was using Direct3D, but starting with Windows Vista this 'global' option was no longer feasible since the OS itself was using Direct3D. Even on the older systems, people would install the DirectX SDK, enable the debugging, and then forget about and find all their games would go really slowly.

Staring with Direct3D 10.0, a new mechanism was created for the "Developer Runtime" via a API layering mechanism and is implemented in the D3D10SDKLAYERS.DLL. Developers could opt in their specific application to the debug validation either by creating the device with D3Dxx_CREATE_DEVICE_DEBUG, or by adding the executable to a list in the DirectX Control Panel. This layering mechanism is also used as a way to improve performance, with the non-debug version of the API doing fairly minimal parameter validation leaving the really detailed diagnostics and validation to the debug layer and only doing it when a developer was wanting the additional debug checking done.

This same system is still in place for Direct3D 11.0 (D3D11SDKLAYERS.DLL) and 11.1 (D3D11_1SDKLAYERS.DLL). It is highly recommended that developers make use of the debug layer to validate their Direct3D 11.x-based applications, paying particular attention to CORRUPTION and ERROR level messages. These are often indicators of severe problems lurking in your code, and could help avoid support issues once your application is deployed. The WARNING and INFO level messages can also be very useful. That said, sometimes there are warnings that are not as useful. Just like with compiler warnings, learning to ignore them as 'noise' can result in missing more actionable messages. Therefore, you should consider having your application suppress 'known' messages and the like whenever the debug device is active.

A common error comes from making use of the Object Naming feature to improve the debugging of you resources with the debug device, VS 2012 graphics debugging, and the legacy PIX for Windows tool.

 D3D11 WARNING: ID3D11Texture2D::SetPrivateData: Existing private data of same name with different size found!
 [ STATE_SETTING WARNING #55: SETPRIVATEDATA_CHANGINGPARAMS]

This could be an important message in some cases, but likely not for cases where you are using WKPDID_D3DDebugObjectName. There may be other warnings your code generates which are also harmless. This can be solved with some code just after you create your device. This code should probably be excluded from your 'production' build for final release, but for testing you may want it in Release builds in cases where the tester turns on debugging manually (such as via the DirectX Control Panel) which is why I'm not showing it in an #ifdef _DEBUG block. Also, for debug builds it is useful for the application to trigger a break-point for the severe cases of CORRUPTION and ERRORs, which is the code I'm showing as #ifdef _DEBUG here. Finally, remember that on 'end-user' systems and standard OS installs, the debug layer creation fails without the appropriate SDK Layers DLL.

For Win32 desktop applications, this code looks like:

 ...
 #ifdef _DEBUG
 deviceCreationFlags |= D3D11_CREATE_DEVICE_DEBUG;
 #endif
 ...
 ID3D11Debug *d3dDebug = nullptr;
 if( SUCCEEDED( d3dDevice->QueryInterface( __uuidof(ID3D11Debug), (void**)&d3dDebug ) ) )
 {
 ID3D11InfoQueue *d3dInfoQueue = nullptr;
 if( SUCCEEDED( d3dDebug->QueryInterface( __uuidof(ID3D11InfoQueue), (void**)&d3dInfoQueue ) ) )
 {
 #ifdef _DEBUG
 d3dInfoQueue->SetBreakOnSeverity( D3D11_MESSAGE_SEVERITY_CORRUPTION, true );
 d3dInfoQueue->SetBreakOnSeverity( D3D11_MESSAGE_SEVERITY_ERROR, true );
 #endif

 D3D11_MESSAGE_ID hide [] =
 {
 D3D11_MESSAGE_ID_SETPRIVATEDATA_CHANGINGPARAMS,
 // Add more message IDs here as needed
 };

 D3D11_INFO_QUEUE_FILTER filter;
 memset( &filter, 0, sizeof(filter) );
 filter.DenyList.NumIDs = _countof(hide);
 filter.DenyList.pIDList = hide;
 d3dInfoQueue->AddStorageFilterEntries( &filter );
 d3dInfoQueue->Release();
 }
 d3dDebug->Release();
 }

For Windows Store and Windows phone 8 applications, this code looks like:

 ...
 #ifdef _DEBUG
 deviceCreationFlags |= D3D11_CREATE_DEVICE_DEBUG;
 #endif
 ...
 ComPtr<ID3D11Debug> d3dDebug;
 if ( SUCCEEDED( d3dDevice.As(&d3dDebug) ) )
 {
 ComPtr<ID3D11InfoQueue> d3dInfoQueue;
 if ( SUCCEEDED( d3dDebug.As(&d3dInfoQueue) ) )
 {
 #ifdef _DEBUG
 d3dInfoQueue->SetBreakOnSeverity( D3D11_MESSAGE_SEVERITY_CORRUPTION, true );
 d3dInfoQueue->SetBreakOnSeverity( D3D11_MESSAGE_SEVERITY_ERROR, true );
 #endif
 D3D11_MESSAGE_ID hide[] =
 {
 D3D11_MESSAGE_ID_SETPRIVATEDATA_CHANGINGPARAMS,
 // Add more message IDs here as needed 
 };
 D3D11_INFO_QUEUE_FILTER filter;
 memset( &filter, 0, sizeof(filter) );
 filter.DenyList.NumIDs = _countof(hide);
 filter.DenyList.pIDList = hide;
 d3dInfoQueue->AddStorageFilterEntries( &filter );
 }
 }

In these cases I throw away the reference to the debug objects ID3D11Debug and ID3D11InfoQueue, but there are some nifty features you might consider using so it can be useful to hold on them instead.

Update: This of course assumes you want to turn off these particular messages 'globally' for your application. You can also suppress them around only a specific part of your code via PushStorageFilter / PopStorageFilter

Developer Runtime: For details on obtaining the proper "Developer Runtime" for your OS, see DirectX 11.1 and Windows 7, Visual Studio 2012 and Windows 8.0 SDK, and Where is the DirectX SDK?. Remember that this includes installing the "reference" device as well as Direct2D debugging facilities.

Windows RT: Windows RT Store app Debugging

↧

Game Rating Systems and Windows 7

January 20, 2013, 11:35 pm

≫ Next: DirectXTK Update

≪ Previous: Direct3D SDK Debug Layer Tricks

Windows 8 includes a number of changes to the parental control ratings systems supported by Windows Family Safety (aka Windows Parental Controls) for Win32 desktop games. These changes are now available on Windows 7 via KB2773072. It is recommended that all game publishers populate their GDFs using the latest Game Definition File Editor (GDFMaker.EXE) in the Windows 8.0 SDK rather than using the legacy DirectX SDK version which does not support these ratings changes.

For full details on these changes to the ratings systems and GDF-related tools, see the Windows 8 and GDFs blog post.

Note: The latest version of the Game Definition File Validator (GDFTrace.EXE) on MSDN Code Gallery has been updated for all the latest ratings system changes.

↧

DirectXTK Update

January 27, 2013, 1:32 pm

≫ Next: DirectX 11.1 and Windows 7 Update

≪ Previous: Game Rating Systems and Windows 7

The DirectX Toolkit (aka DirectXTK) for Direct3D 11 introduced last year and made an official CodePlex project has continued to improve. The DirectXTK project provides 'runtime' utility shared-source C++ code to replace the deprecated D3DX library for Win32 desktop applications using Direct3D 11 (Windows 8, Windows 7, Windows Vista SP2+KB971644, and the Server equivalents) as well as Windows Store apps on Windows 8. This complements DirectXTex which provides shared-source C++ code for 'build time' texture processing that used to ship in D3DX, DirectXMath and Spherical Harmonics math which replace D3DXMath, and the D3DCompile API which replaced the once integrated HLSL compiler. For Win32 desktop applications, there's also an Effects 11 to replace the FX library from D3DX9.

The original toolkit consisted of some key functionality based on the XNA Game Studio design: SpriteBatch, Effects, GeometricPrimitive along with some helper code (CommonStates, VertexTypes) and runtime texture loaders (DDSTextureLoader and WICTextureLoader). Shawn then added SpriteFont and support for Windows phone 8, and I added the ScreenGrab module to support the runtime creation of 'screenshots' from render targets. A fellow Microsoftie also contributed code for a geodesic sphereGeometricPrimitive as an alternative to the traditional UV-sphere.

The most recent additions to the library include

PrimitiveBatch - simple and efficient way to draw dynamic geometry as a replacement for the Direct3D 9 "DrawUP" and "DrawIndexedUP" functions, ideal for debug geometry or on-the-fly procedural geometry generation.
Improved GeometricPrimitive with support for proper winding order for both right-handed coordinates (the assumed default based on the XNA Game Studio conventions) and left-handed coordinates (typically used by DirectX C++ applications), as well as support for drawing them with custom effects.
Model - This draws simple meshes loaded from Visual Studio 2012's built in Autodesk FBX exporter .CMO files or the legacy DirectX SDK Samples Content Exporter.SDKMESH files. This is currently limited to rigid models but we'll be working to add skinning support.

The CodePlex site hosts the latest version of the library, documentation, discussion forums, bug reports, and feature requests. Developers can use Visual Studio 2012 or use Visual Studio 2010 with the standalone Windows 8.0 SDK.

http://directxtk.codeplex.com/

We are also getting some samples using DirectXTK available. A topic page on the CodePlex site will be kept updated with a list of the latest known samples.

SimpleSample - A Win32 desktop sample that demonstrates DirectXTK.
SimpleSample - A Windows store app sample that demonstrates DirectXTK
SimpleSample - A Windows phone 8 sample that demonstrates DirectXTK
The Windows phone 8 version of MarbleMaze also makes use of DirectXTK

↧

DirectX 11.1 and Windows 7 Update

February 26, 2013, 11:11 am

≫ Next: Known Issues: DirectXMath 3.03

≪ Previous: DirectXTK Update

As of today, IE 10 for Windows 7 has been officially released. IE10 for Windows 7 includes portions of the DirectX 11.1 runtime for Windows 7 Service Pack 1 and Windows Server 2008 R2 Service Pack 1 via KB 2670838.

Full technical details of what's included in KB 2670838 are covered on MSDN. The primary difference between the prerelease and the final version is that WARP supports Feature Level 11.0 with the updated runtime.

See DirectX 11.1 and Windows 7 for some additional notes about KB 2670838 as it impacts PIX for Windows, the debug runtime, and VS 2012 Graphics Diagnostics. The key issue is that the legacy DirectX SDK (June 2010) release version of the Debug Runtime is not compatible with KB 2670838. You can resolve this by installing the Windows 8.0 SDK standalone, VS 2012 which includes the Windows 8.0 SDK, or the VS 2012 Remote Debugging Tools (x86 or x64).

Note: If you have the prerelease of either IE10 or KB 2670838 installed, you should update your system. Windows Update will be offering an update soon, but you can manually install it as well.

XINPUT and XAUDIO2: KB 2670838 does not include XINPUT 1.4 or XAudio 2.8 on Windows 7. These remain Windows 8 exclusive. See XINPUT and Windows 8 and XAudio2 and Windows 8 for guidance on handling this difference in Win32 desktop applications.

WIC: KB 2670838 includes WIC2 for Windows 7. See Windows Imaging Component and Windows 8 for details.

DirectX 11 vs. 11.1: For Windows 7 and Windows Vista, you can continue to use the same DirectX 11.0 APIs as always even with this update installed. The only thing you have to do is to install the updated SDK Debug Layers to restore D3D11_CREATE_DEVICE_DEBUG functionality. If you want to take advantage of some of the new DirectX 11.1 APIs now available on Windows 7 as well, you need to use the Windows 8.0 SDK with VS 2010 or VS 2012 rather than continuing to use the legacy DirectX SDK. See Where is the DirectX SDK? and DirectX SDKs of a certain age for details.

VS 2012: There is improved support for using VS 2012 Graphics Diagnostics on Windows 7 with KB 2670838 installed in the VS 2012 Update 2.

↧

Known Issues: DirectXMath 3.03

March 5, 2013, 4:36 pm

≫ Next: Visual Studio 2012 Update 2

≪ Previous: DirectX 11.1 and Windows 7 Update

The Windows 8.0 SDK includes DirectXMath version 3.03 for use with Windows Style apps and Win32 desktop applications on Windows 8, Windows RT, Windows 7, and Windows Vista. DirectXMath 3.03 is also part of the Windows phone 8.0 SDK for use on Windows phone 8. There are a number of minor bugs in the library that have been reported by customers since it was released, which will be addressed in future SDK releases. In the meantime, since the code is all inline in the headers, you can make the fix directly to a local copy as needed or work around the issue in your own code.

XMVector3Cross

The ARM-NEON implementation leaves the .w component undefined instead of setting it to zero as the other versions do. The fix is to change DirectXMathVector.inl line 7678.

// Original code
 return veorq_u32( vResult, g_XMFlipY );

 // Corrected code
 vResult = veorq_u32( vResult, g_XMFlipY );
 return vandq_u32( vResult, g_XMMask3 );

XMVectorFloor and XMVectorCeiling

These functions use a naïve implementation that fails when given an odd whole number (such as 105.0) which causes the answer to jump to 104.0 due to round-to-nearest (even) behavior. The solution is to replace these functions with a different implementation in DirectXMathVector.inl starting on line 2426.

inline XMVECTOR XMVectorFloor
 (
 FXMVECTOR V
 )
 {
 #if defined(_XM_NO_INTRINSICS_)

 XMVECTOR vResult = {
 floorf(V.vector4_f32[0]),
 floorf(V.vector4_f32[1]),
 floorf(V.vector4_f32[2]),
 floorf(V.vector4_f32[3])
 };
 return vResult;

 #elif defined(_XM_ARM_NEON_INTRINSICS_)
 float32x4_t vTest = vabsq_f32( V );
 vTest = vcltq_f32( vTest, g_XMNoFraction );
 // Truncate
 int32x4_t vInt = vcvtq_s32_f32( V );
 XMVECTOR vResult = vcvtq_f32_s32( vInt );
 XMVECTOR vLarger = vcgtq_f32( vResult, V );
 // 0 -> 0, 0xffffffff -> -1.0f
 vLarger = vcvtq_f32_s32( vLarger );
 vResult = vaddq_f32( vResult, vLarger );
 // All numbers less than 8388608 will use the round to int
 // All others, use the ORIGINAL value
 return vbslq_f32( vTest, vResult, V );
 #elif defined(_XM_SSE_INTRINSICS_)
 // To handle NAN, INF and numbers greater than 8388608, use masking
 __m128i vTest = _mm_and_si128(_mm_castps_si128(V),g_XMAbsMask);
 vTest = _mm_cmplt_epi32(vTest,g_XMNoFraction);
 // Truncate
 __m128i vInt = _mm_cvttps_epi32(V);
 XMVECTOR vResult = _mm_cvtepi32_ps(vInt);
 __m128 vLarger = _mm_cmpgt_ps( vResult, V );
 // 0 -> 0, 0xffffffff -> -1.0f
 vLarger = _mm_cvtepi32_ps( _mm_castps_si128( vLarger ) );
 vResult = _mm_add_ps( vResult, vLarger );
 // All numbers less than 8388608 will use the round to int
 vResult = _mm_and_ps(vResult,_mm_castsi128_ps(vTest));
 // All others, use the ORIGINAL value
 vTest = _mm_andnot_si128(vTest,_mm_castps_si128(V));
 vResult = _mm_or_ps(vResult,_mm_castsi128_ps(vTest));
 return vResult;
 #else // _XM_VMX128_INTRINSICS_
 #endif // _XM_VMX128_INTRINSICS_
 }

and DirectXMathVector.inl starting on line 2467

inline XMVECTOR XMVectorCeiling
 (
 FXMVECTOR V
 )
 {
 #if defined(_XM_NO_INTRINSICS_)
 XMVECTOR vResult = {
 ceilf(V.vector4_f32[0]),
 ceilf(V.vector4_f32[1]),
 ceilf(V.vector4_f32[2]),
 ceilf(V.vector4_f32[3])
 };
 return vResult;

 #elif defined(_XM_ARM_NEON_INTRINSICS_)
 float32x4_t vTest = vabsq_f32( V );
 vTest = vcltq_f32( vTest, g_XMNoFraction );
 // Truncate
 int32x4_t vInt = vcvtq_s32_f32( V );
 XMVECTOR vResult = vcvtq_f32_s32( vInt );
 XMVECTOR vSmaller = vcltq_f32( vResult, V );
 // 0 -> 0, 0xffffffff -> -1.0f
 vSmaller = vcvtq_f32_s32( vSmaller );
 vResult = vsubq_f32( vResult, vSmaller );
 // All numbers less than 8388608 will use the round to int
 // All others, use the ORIGINAL value
 return vbslq_f32( vTest, vResult, V );
 #elif defined(_XM_SSE_INTRINSICS_)
 // To handle NAN, INF and numbers greater than 8388608, use masking
 __m128i vTest = _mm_and_si128(_mm_castps_si128(V),g_XMAbsMask);
 vTest = _mm_cmplt_epi32(vTest,g_XMNoFraction);
 // Truncate
 __m128i vInt = _mm_cvttps_epi32(V);
 XMVECTOR vResult = _mm_cvtepi32_ps(vInt);
 __m128 vSmaller = _mm_cmplt_ps( vResult, V );
 // 0 -> 0, 0xffffffff -> -1.0f
 vSmaller = _mm_cvtepi32_ps( _mm_castps_si128( vSmaller ) );
 vResult = _mm_sub_ps( vResult, vSmaller );
 // All numbers less than 8388608 will use the round to int
 vResult = _mm_and_ps(vResult,_mm_castsi128_ps(vTest));
 // All others, use the ORIGINAL value
 vTest = _mm_andnot_si128(vTest,_mm_castps_si128(V));
 vResult = _mm_or_ps(vResult,_mm_castsi128_ps(vTest));
 return vResult;
 #else // _XM_VMX128_INTRINSICS_
 #endif // _XM_VMX128_INTRINSICS_
 }

This problem does not apply to the SSE 4.1 versions of these functions.

XMConvertHalfToFloat and XMConvertFloatToHalf

These functions convert to the Xbox 360 variant of float16 rather than the IEEE 754 standard version of float16. This means values greater than +- 65504.0 map to QNAN rather than +/- INF as would be expected. The implementation makes sense for XNAMath (aka xboxmath 2.x), but doesn't make any sense in DirectXMath since it does not support the Xbox 360 platform. The solution is to change DirectXPackedVector.inl starting on line 34.

// Original code
 uint32_t Exponent;
 if ((Value & 0x7C00) != 0) // The value is normalized
 {
 Exponent = (uint32_t)((Value >> 10) & 0x1F);
 }

 // Corrected code
 uint32_t Exponent = (Value & 0x7C00);
 if ( Exponent == 0x7C00 ) // INF/NAN
 {
 Exponent = (uint32_t)143;
 }
 else if (Exponent != 0) // The value is normalized
 {
 Exponent = (uint32_t)((Value >> 10) & 0x1F);
 }

and in DirectXPackedVector.inl starting on line 111.

// Original code
 if (IValue > 0x47FFEFFFU)
 {
 // The number is too large to be represented as a half. Saturate to infinity.
 Result = 0x7FFFU;
 }
 else

 // Corrected code
 if (IValue > 0x477FE000U)
 {
 // The number is too large to be represented as a half. Saturate to infinity.
 if (((IValue & 0x7F800000) == 0x7F800000) && ((IValue & 0x7FFFFF ) != 0))
 {
 Result = 0x7FFF; // NAN
 }
 else
 {
 Result = 0x7C00U; // INF
 }
 }
 else

This problem does not apply to the F16C / CVT16 versions of these functions

BoundingOrientedBox::Transform and BoundFrustum::Transform

The matrix form of these functions do not properly handle scaling transformations. The same change is applied in DirectXCollision.inl on line 1952 and again on line 2824

//Original code XMVECTOR Rotation = XMQuaternionRotationMatrix( M ); // Corrected code XMMATRIX nM; nM.r[0] = XMVector3Normalize( M.r[0] ); nM.r[1] = XMVector3Normalize( M.r[1] ); nM.r[2] = XMVector3Normalize( M.r[2] ); nM.r[3] = g_XMIdentityR3; XMVECTOR Rotation = XMQuaternionRotationMatrix( nM );

XMStoreFloat3PK and XMStoreFloat3SE

These functions have some minor typos in the exact bits that are used in specials-generation. This doesn't really impact the functionality in any obvious way, but it's also an easy fix. In DirectXPackedVector.inl on line 1709.

// Original code
 Result[j] = 0x7c0 | (((I>>17)|(I>11)|(I>>6)|(I))&0x3f);

 // Corrected code
 Result[j] = 0x7c0 | (((I>>17)|(I>>11)|(I>>6)|(I))&0x3f);

and DirectXPackedVector.inl line 1756.

// Original code
 Result[2] = 0x3e0 | (((I>>18)|(I>13)|(I>>3)|(I))&0x1f);

 // Corrected code
 Result[2] = 0x3e0 | (((I>>18)|(I>>13)|(I>>3)|(I))&0x1f);

and DirectXPackedVector.inl line 1826.

// Original code
 Frac[j] = ((I>>14)|(I>5)|(I))&0x1ff;

 // Corrected code
 Frac[j] = ((I>>14)|(I>>5)|(I))&0x1ff;

Note: Attached are the relevant files with these fixes applied. It requires you use the rest of the library in the Windows 8.0 SDK or the Windows phone 8.0 SDK, and the code is subject to the respective SDK's license agreement. Was refreshed on March 7, 2013

↧

Visual Studio 2012 Update 2

April 8, 2013, 11:32 am

≫ Next: Game Developer Conference 2013

≪ Previous: Known Issues: DirectXMath 3.03

The second update to Visual Studio 2012 is now available for download. For full details, see the following blog posts: Visual Studio team blog, Somasegar's blog, and Visual Studio ALM + Team Foundation Server blog.

See KB 2797912

Compiler and CRT

VS 2012 Update 2 includes a new version of the compiler (17.00.60315.1).

The C/C++ Runtime was not updated, so you should continue to use the VS 2012 Update 1 version (11.0.51106.1).

Graphics Diagnostics

VS 2012 Update 2 includes stability and performance improvements for the Graphics Diagnostics feature. This includes improved support for KB 2680838.

Related: Visual Studio 2012 Update 1

Windows XP developers: There have been some reported issues with using the "v110_xp" Platform Target with Update 2.

↧

Game Developer Conference 2013

June 6, 2013, 11:34 am

≫ Next: DirectX SDKs of a certain age

≪ Previous: Visual Studio 2012 Update 2

The Microsoft presentations at GDC 2013 are freely available from the GDC Vault.

Windows
Conquering the Galaxy with Phones and Tablets: Galactic Reign Postmortem
Secrets of Success for Publishing Games in the Windows Store
Optimizing for Power Efficient GPUs in DirectX/C++ Windows Store Games
Core Technologies for Windows 8 Games
Developing a Windows Store Game with DirectX and C++
Designing Games for Windows 8 Tablets and PCs

Windows phone
Intro to DirectX C++ Game Development on Windows Phone 8
Building DirectX Games for Windows and Windows Phone
Building Connected Game Experiences on Windows Phone
Using In-Application Purchase for Windows Phone Games
Middleware Offerings for Windows and Windows Phone
From Our Partners: What Makes a Great Windows Phone Game

Xbox
Developing a Second Screen Experience with Xbox SmartGlass

Note: There were a number of DirectX 11 related presentations from AMD, NVIDIA, and Intel as well.

↧

DirectX SDKs of a certain age

August 21, 2012, 4:56 pm

≫ Next: DirectXMath: SSE, SSE2, and ARM-NEON

≪ Previous: Game Developer Conference 2013

Recently many older releases of the DirectX SDK and REDIST packages expired and were removed from the Microsoft Downloads Center site. The DirectX SDK and REDIST packages for all 2008, 2009, and 2010 releases are currently available, but all 2007 and prior releases are no longer hosted by Microsoft.

Here is a quick summary of DirectX technologies and recommended solutions. Be sure to read Where is the DirectX SDK? and Where is the DirectX SDK (2013 Edition)? as well.

Technology	Resolution
Direct3D 11.x	The Windows SDK 7.0 includes Direct3D 11.0, Direct2D, DirectWrite, WARP, and DXGI 1.1. The Windows SDK 8.0 also includes Direct3D 11.1, updated Direct2D/DirectWrite, updated WARP, and DXGI 1.2. The Windows SDK 8.1 includes Direct3D 11.2, Direct2D 1.2, and DXGI 1.3.
Direct3D 10.x	The Windows SDK 6.0 or later includes Direct3D 10.x and DXGI 1.0. Windows SDK 7.0 or later includes updates for expanded Direct3D 10.1 feature levels (aka "10level9")
Direct3D 9	Windows SDK 6.0 or later includes Direct3D 9 and Direct3D9Ex.
Direct3D 8 and prior	The August 2007 DirectX SDK was the last version to include Direct3D 8 (`d3d8.h d3d8caps.h d3d8types.h`) and Direct3D 7 and prior (`d3d.h d3dcaps.h d3dvec.inl d3dtypes.h`). Direct3D 9 or later should be used for all applications, and we'd recommend using Direct3D 11.
`D3DX`	The June 2010 DirectX SDK contains the last release of D3DX9, D3DX10, and D3DX11. DirectXMath, DirectXTex, and D3DCompile replace the majority of the functionality in these utility libraries. DirectXTK provides further alternatives for Direct3D 11 applications. For Direct3D 9 applications, the DDSWithoutD3DX sample provides a way to create textures from `.DDS` files. The Effects 11 library is available as shared-source online.
`DXERR9.LIB`	The August 2007 DirectX SDK was the last version to include the `dxerr9.lib`. It has been replaced by `dxerr.lib` in June 2010 DirectX SDK which supports all the same error codes plus some new ones. Changing references to `dxerr.lib` or making a copy as `dxerr9.lib` should resolve link issues for this library. Note for Windows 8.0 SDK users, see this post for a replacement solution.
XACT	The June 2010 DirectX SDK contains the last release of the Xbox Audio Cross Platform Tool (XACT) for Windows. For games, we recommend using XAudio2 or a 3rd party middleware solution instead.
DirectDraw	While February 2010 DirectX SDK was the last to contain `ddraw.h` and `ddraw.lib`, `ddraw.h` is still available in the Windows SDK 6.0 or later. `ddraw.lib` isn't needed. See Wither DirectDraw for details.
DirectInput	The August 2007 DirectX SDK was the last version to include DirectInput7 and prior (`dinput.lib`). DirectInput8 (`dinput.h dinput8.lib`) is available in the Windows SDK 7.0 or later and is supported for both x86 and x64 native Win32 desktop applications. For gamepads, we'd recommend supporting XINPUT. XINPUT 9.1.0 headers and libraries are available in the Windows SDK 6.0 or later. See this post for additional information. For mouse and keyboard input, you should use standard Windows messages rather than DirectInput as well.
DirectSound	DirectSound8 (`dsconf.h dsound.h dsound.lib`) is available in the Windows SDK 7.0 or later and is supported for both x86 and x64 native Win32 desktop applications. For games, we recommend using XAudio2 instead. Audio engines with their own mixing engine and source rate conversion (SRC) support should use WASAPI on Windows Vista or later.
DirectMusic	The August 2007 DirectX SDK was the last version to include DirectMusic, and the DirectMusic Producer tool download is no longer hosted by Microsoft. Use of DirectMusic for games is not recommended. "Core" DirectMusic headers (`dls1.h dls2.h dmdls.h dmerror.h dmksctrl.h dmusbuff.h dmusicc.h dmusics.h`) for use by professional audio developers are available in the Windows SDK 7.1 or later, and supported for both x86 and x64 native Win32 desktop applications by Windows 7 and Windows 8.
DirectShow	The February 2005 DirectX SDK was the last to include DirectShow headers, but these are available in the Windows SDK. Media Foundation available on Windows Vista and later versions of Windows is recommended over DirectShow for video playback. Be sure to read this post for some additional considerations.
DirectPlay	The August 2007 DirectX SDK was the last version to include DirectPlay (`dpaddr.h dplay.h dplobby.h dplay8.h dplobby8.h dpnathlp.h dplayx.lib`). The DirectPlay NAT helper is not supported on Windows Vista or newer versions of Windows. For games we recommend using TCP/IP via the WinSock API for network communication. To replace the 'lobby' functionality, you can utilize any number of the many game services available today from Microsoft and other vendors.
DirectAnimation	The August 2007 DirectX SDK was the last version to include `dxtrans.h` and `dxtrans.lib`. This technology was used at one point by Internet Explorer, but this is no longer in use.
Managed DirectX 1.1	The August 2006 DirectX SDK was the last version to include the samples and documentation for Managed DirectX 1.1. See DirectX and .NET for more information.
Direct3D Retained Mode	The August 2007 DirectX SDK was the last version to include `d3drm.h d3drmdef.h d3drmobj.h d3drmwin.h`. This component is not supported on Windows Vista or newer versions of Windows (see KB 969150).
DirectPlay Voice	The August 2007 DirectX SDK was the last version to include `dvoice.h`. This component is not supported on Windows Vista or newer versions of Windows (see KB 970978).
DirectX 7/8 Visual Basic 6.0	The August 2007 DirectX SDK was the last version to include `dx7todx8.h`. This component is not supported on Windows Vista or newer versions of Windows (see KB 971028).

DirectSetup: For the REDIST package, the current web redist and standalone package will install all older and current verisons of the various optional side-by-side DLLs on Windows XP Service Pack 2 or later. See Not So Direct Setup for more details and notes about older releases.

Dark GDK: For users of the Dark GDK that was promoted for Visual Studio 2008 Express, the retirement of the DirectX SDK (August 2007) poses some challenges. There are some community work-arounds for disabling the use of DirectPlay and resolving the link problems with DXERR9.LIB (such as this post), and these are preferrable to continuing to use a copy of the DirectX SDK (August 2007).

DirectX SDK (August 2007): If you obtain a copy of this package from an unofficial mirror, be very careful as installing executables that require administrator privledges from untrusted websites carries a potential risk of adding your machine to a botnet and getting infected by other malware. Check that the EXE is signed with a valid Microsoft Digital Signature before running it. These unofficial mirrors are not supported or sponsered by Microsoft.

VS 2010: Visual Studio 2010 comes with the Windows 7.0 SDK included. You can make use of the Windows 7.1 SDK with VS 2010 by using a Platform Toolset setting. You can use the Windows 8.0 or 8.1 SDK by creating a property sheet for your project.

VS 2012: Visual Studio 2012 comes with the Windows 8.0 SDK included. Mixing the Windows 8.0 SDK with the legacy DirectX SDK requires some specific build settings.

VS 2012 Update 1: Support for Windows XP the "v110_xp" Platform Toolset makes use of a Windows SDK 7.1A which is basically the same as 7.1.

VS 2013: Visual Studio 2013 comes with the Windows 8.1 SDK included.

↧

DirectXMath: SSE, SSE2, and ARM-NEON

September 11, 2012, 11:43 am

≫ Next: DirectXMath: SSE3 and SSSE3

≪ Previous: DirectX SDKs of a certain age

The DirectXMath library provides high-performance linear algebra math support for the typical kinds of operations found in a 3D graphics application. The library achieves this by making use of specialized SIMD (Single-Instruction-Multiple-Data) instruction sets to work on 4 single-precision float values at a time. The design of the library is itself heavily influenced by these instructions to provide data in a way most friendly to efficient computation.

The original xboxmath library shipped only for the Xbox 360 and the API was designed to expose the majority of the VMX128 instruction set (a custom extension of the PowerPC AltiVec SIMD instruction set). These instructions were focused on the kind of SIMD most useful for games: 4-way single-precision float vectors (similar to HLSL’s float4) with a few limited integer-based instructions plus some specialized instructions for coping with common Direct3D packed formats.

The XNAMath library (aka xboxmath version 2) kept the same API but added optimized support for Windows as well as Xbox 360. The SSE instruction set (intrinsics are in the xmmintrin.h header) provides the basic functionality, namely instructions for working with float4-style vectors (__m128). The SSE2 instruction set (intrinsics are in the emmintrin.h header) provides support for integer operations on int4/uint4 vectors (__m128i). While SSE2 also provides support for other kinds of vectors (double2, byte16, short8, etc.) as well, these were not needed by the XNAMath/xboxmath API.

Historically, making use of SIMD instructions has been complicated by the fact that on a CPU without support for them, these instructions crash the application generating an invalid instruction hardware exception. For the Xbox 360, this was not a concern because as a fixed platform every system had the same VMX128 instruction set. D3DXMath and other math libraries on Windows have made use of dynamic codepaths, typically through function pointers, and included numerous implementations that use a different mix of instructions. One of these used traditional scalar floating point math so that it could be used as the fall back on a system without support for the SIMD instructions.

In theory this is a very sensible solution, but it has a number of problems. The first is that calling conventions (also called ABI – Application Binary Interface) were not designed for efficient passing of SIMD data so the use of a function pointer already introduced some overhead copying SIMD registers to and from the stack on either side of the call. The indirect jump itself caused some additional overhead. This cumulative cost implies that the actual computation itself must be expensive enough to make this worthwhile. Doing a simple dot-product or cross-product was typically not enough work to cover this overhead cost, while more complex operations such as a matrix-matrix multiply was a net win but still lost some efficiency. This also made it less efficient to compose these functions together in an application. Xboxmath avoided all these issues by making all the functions inline and having no dynamic code paths at all. For XNAMath, the ‘baseline’ instruction set had to have similar “universal” support to keep the ‘all inline’ model.

SSE and SSE2 fit these requirements rather well. All x64 capable CPUs must support both SSE and SSE2 because the x64 native code standard requires they be used for all float and double computations (the 32-bit x87 instruction set was deprecated for x64 native). In fact, this support is so ubiquitous that the Windows 8 operating system itself requires SSE and SSE2 instruction support even for 32-bit (x86) versions and won’t install on a system without them.

DirectXMath (aka XNAMath version 3) started from XNAMath’s support and was evolved to be more efficient on Windows by removing the requirement of supporting Xbox 360, as well as adding support for the Windows RT (Window on ARM) platform. All Windows RT systems are required to support ARM-NEON which makes it an excellent baseline instruction set for the ARM platform.

As a side note, there are some older instruction sets available on many CPUs including Intel MMX™ and AMD 3DNow!®. These are not used by DirectXMath because they have been deprecated for x64 native code. These instruction sets actually alias the standard x87 floating-point registers, which themselves have been deprecated. This scheme introduced some state management and some interference with standard float/double operations, and was more challenging to optimize than the XMM register file used by SSE. By avoiding any use of intrinsics that operate on the __m64 type, the code in DirectXMath compiles for both x86 and x64 native.

Processor Support

Intel Pentium 4, AMD Athlon 64, all x64-capable CPUs, and later processors support both SSE and SSE2.

Intel Pentium 3, AMD K7, and some lesser known older x86 clones (VIA C3, Transmeta Crusoe) only support SSE.

For x64 native applications, you can assume that SSE and SSE2 instruction sets are always supported. For Windows RT (Windows on ARM) applications, you can assume that ARM-NEON is always supported. For x86 Windows Store apps or Win32 desktop apps that require Windows 8 (i.e. built with _WIN32_WINNT=0x0602), you can assume SSE and SSE2 is always supported.

DirectXMath provides the XMVerifyCPUSupport method to validate the baseline instruction set support and can be called at application startup as a safety measure. On modern CPUs this will always succeed, but this can be a useful to ensure unsupported legacy CPUs are detected immediately.

You can also use the IsProcessorFeaturePresent Win32 API with PF_XMMI_INSTRUCTIONS_AVAILABLE to detect SSE support, PF_XMMI64_INSTRUCTIONS_AVAILABLE to detect SSE2 support, or PF_ARM_NEON_INSTRUCTIONS_AVAILABLE to detect ARM-NEON support.

For x86/x64 apps, you can use the following code as well:

int CPUInfo[4] = {-1};
__cpuid( CPUInfo, 0 );
bool bSSE = false;
bool bSSE2 = false;
if ( CPUInfo[0] > 0 )
{
 __cpuid(CPUInfo, 1 );
 bSSE = (CPUInfo[3] & 0x2000000) != 0;
 bSSE2 = (CPUInfo[3] & 0x4000000) != 0;
}

Additional Topics

In this series of posts, I explore how applications using DirectXMath can take advantage of instruction sets beyond the SSE/SSE2 baseline. Using these advanced instructions require that the application make use of dynamic codepaths, multiple versions of the EXE build for different instruction sets, or simply mandates the system have a higher minimum requirement to run. By leaving this choice to the application, the right trade-offs can be made when looking at using these additional instructions.

↧

DirectXMath: SSE3 and SSSE3

September 11, 2012, 12:01 pm

≫ Next: DirectXMath: SSE4.1 and SSE4.2

≪ Previous: DirectXMath: SSE, SSE2, and ARM-NEON

The SSE3 instruction set adds about a dozen instructions (intrinsics are in the pimmintrin.h header). The main operation these instructions provide is the ability to do “horizontal” adds and subtracts (ARM-NEON refers to these as ‘pairwise’ operations) for float4 and double2 data.

 Result = _mm_hadd_ps(V1,V2);
->
 Result[0] = V1[0] + V1[1];
 Result[1] = V1[2] + V1[3];
 Result[2] = V2[0] + V2[1];
 Result[3] = V2[2] + V2[3];

There are variants that use different signs for the two values, but otherwise they are basically the same.

The majority of the DirectXMath library is designed around avoiding the needing for these operations, but they are useful for dot-product operations (VMX128 on the Xbox 360 had a specific instruction for doing dot-products across a vector, but not a general pairwise add).

The existing SSE/SSE2 dot-product for float4:

 inline XMVECTOR XMVector4Dot(FXMVECTOR V1, FXMVECTOR V2)
 {
 XMVECTOR vTemp2 = V2;
 XMVECTOR vTemp = _mm_mul_ps(V1,vTemp2);
 vTemp2 = _mm_shuffle_ps(vTemp2,vTemp,_MM_SHUFFLE(1,0,0,0));
 vTemp2 = _mm_add_ps(vTemp2,vTemp);
 vTemp = _mm_shuffle_ps(vTemp,vTemp2,_MM_SHUFFLE(0,3,0,0));
 vTemp = _mm_add_ps(vTemp,vTemp2);
 return XM_PERMUTE_PS(vTemp,_MM_SHUFFLE(2,2,2,2));
 }

can be rewritten using SSE3 as:

 inline XMVECTOR XMVector4Dot(FXMVECTOR V1, FXMVECTOR V2)
 {
 XMVECTOR vTemp = _mm_mul_ps(V1,V2);
 vTemp = _mm_hadd_ps( vTemp, vTemp );
 return _mm_hadd_ps( vTemp, vTemp );
 }

This version has the same number of multiply/add operations, but there are three fewer shuffles required. As we’ll see in a future installment, there are actually some better options than this in SSE 4.1.

There are also two new instructions which can be used as a special-case substitute for the XMVectorSwizzle<> template. We’ll make use of these in a future installment.

`XMVectorSwizzle<0,0,2,2>(V)`	`_mm_moveldup_ps(V)`
`XMVectorSwizzle<1,1,3,3>(V)`	`_mm_movehdup_ps(V)`

The Supplemental SSE3 (SSSE3) instruction set adds the equivalent “horizontal” adds and subtracts for various integer vectors, so they are not particularly useful for DirectXMath. These intrinsics are located in the tmmintrin.h header. There are also some other useful integer operations that make life simpler for implementing algorithms like Fast Block Compress, codecs, or other image processing on integer data which are a bit out of scope for DirectXMath.

There is one SSSE3 intrinsic of interest for DirectXMath: _mm_shuffle_epi8. The purpose of this instruction is to be able to rearrange the bytes in a vector, which makes it an excellent function for doing vector-based Big-Endian/Little-Endian swaps without having to ‘spill’ the vector to memory and reload it.

inline XMVECTOR XMVectorEndian( FXMVECTOR V )
 {
 static const XMVECTORU32 idx = { 0x00010203, 0x04050607, 0x08090A0B, 0x0C0D0E0F };
 __m128i Result = _mm_shuffle_epi8( _mm_castps_si128(V), idx );
 return _mm_castsi128_ps( Result );
 }

There’s not enough use for this kind of operation to make this function part of the library (Windows x86, Windows x64, and Windows RT are all Little-Endian platforms), but it can be useful for some cross-platform tools processing (Xbox 360 is Big-Endian).

Processor Support

SSE3 is supported by Intel Pentium 4 processors (“Prescott”), AMD Athlon 64 (“revision E”), AMD Phenom, and later processors. This means most, but not quite all, x64 capable CPUs should support SSE3.

Supplemental SSE3 (SSSE3) is supported by Intel Core 2 Duo, Intel Core i7/i5/i3, Intel Atom, AMD Bulldozer, and later processors.

 int CPUInfo[4] = {-1};
 __cpuid( CPUInfo, 0 );
 bool bSSE3 = false;
 bool bSSSE3 = false;
 if ( CPUInfo[0] > 0 )
 {
 __cpuid(CPUInfo, 1 );
 bSSE3 = (CPUInfo[2] & 0x1) != 0;
 bSSSE3 = (CPUInfo[2] & 0x200) != 0;
 }

You can also use the IsProcessorFeaturePresent Win32 API with PF_SSE3_INSTRUCTIONS_AVAILABLE on Windows Vista or later to detect SSE3 support. This API does not report support for SSSE3.

Utility Code

The source code attached to this blog post is bound to the Microsoft Public License (MS-PL).

↧

DirectXMath: SSE4.1 and SSE4.2

September 11, 2012, 12:18 pm

≫ Next: DirectXMath: AVX

≪ Previous: DirectXMath: SSE3 and SSSE3

The SSE4 instruction set consists of two parts, referred as SSE4.1 and 4.2. The intrinsics are located in the smmintrin.h header. The SSE4.1 instruction set is the most interesting for DirectXMath, while SSE 4.2 adds some more specialized instructions for CRC checks and string handling. The key new features are a flexible dot-product instruction, float4 vector rounding, a 2-vector ‘mux’ blend, and some specialized extract/insert operations.

A number of DirectXMath functions can be replaced with a single intrinsic when using SSE4.1.

`XMVector2Dot(V1,V2)`	`_mm_dp_ps( V1, V2, 0x3f )`
`XMVector3Dot(V1,V2)`	`_mm_dp_ps( V1, V2, 0x7f )`
`XMVector4Dot(V1,V2)`	`_mm_dp_ps( V1, V2, 0xff )`
`XMVectorRound(V)`	`_mm_round_ps( V, _MM_FROUND_TO_NEAREST_INT \| _MM_FROUND_NO_EXC )`
`XMVectorTruncate(V)`	`_mm_round_ps( V, _MM_FROUND_TO_ZERO \| _MM_FROUND_NO_EXC )`
`XMVectorFloor(V)`	`_mm_floor_ps( V )`
`XMVectorCeiling(V)`	`_mm_ceil_ps( V )`

The bit insert/extract instructions provide some specific optimization cases for vector accessors and setters. Here are the “Y” element versions, which can be extrapolated to the “Z” and “W” element versions very easily. Note that the standard scalar SSE/SSE2 mov already provides efficient support for the “X” element.

 inline void XMVectorGetYPtr(float *y, FXMVECTOR V)
 {
 *((int*)y) = _mm_extract_ps( V, 1 );
 }

 inline uint32_t XMVectorGetIntY(FXMVECTOR V)
 {
 __m128i V1 = _mm_castps_si128( V );
 return static_cast<uint32_t>( _mm_extract_epi32( V1, 1 ) );
 }

 inline void XMVectorGetIntYPtr(uint32_t *y, FXMVECTOR V)
 {
 __m128i V1 = _mm_castps_si128( V );
 *y = static_cast<uint32_t>( _mm_extract_epi32( V1, 1 ) );
 }

 inline XMVECTOR XMVectorSetY(FXMVECTOR V, float y)
 {
 XMVECTOR vResult = _mm_set_ss(y);
 vResult = _mm_insert_ps( V, vResult, 0x10 );
 return vResult;
 }

 inline XMVECTOR XMVectorSetIntY(FXMVECTOR V, uint32_t y)
 {
 __m128i vResult = _mm_castps_si128( V );
 vResult =
 _mm_insert_epi32( vResult, static_cast<int>(y), 1 );
 return _mm_castsi128_ps( vResult );
 }

The _mm_blend_ps instruction can be used as special-cases for the XMVectorPermute<> template. We’ll make more use of these in a future installment.

`XMVectorPermute<4,1,2,3>(V1,V2)`	`_mm_blend_ps(V1,V2,0x1)`
`XMVectorPermute<0,5,2,3>(V1,V2)`	`_mm_blend_ps(V1,V2,0x2)`
`XMVectorPermute<4,5,2,3>(V1,V2)`	`_mm_blend_ps(V1,V2,0x3)`
`XMVectorPermute<0,1,6,3>(V1,V2)`	`_mm_blend_ps(V1,V2,0x4)`
`XMVectorPermute<4,1,6,3>(V1,V2)`	`_mm_blend_ps(V1,V2,0x5)`
`XMVectorPermute<0,5,6,3>(V1,V2)`	`_mm_blend_ps(V1,V2,0x6)`
`XMVectorPermute<4,5,6,3>(V1,V2)`	`_mm_blend_ps(V1,V2,0x7)`
`XMVectorPermute<0,1,2,7>(V1,V2)`	`_mm_blend_ps(V1,V2,0x8)`
`XMVectorPermute<4,1,2,7>(V1,V2)`	`_mm_blend_ps(V1,V2,0x9)`
`XMVectorPermute<0,5,2,7>(V1,V2)`	`_mm_blend_ps(V1,V2,0xA)`
`XMVectorPermute<4,5,2,7>(V1,V2)`	`_mm_blend_ps(V1,V2,0xB)`
`XMVectorPermute<0,1,6,7>(V1,V2)`	`_mm_blend_ps(V1,V2,0xC)`
`XMVectorPermute<4,1,6,7>(V1,V2)`	`_mm_blend_ps(V1,V2,0xD)`
`XMVectorPermute<0,5,6,7>(V1,V2)`	`_mm_blend_ps(V1,V2,0xE)`

Processor Support

SSE4.1 is supported on Intel Core 2 (“Penryn”), Intel Core i7 (“Nehalem”), AMD Bulldozer, and later processors.

SSE 4.1 and SSE4.2 are supported on Intel Core i7 (“Nehalem”), AMD Bulldozer, and later processors.

int CPUInfo[4] = {-1};
 __cpuid( CPUInfo, 0 );
 bool bSSE4_1 = false;
 bool bSSE4_2 = false;
 if ( CPUInfo[0] > 0 )
 {
 __cpuid(CPUInfo, 1 );
 bSSE4_1 = (CPUInfo[2] & 0x80000) != 0;
 bSSE4_2 = (CPUInfo[2] & 0x100000) != 0;
 }

Compiler Support

Support for SSE4.1 and SSE4.2 intrinsics was added to Visual Studio 2010.

Utility Code

The source code attached to this blog post is bound to the Microsoft Public License (MS-PL).

↧

DirectXMath: AVX

September 11, 2012, 12:36 pm

≫ Next: DirectXMath: F16C and FMA

≪ Previous: DirectXMath: SSE4.1 and SSE4.2

The Advanced Vector Extensions (AVX) instruction set goes beyond just adding more instructions like we’ve seen in previous installments. AVX also introduces an extended register file and a new x86 instruction encoding prefix.

The AVX instruction set expands the existing XMM register file of 128-bit registers used by SSE instructions. XMM now refers to the lower 128-bits of the expanded YMM register file of 256-bit registers (analogous to the AX 16-bit vs. EAX 32-bit x86 registers). A full 256-bit YMM register can contain a float8 (__m256), a double4 (__m256d), or a mixture of various integer types including int/uint8 (__m256i). A new set of intrinsics beginning with the prefix _mm256 (such as _mm256_mul_ps) operate on these YMM registers, while existing _mm intrinsics (such as _mm_mul_ps) operate on the lower 128-bits of the same registers. The AVX intrinsics and types are in the immintrin.h header.

Because of the extended register file, the OS must be updated to save the full 256-bit registers rather than just the lower 128-bits when doing context switches. If the OS does not implement this, none of the AVX instructions (even those that operate on __m128 values) are valid and will generate an invalid instruction hardware exception if executed. This “OXSAVE” feature is implemented in Windows 7 Service Pack 1 / Windows Server 2008 R2 Service Pack 1 and later versions of Windows.

Another aspect of AVX is the use of a new “VEX” instruction prefix. This can be applied to existing SSE instructions as well as the new AVX instructions, with the mnemonic adding a ‘v’ letter prefix. The key change is that the VEX prefix encodes “non-destructive destination” instructions. The x86 instruction set uses a “destructive destination” model where one of the inputs is also the destination.

mulps xmm0, xmm1
; xmm0 = xmm0 * xmm1

Becomes with the VEX prefix

vmulps xmm2, xmm0, xmm1
; xmm2 = xmm0 * xmm1

This new VEX prefix encoding can help reduce register scheduling pressure by eliminating the need for making copies of registers before operating on them to preserve their current contents.

For DirectXMath, making direct use of YMM registers would require additional types to be introduced to the library, or perhaps can be leveraged in some specific ‘stream’ scenarios. As such, the immediate applicability of the AVX instruction for DirectXMath is using intrinsics that operate on __m128 data (i.e. XMVECTOR).

There are two simple substitutions that AVX opens up. The first is a replacement for XMVectorReplicatePtr. The DirectXMath library generally avoids this operation elsewhere, so this is the only place this intrinsic can be applied.

 inline XMVECTOR XMVectorReplicatePtr( const float *pValue )
 {
 return _mm_broadcast_ss( pValue );
 }

The second is an alternative intrinsic for doing a ‘shuffle’ of a single vector.

 _mm_shuffle_ps( V, V, imm )
 ->
 _mm_permute_ps( V, imm )

This pattern is used in a lot of places in the library, but the most common use is in XMVectorSplat*, XMVectorPermute<>, and XMVectorSwizzle<>.

There is also a new _mm_permutevar_ps intrinsic for generalized permutes using a control vector rather than immediate literals. For DirectXMath, the template forms of XMVectorPermute<> and XMVectorSwizzle<> are already much more efficient than the function form as they can be compiled down to one or two shuffle operations (_mm_shuffle_ps requires the control indices be literal values), but there are times when the function form is more convenient to use. For SSE/SSE2, these functions have to ‘spill’ the vector to memory, rearrange them with scalar memory swaps, and then reload the vector. For AVX, we can now implement these function forms more efficiently (or at least avoid the need to spill to memory).

 inline XMVECTOR XMVectorSwizzle( FXMVECTOR V,
 uint32_t E0, uint32_t E1, uint32_t E2, uint32_t E3 )
 {
 unsigned int elem[4] = { E0, E1, E2, E3 };
 __m128i vControl =
 _mm_loadu_si128(
 reinterpret_cast<const __m128i *>(&elem[0]) );
 return _mm_permutevar_ps( V, vControl );
 }


 inline XMVECTOR XMVectorPermute( FXMVECTOR V1, FXMVECTOR V2,
 uint32_t PermuteX, uint32_t PermuteY,
 uint32_t PermuteZ, uint32_t PermuteW )
 {
 static const XMVECTORU32 three = { 3, 3, 3, 3 };
 _declspec(align(16)) unsigned int elem[4] =
 { PermuteX, PermuteY, PermuteZ, PermuteW };
 __m128i vControl = _mm_load_si128(
 reinterpret_cast<const __m128i *>(&elem[0]) );
 __m128i vSelect = _mm_cmpgt_epi32( vControl, three );
 vControl = _mm_castps_si128(
 _mm_and_ps( _mm_castsi128_ps( vControl ), three ) );
 __m128 shuffled1 = _mm_permutevar_ps( V1, vControl );
 __m128 shuffled2 = _mm_permutevar_ps( V2, vControl );
 __m128 masked1 = _mm_andnot_ps( _mm_castsi128_ps( vSelect ),
 shuffled1 );
 __m128 masked2 = _mm_and_ps( _mm_castsi128_ps( vSelect ),
 shuffled2 );
 return _mm_or_ps( masked1, masked2 );
 }

Visual C++ and AVX

With the introduction of the VEX prefix, it is possible to generate all SSE/SSE2 code using it. With Visual Studio 2010 SP1/Visual Studio 2012, there is a new /arch:AVX switch which does exactly this. It causes all explicit _mm intriniscs to use the VEX prefix, as well making all compiler-generated SSE/SSE2 instructions use VEX as well. This impacts all x64 native floating-point math operations. For x86 it is similar to specifying /arch:SSE2 making the compiler prefer the use of SSE/SSE2 to x87 for floating-point math with the additional use of the VEX prefix.

Because the resulting EXE makes extensive use of AVX instructions, the resulting program can only be run on a system with an AVX capable CPU and running an “OSXSAVE” enabled version of Windows.

Processor Support

AVX is supported by Intel “Sandy Bridge”, AMD Bulldozer, and later processors.

In addition to the hardware supporting the new instruction set, the OS must support saving the new YMM register file or the AVX instructions will remain invalid. This support is included in Windows 7 Service Pack 1, Windows Server 2008 R2 Service Pack 1, Windows 8, and Windows Server 2012. This support is indicated by the OSXSAVE bit in CPUID being set along with the AVX support bit.

 int CPUInfo[4] = {-1};
 __cpuid( CPUInfo, 0 );
 bool bAVX = false;
 if ( CPUInfo[0] > 0 )
 {
 __cpuid(CPUInfo, 1 );
 bool bOSXSAVE = (CPUInfo[2] & 0x8000000) != 0;
 bAVX = bOSXSAVE && (CPUInfo[2] & 0x10000000) != 0;
 }

Compiler Support

Support for AVX intrinsics was added to Visual Studio 2010 via Service Pack 1. The /arch:AVX switch is supported by VS 2010 SP1, although IDE support wasn't added until VS 2012.

Utility Code

The source code attached to this blog post is bound to the Microsoft Public License (MS-PL).

↧

DirectXMath: F16C and FMA

September 11, 2012, 12:50 pm

≫ Next: Dual-use Coding Techniques for Games, part 1

≪ Previous: DirectXMath: AVX

In our last installment in this series, we cover a few additional instructions that extend the AVX instruction set. These instructions make use of the VEX prefix and require the OS implement “OXSAVE”. Without this support, these instructions are all invalid and will generate an invalid instruction hardware exception.

Half-precision Floating-point Conversion

The F16C instruction set (also called CVT16 by AMD) provides support for doing half-precision<-> single-precision floating-point conversions. These intrinsics are in the immintrin.h header.

 inline float XMConvertHalfToFloat(HALF Value )
 {
 __m128i V1 = _mm_cvtsi32_si128( static_cast<uint32_t>(Value) );
 __m128 V2 = _mm_cvtph_ps( V1 );
 return _mm_cvtss_f32( V2 );
 }


 inline HALF XMConvertFloatToHalf( float Value )
 {
 __m128 V1 = _mm_set_ss( Value );
 __m128i V2 = _mm_cvtps_ph( V1, 0 );
 return static_cast<HALF>( _mm_cvtsi128_si32(V2) );
 }

This instruction actually converts 4 HALF <-> float values at a time, so this can be used to improve the performance of both XMConvertHalfToFloatStream and XMConvertFloatToHalfStream.

Fused Multiply-Add

Computations often contain steps where two values are multiplied and then the result is accumulated with previous results. This can be done in single instruction using a ‘fused’ multiply-add operation:

V = V1 * V2 + V3

DirectXMath provides this functionality with the XMVectorMultiplyAdd function. The challenge in making use of FMA is that Intel and AMD took a while to agree on the exact details—thankfully ARM-NEON has a fused multiply-add instruction.

AMD Bulldozer implements FMA4. which uses a non-destructive destination form using 4 registers. These intrinsics are located in the ammintrin.h header.

inline XMVECTOR XMVectorMultiplyAdd(FXMVECTOR V1, FXMVECTOR V2, FXMVECTOR V3)
 {
 return _mm_macc_ps( V1, V2, V3 );
 }

Intel “Haswell” is expected to implement FMA3, which uses a destructive destination form using only 3 registers. The intrinsics are located in the immintrin.h header.

inline XMVECTOR XMVectorMultiplyAdd(FXMVECTOR V1, FXMVECTOR V2, FXMVECTOR V3)
 {
 return _mm_fmadd_ps( V1, V2, V3 );
 }

AMD has announced it is planning to implement FMA3 with “Piledriver”. It is also fairly easy to use the same source code to generate both versions by just substituting one intrinsic for the other.

Processor Support

F16C/CVT16 is supported by AMD “Piledriver”, Intel “Ivy Bridge”, and later processors.

FMA4 is supported by AMD Bulldozer.

FMA3 will be supported by Intel “Haswell” and AMD “Piledriver” processors.

As extensions of the AVX instruction set, these instructions all require OSXSAVE support. This support is included in Windows 7 Service Pack 1, Windows Server 2008 R2 Service Pack 1, Windows 8, and Windows Server 2012.

 int CPUInfo[4] = {-1};
 __cpuid( CPUInfo, 0 );
 bool bOSXSAVE = false;
 bool bAVX = false;
 bool bF16C = false;
 bool bFMA3 = false;
 bool bFMA4 = false;
 if ( CPUInfo[0] > 0 )
 {
 __cpuid(CPUInfo, 1 );
 bOSXSAVE = (CPUInfo[2] & 0x8000000) != 0;
 bF16C = bOSXSAVE && (CPUInfo[2] & 0x20000000) != 0;
 bAVX = bOSXSAVE && (CPUInfo[2] & 0x10000000) != 0;
 bFMA3 = bOSXSAVE && (CPUInfo[2] & 0x1000) != 0;
 }
 __cpuid( CPUInfo, 0x80000000 );
 if ( CPUInfo[0] > 0x80000000 )
 {
 _cpuid(CPUInfo, 0x80000001 );
 bFMA4 = bOSXSAVE && (CPUInfo[2] & 0x10000) != 0;
 }

Compiler Support

FMA4 intrinsics were added to Visual Studio 2010 via Service Pack 1.

FMA3 and F16C/CVT16 intrinsic support requires Visual Studio 2012.

Utility Code

The source code attached to this blog post is bound to the Microsoft Public License (MS-PL).

↧

Dual-use Coding Techniques for Games, part 1

September 17, 2012, 3:10 pm

≫ Next: Dual-use Coding Techniques for Games, part 2

≪ Previous: DirectXMath: F16C and FMA

Writing shared code for Windows Store and Win32 desktop apps

Introduction

Apps written for the Windows Store make use of the Windows Runtime (WinRT) and a restricted subset of Win32 APIs located in the core API family (indicated by WINAPI_FAMILY set to WINAPI_PARTITION_APP). Traditional Win32 desktop apps have access to a larger desktop API family (indicated by WINAPI_FAMILY set to WINAPI_PARTITION_DESKTOP), but this is subject to various levels of OS support required for each function. These two taken together can make it challenging to write shared code libraries and helper functions that can successfully compile for both Windows Store apps and Win32 desktop applications supporting Windows Vista, Windows 7, and Windows 8.

In general, applications should be written to target either the Windows Store or the Win32 desktop. Windows Store apps make use of a distinct UI, input, system-integration, and presentation model which is not supported for Win32 desktop applications even on the Windows 8 Desktop. Targeting the Windows RT (a.k.a. Windows on ARM) platform requires writing a Window Store app, while targeting down-level platforms such as Windows Vista and Windows 7 require writing Win32 desktop apps. Trying to address both of these with the same EXE is not possible, and each will have significant platform-specific code.

The purpose of this series of posts is to talk about the overlap, and how developers creating shared libraries and game middleware can write C++ code that will successfully compile for both platforms.

Note the majority of this article applies to Windows phone 8 using the Windows phone SDK 8.0. Windows phone 8 development makes use of a WINAPI_FAMILY of WINAPI_FAMILY_PHONE_APP.

Compiler Toolsets and SDK Selection

To author Windows Store apps, developers must use Visual Studio 2012 which includes the Windows 8.0 SDK. This same toolset can be used to target Win32 desktop apps for Windows 8 (Desktop), Windows 7, and Windows Vista. For this article, the focus is on using this compiler toolset.

Note that with careful coding, it is possible to also support Visual Studio 2010 with the Windows 8.0 SDK for building Win32 desktop apps. In some specific cases some extra functionality is needed that is otherwise handled by Visual Studio 2012’s C++11 Standard Library, and this means restricting language feature use to VS 2010’s C++0x support and avoiding the use of C++/CX language extensions.

C++11 Language Feature	VS 2010	VS 2012
`nullptr`	ü	ü
`static_assert`	ü	ü
`override / final`^*	ü	ü
Lambda expressions	ü	ü
Rvalue references	ü	ü
`decltype`	ü	ü
`auto`	ü	ü
Strongly typed enumerations		ü
Forward declared enumerations		ü
Ranged-based `for` loops		ü
Initializer lists	û	û
Variadic templates	û	û

* = In VS 2010, final was implemented as sealed

Note: A future update to Visual C++ will include support for additional C+11 features including initializer lists, variadic templates, uniform initialization, function template default arguments, delegating constructors, explicit conversion operators and raw strings.

Use of the older standalone DirectX SDK is not recommended or supported for Windows Store apps. It includes many legacy technologies that are not supported for this platform, and thus their use complicates the goal of ‘dual-use’ coding. See the blog posts “Where is the DirectX SDK?” and “DirectX SDKs of a certain age” for more information.

C++11 Standard Library

The majority of the C++11 Standard Library is supported for both Windows Store apps and Win32 desktop apps. This provides a large breadth of functionality that is common and safe to use for ‘dual-use’ scenarios.

C++11 header	VS 2010	VS 2012
`<array>, <memory>, <random>, <regex>, lt;tuple>, <type_traits>, <unordered_map>, <unordered_set>`	ü	ü
`<stdint.h>, cstdint`	ü	ü
`unique_ptr<T>`	ü	ü
`cbegin(), cend(), crbegin(), crend()`	ü	ü
`<forward_list>`	ü	ü
`<algorithm>` and `<exception>` updates	ü	ü
`<allocators>`	ü	ü
`<codecvt>`	ü	ü
`<system_error>`	ü	ü
`emplace(), emplace_front(), emplace_back(),` etc.		ü
`<chrono>`		ü
`<ratio>`		ü
`<scoped_allocator>`		ü
`<atomic>, <condition_variable>, <future>, <mutex>, <thread>`		ü
`<intializer_list>`	û	û
`<cuchar>, <cfenv>, <ctgmath>, <cstdalign>, <cstdbool>`	û	û

The majority of Visual C++ functions in the C Runtime are available for Windows Store apps, but there are some specific headers which are not fully available.

Visual C++ header	Notes
`agents.h concrt.h`	The majority of the Concurrency Runtime (ConcRT) is available. There is, however, no support for the advanced scheduler (i.e. schedule groups, contexts)
`concrtrm.h`	The Concurrent Runtime (ConcRT) resource manager is not available to Windows Store apps.
`conio.h`	No functions in this header are available
`ctype.h, cctype`	`isleadbyte` and `_isleadbyte_l` are not available
`direct.h`	Only `_mkdir, _rmdir, _wmkdir,` and `_wrmdir` are available
`io.h`	`_pipe` is not available
`locale.h, clocale`	Obsolete locale functions are not available
`malloc.h`	`_resetstkoflw` is not available.
`mbctype.h, mbstring.h`	All multi-byte (`_ismb, _mb*`) functions are not available.
`process.h`	Most process and DLL related functions are not available. `exit` and `abort` are the only functions available for Windows Store apps.
`stdio.h, cstdio`	`_pclose, _popen, _wpopen` functions are not available.
`stdlib.h, cstdlib`	POSIX/DOS-style environment variables and related functions & types are not supported for Windows Store apps. There is also no equivalent for `_seterromode, _beep,` or `_sleep`.
`tchar.h`	The `_MBCS` mode is not supported for Windows Store apps. You can only use `_UNICODE`.
`time.h, ctime`	System-time functions (`_getsystime, _setsystime`) are not available. Note you can use Win32 APIs for `GetSystemTime` and `GetLocalTime`, but not set the time in a Windows Store app.
`wchar.h, cwchar`	`codeisleadbyte, _Isleadbyte_l, _wgetcwd,` and `_getddcwd` are not supported.
`wctype.h, cwctype`	Obsolete `is_wctype` is not supported.

Machine Architectures

Windows Store apps should compile for Windows x86 (32-bit), Windows x64 (64-bit) native, and Windows RT (ARM). Win32 desktop apps should compile for Windows x86 and Windows x64 native. Most C/C++ code should work fine across all platforms if using platform-neutral types.

Use portable types. Use size_t, ptrdiff_t, and the various <stdint.h> (<cstdint>) types (i.e. int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, intptr_t, and uintptr_t).
Group pointers in structures and classes. Most data types do not change size when moving to x64 native, but pointers become 8 bytes (known as the “LLP64” data model). The default pack setting for x64 is 16 rather than 8 to ensure structures are padded to a natural alignment including pointers. Mixing pointers with other data types in structures results in more padding than would happen if the pointers were grouped together.
Prefer C++ style casts. Use of const_cast<>, static_cast<>, and reinterpret_cast<> rather than C-style casts can help highlight potential pointer-truncation issues more easily.
Use maximum warnings (/Wall). A number of warnings that tend to highlight 64-bit portability issues include C4302 and C4826 are off by default. You can disable specific warnings to reduce ‘noise’ as they are identified by #pragma warning or /wd.
Use /analyze. Static code analysis will highlight a number of issues, particularly using the incorrect printf format specifications.

Inline assembly is not supported for x64 native or ARM compilation, so it should be avoided generally. You can make use of intrinsics instead. Avoid using MMX™ intrinsics (i.e. those from the mmintrin.h header or that operate with the __m64 type) to ensure the same code works for both x86 and x64 native. For ARM, there is a full set of intrinsics available in armintr.h and arm_neon.h.

The ability to write standalone assembly for all machine architectures is not currently supported for Windows Store apps, and is therefore not recommended for ‘dual-use’ or Windows Store app scenarios.

When writing architecture-specific code, make use of the _M_IX86 (32-bit), _M_X64 (64-bit), and _M_ARM machine architecture defines for conditional compilation. All three are “Little Endian” platforms (Windows RT included).

Note: The VS 2012 toolset fully supports x86, x64, and ARM. VS 2010 has no support for ARM targets.

Exception-Safe Coding

Windows Store apps make use of C++ exception handling and are compiled with /EHsc. Many Win32 desktop applications use HRESULTs and do not enable exception handling of any kind, although some do use it. Dual-use shared code can use HRESULTs or other error codes and leave the decision to use exception handling to the client code. (See DirectXTex for an example of this approach.) Alternatively, dual-use shared code can throw either C++ standard exceptions or Windows Store app Platform exceptions through specific compiler techniques. (See DirectXTK for an example of this approach.)

Since dual-use code can be used in the context of exception handling, it is strongly recommended that you make use of ‘exception-safe’ coding practices. C++ exception handling takes advantage of the language and ensures that objects are properly destructed when leaving scope normally or when processing an exception. When using the C++11 Standard Library, those containers are already written to be ‘exception-safe’.

The main area where this impacts ‘dual-use’ shared code and C++ code in general is when allocating resources. The guidance here is to never rely on calling delete, delete [], CloseHandle, Release, etc. directly but have the destructor of a class instance handle it automatically. This technique is known as Resource Acquisition Is Initialization (RAII). This ensures that the code will behave well both in normal operation and in the cases where exception handling is used. The C++11 Standard Library provides a number of classes that make implementing this pattern fairly straight-forward.

Traditional C++	Exception-safe C++
`MyObject *obj = new MyObject;`	`std::unique_ptr<MyObject> obj(new MyObject);` -or- `std::shared_ptr<MyObject> obj( make_shared<MyObject>() );`
`BYTE* buffer = new BYTE[ 2048 ];`	`std::array<uin8_t, 2048> buffer;` -or- `std::unique_ptr<uin8_t[]> buffer( new uint8_t[2048]; )`
`float* buffer = _aligned_malloc( 2048, 16 );`	`struct aligned_deleter { void operator()(void* p) { _aligned_free(p); } }; std::unique_ptr<float, aligned_deleter> buffer( _aligned_malloc(2048,16)) ;`
`HANDLE h = CreateFile(…); if ( h == INVALID_HANDLE) // error`	`struct handle_closer { void operator()(HANDLE h) { assert(h != INVALID_HANDLE_VALUE); if (h) CloseHandle(h); } }; inline HANDLE safe_handle( HANDLE h ) { return (h==INVALID_HANDLE_VALUE) ? 0:h; } std::unique_ptr<void, handle_closer> hFile( safe_handle( CreateFile(…) ) ); if ( !hFile ) // error`
`CRITICAL_SECTION cs; InitializeCriticalSection(&cs); EnterCriticalSection(&cs); … LeaveCriticalSection(&cs);`	`std::mutex m; { std::lock_guard lock(m); /* lock on m held until end of scope */ }`
`ID3D11InputLayout* inputLayout = NULL; device->CreateInputLayout( …, &inputLayout ); SAFE_RELEASE(inputLayout);`	`#include “wrl.h” Microsoft::WRL::ComPtr<ID3D11InputLayout> inputLayout; device->CreateInputLayout(…, &inputLayout );` -or- `device->CreateInputLayout(…, inputLayout.ReleaseAndGetAddressOf() )`

Note: When building with VS 2012 or VS 2010 with the Windows 8.0 SDK for both Win32 desktop applications and Windows Store apps you can use Windows Runtime Library’s ComPtr. This is similar to ATL’s CComPtr.

Note: When passing these objects to other functions, you can pass raw pointers and use .get() on the memory control object on each call, or pass the smart pointer object. When using smart pointer objects as parameters, pass them by constant reference, similar to other STL containers, in order to avoid additional temporary copies and to avoid excessive reference count increment and decrement cycles.

The use of this ‘exception-safe’ pattern has the added benefit of ensuring you do not need to make use of explicit try / catch blocks in your code to handle resource cleanup. This contributes to keeping ‘dual-use’ code agnostic to the use of Exception Handling while still being ‘exception-safe’ when it is used.

(continued in part 2)

↧

Dual-use Coding Techniques for Games, part 2

September 17, 2012, 5:25 pm

≫ Next: Dual-use Coding Techniques for Games, part 3

≪ Previous: Dual-use Coding Techniques for Games, part 1

Writing shared code for Windows Store and Win32 desktop apps

(continued from part 1)

Win32 APIs

The majority of the “core” API family are new Windows Runtime (WinRT) style APIs which are not available for down-level Win32 desktop applications. Therefore the overlap is in Win32 APIs that are available to both kinds of applications. In many cases, the Windows Store apps ‘core’ API family contains a Win32 API that is very recent. Therefore, a key technique for writing dual-use code properly is learning to leverage the _WIN32_WINNT control define for Windows Headers.

For Windows 8 only support, _WIN32_WINNT is 0x0602 which is the default with the Windows 8.0 SDK and is the value you expect to use for Windows Store apps as well.
For Windows 7 and Windows 8 Win32 desktop support, _WIN32_WINNT should be 0x0601.
For Windows Vista, Windows 7, and Windows 8 Win32 desktop support, _WIN32_WINNT should be 0x0600.

Note:For Win32 APIs, you should prefer the use of the standard _WIN32_WINNT control define to conditional compile for Windows Store apps rather than trying to make use of WINAPI_FAMILY macros directly.

For example, for Windows Store apps CreateFile2 must be used which is a Windows 8 only API. For down-level support, you will want to use CreateFile.

 #if (_WIN32_WINNT >= 0x0602)

 ScopedHandle hFile( safe_handle(
 CreateFile2( szFile, GENERIC_READ, FILE_SHARE_READ,
 OPEN_EXISTING, nullptr ) ) );

 #else

 ScopedHandle hFile( safe_handle(
 CreateFile( szFile, GENERIC_READ, FILE_SHARE_READ, nullptr,
 OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, nullptr ) ) );

 #endif

 if ( !hFile )
 {
 return HRESULT_FROM_WIN32( GetLastError() );
 }

Note: There's one more detail in this particular case. For Windows Store apps, the typical default dwShareMode of 0 is likely to cause problems when you are trying to open a file for read-only access because ‘exclusive’ mode is only granted for files which the app has write privileges. If using a dwDesiredAccess of GENERIC_READ, you want to use dwShareMode of FILE_SHARE_READ as shown here.

Dual-use shared code should always make use of Unicode to support Windows Store apps (UNICODE and _UNICODE are defined) and is fully supported on down-level Win32 desktop apps. Universal use of Unicode and wchar_t is recommended, although TCHAR is still available if legacy ASCII/multi-byte support is required for some Win32 desktop scenario.

Here are a number of Win32 APIs that are available to Windows Store apps when using the latest version.

Older API	“Core” Win32 API	`_WIN32_WINNT` Required
`GetDiskFreeSpace`	`GetDiskFreeSpaceEx`
`GetFileAttributes`	`GetFileAttributesEx`
`FindFirstFile`	`FindFirstFileEx`
`LockFile`	`LockFileEx`
`MoveFile`	`MoveFileEx`
`SetFilePointer`	`SetFilePointerEx`
`UnlockFile`	`UnlockFileEx`
`WaitForMultipleObjects`	`WaitForMultipleObjectsEx`
`WaitForSingleObject`	`WaitForSingleObjectEx`
`CreateEvent`	`CreateEventEx`	0x0600 (Windows Vista)
`CreateMutex`	`CreateMutexEx`	0x0600 (Windows Vista)
`CreateSemaphore`	`CreateSemaphoreEx`	0x0600 (Windows Vista)
`GetFileSize GetFileSizeEx`	`GetFileInformationByHandleEx`	0x0600 (Windows Vista)
`GetTickCount timeGetTime`	`GetTickCount64`	0x0600 (Windows Vista)
`InitializeCriticalSection InitializeCriticalSectionAndSpinCount`	`InitializeCriticalSectionEx`	0x0600 (Windows Vista)
`CopyFile, CopyFileEx`	`CopyFile2`	0x0601 (Windows 7)
`CreateFile`	`CreateFile2`	0x0602 (Windows 8)
`CreateFileMapping`	`CreateFileMappingFromApp`	0x0602 (Windows 8)
`GetOverlappedResult`	`GetOverlappedResultEx`	0x0602 (Windows 8)
`MapViewOfFile, MapViewOfFileEx`	`MapViewOfFileFromApp`	0x0602 (Windows 8)

Here are some additional notes on Win32 APIs commonly used by games.

Win32 API	Notes
`CreateThread SetThreadAffinityMask Sleep(Ex) SetPriorityClass TlsAlloc`	Windows Store apps do not support POSIX-style threading APIs. These applications must use the Windows Runtime (WinRT) ThreadPool API in `Windows.System.Threading`. For dual-use coding, you can make use of the C++11 Standard Library threading support and/or the Concurrency Runtime (ConcRT) which is supported for both Windows Store apps and Win32 desktop apps. Note: There is a sample available that emulates a small subset of the Win32 threading APIs using the WinRT threadpool which may be useful for porting existing threading codebases. See CreateThread for Windows 8.
`CreateProcess GetCommandLine GetEnvironmentStrings`	Windows Store apps do not support POSIX-style process, command-line or environment variable APIs. These applications must use the Launcher class in the `Windows.System` namespace.
`FindResource(Ex) LockResource`	Windows Store apps do not use Win32 style resource files. The recommendation is to include the required data as part of the AppX package and use standard file I/O to access them.
`GlobalMemoryStatus(Ex)`	There is no Windows Store app function available that will return physical or virtual memory information.
`HeapAlloc`	This is the standard memory allocation routine family is available for use by Windows Store apps `LocalAlloc, GlobalAlloc,` and `VirtualAlloc` are not available for Windows Store apps.
`LoadLibrary(Ex)`	Windows Store apps must use `LoadPackagedLibrary` and the target DLL must be present in the application package or listed in the AppX manifest. The target DLL must therefore pass the Windows App Certification Kit (WACK) tool validation. You have to use implicit linking with system DLLs, although you can use `/DELAYLOAD` if desired.
`LoadString`	Windows Store apps do not use Win32 style resource files for localization. These applications use `ResourceLoader` in the `Windows.ApplicationModel.Resources` namespace. Dual-use shared code should avoid directly displaying strings and should leave localization to the client application.
OpenGL	OpenGL is not supported for Windows Store apps.
`QueryPerformanceCounter QueryPeformanceFrequency`	These functions are supported for Windows Store apps as well as Win32 desktop apps as the basis for high-resolution timers.
`timeBeginPeriod timeEndPeriod`	Windows Store apps cannot change the global system timer resolution as this can negatively impact power-saving modes.
WinSock	The Windows Sockets 2 API is not available for Windows Store apps and they must use the Windows Runtime (WinRT) API `Windows.Networking.Sockets` instead. TCP and UDP layer network communications are therefore not a good candidate for dual-use shared code, although an abstraction could be written with two different implementations. DirectPlay is not supported for Windows Store apps and its use is not recommended for Win32 desktop apps. Note: The Windows phone 8 platform does support WinSock.

DirectX and Media Technologies

One of the reasons that dual-use shared code is possible for game technology is because many of the traditional DirectX Win32 APIs are available for Windows Store apps as well as down level for Win32 desktop apps.

DirectX Technology	Notes
Direct3D 11.0, DXGI 1.1, Direct2D, and DirectWrite	These technologies are available for Windows Store apps and Win32 desktop apps for Windows 8, Windows 7, and Windows Vista SP2+KB971644. Direct3D 9 and Direct3D 10.x are not supported for Windows Store apps or Windows phone 8. Note: Windows phone 8 does not support Direct2D or DirectWrite.
Direct3D 11.1, DXGI 1.2, improved Direct2D and DirectWrite	These technologies are available for Windows Store apps and Win32 desktop apps on Windows 8. Partial support for these APIs is available for Win32 desktop applications on Windows 7 Service Pack 1 via KB 2670838. Windows Store apps can rely on these technologies always being present, while Win32 desktop applications need to provide suitable fallbacks for older versions of Windows. Note: Windows phone 8 supports Direct3D 11.1 and DXGI 1.2.
D3DX	All versions of the D3DX utility library (D3DX9, D3DX10, and D3DX11) are not supported for Windows Store apps or Windows phone 8. DirectXTK and DirectXTex both support Windows Store apps as well as Win32 desktop applications on Windows 8, Windows 7, and Windows Vista. DirectXTK also supports Windows phone 8. These provide replacements for much of the functionality in D3DX for Direct3D 11. See "Direct3D 11 Textures and Block Compression" for more information. The D3DCSX Compute Shader helper utility is available for Win32 desktop applications, but not for Windows Store apps or Windows phone 8.
HLSL Compiler / D3DCompile	The HLSL compiler (`FXC.EXE`) and the D3DCompile (`D3DCompiler_*.DLL`) APIs are supported for both Windows Store apps and Win32 desktop apps. Note that for Windows Store apps and Windows phone 8, the HLSL compiler / D3DCompile APIs are only supported for development and not for deployment. See “HLSL, FXC, and D3DCompile” for more information.
Effects 11 (FX11)	The Effects 11 technology relies on runtime shader reflection via `D3DReflect` in the D3DCompiler. Due to the limitations above, this makes Effects 11 library unsuited to use in Windows Store apps, Windows phone 8, or dual-use code.
DirectXMath	The DirectXMath library is supported for Windows Store and Win32 desktop apps. This library provides SSE/SSE2 optimizations for Windows x86 and x64 native, as well as ARM-NEON optimizations for Windows RT and Windows phone 8. See “Introducing DirectXMath” for more information.
XAudio2	Windows 8 includes XAudio2 .8 which is supported for Windows Store apps and Windows phone 8. See “XAudio2 and Windows 8” for more details. Windows Core Audio (WASAPI) is available for Windows Store apps for use by low-level audio libraries. DirectSound is not supported for Windows Store apps.
XINPUT	Windows 8 includes XInput 1.4 which is supported for Windows store apps and Win32 desktop apps. Windows Vista, Windows 7, and Windows 8 also include XInput 9.1.0 which is supported for Win32 desktop applications. See “XInput and Windows 8” for more details. DirectInput is not supported for Windows Store apps. Note: XINPUT is not supported by Windows phone 8.
Windows Imaging Component (WIC)	This technology is available for Windows Store apps and Win32 desktop apps for Windows 8, Windows 7, and Windows Vista. Be sure to set the `_WIN32_WINNT` definition properly to ensure use of the correct version of the WIC factory. See "Windows Imaging Component and Windows 8" for more information. Note: Windows phone 8 does not support WIC.
Windows Media Foundation (MF)	The Windows Media Foundation is available for Windows Store apps and Win32 desktop applications on Windows 8, Windows 7, and Windows Vista. Be sure to read this post for some additional guidance. DirectShow is not supported for Windows Store apps. Note: Windows phone 8 has partial support for the Media Foundation API, specifically IMFMediaEngine

(continued in part 3)

↧

Dual-use Coding Techniques for Games, part 3

September 17, 2012, 5:37 pm

≫ Next: Effects for Direct3D 11 Update

≪ Previous: Dual-use Coding Techniques for Games, part 2

Writing shared code for Windows Store and Win32 desktop apps

(continued from part 1 and part 2)

Windows Runtime (WinRT) APIs

There are a number of areas of the system where you must use WinRT APIs to access the required functionality for Windows Store apps, and there is no Win32 equivalent included in the Windows Store apps API family. This code is not a good candidate for dual-use scenarios, but there are times when it makes sense to house both the WinRT and Win32 implementation in the same module. Generally you should prefer to have the client application handle this platform-specific functionality and provide the information needed to your dual-use shared code as parameters, but this is not always convenient or practical.

This scenario is one where you have to make use of the WINAPI_FAMILY macro to determine if you are building for Windows Store apps or Win32 desktop apps. There are a number of ways to do this, and most of them are subtly incorrect. The system headers make extensive use of the WINAPI_FAMILY_PARTITION macro available in <winapifamily.h>, however, as the exact make-up of partitions is subject to change with the introduction of new families over time, the recommendation is to only take dependencies on the FAMILY macros.

 #if !defined(WINAPI_FAMILY) || (WINAPI_FAMILY == WINAPI_FAMILY_DESKTOP_APP)
 // This code is for Win32 desktop apps
 #else
 // This code is for Windows Store or Windows phone apps
 #endif

In some cases when writing code for Windows phone apps, you may need to handle a difference from Windows Store apps. In this case, you can use this guard.

 #if defined(WINAPI_FAMILY) && (WINAPI_FAMILY == WINAPI_FAMILY_PHONE_APP)
 // This code is for Windows phone apps only
 #endif

Alternatively, you may want to support contexts without the Windows 8.0 SDK such as using the Windows 7.1 SDK for Windows XP support. In this case requiring an explicit build configuration (such as /DBUILDING_FOR_DESKTOP in the project settings for Win32 desktop usage) is the easiest and cleanest solution.

 #ifdef BUILDING_FOR_DESKTOP
 // This code is for Win32 desktop apps
 #else
 // This code is for Windows Store apps
 #endif

The __cplusplus_winrt control define can be a useful way to isolate C++/CX language extensions as well, and this define is active whenever building with /ZW (the default for Windows Store app projects). It is, however, possible to be building for a Windows Store app without the /ZW switch (such as in a static library), so the #ifndef __cplusplus_winrt case can still be for a Windows Store app. Thus it is not a substitute for the logic above with the WINAPI_FAMILY control define for determining when building for the Windows Store vs. Win32 desktop.

 #if !defined(WINAPI_FAMILY) || (WINAPI_FAMILY == WINAPI_FAMILY_DESKTOP_APP)
 // This code is for Win32 desktop apps
 #elif !defined (__cplusplus_winrt)
 #error This module requires WinRT C++/CX language support (/ZW)
 // This code is for WinRT Windows Store apps
 #endif

For example, here is some utility code for getting access to the proper path for a temporary file folder. This code builds for Windows Store apps using /ZW and for Win32 desktop apps.

 void GetTemporaryDirectory( wchar_t* dir, size_t maxsize )
 {
 if ( !maxsize ) return;
 *dir = 0;
 #if !defined(WINAPI_FAMILY) || (WINAPI_FAMILY == WINAPI_FAMILY_DESKTOP_APP)
 DWORD nChars = GetTempPath( maxsize, dir );
 if ( nChars > 0 )
 dir[nChars-1] = '\0'; // Trim trialing '\'
 else
 *dir = 0;
 #else // Windows Store WinRT app
 auto folder = Windows::Storage::ApplicationData::Current
 ->TemporaryFolder;
 wcscpy_s( dir, maxsize, folder->Path->Data() );
 #endif // WINAPI_FAMILY_PARTITION
 }

Here is a similar function that gets the application local data folder for the Windows Store app using /ZW or for Win32 desktop apps using the Windows Vista IKnownFolder API.

 void GetApplicationDataDirectory( wchar_t* dir, size_t maxsize )
 {
 if ( !maxsize ) return;
 *dir = 0;
 #if !defined(WINAPI_FAMILY) || (WINAPI_FAMILY == WINAPI_FAMILY_DESKTOP_APP)
 ScopedObject<IKnownFolderManager> mgr;
 HRESULT hr = CoCreateInstance( CLSID_KnownFolderManager,
 nullptr, CLSCTX_INPROC_SERVER, IID_IKnownFolderManager, (LPVOID*)&mgr );
 if (SUCCEEDED(hr))
 {
 ScopedObject<IKnownFolder> folder;
 hr = mgr->GetFolder( FOLDERID_LocalAppData, &folder );
 if (SUCCEEDED(hr))
 {
 LPWSTR szPath = 0;
 hr = folder->GetPath( 0, &szPath );
 if (SUCCEEDED(hr) )
 {
 wcscpy_s( dir, maxsize, szPath );
 wcscat_s( dir, maxsize, L”\\MyUniqueApplicationName”);
 CreateDirectory( dir, nullptr );
 CoTaskMemFree( szPath );
 }
 }
 }
 #else // Windows Store WinRT app
 auto folder = Windows::Storage::ApplicationData::Current
 ->LocalFolder;
 wcscpy_s( dir, maxsize, folder->Path->Data() );
 #endif
 }

Note: This code assumes that CoInitialize(Ex) was already called by the client application.

Remember that Windows Store apps have a very restricted set of security privileges and access to the hard disk is tightly controlled. You should assume you only have read access to the files included in the AppX package for the Windows Store or the install location in “Program Files” for Win32 desktop apps. You should assume you only have read/write access to a temporary folder, the application local data folder, the application roaming data folder, and only other folders in special permissions scenarios (which may be read-only instead of read-write).

`Windows::Storage:: ApplicationData` property	`IKnownFolder` equivalent	`SHGetKnownFolderPath` equivalent
LocalFolder	FOLDERID_LocalAppData + unique folder name	CSIDL_LOCAL_APPDATA + unique folder name
RoamingFolder	FOLDERID_RoamingAppData + unique folder name	CSIDL_APPDATA + unique folder name

Note: There’s no direct equivalent to LocalSettings or RoamingSettings for Win32 desktop apps.

See File access and permissions in Windows Store apps

Resources

Scott Meyers. More Effective C++. Addison-Wesley, 1996. Print.

C++: New Standard Concurrency Features in Visual C++ 11, MSDN Magazine (March 2012)

X64 Primer: Everything You Need To Know To Start Programming 64-Bit Windows Systems, MSDN Magazine (May 2006)

↧