Part A - Foundations

COM Objects and DirectX

Explain the requirements of COM technology
Introduce globally unique identifiers
Introduce the DirectX APIs

COM Technology | GUIDs | DirectX | Exercises

The instructional framework communicates with the hardware through the DirectX APIs.  The Component Object Model (COM) technology, which supports these APIs exhibits binary encapsulation and backward compatibility.  APIs developed using this technology ship in the form of COM objects.  Each COM object connects to client applications at the binary level and provides access to the most recent version as well as all previous versions. 

This chapter describes the requirements that COM technology imposes on client applications, describes the conventions that COM technology uses to identify its objects, and introduces the DirectX set of APIs. 

COM Technology

A COM object differs from a C++ object.  The concept of a COM object is more abstract.  A C++ object is an instance of a C++ class and occupies a region of memory.  A COM object is software that creates and manages instances of a coclass.  A coclass is the counterpart to a C++ class.  Instances of a coclass are counterparts of C++ objects.  An instance of a COM coclass is NOT the COM object itself.  In COM technology, instance and object have different meanings.  Instances of a COM coclass hold client-application data in their own dedicated regions of memory. 

A COM object maintains its own internal state and may implemented in any language.  Its implementation is transparent to all of its client applications.  This transparency is achieved through a set of interfaces. 

a com object

A COM object provides each client application with a complete set of interfaces to instances of its coclass.  Each interface exposes methods that became available with the corresponding version of the COM object.  An interface does not expose any method that became available with later versions of the COM object.  In this way, COM technology ensures backward compatibility. 

A COM object loads itself into system memory when the first client application running on a system accesses that object.  A COM object removes itself from system memory when the last application that has accessed it has relinquished access.  Each COM object effectively runs on its own and is shared by all of the client applications that access it. 

A COM object uses a system of reference counting to manage its lifetime.  An internal counter keeps track of the number of interfaces currently being used by client applications.  The COM object removes itself from memory once this reference count drops to zero.  For reference counting to work properly, each client application must follow the COM objects reference counting convention.  This implementation occurs through functions provided by the COM object. 

COM Functions

Each COM object has at least one global Create...() function.  This function creates an instance of the object's coclass and returns the address of the interface to that instance.  An argument in the call to this Create...() function identifies the version of the COM object to be exposed.  The Create...() function returns the address of the requested interface.  The client application accesses the object's methods through this interface. 

The COM standard requires that all COM objects share certain features.  Every COM object exposes three methods to manage its lifetime:

  • AddRef()
  • Release()
  • QueryInterface()

The COM standard also expects each client application to call

  • AddRef() whenever the application duplicates a pointer to a COM interface
  • Release() whenever the application destroys a pointer to a COM interface

The QueryInterface() method casts a pointer to an interface to a pointer to a different interface on the same instance.  An argument in the call to the QueryInterface() method identifies the new interface to be retrieved.  A separate argument receives the requested interface's address.

The COM helper functions and methods return a wide variety of success and failure codes containing detailed information regarding the level of success or failure of the function call.  The data type of these codes is either an HRESULT or a ULONG.  An HRESULT is a 32-bit integer.  A ULONG data type is a 32-bit unsigned integer.  These two return types differ only in their range:

 typedef long HRESULT;
 typedef unsigned long ULONG;

The Windows API defines two macros for extracting the boolean value from an HRESULT data type

 #define SUCCEEDED(hr) (long(hr) >= 0)
 #define FAILED(hr) (long(hr) < 0)

Note that a zero-valued HRESULT reports success (not failure).  Wrapping calls to COM functions in one of these macros yields the boolean values of the original return codes, which is usually all that client applications require. 


To demonstrate how COM technology is used, consider the example presented in the preceding chapter.  The upgraded code for the client application was:

 // application.cpp - C++ version

 #include <iostream>
 using namespace std;
 #include "iComponent.h"

 int main() {
     char s[] = "Hello", c = 'o';
     int rc;
     iString* str;

     rc = CreateString(s, "iString", (void**)&str);
     if (rc) {
         cout << "String is " << str->getstr() << endl;
         cout << "Length is " << str->length() << endl;
         iStringEx* strEx;
         rc = str->DynamicCast("iStringEx", (void**)&strEx);
         if (rc) {
             cout << "Index  is " << strEx->find(c) << endl;
             int rcc = strEx->Delete();
             if (!rcc)
                 cerr << "Error during deletion" << endl;

The supporting component was a C++ class named String, with methods accessible through an interface named iString

If the supporting component was a COM object instead and if IID_STRING and IID_STRINGEX were two interfaces available on that object, the source code for the client application would look like:

 // application.cpp - COM Object version

 #include <iostream>
 using namespace std;
 #include "iComponent.h"

 int main() {
     char s[] = "Hello", c = 'o';
     HRESULT hr;   // return code from a COM function
     iString* str; // will point to an iString interface

     hr = CreateString(s, IID_STRING, (void**)&str);
     if (SUCCEEDED(hr)) {
         cout << "String is " << str->getstr() << endl;
         cout << "Length is " << str->length() << endl;
         iStringEx* strEx;
         hr = str->QueryInterface(IID_STRINGEX, (void**)&strEx);
         if (SUCCEEDED(hr)) {
             cout << "Index  is " << strEx->find(c) << endl;
             ULONG rcc = strEx->Release();
             if (!rcc)
                 cerr << "Error during deletion" << endl;

The highlighted parts are COM-specific.  Note that

  • the QueryInterface() and CreateString() functions accept the interface's identifier through a parameter
  • the QueryInterface() and CreateString() functions return the interface's address through a pointer to a pointer parameter
  • the client application needs the IID_STRINGEX interface (instead of IID_STRING) to access the find() method
  • one Release() call exists for every interface that has been successfully retrieved.  The client application does not assume any specific relationship between the interfaces that it has retrieved.

Comparing these two examples, we notice the following replacements

  • QueryInterface() replaces DynamicCast()
  • Release() replaces Delete()
  • HRESULT or ULONG data types replace the int data types


Every COM object derives its three standard methods - AddRef(), Release(), and QueryInterface() - from the same abstract base class called IUnknownIUnknown declares these methods as pure virtual methods. 

All interfaces to the releases of a COM object derive from IUnknown.  The interface for a specific release of the COM object only declares those methods that have been added in that particular release.  (See the interface hierarchy in the preceding chapter.)

Reference Counting

A COM object knows when to delete itself through its system of reference counting.  When a client application retrieves an interface to the COM object, the object increments an internal reference counter.  When an application creates an additional pointer to the object, the COM standard expects it to direct the COM object to increment its counter.  Whenever an application pointer goes out of scope, the COM standard expects the application to direct the COM object to decrement its counter.  The COM object deletes itself once this counter has reached zero; that is, once all interfaces to it have disengaged. 

The three methods that manage reference counting on the client application side are those listed above:

  • AddRef() - informs the COM object that the application has duplicated a pointer to an interface
  • QueryInterface() - provides the application with a new interface to the COM object
  • Release() - informs the COM object that the application has disengaged a pointer to one of its interfaces

When an application terminates execution, it does not destroy any of the COM objects that it has used.  Other applications may still be using that same COM object.  If any application neglects to notify the COM objects that it has used when it discards any interface, that object's lifetime will survive until the next rebooting of the operating system.  Such memory leaks are extremely difficult to locate and can grind an operating system to a halt. 


To avoid such memory leaks, it is important to apply the rules of COM technology properly within each client application.

Versioning Through Interfaces

A COM object manages all instances of its coclass and exposes interfaces to those instances as requested.  Each interface provides access to either the complete set or some subset of the methods that the coclass supports.  Each new version of a COM object adds a new interface, which exposes the new methods along with all previously exposed methods.  That is, the interface for any particular version of a COM object provides access to all of the methods exposed by the COM object until that particular release. 


Note that client applications do not have direct access to the methods of a COM object's coclass.  The object exposes those methods through a pointer to a virtual table.  The object is encapsulated at the binary level. 


A client application should call the AddRef() method on a COM interface whenever it duplicates a pointer to that interface.  The application should call the Release() method on a COM interface just before disengaging a pointer to that interface.  The application should call the QueryInterface() method on a COM interface to obtain a different interface to the same instance. 


Independence of COM objects from their client applications gives rise to a need to identify COM objects in some standard way across any system.  The identifiers must be unique across all of the systems that communicate with one another.  COM technology uses Unix's Universally Unique Identifier (UUID) standard for this purpose. 

The UUID standard was originally designed to distinguish entities within a distributed computing environment without central coordination but with reasonable confidence that entity identification would not be duplicated.  Microsoft, Oracle, and Novell adopted a version of the UUID standard called Globally Unique Identifiers (GUIDs, pronounced like squids). 

A GUID is a 128-bit (16-byte) value with a very low probability of duplication.  Examples of a GUID in canonical and struct form are


 typedef struct _GUID {
     unsigned long  Data1;
     unsigned short Data2;
     unsigned short Data3;
     unsigned char  Data4[ 8 ];

The Windows data type for a GUID is GUID and the data type for a pointer to a GUID is LPGUID.  (The GUID struct is defined in Rpcdce.h, which is included in many header files.)

The NULL value for a GUID type is defined as:

 const GUID GUID_NULL = { 0, 0, 0, { 0, 0, 0, 0, 0, 0, 0, 0 } };

Note that this value is different from a simple 0.  If we initialize a GUID to an empty value, we should initialize it to GUID_NULL (not to 0).

Windows uses GUIDs to identify most of the installed hardware and software.  There are two GUID subtypes for COM objects:

  • CLSIDs identify the COM objects themselves
  • IIDs identify interfaces to the COM objects

When we install an SDK (such as Windows or DirectX), the Windows operating system generates a CLSID for each COM object and an IID for each interface available on that object.  Windows stores all of the GUIDs for COM objects in the system registry. 

Enumerating GUIDs

To find the GUID for a particular device, we use a technique called enumeration.  Enumeration involves two functions: one to set up the enumeration itself for the specified type of device and one to process every occurence.  The setup function receives as one of its parameters the address of the processing function.  We call the latter function the callback function for the enumeration.

Consider enumerating the controllers attached to a host computer.  The call to setup the enumeration looks like

 // ...
      DI8DEVCLASS_GAMECTRL,                // search for all game controllers
      (LPDIENUMDEVICESCALLBACK)myCallback, // address of the callback function
      (void*)ctrlrSet,                     // address passed to callback function
      DIEDFL_ATTACHEDONLY                  // enumerate attached devices only

di points to an interface on the COM object.  The API defines the signature of the callback function.  In this example, the callback function retrieves the GUID from the struct that describes the enumerated controller:

 BOOL CALLBACK myCallback(
  const DIDEVICEINSTANCE* controller, // points to enumerated controller
  void* ctrlrSet                      // address passed in enumeration call
  ) {
      Controller* c = (Controller*)ctrlrSet;
      int i = c->current++;
      c->guid[i] = controller->guidInstance; // stores guid for controller
      return DIENUM_CONTINUE;                // continue the enumeration

The first parameter receives the address of the instance of the struct that describes the enumerated controller.  The guidInstance member holds the GUID of that controller.  The second parameter receives the address passed to the enumerating function.  Here, this address points to the instance that holds the information for all enumerated controllers.  Our callback function populates that instance with the retrieved GUID. 

Retrieving an Interface

To retrieve an interface to a COM object, we allocate memory for a pointer to that interface and call the appropriate Create...() function.  The function returns the address, which we store in the pointer.  The call to the Create...() function identifies the desired interface through the IID argument, the function name itself, or some combination of both.  For example,

    HRESULT hr; // holds success/failure code

    // retrieve an interface to the DirectInput8 COM object
    LPDIRECTINPUT8 di;       // points to the interface
    hr = DirectInput8Create(
     IID_IDirectInput8,      // identifies the interface
     (void**)&di,            // where to store the interface address

    // retrieve an interface to the Direct3D COM object
    LPDIRECT3D9 d3d; // points to the interface
    d3d =            // holds the interface address
     Direct3DCreate9(D3D_SDK_VERSION); // identifies the interface 

Note the lack of a uniform syntax for retrieving a COM interface.  The form of the Create...() function varies from COM object to COM object. 


DirectX is a comprehensive set of low-level APIs that provide direct access to the underlying hardware.  DirectX is used for for gaming and other high-performance applications on Windows platforms.  Its APIs are built on COM technology.  The X in DirectX is a placeholder for the name of a specific API (for instance, Direct3D, DirectInput). 

Microsoft has grouped the APIs under general categories that describe the nature of their contribution to multi-media programming.  DirectX currently consists of:

  • Direct Graphics
    • Direct2D - 2D graphics - new in Windows Vista/7
    • DirectWrite - Fonts - new in Windows Vista/7
    • Direct3D - 3D graphics
      • Direct3D9
      • Direct3D9Ex - new in Windows Vista
      • Direct3D10 - new in Windows Vista
      • Direct3D11 - new in Windows 7
  • Direct Audio
    • DirectSound - deprecated
    • XAUDIO2
    • X3DAUDIO
    • XACT
  • Direct Input
    • DirectInput8
    • XInput
  • Direct Setup
  • Direct Compute - for computations - new in Windows Vista/7

The highlighted APIs are those that we cover in this course.  The names starting with X refer to cross-platform APIs with XBox360.  Microsoft continues to develop DirectX actively for the PC platform, but has combined much of that development work with that for the XBox360 platform. 

DirectX GUIDs are stored in dxguid.lib, which comes with the DirectX SDK. 


In 1995, Craig Eisler, Alex St.John, and Eric Engstrom raised concerns that the success of the forthcoming launch of Windows 95 might be muted, since game programmers had found the DOS platform friendlier.  That platform allowed direct access to video and sound hardware while the forthcoming Windows 95 platform with its protected memory model restricted such access.  Intel had developed the Display Control Interface (DCI) standard for transferring video processing from a PC's CPU to the video adapter.  The DCI driver would let an application send information directly to the video adapter if the CPU was busy.  Eisler, St.John, and Engstrom developed a set of APIs based upon this standard, which enabled direct access to the video hardware.  Microsoft called the first release its Windows Games SDK.  The name did not stick and Microsoft eventually changed it to DirectX. 

Direct Graphics was Microsoft's alternative to the graphics capabilities of its own Windows API.  The Windows API displayed images through the graphics device interface (GDI).  This GDI interacted with the device drivers to provide a device-independent way of managing graphics and writing to a video buffer in system memory, while the CPU transfered data from this buffer to video memory.  Although this technology was sufficiently fast for applications such as word processors and spreadsheets, it was much too slow for gaming and high-performance applications.  Direct Graphics bypassed both system memory and the CPU and wrote directly to the graphics hardware. 

Microsoft released DirectX 7 in 1999, DirectX 8 in 2000, DirectX 9 in 2002, DirectX 10 in 2006 and DirectX 11 in 2009.  DirectX 10 requires the Windows Display Driver Model which became available with Windows Vista.  DirectX 11 requires either Windows Vista or Windows 7. 


Direct3D focuses on three-dimensional graphics.  Direct3D achieves device-independence through a Hardware Abstraction Layer (HAL).  The HAL hides hardware differences from the operating system kernel, avoiding any need to change the kernel to run on different hardware.  That is, we do not need to change the operating system kernel when we change the graphics card. 

The HAL is a software layer between the physical hardware and the software that drives that hardware.  The HAL interacts directly with the graphics hardware and is typically implemented by the manufacturer of the hardware. 

hardware abstraction layer

Direct3D provides client applications with interfaces that access the HAL directly.  As a result, applications bypass calls to the operating system altogether.  That is, Direct3D provides not only direct access to the graphics hardware, but also access to any specialized features that that particular hardware offers through its HAL. 

If an application requests a Direct3D feature that has not been implemented in the installed hardware, the HAL does not report it as a capability and Direct3D emulates that particular feature in a reference device or reference rasterizer.  Software emulation supports every Direct3D feature available, is accurate but extremely slow, can be used for debugging, and is only available with the DirectX SDK installed.  In short, Direct3D provides a consistent interface to its client applications for working directly with any installed hardware. 


Previous Reading  Previous: Component Design Next: Display and Mode Selection   Next Reading

  Designed by Chris Szalwinski   Copying From This Site