Workshop 3

Device Management

In this workshop, you are to determine the device(s) installed on a host computer and certain device properties. 

Learning Outcomes

Upon successful completion of this workshop, you will have demonstrated the abilities

  • to verify a CUDA installation and determine the properties of the installed devices
  • to write code to interrogate a CUDA-enabled device
  • to interpret errors generated by calls to the runtime API
  • to summarize what you have learned in completing this workshop


This workshop consist of three parts:

  1. verifying a CUDA installation and device configuration
  2. querying the installed device(s) and selecting one with a user-specified compute capability
  3. capturing any errors generated by the runtime API



To verify a CUDA installation, first open a command prompt window and change the current directory to the installation's bin sub-directory

 cd %CUDA_PATH%\bin 

The environment variable CUDA_PATH should contain the absolute path to the current directory for the default installation.

Once in this directory, enter the following command, which invokes the compiler driver and displays its version.  Your results should look something like one of those shown on the right:

 nvcc -V

 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2016 NVIDIA Corporation
 Built on Sat_Sep__3_19:05:48_CDT_2016
 Cuda compilation tools, release 8.0, V8.0.44

To access the CUDA samples directory, change the current directory to the samples' executable sub-directory bin\win64\Release sub-directory

 cd %NVCUDASAMPLES_ROOT%\bin\win64\Release

If the executables' sub-directory is empty, you can build them using one of the Visual Studio solutions in the samples' root directory.

Using File Explorer, navigate to the samples' root directory, which you should find hidden at C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0.  Open the Visual Studio solution file compatible with your VS installation.  Build the solution in Release mode for x64.  This may take about 20 minutes.

Once the executables for your CUDA samples have been built successfully, you can run them directly from the command line.  To obtain the properties of the installed device(s), enter: 


The output should look something like:

 deviceQuery Starting...

  CUDA Device Query (Runtime API) version (CUDART static linking)

 Detected 1 CUDA Capable device(s)

 Device 0: "Quadro M1000M"
   CUDA Driver Version / Runtime Version          8.0 / 8.0
   CUDA Capability Major/Minor version number:    5.0
   Total amount of global memory:                 2048 MBytes (2147483648 bytes)
   ( 4) Multiprocessors, (128) CUDA Cores/MP:     512 CUDA Cores
   GPU Max Clock rate:                            1072 MHz (1.07 GHz)
   Memory Clock rate:                             2505 Mhz
   Memory Bus Width:                              128-bit
   L2 Cache Size:                                 2097152 bytes
   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536),
   3D=(4096, 4096, 4096)
   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
   Total amount of constant memory:               65536 bytes
   Total amount of shared memory per block:       49152 bytes
   Total number of registers available per block: 65536
   Warp size:                                     32
   Maximum number of threads per multiprocessor:  2048
   Maximum number of threads per block:           1024
   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
   Maximum memory pitch:                          2147483647 bytes
   Texture alignment:                             512 bytes
   Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
   Run time limit on kernels:                     Yes
   Integrated GPU sharing Host Memory:            No
   Support host page-locked memory mapping:       Yes
   Alignment requirement for Surfaces:            Yes
   Device has ECC support:                        Disabled
   CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Mo 
   Device supports Unified Addressing (UVA):      Yes
   Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
   Compute Mode:
      < Default (multiple host threads can use ::cudaSetDevice() with device simu 
      ltaneously) >

 deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Versi
 on = 8.0, NumDevs = 1, Device0 = Quadro M1000M
 Result = PASS

To identify the communications rate, enter the following command: 


The output should look something like:

 [CUDA Bandwidth Test] - Starting...
 Running on...

  Device 0: Quadro M1000M
  Quick Mode

  Host to Device Bandwidth, 1 Device(s)
  PINNED Memory Transfers
    Transfer Size (Bytes)        Bandwidth(MB/s) 
    33554432                     6112.6

  Device to Host Bandwidth, 1 Device(s)
  PINNED Memory Transfers
    Transfer Size (Bytes)        Bandwidth(MB/s)
    33554432                     6148.5

  Device to Device Bandwidth, 1 Device(s)
  PINNED Memory Transfers
    Transfer Size (Bytes)        Bandwidth(MB/s)
    33554432                     67623.1

 Result = PASS

Device Properties

Survey the Device Management Module of the CUDA Toolkit Documentation, specifically the following API functions:

Complete the code listed below by adding appropriate calls to these API functions.

 // Device Query and Selection - Workshop 3
 // w3.cpp

 #include <iostream>
 #include <cstdlib>
 #include <cstring>

 // CUDA run-time header file

 int main(int argc, char** argv) {
     bool selectADevice  = argc == 3;
     bool listAllDevices = argc == 1;
     int rc = 0;

     if (selectADevice) {
         int device;
         int major = std::atoi(argv[1]); // major version - compute capability 
         int minor = std::atoi(argv[2]); // minor version - compute capability

         // choose a device close to compute capability maj.min
         // - fill the properties struct with the user-requested capability
         // - retrieve the device that is the closest match
         // - retrieve the properties of the selected device

         std::cout << "Device with compute capability " << major << '.' <<
          minor << " found (index " << device << ')' << std::endl;
     } else if (listAllDevices) {
         int noDevices;

         // retrieve the number of installed devices

         for (int device = 0; device < noDevices; ++device) {

             // retrieve the properties of device i_dev

             std::cout << "Name:                " <<
                                    << std::endl;
             std::cout << "Compute Capability:  " <<
                                    << '.' <<
                                    << std::endl; 
             std::cout << "Total Global Memory: " <<
                                    << std::endl;
         if (noDevices == 0) {
             std::cout << "No Device found " << std::endl;
     } else {
         std::cout << "***Incorrect number of arguments***\n";
         rc = 1;

     return rc;

This program executes differently depending on the number of command line arguments (2 or 0):

  • 2 arguments - selects the device closest to the user-specified compute capability
  • 0 arguments - lists all of the CUDA-enabled devices installed on the host computer

Visual Studio Test Runs

Start a Visual Studio project named w3.  Use Visual Studio 2015 for a CUDA 8.0 installation.

  • New Project -> Visual C++ -> Empty Project
  • name: w3 | OK
  • select Properties -> New Item -> w3.cpp file
  • paste in the incomplete source code listed above
  • complete the code by including the CUDA runtime header file and calling the required CUDA API functions
  • select Properties -> w3 Properties
  • C++ -> General -> Additional Include Directories -> %CUDA_PATH%\include
  • Linker -> General -> Additional Library Directories -> %CUDA_PATH%\lib\x64
  • Linker -> Input -> Additional Dependencies -> cudart.lib
  • select Build Solution
  • select Debug -> Start without Debugging
  • select Project -> Properties -> Debugging -> Command Arguments -> 6 0
  • select Debug -> Start without Debugging

Error Checking

The CUDA API function calls in this workshop return error codes of type cudaError_t.  The value cudaSuccess identifies successful execution. 

Code your own function named check(cudaError_t) that checks the error code received and if the code is not a success code inserts a user-friendly string into the std::cerr object before returning control to its caller. 

Survey the Error Handling Module of the CUDA Toolkit Documentation, specifically the following API function:

Wrap each one of the API calls in your main program in a call to this reporting function to send any possible error to the standard error stream. 


Copy your source code and the results of your test runs to a file named w3.txt

Your submission file should include:

  1. a listing of your completed source code
  2. results for each of your test cases

Upload your typescript to Blackboard: 

  • Login to
  • Select your course code
  • Select Workshop 3 under Assignments
  • Upload w3.txt
  • Under "Add Comments" describe to your instructor in detail what you have learned in completing this workshop. 
  • When ready to submit, press "Submit"

  Designed by Chris Szalwinski   Copying From This Site   
Creative Commons License