Simple DirectCompute program does not work

simplyme

Junior Member
Mar 6, 2009
24
0
0
I'm trying to make a simple Command line program to compare the times for CPU and GPU in matrix multiplication. The CPU part of the program has been done quickly enough, but I'm having serious trouble in getting the first part of the GPU section (Visual C++ 2010 Express):

//Setting up GPU
HRESULT hr;
ID3D11Device *g_pD3DDevice;
ID3D11DeviceContext *g_pD3DContext;
D3D_FEATURE_LEVEL *g_D3DFeatureLevel;

hr = D3D11CreateDevice( NULL, D3D_DRIVER_TYPE_HARDWARE, NULL,D3D11_CREATE_DEVICE_SINGLETHREADED|D3D11_CREATE_DEVICE_DEBUG,NULL, 0, D3D11_SDK_VERSION, &g_pD3DDevice, g_D3DFeatureLevel, &g_pD3DContext);

This does not even compile and I'm getting three errors here (no line number is specified but an object "cpugpu.obj" is specified as the file in the error list window):

Error 2 error LNK2028: unresolved token (0A000024) "extern "C" long __stdcall D3D11CreateDevice(struct IDXGIAdapter *,enum D3D_DRIVER_TYPE,struct HINSTANCE__ *,unsigned int,enum D3D_FEATURE_LEVEL const *,unsigned int,unsigned int,struct ID3D11Device * *,enum D3D_FEATURE_LEVEL *,struct ID3D11DeviceContext * *)" (?D3D11CreateDevice@@$$J240YGJPAUIDXGIAdapter@@W4D3D_DRIVER_TYPE@@PAUHINSTANCE__@@IPBW4D3D_FEATURE_LEVEL@@IIPAPAUID3D11Device@@PAW44@PAPAUID3D11DeviceContext@@@Z) referenced in function "int __clrcall main(cli::array<class System::String ^ >^)" (?main@@$$HYMHP$01AP$AAVString@System@@@Z) C: \Users\simplyme\documents\visual studio 2010\Projects\cpugpu\cpugpu\cpugpu.obj


The next Error is:
Error 3 error LNK2019: unresolved external symbol "extern "C" long __stdcall D3D11CreateDevice(struct IDXGIAdapter *,enum D3D_DRIVER_TYPE,struct HINSTANCE__ *,unsigned int,enum D3D_FEATURE_LEVEL const *,unsigned int,unsigned int,struct ID3D11Device * *,enum D3D_FEATURE_LEVEL *,struct ID3D11DeviceContext * *)" (?D3D11CreateDevice@@$$J240YGJPAUIDXGIAdapter@@W4D3D_DRIVER_TYPE@@PAUHINSTANCE__@@IPBW4D3D_FEATURE_LEVEL@@IIPAPAUID3D11Device@@PAW44@PAPAUID3D11DeviceContext@@@Z) referenced in function "int __clrcall main(cli::array<class System::String ^ >^)" (?main@@$$HYMHP$01AP$AAVString@System@@@Z) C: \Users\simplyme\documents\visual studio 2010\Projects\cpugpu\cpugpu\cpugpu.obj


And the last one is:

Error 4 error LNK1120: 2 unresolved externals C: \Users\simplyme\documents\visual studio 2010\Projects\cpugpu\Debug\cpugpu.exe

I installed the DirectX SDK (June 2010) and I have included the Include files and reference directories into the project. I have an NVidia 8600mgt with the latest official driver from the Nvidia website. And I'm using Vista home basic.

I have downloaded other examples from other websites "http://www.codeproject.com/KB/directx/DirectX11ComputeShaders.aspx" and even these come with some syntax or undefined error.

Can someone tell me what I'm missing here? Or is there some place where I can download some working example of matrix multiplication using the GPU and study it? I'm trying to make a presentation on this and I thought it would be useful if I had a demonstration of the possible performance gain.
 

ObscureCaucasian

Diamond Member
Jul 23, 2006
3,934
0
0
Looks like you need to import a library. Microsoft tells me that the D3D11CreateDevice function is is the library "D3D11.lib".
 

Cogman

Lifer
Sep 19, 2000
10,277
125
106
Looks like you need to import a library. Microsoft tells me that the D3D11CreateDevice function is is the library "D3D11.lib".

Yep. Your linker can't find where those functions are declared. You need to point it in the right direction by adding references to your project.
 

simplyme

Junior Member
Mar 6, 2009
24
0
0
First and foremost, thank you all for all the help. However I was in a real hurry at the time to get a working Example of GPGPU programming and so I had switched over to CUDA and modified one of the official examples (from the Nvidia website) to perform a simple CPU vs GPU array addition (matrix multiplication was too difficult to understand and explain for my seminar and simplifying it caused some other problems).

The program simply adds two arrays of numbers and stores the result in a third array. It also mentions the time taken for a CPU and a GPU to do the same. On my pc, with a small array (< 3000 ) the CPU is faster than the GPU, after which the GPU is faster, even at high array sizes like 32000. Another thing to note was that the GPU always takes about the same time to calculate no matter what the array size, which is expected considering the parallelisim involved.

So here's the source code (in CUDA) for anyone who needs to do a quick CPU vs GPU comparison. It's a Console program and it works in Vista for me. Remember to name the file with a ".cu" extension:

Code:
//Size of the arrays
#define M 3200

//Header files
#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>

// For GPGPU programming (CUDA)
#include <cutil_inline.h>

//Kernel --> Code executed in the GPU (For each thread)
__global__ void testKernel( int *a, int *b, int *c) 
{  
  const unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;  
 
  if( x < M )
  {
	c[x] = a[x]+b[x];
  }
}

//Main Program --> Executed in the CPU
int main() 
{
	//Input and output arrays
	int *d = (int*)malloc(M*sizeof(int));
	int *e = (int*)malloc(M*sizeof(int));
	int *f = (int*)malloc(M*sizeof(int));
	int *g = (int*)malloc(M*sizeof(int));
	
	if( d == NULL || e == NULL || f == NULL || g==NULL)
	{
		printf("\nError, could not setup 1D array");
		getchar();
		return 0;
	}
	//Other variables
	long i, m;
	
	//To time the execution
	unsigned int timer = 0;
	cutCreateTimer( &timer);

	//Storing the size of the array
	m = M;
    printf("\nSimple array addition with arrays of size %d :", m);	


	//Assigning random variables to the array elements
	printf("\n\nAssigning Values to array a");	
	srand(12);
	for( i =0; i < m; i++ )
	{
		d[i] = rand() % 1000;
		e[i] = rand() % 1000;
	}
			
	printf("\nValues have been assigned. Press enter to start...");
	getchar();

	// CPU ADDITION
	printf("\n\nArray addition using CPU: ");
	cutStartTimer(timer);
	for( i = 0; i < m; i++ )
		f[i] = d[i] + e[i];				
	cutStopTimer( timer);
	printf( "Processing time: %f (ms)\n", cutGetTimerValue( timer));
	cutDeleteTimer( timer);
	printf("\n\nPress enter to start GPU addition...");
	getchar();
	
	// GPU PREPARATION
	//Select the GPU to use
	cudaSetDevice( cutGetMaxGflopsDeviceId() );
	
	//Allocate device(GPU) memory
    int *input1, *input2;
    int *output;    
        
    size_t size = M * sizeof(int);
    
    cudaMalloc( (void**) &input1, size);
    cudaMalloc( (void**) &input2, size);
    cudaMalloc( (void**) &output, size);
    
    // Copy host memory to device    
    cudaMemcpy( input1, d, size, cudaMemcpyHostToDevice);
    cudaMemcpy( input2, e, size, cudaMemcpyHostToDevice);        
    cudaMemcpy( output, g, size, cudaMemcpyHostToDevice);
    
    // Setup execution parameters (32 threads per block is optimum because)
    int temp = m/32;
    dim3  grid( temp + 1, 1, 1);
    dim3  threads( 32, 1, 1);

    	
	//GPU CALCULATION		
	cutCreateTimer( &timer);
	cutStartTimer( timer);
	testKernel<<< grid, threads>>>(input1, input2, output);		//---->Executing code on the GPU
	cutStopTimer( timer);
	printf( "Processing time: %f (ms)\n", cutGetTimerValue( timer));
	cutilCheckError( cutDeleteTimer( timer));
	printf("\nPress Enter to quit...");
	char ch = getchar();
	
	//Get the output from the GPU	
	cudaMemcpy( g, output, size, cudaMemcpyDeviceToHost);


	//Show the results of the operations on the cpu and the GPU
	i = 0;
	while( ch != '\n' )
	{
		if( i == M )
			break;
		printf("\nSum at %d/%d : %4d(GPU), %4d(CPU)", i, m-1, g[i], f[i]);    		
		i++;
	}	
	getchar();
	getchar();    
}

I have also included some files for others to use. Here's a link to the source code, an excel sheet with a comparison of times and some executables that compare array addition with different array sizes.

http://www.megaupload.com/?d=KO678ITF

Hope this comes in handy for someone.