Playing with OpenCL: Gaussian Blurring

Introducing the use of OpenCL with a tutorial on how to setup the software development environment for it depending on the vendor of your graphics card. Also presenting a full-fledged example of its use with the implementation of a parallel processing algorithm for applying a gaussian blur filter on an image.

Gaussian Blur

EDIT(2 years and something later):
The code in this tutorial is ugly. Checked it 2 years and something after writting it and can definitely say it’s bad. I don’t have enough time to “fix” it but at least I changed the github repo, to provide a simple Makefile and removed nonsensical dependency to an immature version of a library of mine.

Start of original post:
I haven’t written a blog post in ages, since I was pretty busy with my university class. Lately I have been playing around with OpenCL and completed a small demo that can perform Gaussian Blurring on a bitmap image either with the CPU or the GPU. In this post I will try to explain how to do it step by step so that people with no OpenCL knowledge may find a good starting point to get comfortable with OpenCL. This post is accompanied by source code located at Github.

First of all what is OpenCL and how do I get it? Well, OpenCL is an API interface to your graphics card that allows for GPGPU(General Purpose GPU) computing. It is an open standard that is made and maintained by the same guys that work on OpenGL, the Khronos Group. The first version of this tutorial will showcase how to work in Windows with OpenCL but in the future a Linux and maybe even Mac OS X version may be added. (Theoretically this code will work in Linux too but it has not yet been tested. Subject to change.)

Getting OpenCL

As I mentioned above OpenCL is an API for your graphics card so it stands to reason that the drivers for it and any SDK would be available from your GPU’s vendor. So in order to get the OpenCL SDK you have to visit your graphic’s card vendor website. Below you can see the links at the time of writing of this post:

Now that you have the SDK let’s see how to use it from inside a project. Just like any other library you have to link to the library object itself and expose the functions via header files. At the moment of this writing I will show how to do this under Windows for the nVidia OpenCL sdk and the gcc compiler with the Codeblocks IDE. For the Microsoft Visual studio the process is similar. If I can find more time in the future I will show how to accomplish this with other IDEs and other OSes too.

Linking to OpenCL

So assuming that your IDE is Codeblocks, fire it up and make a new project. In Codeblocks all linking options are in the menu “Project”->”Build Options”. First we should link to the OpenCL library itself, so navigate to the Linker Settings and add the directory of OpenCL.lib there as can be seen in the picture below. If you have a nVidia card and are on Windows the path should be similar to the one below.

C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\OpenCL\common\lib\Win32\OpenCL.lib
Codeblocks Linker Settings
Linking to OpenCL in Codeblocks

Afterwards we have to tell the compiler where the OpenCL headers are located. For the nVidia and Windows 7 case they are located in:

C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\OpenCL\common\inc

In order to make gcc understand that and while still being inside the project’s build options we navigate to search directories tab and the compiler subtab and insert the location of the headers as shown in the image below.

OpenCL Headers and Codeblocks
Showing Codeblocks the location of the OpenCL headers

Finally the last thing that we need to do is tell the compiler where the location of the library file is. That’s pretty easy since in our case as we saw above it is

C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.2\OpenCL\common\lib\Win32

To show that to gcc just navigate to the linker subtab and enter the path to the library there like shown in the picture below.

Codeblocks OpenCL lib Linking
Showing Codeblocks the location of OpenCL lib

With the OpenCL SDK and drivers downloaded and installed and the project settings setup correctly now we are finally ready to run OpenCL code.

Gaussian Blurring project

As mentioned in the beginning, all the code for this tutorial can be found in Github. Just visit the link and download/clone the repo. Just create a new project in the IDE of your choice and link against the OpenCL library as shown above and also the refu.dll or refu.so (depending on your OS) and the refu headers included in the repository. Remember that at the moment of writing this code has only been tested in Windows but should theoretically work for Linux too with minor to no changes. When I find the time I will test it on Linux too and adjust the repository so that it’s guaranteed to work for both OSes.

The code that you downloaded creates a program that accepts some arguments and according to them can blurs bitmap images with both the CPU and the GPU. These arguments are given below:

  • -h   Displays a help message
  • -i   The name of the input image for blurring
  • -g   The size of the gaussian kernel. Default is 3
  • -s   The sigma parameter of the gaussian. Default is 0.8

Bitmaps files

The tutorial’s code contains some functions to load and save .bmp bitmap images. It is code that you should use only for this project and is definitely not well written enough to be used generally. It only works with 24 bits per pixel uncompressed bitmap images. The test images included inside the repository are all compatible so I recommend you use them for the purpose of this tutorial.

The functions can be seen below

//! Initializes a bmp image object from a file
//!
//! @param bmp The bmp object to initialize
//! @param filename The name of the file from which to initialize the image
//! @return true for succesfull initialization and false otherwise
char meImageBMP_Init(ME_ImageBMP* bmp,char* filename);
 
//! @brief A bitmap saving to disk function.
//!
//! Saves the image only as 24 BPP uncompressed bitmap
//! @param bmp The bitmap image
//! @param filename The name of the file to save
int meImageBMP_Save(ME_ImageBMP* bmp,char* filename);

The program uses the first function to read the data of the bitmap, then performs gaussian blurring and in the end saves the blurred image back to the disk with the second function.

Blurring on the CPU

Performing Gaussian blurring on the CPU is a linear process. First a Gaussian kernel(also known as mask or filter) is created using the following equation.

The 2D Gaussian Function
The 2D Gaussian Function

The result is a square matrix whose greatest value is located at the center and as you gradually progress outwards the values get smaller. For example for sigma=1 and the size of the square gaussian being 5 the following kernel is generated. You may notice that the values are normalized. We perform normalization of the gaussian because if we did not then the final image may contain excessive light or dark areas.

0.002969 0.013306 0.021938 0.013306 0.002969
0.013306 0.059634 0.098320 0.059634 0.013306
0.021938 0.098320 0.162103 0.098320 0.021938
0.013306 0.059634 0.098320 0.059634 0.013306
0.002969 0.013306 0.021938 0.013306 0.002969

Performing gaussian blur is as simple as sliding this kernel along the image and convolving the pixels of the image with the kernel’s values. It is more computationally effective to convolve two 1-dimensional gaussian kernels with the image, one in the horizontal direction and one in the vertical direction but since here article we are just dealing with OpenCL this is out of the scope of the tutorial.

Below we can see the actual function that performs the gaussian blurring on the CPU.

//Blurs the given image using the CPU algorithm
char pna_blur_cpu(char* imgname,uint32_t size,float sigma)
{
    uint32_t i,x,y,imgLineSize;
    int32_t center,yOff,xOff;
    float* matrix,value;
    matrix = createGaussianKernel(size,sigma);
    //read the bitmap
    ME_ImageBMP bmp;
    if(meImageBMP_Init(&bmp,imgname)==false)
    {
        printf("Image \"%s\" could not be read as a .BMP file\n",imgname);
        return false;
    }
    //find the size of one line of the image in bytes and the center of the gaussian kernel
    imgLineSize = bmp.imgWidth*3;
    center = size/2;

At the beginning of the function the gaussian kernel is created with the arguments that the user of the program provides at execution. Then the bitmap image is read and the size of a line in bytes and the center of the gaussian kernel is found.

    //convolve all valid pixels with the gaussian kernel
    for(i = imgLineSize*(size-center)+center*3; i < bmp.imgHeight*bmp.imgWidth*3-imgLineSize*(size-center)-center*3;i++)
    {
        value = 0;
        for(y=0;y<size;y++)
        {
            yOff = imgLineSize*(y-center);
            for(x=0;x<size;x++)
            {
                xOff = 3*(x - center);
                value += matrix[y*size+x]*bmp.imgData[i+xOff+yOff];
            }
        }
        bmp.imgData[i] = value;
    }

Afterwards as we can see above, for all the pixels on which the gaussian kernel can be centered on we start a loop. In this loop the pixel’s new value is the convolution of itself and its surrounding pixels with the gaussian kernel.

    //free memory and save the image
    free(matrix);
    meImageBMP_Save(&bmp,"cpu_blur.bmp");
    return true;

Finally in the end, memory is freed and the blurred image is saved on disk.

Gaussian Blur
Gaussian blur result with size=5 and sigma=0.8

Blurring on the GPU

Blurring on the GPU is a little more complicated but is also the interesting part of the article. This is where we will utilize the OpenCL API. The beginning of the function is of similar fashion to the CPU one.

//Blurs the given image using the GPU
char pna_blur_gpu(char* imgname,uint32_t size,float sigma)
{
    uint32_t imgSize;
    float* matrix;
    cl_int ret;//the openCL error code/s
    //get the image
    ME_ImageBMP bmp;
    meImageBMP_Init(&bmp,imgname);
    imgSize = bmp.imgWidth*bmp.imgHeight*3;
    //create the gaussian kernel
    matrix = createGaussianKernel(size,sigma);
    //create the pointer that will hold the new (blurred) image data
    unsigned char* newData;
    RF_MALLOC(newData,imgSize);

The gaussian kernel is created and the bitmap image is read.

This is where things start to diverse. Let’s see how OpenCL treats your graphics card. OpenCL considers your GPU as an OpenCL-enabled device. In order to do non-graphics calculations in this device you need to write a program for it. This program is written in a language almost identical to C. It is sent to the Graphics card, compiled there and waiting to run whenever we need it to. It is also called a kernel. In the tutorial’s repository the kernel’s code is located inside the “kernel.cl” file, whose contents we will soon examine. For now let’s just see how to use it from the side of our machine.

    // Read in the kernel code into a c string
    FILE* f;
    char* kernelSource;
    size_t kernelSrcSize;
    if( (f = fopen("kernel.cl", "r")) == NULL)
    {
        fprintf(stderr, "Failed to load OpenCL kernel code.\n");
        return false;
    }
    RF_MALLOC(kernelSource,MAX_SOURCE_SIZE)
    kernelSrcSize = fread( kernelSource, 1, MAX_SOURCE_SIZE, f);
    fclose(f);

The code above just opens the file that holds the kernel code and reads it into a buffer, the kernelSource.

Afterwards we are starting to call OpenCL related code as we can see below.

    //Get platform and device information
    cl_platform_id platformID;//will hold the ID of the openCL available platform
    cl_uint platformsN;//will hold the number of openCL available platforms on the machine
    cl_device_id deviceID;//will hold the ID of the openCL device
    cl_uint devicesN; //will hold the number of OpenCL devices in the system
    if(clGetPlatformIDs(1, &platformID, &platformsN) != CL_SUCCESS)
    {
        printf("Could not get the OpenCL Platform IDs\n");
        return false;
    }
    if(clGetDeviceIDs(platformID, CL_DEVICE_TYPE_DEFAULT, 1,&deviceID, &devicesN) != CL_SUCCESS)
    {
        printf("Could not get the system's OpenCL device\n");
        return false;
    }

At first we call the clGetPlatformIDs() function to get the ID of the OpenCL platform and then the clGetDeviceIDs() function to get the ID of the OpenCL device if any is available in the machine. Even though the arguments should be self-explanatory feel free to visit the documentation of the API functions to learn more about each one of them.

// Create an OpenCL context
    cl_context context = clCreateContext( NULL, 1, &deviceID, NULL, NULL, &ret);
    if(ret != CL_SUCCESS)
    {
        printf("Could not create a valid OpenCL context\n");
        return false;
    }
    // Create a command queue
    cl_command_queue cmdQueue = clCreateCommandQueue(context, deviceID, 0, &ret);
    if(ret != CL_SUCCESS)
    {
        printf("Could not create an OpenCL Command Queue\n");
        return false;
    }

Following we have to create an OpenCL context, just like we would have to with OpenGL and by using the clCreateContext() function. As you can observe the context is created for the device whose ID we had obtained before. Finally a command queue is created using the clCreateCommandQueue() function, inside which OpenCL functions shall be entered and executed.

The bitmap image and the gaussian kernels are basically buffers of bytes allocated with malloc() on the side of our machine. In order to send them to the GPU we have to create buffers in the Graphics memory. They are called (just like in OpenGL) buffer objects.

 /// Create memory buffers on the device for the two images
    cl_mem gpuImg = clCreateBuffer(context,CL_MEM_READ_ONLY,imgSize,NULL,&ret);
    if(ret != CL_SUCCESS)
    {
        printf("Unable to create the GPU image buffer object\n");
        return false;
    }
    cl_mem gpuGaussian = clCreateBuffer(context,CL_MEM_READ_ONLY,size*size*sizeof(float),NULL,&ret);
    if(ret != CL_SUCCESS)
    {
        printf("Unable to create the GPU image buffer object\n");
        return false;
    }
    cl_mem gpuNewImg = clCreateBuffer(context,CL_MEM_WRITE_ONLY,imgSize,NULL,&ret);
    if(ret != CL_SUCCESS)
    {
        printf("Unable to create the GPU image buffer object\n");
        return false;
    }

The buffer objects are created using the clCreateBuffer() function. The image data and the gaussian kernel buffers are created with the CL_MEM_READ_ONLY flag since the GPU kernel program will only read from these buffers and the blurred image to be returned from the GPU is created with the CL_MEM_WRITE_ONLY flag since it will only be populated by the GPU kernel program.

After the buffers have been created in the graphics memory we enqueue the data we want to send them to the command queue that we created before.

    //Copy the image data and the gaussian kernel to the memory buffer
    if(clEnqueueWriteBuffer(cmdQueue, gpuImg, CL_TRUE, 0,imgSize,bmp.imgData, 0, NULL, NULL) != CL_SUCCESS)
    {
        printf("Error during sending the image data to the OpenCL buffer\n");
        return false;
    }
    if(clEnqueueWriteBuffer(cmdQueue, gpuGaussian, CL_TRUE, 0,size*size*sizeof(float),matrix, 0, NULL, NULL) != CL_SUCCESS)
    {
        printf("Error during sending the gaussian kernel to the OpenCL buffer\n");
        return false;
    }

This is accomplished using the clEnqueueWriteBuffer() function as we can observe from the above code.

At this point we should talk about the kernel, the code that runs on the GPU. As we said in our example code this is contained inside “kernel.cl”. Let’s start taking a look at its code below.

//The Gaussian blur function that runs on the gpu
__kernel void gaussian_blur(__global const unsigned char *image, __global const float* G, const int W,const int H,const int size, __global unsigned char* newImg) 
{ 
	unsigned int x,y,imgLineSize;
	float value;
	int i,xOff,yOff,center;
    //Get the index of the current element being processed
    i = get_global_id(0);

As you can immediately notice the kernel code is indeed C, with a few distinct differences. The function is preceded by the __kernel keyword. In an OpenCL program only one function can be so preceded and that function acts as the main function of the program. This is what runs in each GPU core. As far as the function’s arguments are concerned, image is the bitmap image buffer, G is the gaussian kernel, W and H are the image’s width and height respectively, size is the size of the gaussian kernel and newImg is the buffer where the blurred image will be returned. Arguments preceded by the __global keyword refer to memory objects allocated from the global memory pool.A final noticeable part is the call to get_global_id() which basically returns the index of the work-item that is currently being processed in this particular call.

Further below in the GPU kernel program the code becomes quite similar to the CPU one.

	//Calculate some needed variables
	imgLineSize = W*3;
	center = size/2;
	//pass the pixel through the kernel if it can be centered inside it
	if(i >= imgLineSize*(size-center)+center*3 &&
	   i < W*H*3-imgLineSize*(size-center)-center*3)
	{
		value=0;
	  for(y=0;y<size;y++)
          {
            yOff = imgLineSize*(y-center);
            for(x=0;x<size;x++)
            {
                xOff = 3*(x-center);
                value += G[y*size+x]*image[i+xOff+yOff];
            }
          }
          newImg[i] = value;
	}
	else//if it's in the edge keep the same value
	{
		newImg[i] = image[i];
	}

The main difference is that there is no outer for() loop since the index is known from get_global_id() and that for the edge pixels the original value has to be preserved.

Returning back to the sourcecode of our example that will be running on the machine we will atempt to build the “kernel.cl” program from the source that we read from the file into the kernelSource buffer in the beginning of the program.

    //Create a program object and associate it with the kernel's source code.
    cl_program program = clCreateProgramWithSource(context, 1,(const char **)&kernelSource, (const size_t *)&kernelSrcSize, &ret);
    free(kernelSource);
    if(ret != CL_SUCCESS)
    {
        printf("Error in creating an OpenCL program object\n");
        return false;
    }
    //Build the created OpenCL program
    if((ret = clBuildProgram(program, 1, &deviceID, NULL, NULL, NULL))!= CL_SUCCESS)
    {
        printf("Failed to build the OpenCL program\n");
        //create the log string and show it to the user. Then quit
        char* buildLog;
        RF_MALLOC(buildLog,MAX_LOG_SIZE);
        if(clGetProgramBuildInfo(program,deviceID,CL_PROGRAM_BUILD_LOG,MAX_LOG_SIZE,buildLog,NULL) != CL_SUCCESS)
        {
            printf("Could not get any Build info from OpenCL\n");
            free(buildLog);
            return false;
        }
        printf("**BUILD LOG**\n%s",buildLog);
        free(buildLog);
        return false;
    }

The program creation from source happens with the call of the clCreateProgramWithSource() function. Right after that we are are attempting to build the newly created program with the clBuildProgram() function.

If there is a problem at building the program, then by calling the clGetProgramBuildInfo() function with the CL_PROGRAM_BUILD_LOG argument for the program that failed to build, we can obtain a build log which will contain any possible build errors/warning which we subsequently can print and show the user what the problem appears to be.

If the program is built succesfully we have to create the kernel of the program and set its arguments.

// Create the OpenCL kernel. This is basically one function of the program declared with the __kernel qualifier
    cl_kernel kernel = clCreateKernel(program, "gaussian_blur", &ret);
    if(ret != CL_SUCCESS)
    {
        printf("Failed to create the OpenCL Kernel from the built program\n");
        return false;
    }
    ///Set the arguments of the kernel
    if(clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&gpuImg) != CL_SUCCESS)
    {
        printf("Could not set the kernel's \"gpuImg\" argument\n");
        return false;
    }
    if(clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&gpuGaussian) != CL_SUCCESS)
    {
        printf("Could not set the kernel's \"gpuGaussian\" argument\n");
        return false;
    }
    if(clSetKernelArg(kernel, 2, sizeof(int), (void *)&bmp.imgWidth) != CL_SUCCESS)
    {
        printf("Could not set the kernel's \"imageWidth\" argument\n");
        return false;
    }
    if(clSetKernelArg(kernel, 3, sizeof(int), (void *)&bmp.imgHeight) != CL_SUCCESS)
    {
        printf("Could not set the kernel's \"imgHeight\" argument\n");
        return false;
    }
    if(clSetKernelArg(kernel,4,sizeof(int),(void*)&size) != CL_SUCCESS)
    {
        printf("Could not set the kernel's \"gaussian size\" argument\n");
        return false;
    }
    if(clSetKernelArg(kernel, 5, sizeof(cl_mem), (void *)&gpuNewImg) != CL_SUCCESS)
    {
        printf("Could not set the kernel's \"gpuNewImg\" argument\n");
        return false;
    }

That is accomplished as we can see above by calling the clCreateKernel() function on the succesfully built program. The second argument is the name of the kernel function, the function of the program sent to the GPU that is preceded by __kernel.

After creating the kernel we also have to set its arguments, one by one. This is accomplished by calling clSetKernelArg() for each argument of the kernel. The arguments of this function are easy to deduce, especially if you visit the link to its documentation. Something that requires attention here is that the buffers (image data, gaussian kernel and returned image) are not set to point at the machine’s buffer but at the memory buffer objects that have been created on the graphics memory side for them.

Finally we come to the point where we actually call on the GPU to act on the compiled kernel program.

    ///enqueue the kernel into the OpenCL device for execution
    size_t globalWorkItemSize = imgSize;//the total size of 1 dimension of the work items. Basically the whole image buffer size
    size_t workGroupSize = 64; //The size of one work group
    ret = clEnqueueNDRangeKernel(cmdQueue, kernel, 1, NULL, &globalWorkItemSize, &workGroupSize,0, NULL, NULL);

This is accomplished by enquing the kernel into the OpenCL device with a call to clEnqueueNDRangeKernel(). This function also chooses the number of work items for this operation, which in our case is the size of the image(imgSize) and the number of items in a workgroup. For a more detailed explanation of the arguments visit the function’s documentation link.

After the kernel is executed we need to read the results of the blurring.

///Read the memory buffer of the new image on the device to the new Data local variable
    ret = clEnqueueReadBuffer(cmdQueue, gpuNewImg, CL_TRUE, 0,imgSize, newData, 0, NULL, NULL);

This is accomplished by a call to the function clEnqueueReadBuffer() with arguments similar to that of clEnqueueWriteBuffer() but this time we are writing to the newData buffer allocated at the beginning of the function.

Finally at the end, all OpenCL related structures and buffers have to be cleaned up and the resulting blurred image saved to the disk.

///Clean up everything
    free(matrix);
    clFlush(cmdQueue);
    clFinish(cmdQueue);
    clReleaseKernel(kernel);
    clReleaseProgram(program);
    clReleaseMemObject(gpuImg);
    clReleaseMemObject(gpuGaussian);
    clReleaseMemObject(gpuNewImg);
    clReleaseCommandQueue(cmdQueue);
    clReleaseContext(context);
    ///save the new image and return success
    bmp.imgData = newData;
    meImageBMP_Save(&bmp,"gpu_blur.bmp");
    return true;

Analyzing the results

Well, since we went into all the trouble of making this small OpenCL demo why don’t we measure its performance and compare the CPU and GPU on different sample images? This is what we will see in this section. The images used are all included inside the repository. These tests are ran on a Windows 7 machine equipped with a nVidia Geforce 320M graphics card. The gaussian kernel used on all of them is the same. A square gaussian kernel with a size of 5 and a sigma value of 0.8.

(edit: 2 years later, the timer code is removed from the repository. Feel free to replace with your own timing functions”)

In order to perform these tests we measure the performance of the cpu and the gpu blurring functions using timers.

 //start a timer to time the CPU performance
    RF_Timer timer;
    rfTimer_Init(&timer,RF_TIMER_MICROSECONDS);
    //perform CPU blurring
    if(pna_blur_cpu(imgname,gaussianSize,gaussianSigma)==false)//time it
        return -2;
    //query the timer for the CPU
    units  = rfTimer_Query(&timer,RF_TIMER_MICROSECONDS);
    printf("CPU Gaussian blurring took %f microseconds\n",units);
    //reset the timer for the GPU
    rfTimer_Query(&timer,RF_TIMER_MICROSECONDS);
    //perform GPU blurring and then read the timer
    if(pna_blur_gpu(imgname,gaussianSize,gaussianSigma)==false)
        return -3;
    units  = rfTimer_Query(&timer,RF_TIMER_MICROSECONDS);//time it
    printf("GPU Gaussian blurring took %f microseconds\n",units);

As can be seen above a new RF_Timer is initialized and starts counting right before the blurring functions with a call to the rfTimer_Init() function. After each blurring call finishes we query the timer to learn how much time elapsed in microseconds with calls to rfTimer_Query().

Below we can see the results of running this experiment for all 4 sample images.

CPU vs GPU
Comparing CPU and GPU performance on blurring

What is immediately noticeable is that as the size of the image increases so does the benefit of using OpenCL and the GPU to parallelize the blurring process. On the contrary as the image gets smaller and smaller the benefits are not so clear. And we can even see for a 0.1 MB sized bitmap that it’s actually faster to use Gaussian blurring on the CPU rather than on the GPU.

Why do smaller image sizes mean less parallelization benefits? Well there are many reasons but the main one of them is the communication overhead that’s added. When using the GPU to perform parallel operations we lose lots of precious time while compiling the kernel program, creating the buffers in the graphics memory, sending the contents of the buffer to the graphics memory and reading back the results.

Total performance vs communication overhead
Comparing GPU total time with communication overhead

In the graph above we can see the GPU total time taken to blur an image (blue bar) compared to how much of it is used for communicating to/from the GPU(red bar). What we can see is that naturally as the size of the data gets bigger the communication overhead takes up a bigger part of the computation because of the size of the created buffers.

A surprising result is that even though for the smaller image sizes the communication overhead gets reduced to a very small percentage of the total GPU calculation time, as we can deduce from the first graph it along with the differences between the CPU and GPU implementation are enough to make blurring faster in the CPU for those sizes.

And thus concludes the post on playing with OpenCL and Gaussian Blurring. It turned out to be much bigger than I originally planned. I will be awaiting your comments, feedback and constructive criticism either in the comments section below or by email at: lefteris *at* refu *dot* co

Customized kids superhero capes
superhero capes|Kids can play comfortably with our double sided satin capes
superhero capes| If you are not absolutely thrilled with the reaction your young ones will get when they see this superhero dress-up set
cartoon capes| This awesome cap and mask set will have you child fighting crime in no time.
cosplay masks|
Cartoon hero capes & masks
batman capes|

Justice league boys batman DC superhero childs halloween costume.
spiderman capeswe provide Spiderman cape & mask, Children just love the feel of this fabric – perfect for superhero parties and birthdays.

17 thoughts on “Playing with OpenCL: Gaussian Blurring”

  1. Your tutorial is quite easy to understand. But, I have a small problem, while compiling the code on CPU(on Microsoft Visual Studio 2010), i get an error, which says:
    1>e:\opencl_gaussian_blur-master\refu\rf_setup.h(162): fatal error C1189: #error : Unsupported Compiler
    1> gaussian.c.

    and similarly, for the other 2 files, bitmap.c and args.c

    I have included the paths for opencl files, and refu files, and attached the path for opencl.dll, and refu lib files..

  2. Hello and thanks for your feedback. At the moment of the writing of the tutorial refu did not support the MSVC compiler used by Visual studio. I am working on providing support for MSVC compiler for refu but it is not yet complete.

    Try compiling the whole things with the gcc compiler and it will definitely work. If you are feeling adventurous you can always delete the code inside refu/rf_setup.h that generates this error and pray that it compiles with the MSVC compiler too.

    To do that go to line 162 of rf_setup.h and edit out the #error Unsuported Compiler. Replace it with
    #define RF_COMPILE_ERROR(i_ERROR_STR)
    Doing that you effectively define an empty macro.

    Please note that since the time of writing of this blog post the refu library has changed A LOT and it is still growing so that’s why I have not updated this tutorial. Will get to it sometime in the future.

    If you have any other problems do not hesitate to write another comment.

  3. i tried to compile ur programs after following all instructions it is giving stray error in line 1 of rf_options.h

  4. What do you mean stray error? Could it be something with the encoding of the file in your computer? Do you mean stray unicodes?

  5. i have mailed u an image screenshot of the error.i also tried to change encoding frm ANSI to UTF-8 ,but it does not work….

  6. I am afraid I can’t help you with Windows XP. I don’t even know if they support OpenCL for that OS.

  7. Few advices for Windows XP users if you wanna save your problems. Download AMD Stream here, green button “trusted download” on the left http://ati-stream-sdk-developer.software.informer.com/2.3/
    it contains OpenCL. I also met problems that the compiler did not understand long file names, so if you hit same problem, then edit your path/file names into 8.3 (eightDotThree) format. If you hit error that -Lrefu.dll not found or similar, carefully check your_project_file.cbp whether all paths are correct, absolute and in 8.3 format. For ATI users use these paths: C:\Program Files\AMD APP\lib\x86 and C:\Program Files\AMD APP\include, which could be C:\PROGRA~1\AMDAPP~1\lib\x86\ and C:\PROGRA~1\AMDAPP~1\include but check this with command.com (create link to command.com in your directory and skip working path, then run the link and get the working path from MS-DOS command prompt).

  8. I try to compile your program but I got errors “file not found” for
    #include
    in arg.h, when I correct to
    #include “refu/rf_setup.h”
    so I can go on, but the code stucks on refu/rf_utils.h on line 25: 1#include
    fatal error no such file or directory.
    As a result I have to use quotes everywhere and relative path as in the rf_timer:
    #include “../rf_options.h”
    #ifdef RF_MODULE_TIME_TIMER //timer module check

    #include “../rf_setup.h”

    last error which I have is this:
    p:\programs\C++\codeblocks_32bit\mingw\bin\..\lib\gcc\mingw32\4.7.1\..\..\..\libmingw32.a(main.o):main.c:(.text.startup+0xa7)||undefined reference to `WinMain@16’|
    collect2.exe: error: ld returned 1 exit status
    What is this?

  9. Valentine,

    Hello. I wrote this code a while back. I don’t remember a lot about it but when I went to check it out, I was amazed at how damn ugly it is. I updated the github repository to at least provide a makefile and remove the dependency on an immature version of Refu library. I don’t have time to do any more right now but if you need more help please tell me.

    As for the particular error you were getting: You are compiling in windows. Did you include all the proper compiler flags for your platform? I don’t know much about windows I am afraid.

  10. Thanks for reply. I found the problem. Some how I forget to include main.c so I had to add it. I fount these problems: 1) unknown type bool -> I had to add to #include which I included from file which I called types.h and this I included in every file. 2) I had error here:
    rfInit(“errorLog”,”infoLog”);
    because here you use *char type, I had to add:
    char errorLog[] = “errorLog”;
    char infoLog[] = “infoLog”;
    rfInit(errorLog,infoLog);
    in the main.c

    Now, I have errors:
    main.cpp|24|undefined reference to `readArguments(int, char**, char**, unsigned int*, float*)’
    undefined reference to `pna_blur_cpu(char*, unsigned int, float)’
    undefined reference to `pna_blur_gpu(char*, unsigned int, float)’

    I don’t know why is this.

    4) I have included refu.dll as you did, but this is useless I think because you don’t use the library refu.dll but you use refu headers to open image in bmp format. I would remake this later to use some image library. I found nice library LodePNG, but this one is only for PNG, so I probably will learn to use jpeglib or tifflib for a challenge.

    You did quite a lot of work to write the article and whole the code looks complete, but I did not yet get to the point to make it working šŸ™‚

  11. I updated the code. You can get the new code from:
    https://github.com/LefterisJP/OpenCL_Gaussian_Blur

    It should fix all these dependencies with Refu library. You don’t need to bother with it at all. I made it quite simpler.

    But the original guide should also work. Including a library for development purposes ALWAYS needs the headers of the library. It seems from your comments that you don’t know how libraries work. Try to read up on linking with shared libraries.

    As for the new code, just use the makefile and it should all work (under linux). Under windows just mimic the compiler commands.

  12. On Windows there are errors. You deleted the rf_setup which declared uint32_t type: typedef unsigned __int32 uint32_t;

  13. Uknown header #include and Unknown function htonl() in file bitmap.c

  14. htonl is a function that converts from host to network byte order. Used it to determine if the system is little or big endian. In Windows, it seems this is in a different place so I will edit the code to make it work.

    But unfortunately you will have to link with 1 extra library. Check for details here:
    http://www.cplusplus.com/forum/windows/21655/

    As for the uint32_t stuff, I thought that in windows they were also in . Please typedef whatever is missing. I don’t have a windows machine to test anything.

    Edit: I just pushed a commit which should take care of the windows specific stuff, as long as you link with the correct library as pointed out in my comment.

  15. I could compile the project successfully.
    But in running time, it always shows:
    Image image3.BMP could not be read as a .BMP file
    ( P.S. I use your BMP files )

    So I trace code:
    if(meImageBMP_Init(&bmp,imgname)==false)
    {
    printf(“Image \”%s\” could not be read as a .BMP file\n”,imgname);
    return false;
    }

    This is always false…
    What’s wrong???

  16. I wrote this tutorial a very long time ago. A lot of things may have changed since then, regarding compiler versions and tools. You have to try and figure it out on your own.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.