OpenCL – Issues with Image Convolution Kernel and Output Buffer


I’m having an issue while writing an convolution filter kernel for OpenCL. I am trying to pass an input buffer to the kernel, perform convolution, and then store the results in an output buffer, which is then read and written to a PPM file. However, I am encountering some issues. I pass the input buffer to the kernel as so:

//Execute - Write pixel data into device.
clEnqueueWriteBuffer(queue, in_buffer, CL_FALSE, 0, imageElements, 
input, 0, NULL, NULL);

Where imageElements is ROWS * COLUMNS * 3 for a PPM image. I then perform convolution using the following single, sliding work item kernel (as I am using an Intel FPGA and they reccomend single work item kernels). I used an identity filter to make the issue i’m having more obvious:

#include "../host/inc/imageDims.h"

void identity(global unsigned int * restrict frame_in, global 
unsigned int * restrict frame_out,
const int iterations)
// Filter coefficients
int Gx[3][3] = {{0,0,0},{0,1,0},{0,0,0}};

// Pixel buffer of 2 rows and 3 extra pixels
int rows[2 * COLS + 3];

// The initial iterations are used to initialize the pixel buffer.
int count = -(2 * COLS + 3);
while (count != iterations) {
    // Each cycle, shift a new pixel into the buffer.
    // Unrolling this loop allows the compile to infer a shift 
    #pragma unroll
    for (int i = COLS * 2 + 2; i > 0; --i) {
        rows[i] = rows[i - 1];
    rows[0] = count >= 0 ? frame_in[count] : 0;

    int r_temp = 0;
    int g_temp = 0;
    int b_temp = 0;

    // With these loops unrolled, one convolution can be computed 
    // cycle.
    #pragma unroll
    for (int i = 0; i < 3; ++i) {
        #pragma unroll
        for (int j = 0; j < 3; ++j) {
            unsigned int pixel = rows[i * COLS + j];
            unsigned int b = pixel & 0xff;
            unsigned int g = (pixel >> 8) & 0xff;
            unsigned int r = (pixel >> 16) & 0xff;

            r_temp += (int)r*Gx[i][j]; 
            g_temp += (int)g*Gx[i][j];
            b_temp += (int)b*Gx[i][j];

//Limits for each channel (R,G,B)
if (r_temp>255) r_temp = 255;
else if(r_temp<0) r_temp = 0;
if (g_temp>255) g_temp = 255;
else if(g_temp<0) g_temp = 0;
if (b_temp>255) b_temp = 255;
else if(b_temp<0) b_temp = 0;

    if (count >= 0) {
       frame_out[count] = ((unsigned int)r_temp << 16) + ((unsigned 
                            int)g_temp << 8) + (unsigned int)b_temp;

I then read the data back from the device:

//Execute - Read pixel data from device.
clEnqueueReadBuffer(queue, out_buffer, CL_FALSE, 0, imageElements, 
output, 0, NULL, NULL);

And call a function to write to a PPM file:

saveImage(COLS, ROWS, (unsigned char*)output);

Now, I don’t think there’s an issue with anything other than the kernel, since if I just pass the input buffer to the saveImage function the image saves fine. However, when I leave it as is, I get the following result after application of the identity:


Clearly, there’s something wrong here. When using the original example from Intel, they use a library to display the image on screen, however for my purpose I do need to save this to a file. I can only image the issue is now to do with the kernel or the output buffer. If anyone could help in this matter, it would be greatly appreciated! For reference, here is the input image:


Thanks! J