C-OpenMP/ Tried to parallelize recursive code with tasks but works slower

问题内容:

Greetings from Mexico!

I was confronted with the task of parallelizing a couple of codes involving recursion, and after doing some research I’ve realised that the best aproach would be to make use of omp’s tasks. However, it seems that its runtime is considerably bigger than its serial counterpart.

I know this question has been asked more than a couple of times in this forum but none of the proposed solutions seem to fit my case.

Below I add my code. infijaParallelSearch() receives the head of a BST and a value to find, and if it is in the tree it returns the node. convertToBST() receives a sorted list and turns it into a Binary Search Tree returning the head.

typedef struct node
{
    int key;
    struct node *left;
    struct node *right;
} Node;


Node* convertToBST(int* array, int size) {
    if( size == 1)
        return NULL;
    Node* root= (Node*)malloc(sizeof(Node));
    if(root == NULL)
    {
        printf("Error creating a new node.\n");
        exit(0);
    }
    root->key=array[size/2-1];
    root->left=convertToBST(array,size/2);
    root->right=convertToBST(array+size/2,size-size/2);
    return root;
}

Node* infijaParallelSearch(Node* root, int x){
    if( root == NULL)
        return NULL;
    if( root->key == x )
        return root;
    #pragma omp task firstprivate(root)
        infijaParallelSearch(root->left,x);

    infijaParallelSearch(root->right,x);
}

When I run it with list of increasing length to compare its runtime with its serial counterpart, I get this: Runtime results

I am measuring time with omp_get_wtime() and I have tried almost everything, from having single traversals to multiple traversals and nothing seems to make a difference. I’ve also considered having the code be made in serial if its length is lower than a value at which the parallel code performs better, but after running the code multiple times that value doesn’t seems to exist.

Thanks again for your time. This is my first time asking and I hope I was not vague in any manner. I am a beginner and I am just developing my passion for coding, so any help or advice would be greatly appreciated.

PS: I don’t know if it will make a difference but below I add the code with which I compare the runtime for both codes.

    int main(){
    int i,*ptrA,n=10;
    double startTime, normalTime, parallelTime;
    FILE *fPtr;

    fPtr=fopen("statistics.txt","w");
        if(fPtr == NULL){
            printf("File cannot be created");
            exit(0);
        }
    while(n<=100000){
        printf("\nTamaño: %d\t",n);

        ptrA = (int*)calloc(n,sizeof(int));
        if (ptrA == NULL)
        {
            printf("Error 401. Memory not available");
            exit(0);
        }


        for (i=0;i<n;i++){
            ptrA[i]=i+1;
        }
        Node *root=convertToBTS(ptrA,n);

        startTime=omp_get_wtime();
        infijaParallelSearch(root,-1);
        normalTime=omp_get_wtime()-startTime;
        printf("Parallel final Time: %f\t",normalTime);

        for (i=0;i<n;i++){
            ptrA[i]=i+1;
        }

        root=convertToBTS(ptrA,n);

        startTime=omp_get_wtime();
        infijaSearch(root,-1);
        parallelTime=omp_get_wtime()-startTime;
        printf("Normal final Time: %f\n",parallelTime);
        free(ptrA);

        fprintf(fPtr,"%d::%f::%f\n",n,normalTime,parallelTime);

        n+=100;
    }
    fclose(fPtr);
    return 0;
}

问题评论:

    
You should bound the creation of tasks. If the tree is balanced, then you can limit the amount of recursion you make with tasks to fit the amount of parallelism you want to achieve (use final clause for a specific range of recursion depth. If the task body is very small, the amount of time spent on task creation and management may be higher, so you want to make sure that the amount of work is bigger, in order to get a speed up.

答案:

答案1:

Firstly, the assumption you make that parallelizing the search will have a performance increase independently of data size is false.
According to your run-time results, we can conclude that even though total run-time is faster for the array sizes you provided in the serialized version, it definitely gets worse as size increases compared to the parallel version.

For example, while increasing the array size almost 10 times over (from 100 to ~10000) you got a 50 times slower performance in the serialized version. But the parrallel version is only ~40 times slower (which means performance gain was better in the parrallel version).

This leads me to assume that with a large enough dataset, the parrallel version would out-perform the serialized version.

Why is this happening?
Regardless of how you chose to implement concurrency here, spawning new tasks (threads) comes with an overhead, in your case it just happens to be large compared to your per-work-item job to be done.

答案评论:

原文地址:

https://stackoverflow.com/questions/47747579/c-openmp-tried-to-parallelize-recursive-code-with-tasks-but-works-slower

Tags:,

添加评论