GPU621/DPS921 T-eaSyPeasy

From CDOT Wiki
Revision as of 13:17, 14 April 2016 by Bernard Wouter de Vries Robles (talk | contribs) (Source)
Jump to: navigation, search


GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary

T-eaSyPeasy

Team Members

  1. Berwout de Vries Robles

Progress

Introduction

For my project instead of picking a new method to learn to parallelise with, I decided to try to paralellise an algorithm that is fairly hard to write, to see if after following this course I could actually parallelise an algorithm. The thing is, in the workshops we have only been parallelising algorithms that have been specifically written to be parallelised. I wanted to change that and see if I could still do it. The algorithm I am parallelising is a Branch & Bound Algorithm by Little from 1966. A good explanation of the algorithm can be found here: https://www.youtube.com/watch?v=nN4K8xA8ShM, the explanation of the branch and bound I wrote starts at 34:30.

Description

First off because I am not a C / C++ programmer I had a lot of trouble learning C++ syntax and what I could and could not do in my program. I think as a result the program is slower than it should be.

Some difficulties and how I solved them:

  • After removing a row or column, the matrix would be skewed, we would have to keep track of the exact columns and rows we removed and calculate our current value based on that entire history along the way.

The way I remedied that is that instead of removing the rows and columns, I put a sentinel value in them (UINT16_MAX), the rest of the algorithm just ignores that value.

  • Not all of the column minima need to be found, since if you have a row minimum in a column that automatically turns into a column minimum, due to the nature of the algorithm.

When searching for row minima I kept a bool array of which columns I had already found a value for, so that I would not have to search those *columns in the findColMinima function.

  • Subtour elimination

Store the result as partial tours, If an assignment is the start of a tour and the end of a tour, place it in between and concatenate the tour. If an assignment is the start of a tour, place it before the tour. If an assignment is the end of a tour, place it behind the tour. If an assignment is not part of a tour, create a new tour with it. Find out which tour the assignment got added to and eliminate the subtour that goes from the end of that tour to the beginning.

I wanted to parallelise certain sections of the algorithm, most notable : finding the row and column minimum subtracting the row and column minimum

With a cilk_for reducer and a vectorized cilk_for this should have been possible, the thing is, I wrote the algorithm in such a way that instead of searching for a minimum, it searches for the location of the minimum. This turned out to be a problem when trying to parallelise it. In the end I got around the problem by reverting to the simpler solution.

Here is what the parallelisation looks like:

    int16_t findRowMinimum(int row) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for (int col = 0; col < matrixSize; col++) {
            best->calc_min(workingCopy[row*matrixSize + col]);
        }
        return best.get_value();
    }
 int16_t findColMinimum(int col) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for (int row= 0; row< matrixSize; row++) {
            best->calc_min(workingCopy[row*matrixSize + col]);
        }
        return best.get_value();
    }
    void subtractRowMinimum(uint16_t val, int row) {
        cilk_for(int col = 0; col < matrixSize; col++) {
            workingCopy[row*matrixSize + col] -= val;
        }
    }
    void subtractColMinimum(uint16_t val, int col) {
        cilk_for(int row = 0; row < matrixSize; row++) {
            workingCopy[row*matrixSize + col] -= val;
        }
    }

The strange thing is that after these parallelisations The algorithm performed worse. I think this has to do with the granularity of the parallelism, because we aren't solving 1000 by 1000 matrices but a very large amount of 20 by 20 matrices, the parallelism isn't splitting into all that much and some of that may be faster off serial. Where our serial solution could solve up to 20 cities, this parallelised version can't solve more than 16 without taking minutes.

Another approach would have been to parallelise the whole nodes instead of the operations in them, but this is a little bit difficult because for each node to exist, the parent needs to have been evaluated already.

Learn from me and think long and hard about what you want to parallelise in regards of granularity, because in this case parallelising the individual operations in the Node made my algorithm significantly worse. It took a lot of programmer effort and gave me no reward in terms of speed. I did learn a lot about C++ writing this algorithm, so all was not lost.

Source

#include <stdint.h>
#include <vector>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <cmath>
#include <time.h>
#include <cilk/cilk.h>
#include <cilk/reducer_min.h>
#include <cilk/cilk_stub.h>


struct Selection {
    uint16_t row;
    uint16_t col;
    uint16_t val;
    Selection(uint16_t x, uint16_t y, uint16_t v);

    bool operator==(const Selection& rhs)
    {
        return (col == rhs.col && row == rhs.row);
    }
    bool operator!=(const Selection& rhs)
    {
        return !(col == rhs.col && row == rhs.row);
    }
    Selection() : row(UINT16_MAX), col(UINT16_MAX), val(UINT16_MAX) {}
};

Selection::Selection(uint16_t r, uint16_t c, uint16_t v) {
    row = r;
    col = c;
    val = v;
}

struct Zero {
    uint16_t row;
    uint16_t col;
    uint16_t horPen;
    uint16_t vertPen;
    Zero(uint16_t row, uint16_t col);
    Zero() {};
};

Zero::Zero(uint16_t r, uint16_t c) {
    row = r;
    col = c;
};

struct Node {
    int lowerBound;
    int matrixSize;
    bool solutionFound = false;
    bool invalidSolution = false;
    uint16_t* originalMatrix;
    Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution, uint16_t lb);
    Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution);
    std::vector<std::vector<Selection>> solution;
    Node* left;
    Node* right;

    void evaluate(){
        std::vector<uint16_t> rowMins(matrixSize);
        std::vector <uint16_t> colMins(matrixSize);
        std::vector<Zero> zeroes;
        std::vector<bool> zeroInColumn(matrixSize, false);
        lowerBound = sumCurrentSolution();

        for (int row = 0; row < matrixSize; row++) {
            int16_t minimum = findRowMinimum(row);
            rowMins[row] = minimum;
            if(minimum != UINT16_MAX)
            lowerBound += minimum;
        }
        for (int row = 0; row < matrixSize; row++) {
            if (rowMins[row] != UINT16_MAX && rowMins[row] != 0) {
                subtractRowMinimum(rowMins[row], row);
            }
        }

        for (int col = 0; col < matrixSize; col++) {
            int16_t minimum = findColMinimum(col);
            colMins[col] = minimum;
            if (minimum != UINT16_MAX)
            lowerBound += minimum;
        }
        for (int col = 0; col < matrixSize; col++) {
            if (colMins[col] != UINT16_MAX && colMins[col] != 0) {
                subtractRowMinimum(colMins[col], col);
            }
        }
        gatherZeroes(zeroes);
        if (zeroes.empty()) {
            invalidSolution = true;
        }
        else {
            findPenalties(zeroes);
            Zero assignment = assignLargestPenalty(zeroes);
            createLeftNode(solution, assignment);
            int subtourIndex = addToResult(solution, assignment);
            createRightNode(solution, assignment, subtourIndex);
            checkSolutionFound();
        }
        free(originalMatrix);
    }

    int16_t findColMinimum(int col) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for(int row = 0; row< matrixSize; row++) {
            best->calc_min(originalMatrix[row*matrixSize + col]);
        }
        return best.get_value();
    }

    int16_t findRowMinimum(int row) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for (int col = 0; col < matrixSize; col++) {
            best->calc_min(originalMatrix[row*matrixSize + col]);
        }
        return best.get_value();
    }

    void subtractRowMinimum(uint16_t val, int row) {
        if (val != 0) {
            cilk_for(int col = 0; col < matrixSize; col++) {
                if (originalMatrix[row*matrixSize + col] != UINT16_MAX)
                    originalMatrix[row*matrixSize + col] -= val;
            }
        }
    }

    void subtractColMinimum(uint16_t val, int col) {
        if (val != 0) {
            cilk_for(int row = 0; row < matrixSize; row++) {
                if (originalMatrix[row*matrixSize + col] != UINT16_MAX)
                    originalMatrix[row*matrixSize + col] -= val;
            }
        }
    }

    int sumCurrentSolution() {
        int result = 0;
        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
            std::for_each(v.begin(), v.end(), [&](Selection s) {
                result += s.val;
            });
        });
        return result;
    }

    void checkSolutionFound() {
        solutionFound = (solution.size() == 1 && solution[0].size() == matrixSize - 1);
    }

    bool isSolutionFound() {
        return solutionFound;
    }

    std::vector<std::vector<Selection>> getSolution() {
        return solution;
    }

    //void createWorkingCopy() {
    //    originalMatrix = (uint16_t*)malloc(sizeof(uint16_t) * matrixSize * matrixSize);
    //    memcpy(originalMatrix, originalMatrix, (sizeof(uint16_t) * matrixSize * matrixSize));
    //}

    void printMatrix() {
        std::cout << "\n Matrix: ";
        for (int i = 0; i < matrixSize; i++) {
            std::cout << "\n";
            for (int j = 0; j < matrixSize; j++) {
                std::cout << originalMatrix[i*matrixSize + j] << " ";
            }
        }
        std::cout << "\n";
    }

    void printOriginalMatrix() {
        std::cout << "\nOriginal Matrix: ";
        for (int i = 0; i < matrixSize; i++) {
            std::cout << "\n";
            for (int j = 0; j < matrixSize; j++) {
                std::cout << originalMatrix[i*matrixSize + j] << " ";
            }
        }
        std::cout << "\n";
    }

    void printSolution() {
        std::cout << "\n intermediate :\n";
        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
            std::for_each(v.begin(), v.end(), [&](Selection s) {
                std::cout << "(" << s.row << ", " << s.col << ") value = " << s.val << " ";
            });
            std::cout << "\n";
        });
        std::cout << "\n";
    }
    void gatherZeroes(std::vector<Zero>& zeroes) {
        for (int i = 0; i < matrixSize; i++) {
            for (int j = 0; j < matrixSize; j++) {
                if (originalMatrix[i*matrixSize + j] == 0) {
                    zeroes.push_back(Zero(i, j));
                }
            }
        }
        //std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
        //    std::cout << " zero found : row : " << z.row << " col : " << z.col << "\n";
        //});
    }

    void findPenalties(std::vector<Zero>& zeroes) {
        std::for_each(zeroes.begin(), zeroes.end(), [&](Zero& z) {
            uint16_t horizontalMin = UINT16_MAX;
            for (int col = 0; col < matrixSize; col++) {
                if (col != z.col && originalMatrix[z.row*matrixSize + col] < horizontalMin) {
                    horizontalMin = originalMatrix[z.row*matrixSize + col];
                }
            }
            z.horPen = horizontalMin;

            uint16_t verticalMin = UINT16_MAX;
            for (int row = 0; row < matrixSize; row++) {
                if (row != z.row && originalMatrix[row * matrixSize + z.col] < verticalMin) {
                    verticalMin = originalMatrix[row * matrixSize + z.col];
                }
            }
            z.vertPen = verticalMin;
        });

        //std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
        //    std::cout << " (" << z.row << ", " << z.col << ") Horizontal penalty: " << z.horPen << " Vertical penalty: " << z.vertPen << "\n";
        //});
    }

    Zero assignLargestPenalty(std::vector<Zero>& zeroes) {
        Zero assignment = zeroes[0];
        std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
            if ((z.vertPen + z.horPen) > (assignment.vertPen + assignment.horPen)) {
                assignment = z;
            }
        });
        return assignment;
    }

    int addToResult(std::vector<std::vector<Selection>>& solution, Zero assignment) {
        int startOfSubtourIndex = -1;
        int endOfSubtourIndex = -1;
        int index = 0;
        int result = 0;

        //dealing with subtours....
        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection>& subtour) {
            if (subtour[0].row == assignment.col) {
                startOfSubtourIndex = index;
            }
            else if (subtour.back().col == assignment.row) {
                endOfSubtourIndex = index;
            }
            index++;
        });
        if (startOfSubtourIndex > -1 && endOfSubtourIndex > -1) {
            //this case is true if our solution connects 2 subtours, by being at the start of one and at the end of the other.
            //we combine them by concatenating the subtour that it is the start of after the subtour that it is the end of.
            solution[endOfSubtourIndex].push_back(Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col]));
            solution[endOfSubtourIndex].insert(solution[endOfSubtourIndex].end(), solution[startOfSubtourIndex].begin(), solution[startOfSubtourIndex].end());
            solution.erase(solution.begin() + startOfSubtourIndex);
            result = endOfSubtourIndex;
            if (startOfSubtourIndex < endOfSubtourIndex) {
                result--;
            }

        }
        else if (startOfSubtourIndex > -1) {
            solution[startOfSubtourIndex].insert(solution[startOfSubtourIndex].begin(), Selection(assignment.row, assignment.col, originalMatrix[assignment.row * matrixSize + assignment.col]));
            //subtour elimination, we can no longer go FROM(column) subtour.back().row TO(row) assignment.col.
            result = startOfSubtourIndex;
        }
        else if (endOfSubtourIndex > -1) {
            solution[endOfSubtourIndex].push_back(Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col]));
            //subtour elimination, we can no longer go FROM(column) assignment.row TO(row) subtour[0].col.
            result = endOfSubtourIndex;
        }
        else {
            solution.push_back(std::vector<Selection>(1, Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col])));
            result = solution.size() - 1;
        }
        return result;
    }

    void createLeftNode(std::vector<std::vector<Selection>>& solution, Zero assignment) {
        uint16_t* leftSideMatrix = (uint16_t*)malloc(sizeof(uint16_t)*matrixSize*matrixSize);
        for (int row = 0; row < matrixSize; row++) {
            for (int col = 0; col < matrixSize; col++) {
                if (row == assignment.row && col == assignment.col) {
                    leftSideMatrix[row*matrixSize + col] = UINT16_MAX;
                }
                else {
                    leftSideMatrix[row*matrixSize + col] = originalMatrix[row*matrixSize + col];
                }
            }
        }
        left = new Node(matrixSize, leftSideMatrix, solution, lowerBound + assignment.horPen + assignment.vertPen);
    }

    void createRightNode(std::vector<std::vector<Selection>>& solution, Zero assignment, int subtourIndex) {
        uint16_t* rightSideMatrix = (uint16_t*)malloc(sizeof(uint16_t)*matrixSize*matrixSize);
        for (int row = 0; row < matrixSize; row++) {
            for (int col = 0; col < matrixSize; col++) {
                if (assignment.row == row || assignment.col == col) {
                    rightSideMatrix[row*matrixSize + col] = UINT16_MAX;
                }
                else {
                    rightSideMatrix[row*matrixSize + col] = originalMatrix[row*matrixSize + col];
                }
            }
        }
        rightSideMatrix[solution[subtourIndex].back().col * matrixSize + solution[subtourIndex][0].row] = UINT16_MAX;
        right = new Node(matrixSize, rightSideMatrix, solution);
    }

    Node getRight() {
        return *right;
    }

    Node getLeft() {
        return *left;
    }

};

Node::Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution) : solution(solution) {
    matrixSize = amountOfCities;
    originalMatrix = matrix;
}

Node::Node(int amountOfCities,uint16_t* matrix, std::vector<std::vector<Selection>> solution, uint16_t lb) : solution(solution){
    matrixSize = amountOfCities;
    originalMatrix = matrix;
    lowerBound = lb;
}



class Tree {
    Node *start;
    int lowerBound;
    int upperBound;
    std::vector<Node> nodes;
};

void getMatrixFromFile(std::string filename, int matrixSize, uint16_t* originalMatrix) {
    std::vector<int> xValues(matrixSize);
    std::vector<int> yValues(matrixSize);
    int n, m, i = 0;
    std::ifstream read(filename);
    while (i < matrixSize && read >> n >> m) {
        xValues[i] = n;
        yValues[i] = m;
        i++;
    }
   
    for (int i = 0; i < matrixSize; i++) {
        for (int j = 0; j < matrixSize; j++) {
            if (i == j) {
                originalMatrix[i*matrixSize + j] = UINT16_MAX;
            }
            else {
                originalMatrix[i*matrixSize + j] = (uint16_t)sqrt(pow(xValues[i]-xValues[j], 2) + pow(yValues[i] - yValues[j],2));
            }
        }
    }
}


void pruneNodeList(std::vector<Node>& nodeList, int upperBound) {
    int i = 0;
    while (i < nodeList.size()) {
        if (nodeList[i].lowerBound >= upperBound) {
            nodeList.erase(nodeList.begin() + i);
        }
        else {
            ++i;
        }
    }
}

int calculateSolution(uint16_t* originalMatrix, std::vector<std::vector<Selection>>& solution, int matrixSize) {
    int sum = 0;
    std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
        std::for_each(v.begin(), v.end(), [&](Selection s) {
            sum += s.val;
        });
    });
    sum += originalMatrix[solution[0].back().col * matrixSize + solution[0][0].row];
    return sum;
}

int findNextEvaluation(std::vector<Node>& nodeList) {
    Node eval = nodeList[0];
    int index = 0;
    int currentIndex = 0;
    std::for_each(nodeList.begin(), nodeList.end(), [&](Node n) {
        if (n.lowerBound < eval.lowerBound) {
            currentIndex = index;
        }
        index++;
    });
    return currentIndex;
}

int main() {
    int amountOfCities = 500;
    uint16_t* originalMatrix = (uint16_t*)malloc(sizeof(uint16_t)*amountOfCities*amountOfCities);
    getMatrixFromFile("cities.txt", amountOfCities, originalMatrix);
    std::vector<std::vector<Selection>> solution;
    Node node = Node(amountOfCities, originalMatrix, solution);
    std::vector<Node> nodeList;

    std::vector<std::vector<Selection>> currentSolution;
    std::vector<std::vector<Selection>> tempSolution;
    int upperBound = UINT16_MAX;
    int count = 0;
    clock_t t1, t2;
    t1 = clock();
    while (!node.isSolutionFound()) {
        node.evaluate();
        count++;
        if (!node.isSolutionFound()) {
            nodeList.push_back(node.getLeft());
            node = node.getRight();
        }
    }
    currentSolution = node.getSolution();
    upperBound = calculateSolution(originalMatrix, currentSolution, amountOfCities);
    pruneNodeList(nodeList, upperBound);
    while (!nodeList.empty()) {
        int nextNode = findNextEvaluation(nodeList);
        node = nodeList[nextNode];
        nodeList.erase(nodeList.begin() + nextNode);
        while (!node.isSolutionFound() && !node.invalidSolution) {
            node.evaluate();
            count++;
            if (!node.isSolutionFound() && !node.invalidSolution && node.lowerBound < upperBound) {
                if (node.getLeft().lowerBound < upperBound) {
                    nodeList.push_back(node.getLeft());
                }
                node = node.getRight();
            }
            else if (node.lowerBound >= upperBound) {
                break;
            } else if(!node.invalidSolution){
                tempSolution = node.getSolution();
                int value = calculateSolution(originalMatrix, tempSolution, amountOfCities);
                if (value < upperBound) {
                    currentSolution = tempSolution;
                    upperBound = value;
                    pruneNodeList(nodeList, upperBound);
                }
            }
        }

    }
    t2 = clock();
    float diff((float)t2 - (float)t1);
    std::cout << "time spent: " << diff << "\n";
    std::cout << "count = " << count << "\n";
    std::cout << "\n Solution :\n";
    std::for_each(currentSolution.begin(), currentSolution.end(), [&](std::vector<Selection> v) {
        std::for_each(v.begin(), v.end(), [&](Selection s) {
            std::cout << "(" << s.row << ", " << s.col << ") ";
        });
        std::cout << "\n";
    });
    std::cout << "\nResult = " << upperBound;
}

useful links

  1. https://www.youtube.com/watch?v=-cLsEHP0qt0 - Branch and Bound explanation from minute 34:30.
  2. https://www.cilkplus.org/tutorial-reducer-min - using the min reducer in cilk_for