Changes

Jump to: navigation, search

DPS921/Franky

8,381 bytes added, 11:21, 26 November 2018
Source
# [mailto:ysim2@myseneca.ca?subject=DPS921 Yoosuk Sim]
# [mailto:rdittrich@myseneca.ca?subject=DPS921 Robert Dittrich]
# [mailto:achisholm1@myseneca.ca?subject=GPU921 Alex Chisholm]
# [mailto:ysim2@myseneca.ca;rdittrich@myseneca.ca;?subject=DPS921 eMail All]
==Case-study==
The test was performed over 99999999 data points with learn rate of .001 and 100 times calculationsteps of gradient descent on each data point. (i.e. <sourcett style="background-color:#f8f9fa;border: 1px solid #eaecf0">$<cmd> 99999999 .001 100</sourcett> ) Also, after each case, we performed an analysis using vTune Amplifier to understand the performance issues and compare them. An example of the algorithm at work can be see here. [[File:lr-demo.gif]] 
===Single-thread===
Single threaded approach to finding the line of best fit.
====Code====
<source>
#include <cmath>
#include <iostream>
#include <random>
return 0;
}
 
</source>
====Performance====
As expected the gradient descent function running epoches*N times would take the longest to execute. It is interesting to see '''std::log''' to have the second longest executing time. This is because when the data is being generated a random_normal function is being called and it seems like they use a lot of logarithm in it.
 
[[File:lr-cpp1.png]]
 
[[File:lr-cpp2.png]]
===Optimized with DAAL===
Intel has a library called the Data Analytics Acceleration Library. It is used to solve big data problems, and the library contains optimized algorithmic building blocks to efficient solutions.
The library includes algorithms to solve all sorts of machine learning problems, including linear regression.
 
Two sets of data were generated from the serial version of the regression algorithm. The serial version was run twice, and the x[N], and y[N] arrays from the random normal number generator were written to two different csv files called test.csv, and train.csv. The x[N] and y[N] values in these two files follow a normal distribution as defined in the serial algorithm code, with N = 99,999,999.
 
The function called "lin_reg_norm_eq_dense_batch.cpp" in the DAAL library was manipulated to test the linear regression model. First, the function "trainModel()" is called. This function reads the "train.csv" data,
and then merges the columns based on the number of independent and dependent variables, in this case it is simple regression with 1 dependent and 1 independent variable. An optimized algorithm is then initialized, training data and dependent values are passed in, and trained based on the data within the csv file. A training result is produced, which is a line of best fit model for the data. The "testModel()" function is then called, which initialized a test algorithm. The algorithm works by passing the dependent variable into the training model, and the independent values are predicted.
 
The model predicted y = 0.5x + 1, which matches nearly perfectly with the random data which was stored to both the train.csv, and test.csv files.
 
 
====Code====
<source>
/* file: lin_reg_norm_eq_dense_batch.cpp */
/*******************************************************************************
* Copyright 2014-2018 Intel Corporation.
*
* This software and the related documents are Intel copyrighted materials, and
* your use of them is governed by the express license under which they were
* provided to you (License). Unless the License provides otherwise, you may not
* use, modify, copy, publish, distribute, disclose or transmit this software or
* the related documents without Intel's prior written permission.
*
* This software and the related documents are provided as is, with no express
* or implied warranties, other than those that are expressly stated in the
* License.
*******************************************************************************/
 
/*
! Content:
! C++ example of multiple linear regression in the batch processing mode.
!
! The program trains the multiple linear regression model on a training
! datasetFileName with the normal equations method and computes regression
! for the test data.
!******************************************************************************/
 
/**
* <a name="DAAL-EXAMPLE-CPP-LINEAR_REGRESSION_NORM_EQ_BATCH"></a>
* \example lin_reg_norm_eq_dense_batch.cpp
*/
 
#include "daal.h"
#include "service.h"
 
using namespace std;
using namespace daal;
using namespace daal::algorithms::linear_regression;
 
/* Input data set parameters */
string trainDatasetFileName = "train.csv";
string testDatasetFileName = "test.csv";
 
const size_t nFeatures = 1; /* Number of features in training and testing data sets */
const size_t nDependentVariables = 1; /* Number of dependent variables that correspond to each observation */
 
void trainModel();
void testModel();
 
training::ResultPtr trainingResult;
prediction::ResultPtr predictionResult;
 
 
int main(int argc, char *argv[])
{
//checkArguments(argc, argv, 2, &trainDatasetFileName, &testDatasetFileName);
trainModel();
testModel();
system("pause");
return 0;
}
 
void trainModel()
{
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv file */
FileDataSource<CSVFeatureManager> trainDataSource(trainDatasetFileName,
DataSource::notAllocateNumericTable,
DataSource::doDictionaryFromContext);
 
/* Create Numeric Tables for training data and dependent variables */
NumericTablePtr trainData(new HomogenNumericTable<>(nFeatures, 0, NumericTable::doNotAllocate));
NumericTablePtr trainDependentVariables(new HomogenNumericTable<>(nDependentVariables, 0, NumericTable::doNotAllocate));
NumericTablePtr mergedData(new MergedNumericTable(trainData, trainDependentVariables));
 
/* Retrieve the data from input file */
trainDataSource.loadDataBlock(mergedData.get());
 
/* Create an algorithm object to train the multiple linear regression model with the normal equations method */
training::Batch<> algorithm;
 
/* Pass a training data set and dependent values to the algorithm */
algorithm.input.set(training::data, trainData);
algorithm.input.set(training::dependentVariables, trainDependentVariables);
 
/* Build the multiple linear regression model */
algorithm.compute();
 
/* Retrieve the algorithm results */
trainingResult = algorithm.getResult();
printNumericTable(trainingResult->get(training::model)->getBeta(), "Linear Regression coefficients:");
}
 
void testModel()
{
 
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the test data from a .csv file */
FileDataSource<CSVFeatureManager> testDataSource(testDatasetFileName,
DataSource::doAllocateNumericTable,
DataSource::doDictionaryFromContext);
 
 
/* Create Numeric Tables for testing data and ground truth values */
NumericTablePtr testData(new HomogenNumericTable<>(nFeatures, 0, NumericTable::doNotAllocate));
NumericTablePtr testGroundTruth(new HomogenNumericTable<>(nDependentVariables, 0, NumericTable::doNotAllocate));
NumericTablePtr mergedData(new MergedNumericTable(testData, testGroundTruth));
 
/* Load the data from the data file */
testDataSource.loadDataBlock(mergedData.get());
/* Create an algorithm object to predict values of multiple linear regression */
prediction::Batch<> algorithm;
 
/* Pass a testing data set and the trained model to the algorithm */
algorithm.input.set(prediction::data, testData);
algorithm.input.set(prediction::model, trainingResult->get(training::model));
 
/* Predict values of multiple linear regression */
algorithm.compute();
 
/* Retrieve the algorithm results */
predictionResult = algorithm.getResult();
 
 
printNumericTable(predictionResult->get(prediction::prediction),
"Linear Regression prediction results: (first 10 rows):", 10);
printNumericTable(testGroundTruth, "Ground truth (first 10 rows):", 10);
}
</source>
 
====Performance====
[[File:DAALvtune.jpg]]
 
===Multi-thread===
This code takes the single-threaded version above and applies TBB to leverage the power of threading to increase performance.
 
The single-thread logic can be broken down into reduce multi-thread logic.
====Code====
<source>
std::normal_distribution<double> b_dist(1.0,0.2);
std::normal_distribution<double> x_dist(0.0,1);
#pragma omp parallel for
for(std::size_t i = 0; i < N; i++) {
m_real[i] = m_dist(generator);
return 0;
}
</source>
====Performance====
While much of the process are now using multithreading, there are still a good chunk of the code using serial region. Let's try to fix that.
 
[[File:lrmp2.png]]
 
====Fix====
Processes generating random numbers for our starting data points are taking up some time in the serial region. The random number generator uses a for-loop. Let's apply OpenMP worksharing to make it more efficient
 
<source>
#pragma omp parallel for
for(std::size_t i = 0; i < N; i++) {
m_real[i] = m_dist(generator);
b_real[i] = b_dist(generator);
c[i].s_x = x_dist(generator);
c[i].s_y = m_real[i] * c[i].s_x + b_real[i];
}
</source>
====Performance====
vTune Amplifier now shows more percentage of run-time utilizing multiple cores.
 
[[File:lrmp.png]]
==Source==
# [https://software.intel.com/en-us/vtune Intel vTune Amplifier Homepage]
# [https://software.intel.com/en-us/node/506154 Parallel_reduce()]
# [https://helloacm.com/cc-linear-regression-tutorial-using-gradient-descent/ Linear Regression In C++]
# [https://github.com/intel/daal/tree/daal_2019/algorithms/kernel/linear_regression Linear Regression Algorithms from Intel DAAL Library]
# [https://github.com/y2s82/franky Github repository of the code used]
70
edits

Navigation menu