From CDOT Wiki
Revision as of 16:44, 11 April 2016 by Jsiu3 (talk | contribs) (How DAAL Works)
Jump to: navigation, search

GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary

Intel Data Analytics Acceleration Library (DAAL)

Team Member

  1. Luong Chuong
  2. Jacky Siu
  3. Woodson Delhia

Intro OLD

Local DAAL Examples Location: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016\windows\daal\examples


New Data:

Parser code:

Low Order Moments:

Our goal is to parse & process this crime data and to add more meaning to said data. Using various parallel techniques taught in the course and comparing them via the DAAL library.


What is Big Data

Big Data is data that is so big that traditional methods of data processing fail to keep up. A lot of the time this type of data is related to human behavior measurable by computers. This maybe trends, web analytics etc. With so many people using technology, data gathered this way is becoming massive. This is one of the reasons for the rise of Big Data.

What is DAAL

DAAL is a C++ & Java / Scala library for data analytics. It's similar to MKL(Math Kernel Library) with some differences:

  • MKL focuses on computation. DAAL focuses on the entire data flow (aquisition, transformation, processing).
  • Optimized for all kinds of Intel based devices (from data center to home computers)

DAAL supports 3 processing modes

  • Offline Processing (Batch) - Data can fit in memory, data can be processed all at once.
  • Online Processing (Streaming) - Data is too big for memory, DAAL processes the data in chunks and combine the partial results for the final result.
  • Distributed processing - Distributes data processing. DAAL has not bound the communication method and leaves it to the developer (Hadoop, Spark, MPI etc).

DAAL Data Flow.

How DAAL Works


Code Examples

Batch Sorting

CSV Data:

-55.558252,63.051427,-27.793776, -75.622534,61.212279,-16.283311, -86.747071,-28.132241,-17.824316, -34.172101,-51.404172,14.670925, -61.506308,48.248030,-99.235341, 9.746765,-89.879258,55.561778, 48.896723,-32.648097,48.313603, -15.346015,9.769776,-33.483281, 56.726081,-87.272631,8.724224, -1.926802,54.960580,-78.723429, 45.237223,-79.764218,-47.271926, 84.138339,11.547818,-92.962952, 46.711824,-42.623510,-34.664673, 55.813112,19.803475,4.807766, -55.474098,-72.163755,89.425736, -7.566596,-77.829218,58.630172, -76.081937,-12.089445,-44.065054, -25.888944,46.425499,-37.515164, -30.201387,-16.237217,-50.716591, -88.085869,60.136249,54.812866


/* file: sorting_batch.cpp
* Copyright 2014-2016 Intel Corporation All Rights Reserved.*/

#include "daal.h"
#include "service.h"

using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace std;

/* Input data set parameters */
string datasetFileName = "../data/batch/sorting.csv";

int main(int argc, char *argv[])
    checkArguments(argc, argv, 1, &datasetFileName);

    /* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv file */
    FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);

    /* Retrieve the data from the input file */

    /* Create algorithm objects to sort data using the default (radix) method */
    sorting::Batch<> algorithm;

    /* Print the input observations matrix */
    printNumericTable(dataSource.getNumericTable(), "Initial matrix of observations:");

    /* Set input objects for the algorithm */
    algorithm.input.set(sorting::data, dataSource.getNumericTable());

    /* Sort data observations */

    /* Get the sorting result */
    services::SharedPtr<sorting::Result> res = algorithm.getResult();

    printNumericTable(res->get(sorting::sortedData), "Sorted matrix of observations:");

    return 0;


DAAL Sort Batch.

The data is sorted from smallest to largest per column.

Data Blocks:

DAAL Sort Block. DAAL Sort Block.

"Blocks" of data are being loaded 5 rows at a time. This allows us to easily section off data to process. This is also one way of distributing data to MPI etc.

Useful Link