From CDOT Wiki
Revision as of 16:05, 11 April 2016 by Jsiu3 (talk | contribs) (Batch Sorting)
Jump to: navigation, search

GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary

Intel Data Analytics Acceleration Library (DAAL)

Team Member

  1. Luong Chuong
  2. Jacky Siu
  3. Woodson Delhia

Intro OLD

Local DAAL Examples Location: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016\windows\daal\examples


New Data:

Parser code:

Low Order Moments:

Our goal is to parse & process this crime data and to add more meaning to said data. Using various parallel techniques taught in the course and comparing them via the DAAL library.


DAAL is a C++ & Java / Scala library for data analytics. It's similar to MKL(Math Kernel Library) with some differences:

  • MKL focuses on computation. DAAL focuses on the entire data flow (aquisition, transformation, processing).
  • Optimized for all kinds of Intel based devices (from data center to home computers)

DAAL supports 3 processing modes

  • Offline Processing (Batch) - Data can fit in memory, data can be processed all at once.
  • Online Processing (Streaming) - Data is too big for memory, DAAL processes the data in chunks and combine the partial results for the final result.
  • Distributed processing - Distributes data processing. DAAL has not bound the communication method and leaves it to the developer (Hadoop, Spark, MPI etc).

DAAL Data Flow.


Code Examples

Batch Sorting

CSV Data:

-55.558252,63.051427,-27.793776, -75.622534,61.212279,-16.283311, -86.747071,-28.132241,-17.824316, -34.172101,-51.404172,14.670925, -61.506308,48.248030,-99.235341, 9.746765,-89.879258,55.561778, 48.896723,-32.648097,48.313603, -15.346015,9.769776,-33.483281, 56.726081,-87.272631,8.724224, -1.926802,54.960580,-78.723429, 45.237223,-79.764218,-47.271926, 84.138339,11.547818,-92.962952, 46.711824,-42.623510,-34.664673, 55.813112,19.803475,4.807766, -55.474098,-72.163755,89.425736, -7.566596,-77.829218,58.630172, -76.081937,-12.089445,-44.065054, -25.888944,46.425499,-37.515164, -30.201387,-16.237217,-50.716591, -88.085869,60.136249,54.812866


/* file: sorting_batch.cpp
* Copyright 2014-2016 Intel Corporation All Rights Reserved.*/

#include "daal.h"
#include "service.h"

using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace std;

/* Input data set parameters */
string datasetFileName = "../data/batch/sorting.csv";

int main(int argc, char *argv[])
    checkArguments(argc, argv, 1, &datasetFileName);

    /* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv file */
    FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);

    /* Retrieve the data from the input file */

    /* Create algorithm objects to sort data using the default (radix) method */
    sorting::Batch<> algorithm;

    /* Print the input observations matrix */
    printNumericTable(dataSource.getNumericTable(), "Initial matrix of observations:");

    /* Set input objects for the algorithm */
    algorithm.input.set(sorting::data, dataSource.getNumericTable());

    /* Sort data observations */

    /* Get the sorting result */
    services::SharedPtr<sorting::Result> res = algorithm.getResult();

    printNumericTable(res->get(sorting::sortedData), "Sorted matrix of observations:");

    return 0;


DAAL Sort Batch.

The data is sorted from smallest to largest per column.

Data Blocks:

DAAL Sort Block. DAAL Sort Block.

"Blocks" of data are being loaded 5 rows at a time. This allows us to easily section off data to process. This is also one way of distributing data to MPI etc.

Useful Link