Changes

Jump to: navigation, search

Ghost Cells

16,358 bytes added, 05:03, 17 February 2019
Analysis
==== Tony ====
Subject: Jacobi's method for Poisson's equation
===== Source Code =====
{| class="wikitable mw-collapsible mw-collapsed"
! poissan.h
|-
|
<source>
#ifndef POISSON_H
#define POISSON_H
#include <fstream>
 
namespace DPS{
class Poisson {
size_t nRowsTotal;
size_t nColumns;
float* data;
int bufferSide;
 
void update (size_t startRow, size_t endRow, const float wx, const float wy);
void bufferSwitch(){ bufferSide = 1 - bufferSide; };
 
public:
Poisson(std::ifstream& ifs);
Poisson(const size_t r, const size_t c, float* d);
~Poisson(){ delete[] data; };
float* operator()(const size_t iteration, const float wx, const float wy);
float* operator()(const size_t iteration){
return operator()(iteration,0.1,0.1);
}
void show(std::ostream& ofs) const;
};
}
#endif
 
</source>
|}
{| class="wikitable mw-collapsible mw-collapsed"
! poissan.cpp
|-
|
<source>
#include <cstring>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <string>
#include "poisson.h"
 
namespace DPS{
Poisson::Poisson(std::ifstream& ifs){
std::string line;
bufferSide = 0;
 
/* find number of columns */
std::getline(ifs,line);
for (size_t i = 0 ; i < line.size() ; i++){
if(line[i]==' ') nColumns++;
}
nColumns++;
 
/* find number of rows */
nRowsTotal++; /* already fetched one */
while(std::getline(ifs,line))
nRowsTotal++;
ifs.clear();
 
try{
data = new float[nColumns * nRowsTotal * 2];
}
catch (...){
throw std::runtime_error("Failed to Allocate Memory");
}
 
/* readin data */
ifs.seekg(0,ifs.beg);
std::cout << ifs.tellg() << std::endl;
for (size_t i = 0 ; i < nRowsTotal * nColumns ; i++) {
ifs >> data[i];
}
 
std::memset(data+nRowsTotal*nColumns,0,nRowsTotal*nColumns*sizeof(float));
}
 
Poisson::Poisson(const size_t r, const size_t c, float* d){
bufferSide = 0;
nRowsTotal = r;
nColumns = c;
try{
data = new float[r*c*2];
}
catch (...){
throw std::runtime_error("Failed to Allocate Memory");
}
std::memcpy(data,d,r*c*sizeof(float));
std::memset(data+r*c,0,r*c*sizeof(float));
}
 
void Poisson::update (size_t startRow, size_t endRow, const float wx, const float wy){
float* x_new = data + (1-bufferSide)*nRowsTotal*nColumns;
float* x_old = data + bufferSide*nRowsTotal*nColumns;
for (size_t i = startRow; i <= endRow; i++)
for (size_t j = 1; j < nColumns - 1; j++)
x_new[i * nColumns + j] = x_old[i * nColumns + j]
+ wx * (x_old[(i + 1) * nColumns + j] + x_old[(i - 1) * nColumns + j]
- 2.0f * x_old[i * nColumns + j])
+ wy * (x_old[i * nColumns + j + 1] + x_old[i * nColumns + j - 1]
- 2.0f * x_old[i * nColumns + j]);
}
 
float* Poisson::operator()(const size_t nIterations, const float wx, const float wy){
for (size_t i = 0; i < nIterations; i++) {
update(0, nRowsTotal-1, wx, wy);
bufferSwitch();
}
return data;
}
 
void Poisson::show(std::ostream& ofs) const{
ofs << std::fixed << std::setprecision(1);
for (size_t j = 0; j < nColumns ; j++) {
for (size_t i = 0 ; i < nRowsTotal ; i++)
ofs << std::setw(8) << data[ bufferSide*nColumns*nRowsTotal + i * nColumns + j];
ofs << std::endl;
}
}
}
 
</source>
|}
{| class="wikitable mw-collapsible mw-collapsed"
! main.cpp
|-
|
<source>
// based on code from LLNL tutorial mpi_heat2d.c
// Master-Worker Programming Model
// Chris Szalwinski - 2018/11/13
// Adopted by Tony Sim - 2019/02/16
#include <iostream>
#include <fstream>
#include <iomanip>
#include <cstdlib>
#include <stdexcept>
 
#include "poisson.h"
 
// solution constants
const size_t NONE = 0;
const size_t MINPARTITIONS = 1;
const size_t MAXPARTITIONS = 7;
// weights
const float wx = 0.1f;
const float wy = 0.1f;
 
int main(int argc, char** argv) {
if (argc != 4) {
std::cerr << "*** Incorrect number of arguments ***\n";
std::cerr << "Usage: " << argv[0]
<< " input_file output_file no_of_iterations\n";
return 1;
}
 
std::ifstream input(argv[1]);
std::ofstream output(argv[2]);
std::ofstream temp("init.csv");
 
if(!input.is_open()){
std::cerr << "Invalid Input File" << std::endl;
return 2;
}
if(!output.is_open()){
std::cerr << "Invalid Output File" << std::endl;
return 2;
}
 
DPS::Poisson* p = nullptr;
try{
p = new DPS::Poisson(input);
}
catch(std::exception& e){
std::cerr << "Error: " << e.what() << std::endl;
}
 
p->show(temp);
 
size_t nIterations = std::atoi(argv[3]);
 
(*p)(nIterations);
 
// write results to file
p->show(output);
 
delete p;
 
}
 
</source>
|}
===== Introduction =====
The presented code simulates heat map using Jacobi's method for Poisson's equation. It is represented in a 2D array, and each element updates its value based on the adjacent elements at a given moment. Each iteration represent one instance in time. By repeating the calculation over the entire array through multiple iterations, we can estimate the state of the heat transfer after a given time interval.
 
===== Profiling =====
The profiling was conducted using a data set of 79 rows and 205 columns over 150000 iterations.
{| class="wikitable mw-collapsible mw-collapsed"
! Flat profile
|-
|
 
Flat profile:
 
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
98.57 2.75 2.75 150000 18.33 18.33 DPS::Poisson::update(unsigned long, unsigned long, float, float)
0.00 2.75 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN3DPS7PoissonC2ERSt14basic_ifstreamIcSt11char_traitsIcEE
0.00 2.75 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
 
 
|}
{| class="wikitable mw-collapsible mw-collapsed"
! Call graph
|-
|
Call graph
 
 
granularity: each sample hit covers 2 byte(s) for 0.36% of 2.75 seconds
 
index % time self children called name
2.75 0.00 150000/150000 DPS::Poisson::operator()(unsigned long, float, float) [2]
[1] 100.0 2.75 0.00 150000 DPS::Poisson::update(unsigned long, unsigned long, float, float) [1]
-----------------------------------------------
<spontaneous>
[2] 100.0 0.00 2.75 DPS::Poisson::operator()(unsigned long, float, float) [2]
2.75 0.00 150000/150000 DPS::Poisson::update(unsigned long, unsigned long, float, float) [1]
-----------------------------------------------
0.00 0.00 1/1 __libc_csu_init [21]
[10] 0.0 0.00 0.00 1 _GLOBAL__sub_I__ZN3DPS7PoissonC2ERSt14basic_ifstreamIcSt11char_traitsIcEE [10]
-----------------------------------------------
0.00 0.00 1/1 __libc_csu_init [21]
[11] 0.0 0.00 0.00 1 _GLOBAL__sub_I_main [11]
-----------------------------------------------
 
 
Index by function name
 
[10] _GLOBAL__sub_I__ZN3DPS7PoissonC2ERSt14basic_ifstreamIcSt11char_traitsIcEE (poisson.cpp) [11] _GLOBAL__sub_I_main (main.cpp) [1] DPS::Poisson::update(unsigned long, unsigned long, float, float)
 
 
|}
 
=====Analysis=====
given 98.57 percent of time is spent on the update() function, it is considered the hotspot.
Total time taken was 2.75.
 
If we consider a GPU environment with 1000 cores, we can estimate the following speedup:
S1000 = 1/(1-.9857 + .9857/1000) = 65.00
In fact, the speed will decrease from 2.75 seconds to 0.0450 seconds.
 
As each iteration depends on the product of the previous iteration, there is a dependency resolution that might hamper the parallel process.
Consideration may also be extended to resolving ghost cells across different SMX while using the device global memory as the transfer pipeline.
 
==== Robert ====
Subject===== Multi Sampling Anti Aliasing =========== Source Files ====== {| class="wikitable mw-collapsible mw-collapsed"! main.cpp|-|<source>#include <cstdint>#include <iostream>#include <algorithm> #define STB_IMAGE_IMPLEMENTATION#include "stb_image.h"#define STB_IMAGE_WRITE_IMPLEMENTATION#include "stb_image_write.h" #include "vec3.h" struct Point { int x; int y;};uint8_t* msaa(const uint8_t* input, uint8_t* output, int width, int height, int channels, int samples) {  // directions is (samples * 2 + 1) ^ 2 int totalPoints = (samples * 2 + 1) * (samples * 2 + 1); Point* directions = new Point[totalPoints]; size_t idx = 0; for (int i = -samples; i <= samples; i++) { for (int j = -samples; j <= samples; j++) { directions[idx].x = i; directions[idx].y = j; idx++; } }  int x, y, cx, cy; Vec3<int> average; for (size_t i = 0; i < width*height; i++) { x = i % width * channels; y = i / width * channels; for (size_t j = 0; j < totalPoints; j++) { cx = x + directions[j].x * channels; cy = y + directions[j].y * channels; cx = std::clamp(cx, 0, width* channels); cy = std::clamp(cy, 0, height* channels); average.add(input[width * cy + cx], input[width * cy + cx + 1], input[width * cy + cx + 2]); } average.set(average.getX() / totalPoints, average.getY() / totalPoints, average.getZ() / totalPoints); output[(width * y + x)] = average.getX(); output[(width * y + x) + 1] = average.getY(); output[(width * y + x) + 2] = average.getZ(); average.set(0, 0, 0); } delete[] directions; return output;} int main(int argc, char* argv[]) { if (argc != 5) { std::cerr << argv[0] << ": invalid number of arguments\n"; std::cerr << "Usage: " << argv[0] << " input output sample_size passes \n"; system("pause"); return 1; } int width, height, channels; uint8_t* rgb_read = stbi_load(argv[1], &width, &height, &channels, STBI_rgb); if (channels != 3) { std::cout << "Incorrect channels" << std::endl; system("pause"); return 2; } int samples = std::atoi(argv[3]); int passes = std::atoi(argv[4]); uint8_t* rgb_write = new uint8_t[width*height*channels];  rgb_write = msaa(rgb_read, rgb_write, width, height, channels, samples); for (int i = 1; i < passes; i++) { rgb_write = msaa(rgb_write, rgb_write, width, height, channels, samples); } stbi_write_png(argv[2], width, height, channels, rgb_write, width*channels); stbi_image_free(rgb_read); delete[] rgb_write; std::cout << "AA Done using " << samples << " sample size" << " over " << passes << " passes" << std::endl; system("pause"); return 0;}</source>|}{| class="wikitable mw-collapsible mw-collapsed"! vec3.h|-|<source>#ifndef VEC3_H#define VEC3_H#include <iostream>template <class T>class Vec3 {private: T x; T y; T z;public: Vec3() { x = 0; y = 0; z = 0; }; Vec3(T x_, T y_, T z_) { x = x_; y = y_; z = z_; } void set(const T &x_, const T &y_, const T &z_) { x = x_; y = y_; z = z_; } void add(const T &x_, const T &y_, const T &z_) { x += x_; y += y_; z += z_; } T getX() const { return x; } T getY() const { return y; } T getZ() const { return z; }  void setX(const T &x_) { x = x_; } void setY(const T &y_) { y = y_; } void setZ(const T &z_) { z = z_; }  static T dot(const Vec3& vec1, const Vec3& vec2) { return vec1.x * vec2.x + vec1.y * vec2.y + vec1.z * vec2.z; } T dot(const Vec3 &vec) const { return x * vec.x + y * vec.y + z * vec.z; } void display(std::ostream& os) { os << "x: " << x << ", y: " << y << ", z: " << z << "\n"; } }; #endif // !VEC3_H </source>|}[https://github.com/nothings/stb/blob/master/stb_image.h stb_image_write.h][https://github.com/nothings/stb/blob/master/stb_image.h stb_image.h] ====== Introduction ======For my selection I chose to do Anti Aliasing since I see it a lot in video games but I never really knew how it worked. There are other anti aliasing methods like FXAA which is fast approximate anti aliasing but it seemed a lot more complicated than MSAA. The way I approached this problem is by getting the color of the pixels around a pixel. In you can specify the distance it will search in the application flags. In my implementation you specify an input file, output file, the radius of pixels to sample and how many passes to take on the image. In my tests the command line options I used was an image I made in paint with 4 sample size and 4 passes. {| class="wikitable mw-collapsible mw-collapsed"! Before|-|[[File:MSAABefore.png]]|}{| class="wikitable mw-collapsible mw-collapsed"! After|-|[[File:MSAAAfter.png]].|}====== Profiling ======{| class="wikitable mw-collapsible mw-collapsed"! Profiling |-|<source>Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 85.72 0.18 0.18 msaa(unsigned char const*, unsigned char*, int, int, int, int) 14.29 0.21 0.03 1 30.00 30.00 stbi_zlib_compress 0.00 0.21 0.00 127820 0.00 0.00 stbiw__zlib_flushf(unsigned char*, unsigned int*, int*) 0.00 0.21 0.00 96904 0.00 0.00 stbiw__zhash(unsigned char*) 0.00 0.21 0.00 5189 0.00 0.00 stbi__fill_bits(stbi__zbuf*) 0.00 0.21 0.00 2100 0.00 0.00 stbiw__encode_png_line(unsigned char*, int, int, int, int, int, int, signed char*) 0.00 0.21 0.00 2014 0.00 0.00 stbiw__sbgrowf(void**, int, int) [clone .constprop.58] 0.00 0.21 0.00 38 0.00 0.00 stbi__get16be(stbi__context*) 0.00 0.21 0.00 19 0.00 0.00 stbi__get32be(stbi__context*) 0.00 0.21 0.00 3 0.00 0.00 stbi__skip(stbi__context*, int) 0.00 0.21 0.00 3 0.00 0.00 stbiw__wpcrc(unsigned char**, int) 0.00 0.21 0.00 3 0.00 0.00 stbi__stdio_read(void*, char*, int) 0.00 0.21 0.00 3 0.00 0.00 stbi__zbuild_huffman(stbi__zhuffman*, unsigned char const*, int) 0.00 0.21 0.00 2 0.00 0.00 stbi__mad3sizes_valid(int, int, int, int) 0.00 0.21 0.00 1 0.00 0.00 _GLOBAL__sub_I_stbi_failure_reason 0.00 0.21 0.00 1 0.00 0.00 stbi__getn(stbi__context*, unsigned char*, int) 0.00 0.21 0.00 1 0.00 0.00 stbi__readval(stbi__context*, int, unsigned char*) 0.00 0.21 0.00 1 0.00 0.00 stbi__load_main(stbi__context*, int*, int*, int*, int, stbi__result_info*, int) 0.00 0.21 0.00 1 0.00 0.00 stbi__parse_zlib(stbi__zbuf*, int) 0.00 0.21 0.00 1 0.00 0.00 stbi__malloc_mad3(int, int, int, int) 0.00 0.21 0.00 1 0.00 0.00 stbi__parse_png_file(stbi__png*, int, int) 0.00 0.21 0.00 1 0.00 0.00 stbi__start_callbacks(stbi__context*, stbi_io_callbacks*, void*) 0.00 0.21 0.00 1 0.00 0.00 stbi__decode_jpeg_header(stbi__jpeg*, int) 0.00 0.21 0.00 1 0.00 0.00 stbi__compute_huffman_codes(stbi__zbuf*) 0.00 0.21 0.00 1 0.00 0.00 stbi__load_and_postprocess_8bit(stbi__context*, int*, int*, int*, int) 0.00 0.21 0.00 1 0.00 0.00 stbi_load_from_file 0.00 0.21 0.00 1 0.00 30.00 stbi_write_png_to_mem 0.00 0.21 0.00 1 0.00 0.00 stbi_zlib_decode_malloc_guesssize_headerflag</source>|} ====== Conclusion ======Since the <code>msaa</code> function I wrote is a hotspot of the program I would suggest offloading part of it to a GPU, more specifically the part that finds the average of colors of the nearby pixels. That part also does not depend on previous iterations to finish so it is a prime candidate for parallelization. 
==== Inna ====
Flat profile for compression:
 
[[File:BookComp.jpg|1300px]]
Flat profile for compression:
 
[[File:textCompression.jpg|1300px]]
Flat profile for compression:
 
[[File:FireComp.jpg|1300px]]
70
edits

Navigation menu