Open main menu

CDOT Wiki β

Changes

WhySoSerial?

927 bytes added, 21:08, 11 April 2017
no edit summary
== WordTranslator - Parallel Approach ==
 
[https://github.com/Pooch11/DPS915/tree/Parallel Parallel Solution Branch of GitHub ]
Issues arose when attempting to change data within a kernel on device memory
1. Kernels do not accept complex objects from the host ( Maps, vectors, strings)
 
2. Kernels load and execute on sequential memory. Device Pointers
 
3. Replacing the data using a character pointer (char*) proved exceedingly difficult.
 
Thus the solution had to be modified in order to accommodate these issues. A couple of options were available to overcome this.
1. We will match a pattern found by the kernel
 
2. We will record where the result was found and the position that we found this match. This would allow another more sophisticated device function to make these translations. This function on a CPU would be at most O(n^2)
 
3. Instead, introduce a structure to manage our complex data. (See Below)
 
[[File:Struct.PNG]]
Instead of using a result array __ballot(PREDICATE) could be considered (More research on this to be done).
 MPI - More knowledge of Message Passing Interface might be needed for a full solution to this problem.See [https://en.wikipedia.org/wiki/Message_Passing_Interface#Dynamic_process_management MPI Wikipedia]
=== Assignment 3 ===
 
==Optimizations==
Optimizations were different in nature than typical Optimizations for Kernel Launches.
'''Launch Configurations'''
First the configuration was optimized. At first threads were launched based on the length of the target text. Later, it was found more useful to launch as many threads as the device could hold for a single block.
[[File:Configurations.PNG]]
'''Pattern Matching'''
The algorithm used in this approach could be revised considerably since there is lots of overlap between the characters checked by threads.
 
'''''Knuth Morris-Pratt'''''
[https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm Link to Wiki explanation]
'''''Horspool'''''
[https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm Link to Wiki explanation]
 
'''''Shift Or'''''
 
[https://en.wikipedia.org/wiki/Bitap_algorithm Link to Wiki explanation]
 
'''Runtime'''
[[File:Runtime_CPUvsGPU.png]]
'''''Shift Or'''''
[[File:CPUvsGPU_Timing_Matching.PNG]]
'''Streaming Kernel Launch'''
'''Streaming'''
Instead of making changes to increase the efficiency of the Kernel, changes were made to incorporate the spirit of the original solution. Taking in multiple words from a lexicon and making changes to a large text.
Thus , streaming the launch of this kernel with alternate can be introduce to split the target data into manageable partitions. In addition, if the patterns are the same size, we can assist the program in finding multiple words in less timeuse streaming to look for more than one word concurrently.  [[File:Streaming_Kernel.PNG]]
76
edits