Open main menu

CDOT Wiki β


GPU621/Threadless Horsemen

970 bytes added, 13:14, 26 November 2018
OpenMP vs Julia Results
* Set the optimization level (default level is 2 if unspecified or 3 if used without a level -O)
 == Vectorization == *We want to briefly touch on vectorization{| class="wikitable"|-! Using Vectorization! Expanded axpy function|-|<source>function axpy(a,x,y) @simd for i=1:length(x) @inbounds y[i] += a*x[i] endend n = 1003x = rand(Float32,n)y = rand(Float32,n)axpy(1.414f0, x, y)</source>|<source>function axpy(a::Float32, x::Array{Float32,1}, y::Array{Float32,1}) n=length(x) i = 1 @inbounds while i<=n t1 = x[i] t2 = y[i] t3 = a*t1[i] t4 = t2+t3 y[i] = t4 i += 1 endend</source>|} * The @simd macro gives the compiler license to vectorize without checking whether it will change the program's visible behavior.* The vectorized code will behave as if the code were written to operate on chunks of the arrays.* @inbounds turns off subscript checking that might throw an exception.* Make sure your subscripts are in bounds before using it or you might corrupt your Julia session.
== Conclusion ==