GPU610/Team AGC

As you can see, most of the time is spent in the 3rd and 4th blocks, which is where I will begin optimization.
Since the number of npoints is 800 in total, divided into separate CPU threads, we will never reach the maximum number of threads per block, 1024.

