Performance Comparisons: Half, Half2 and Float
A performance evaluation is conducted on an Nvidia L40, comparing the 100-iteration access times of device vectors with half, half2, and float types. Each vector was initialized with 1024*1024 elements, but for the half2 type, two elements were packed into a single vector entry. Hence, two randomness are tested for half2 type: random access per half2 and random access per half. Access Type Data Type Vector Size Allocated Memory Time (ms) Random half 1M 2MB 4....