I recently upgraded to a “GeForce GT 640” graphics card, which have Nvidia chipsets. As explained in the two previous posts, I reinstalled the version 5.5 CUDA package from Nvidia.
Now, I compiled the sample programs that come with CUDA 5.5, and tested matrix product computation performance on the GPU, or graphics card. I get 279 GFlops , about 30 times faster than with my previous GPU that did about 9 GFlops , so CUDA lives up to its promise …
$ /usr/local/cuda-5.5/samples/bin/x86_64/linux/release/matrixMulCUBLAS [ENTER]
[Matrix Multiply CUBLAS] – Starting…
GPU Device 0: “GeForce GT 640” with compute capability 3.0
MatrixA(320,640), MatrixB(320,640), MatrixC(320,640)
Computing result using CUBLAS…done.
Performance= 279.37 GFlop/s, Time= 0.469 msec, Size= 131072000 Ops
Computing result using host CPU…done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS