MATLAB Answers

0

Why does my GTX Titan Black GPU underperform in double precision calculations in MATLAB R2015a?

I experience unexpectedly slow performance of the GPU in double precision benchmarks.

I have a fast PC (Intel i7-4790 3.6GHz, 16GB of 1600MHz memory, Windows 7 64bit, and a nVidia GeForce GTX Titan Black GPU card, in PCIe 3.0x16 slot, with 850W power supply. I have downloaded the video drivers and CUDA toolkit and installed matlab Parallel Computing Toolbox:

>> gpuDevice

ans =

CUDADevice with

properties:

Name: 'GeForce GTX TITAN Black'

Index: 1

ComputeCapability: '3.5'

SupportsDouble: 1

DriverVersion: 7

ToolkitVersion: 6.5000

MaxThreadsPerBlock: 1024

MaxShmemPerBlock: 49152

MaxThreadBlockSize: [1024 1024 64]

MaxGridSize: [2.1475e+09 65535 65535]

SIMDWidth: 32

TotalMemory: 6.4425e+09

AvailableMemory: 6.2105e+09

MultiprocessorCount: 15

ClockRateKHz: 980000

ComputeMode: 'Default'

GPUOverlapsTransfers: 1

KernelExecutionTimeout: 1

CanMapHostMemory: 1

DeviceSupported: 1

DeviceSelected: 1

I then downloaded the GPU benchmarking tool by by the MathWorks Parallel Computing Toolbox Team (version of Updated 05 Jan 2015), from http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench

and executed the “gpuBench”.

The results show that my GPU performs similarly to Quadro K6000 in single precision benchmarks (with deviations up to 40%, as expected: both the cards have the same no of CUDA cores but the memory bandwidth is higher for my Titan Black and the amount of memory is higher K6000)

However, the GeForce GTX Titan Black performs 4 times (!) slower than Quadro K6000 in the double precision benchmarks! This is unexpected for several reasons.

A) both cards are fairly similar:

Specification type K6000 / Titan Black

CUDA cores: 2880 / 2880

Clock: 902MHz /889MHz

Memory clock: 6 Gbps/ 7Gbps

Memory bandwidth: 288GB/s / 336GB/s

B) There are benchmarking tests done by the MathWorks

Parallel Computing Toolbox Team shown in the file “Older benchmarks for GPUs” attached. From those results, a GPU very similar to mine, GeForce GTX Titan (an

older GPU with 2688 CUDA cores, 837MHz clock, 6Gbps memory clock and 288GB/s memory bandwidth) shows benchmarks very much similar to Quadro K6000:

Card                        DOUBLE                         SINGLE

               Benchmark MTimes,Backlash, FFT,  MTimes,Backlash,FFT

K6000                       1092       421         160      3017      831         334

GTX Titan                  1106      352         150      2933      582         298

My GPU                      252      163         110      4221      994         409

These results indicate that my GPU card (GeForce GTX Titan Black) should be faster than or similar to the Quadro K6000. However, the performance in the double precision is terrible (4x slower).

1 Answer

Answer by MathWorks Support Team on 22 Apr 2015
 Accepted answer

In this particular case, double precision computing needs to be enabled which can be done using the NVIDIA Control Panel. The below external article show how this may be done.

http://forums.evga.com/When-to-Use-Double-Precision-under-NVIDIA-Control-Panel-Manage-3D-Settings-m2252867.aspx

In general, double precision can often be much slower across GPUs as some of them are optimized by design for single precision computation only and not scientific calculations involving double precision numbers.

As we are unable to provide recommendation for GPU hardware, please contact NVIDIA directly for further information on this disparity in performance. 

 

  0 Comments


Join the 15-year community celebration.

Play games and win prizes!

Learn more
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

MATLAB Academy

New to MATLAB?

Learn MATLAB today!