Message boards : Graphics cards (GPUs) : Tesla
Author | Message |
---|---|
Just in case somebody is interested how some exotic GPUs perform, I did a run on a Tesla K20c: | |
ID: 40419 | Rating: 0 | rate: / Reply Quote | |
Thanks Mumak! Really interesting :) | |
ID: 40420 | Rating: 0 | rate: / Reply Quote | |
Oh, I had ECC enabled on the Tesla. Switching off and giving it another run.. | |
ID: 40421 | Rating: 0 | rate: / Reply Quote | |
Excellent power/performance ratio for [13]SMX (2496CUDA) while computing FP32. You're GK110 120W operating points: 0.04w per core or 9.23W per 192CUDA SMX. 120W is 1024CUDA GTX960 domain. A 12 SMX 780 can sip 145W at 836MHz or lower. (reference base clock) | |
ID: 40422 | Rating: 0 | rate: / Reply Quote | |
You can use nvidia-smi to increase the application clocks. This will extract another 10% or so. | |
ID: 40423 | Rating: 0 | rate: / Reply Quote | |
Thanks for the hint. Current/default clock is 705 MHz, max should be 758. | |
ID: 40424 | Rating: 0 | rate: / Reply Quote | |
Decided to try max GPU clock (758 MHz), the WU has not finished yet. | |
ID: 40425 | Rating: 0 | rate: / Reply Quote | |
NOELIA_ot, 758 MHz, ECC off: 27k seconds | |
ID: 40434 | Rating: 0 | rate: / Reply Quote | |
I expected better performance from a GPU of this quality... There's no reason to do so. It's just the same GK110 chips as GTX780/Ti and Titan/Z. It gains energy efficiency by being run at very low clock speeds and voltages. To some extent thiis could easily be done on other cards as well. Although most will prefer a high performance state anyway. A comparison to a stock GTX960 sounds impressive, but that GPU is driven quite hard up to maximum voltages around 1.20 V and has a lot of room to run more efficiently for a minor performance loss (down to 1.10 - 1.00 V). Maybe a FP64 ACEMD app will be created for those specific high performance DPFP GPU's? Why? Even the super expensive Titan looses 2/3 of the maximum throughput in DP mode. If the app can get by with 32 bit it's always better to use only 32 bit. That's why "mixed precision" with 16 bit fp enhancements will become a topic for nVidia with Pascal. A valid reason would be to use new physical models which might not be possible in FP32. But I don't think it's the precision which limits, it's probably more often the flow control which makes these tasks better suited to CPUs. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 40589 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Tesla