Author |
Message |
DaveSend message
Joined: 12 Jun 14 Posts: 12 Credit: 166,790,475 RAC: 0 Level
Scientific publications
|
Hi,
has anyone noticed that memory clock speeds were a lower while running GPUGRID on Maxwell cards?
I found this topic in the Einstein@Home community:
http://einstein.phys.uwm.edu/forum_thread.php?id=11044
I checked on my memory clocks and found out that my card was running at 1502MHz only instead of 1750MHz.
How does this affect performance of GPUGrid? |
|
|
|
Edit:
There's the original error description. The section "Increasing memory clock speed" should contain the information you need to correct it.
Additionally there's how to overclock the memory, should you wish to do so. The gains are rather limited at GPU-Grid, typically about 2% performance increase for a GTX970 going from 6 to 7 GHz memory clock. But it adds up nevertheless if many people do it. And I expect the gain to be larger for GTX980.
To apply the settings from nVidia Inspector upon each boot:
- in nVidia Inspector right click on "Create Clocks Shortcut" and choose "Create Clock Startup Task"
- or click "Create Clocks Shortcut" and execute the created link automatically via windows task scheduling
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Perhaps this is something particular from these little Maxwells with this project. I see it on my 970 SC from EVGA and on my EVGA 980 SC. I hoped the latter did better with the memory clock. Unfortunately that one runs at 1502.3MHz too, and it will not go higher. But a GTX780 costs almost 300 Euri more for the housing of the card and the 384 more CUDA cores.
____________
Greetings from TJ |
|
|
|
Update:
It seems I was wrong about the memory OC causing my blue screens! Encouraged by successful runs at 3.5 GHz over at Einstein I tried again and it's been running flawless since almost 2 days :)
So the message to get out to GM204 owners is just: Use nVidia Inspector to set 3.5 GHz in P2! It's less importasnt at GPU-Grid than Einstein or SETI, but still helps a bit.
@TJ: no, this is not limited to this project. So far I'm counting 4 CUDA programs, 1 OpenCL and one where I don't know the API being used. All of them are affected, with not a single counter-example being found. So it's safe to assume it simply affects all GP-GPU programs, be it CUDA or OpenCL.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Thanks for explaining and conformation ETA.
Will try nVidia inspector settings.
____________
Greetings from TJ |
|
|
|
Overclocking the memory:
Thanks to skgiven I can now overclock my memory! Here's how:
- the GPU must not be crunching BOINC (either pause your GPU project, or all GPUs, or suspend BOINC completely)
- in the nVidia Inspector OC tab set the overclock for P0 (because you can't go any higher than this in P2)
- now you can set up to this memory clock for P2 as well
- apply & have fun
I can not yet say how much performance this will bring, but given the relatively small performance increase going from 3.0 to 3.5 GHz we shouldn't expect wonders at GPU-Grid.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
DaveSend message
Joined: 12 Jun 14 Posts: 12 Credit: 166,790,475 RAC: 0 Level
Scientific publications
|
Memory controller load running Einstein is at 80%. GPUGrid at around 50%. So the assumption seems to be right that GPUGrid doesn't benefit from higher memory clocks all too much in the end.
|
|
|
|
I don't have solid statistical data yet, but the benefit grows with the memory controller load, which depends on the WU type. For NOELIA_20MG, which went from ~50% load at 3.0 GHz to ~40% at 3.75 GHz I'm seeing approximately a 3.3% performance increase. It's not as much as in other projects, but a nice amount of "free" credits anyway. Power draw increased only by a few W.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
One thing to watch out for is decrease in GPU clocks due to a user defined (100% or 105%...) power cap; if you increase the GDDR speed slightly but the GPU is restricted by your power limit then the gains from increasing the GDDR are lost. It would also impede accurate measurements of performance increase when running at 7GHz rather than 6GHz. This may be why it is the way it is. My guess is that there is something to be gained here but probably more where the load is higher.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Agreed - that's why I raised my power limit by 1% (2 W) when I increased the memory clock, to keep my card working at fairly efficient ~1.10 V without "underusing my new gem".
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Memory OC don't even give 0.1GFlops more....
I tested my GTX580 on 855Mhz/2100mhz = 1750GFlops and 855Mhz/1040mhz = 1750GFlops.
Overclocking Memory = Graphic card eats only more Watts...
Mem OC is only usefull in games... |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GTX580 is GF110 not GM204.
Performance is not the same thing as GFlops.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
UnknownPL1337, it seems you still have to learn a lot.
Anything that measures performance in terms of GFlops is very likely a low-level benchmark. It runs some very simplistic calculations, designed to extract the maximum performance from the hardware. The theoretical GFlops of a GPU are just "clock speed * number of shaders * operations per shader per clock".
Real applications can't sustain this speed because something always gets in the way. For example dependencies between instructions have to be resolved first, or data may have to be fetched from memory.
It's true that faster memory doesn't calculate anything - but it's keeping your chip from being slowed down by memory operations.
What ever you used to measure those 1750 GFlops was either just calculating them based on your GPu clock speed and architecture, or was running some low-level test designed not to be held back by memory operations.
It's also true that on your GTX580 the memory speed does not matter much. It's simply got enough bandwidth to feed the shaders. GM204 has to feed 4 times as many shaders, at comparable clock speeds, with just ~10% more memory bandwidth. The other Maxwell chips (GM107, GM206, GM200) are balanced in a similar way and thus also benefit far more from memory overclocks than your card.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Anything that measures performance in terms of GFlops is very likely a low-level benchmark. It runs some very simplistic calculations, designed to extract the maximum performance from the hardware. The theoretical GFlops of a GPU are just "clock speed * number of shaders * operations per shader per clock".
Even so: GFlops is a basic metric and a standard known meaning. (reason it heard so often) NVidia says it in presentations. BOINC uses it. GFLOPS/W ratio is a metric that figures the placement of Greenest TOP500 supercomputers. Applied to a number of performance ratios: Gflops/core or (M)GFlops/instructions per cycle - Gflops/bandwidth - etc... |
|
|
|
Yes, its meaning is known.. and so are its limitations ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
GFlops is a theoretical maximum expressed against double precision or in the case of here single precision (x16 will be added to Volta).
GPU's have different but fixed architectures, so actual GPU performance depends on how the app and task expose architectural weaknesses (bottlenecks); what it has been asked to do and how it has to do it. Different architectures are relatively better or worse at doing different things.
WRT NVidia, GFlops is a reasonably accurate way of comparing cards performances within a series (or two based on the same architecture) as it's based on what they are theoretically capable of (which can be calculated). However, there are other 'factors' which have to be considered when looking at performance running a specific app.
As MrS said, by calculating and applying these 'correction factors' against compute capabilities we were able to compare performances of cards from different generations, up to and including Fermi. With Kepler we saw a greater variety of architectures within the two series so there were additional factors to consider.
Differences in bandwidth, boost, cache size, memory rates and memory size came to the fore as important considerations when comparing Kepler cards/thinking about buying one to crunch here - these impacted upon actual performance.
Cache size variation seemed to be important, with larger cache sizes being less restrictive.
Same type cards boosted to around about the same speed, irrespective of the price tag or number of fans.
Some cards in a series were even from a different generation GF not GK.
Bandwidth was lower for some Kepler's and was a significant impedance varying with WU types. This is still the case with Maxwell's; some WU's require more bandwidth than others. So for example, running one WU type (say a NOELIA_pnpx) on a GTX970 might incur a 49% Memory Controller Load and a 56% MCL on a GTX980 whereas other WU's would only incur a 26% MCL on a GTX970 and a 30% MCL on a GTX980. In the latter case a GTX980 might for the sake of argument appear to be 19% faster than a GTX970 whereas with the NOELIA WU's the GTX970's performance would be slightly better relatively; with the GTX980 only 15% faster than a GTX970. Increasing memory rates (to what they are supposed to be) alleviates the MCL but it's not as noticeable if the MCL is low to begin with.
The GDDR5 usage I'm seeing from a NOELIA_pnpx task is 1.144GB, so it's not going to perform well on a 1GB card!
Comparing NVidia cards to AMD's ATI range based on GFlops is almost pointless without consideration to the app. Equally pointless is comparing apps that use different features; OpenCL vs CUDA.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
GFlops is a theoretical maximum expressed against double precision or in the case of here single precision (x16 will be added to Volta).
GPU's have different but fixed architectures, so actual GPU performance depends on how the app and task expose architectural weaknesses (bottlenecks); what it has been asked to do and how it has to do it. Different architectures are relatively better or worse at doing different things.
WRT NVidia, GFlops is a reasonably accurate way of comparing cards performances within a series (or two based on the same architecture) as it's based on what they are theoretically capable of (which can be calculated). However, there are other 'factors' which have to be considered when looking at performance running a specific app.
As MrS said, by calculating and applying these 'correction factors' against compute capabilities we were able to compare performances of cards from different generations, up to and including Fermi. With Kepler we saw a greater variety of architectures within the two series so there were additional factors to consider.
Differences in bandwidth, boost, cache size, memory rates and memory size came to the fore as important considerations when comparing Kepler cards/thinking about buying one to crunch here - these impacted upon actual performance.
Cache size variation seemed to be important, with larger cache sizes being less restrictive.
Same type cards boosted to around about the same speed, irrespective of the price tag or number of fans.
Some cards in a series were even from a different generation GF not GK.
Bandwidth was lower for some Kepler's and was a significant impedance varying with WU types. This is still the case with Maxwell's; some WU's require more bandwidth than others. So for example, running one WU type (say a NOELIA_pnpx) on a GTX970 might incur a 49% Memory Controller Load and a 56% MCL on a GTX980 whereas other WU's would only incur a 26% MCL on a GTX970 and a 30% MCL on a GTX980. In the latter case a GTX980 might for the sake of argument appear to be 19% faster than a GTX970 whereas with the NOELIA WU's the GTX970's performance would be slightly better relatively; with the GTX980 only 15% faster than a GTX970. Increasing memory rates (to what they are supposed to be) alleviates the MCL but it's not as noticeable if the MCL is low to begin with.
The GDDR5 usage I'm seeing from a NOELIA_pnpx task is 1.144GB, so it's not going to perform well on a 1GB card!
Comparing NVidia cards to AMD's ATI range based on GFlops is almost pointless without consideration to the app. Equally pointless is comparing apps that use different features; OpenCL vs CUDA.
Through points -- Knowing architectural differences (strengths and weaknesses) are key for peak instruction output - as are the underlining data transfers. To fully understanding the affect different types of code have on any "compute" device - one has to be knowledgeable in creative complexes'(s). Many facets to master - always learning knowledge by failure and practice. It takes many years to learn the language(s) of computers: a never ending journey. Deferring to people who have worked (experts) with computers since single digits bits a natural course for those who are willing to learn. The ones who started it all can teach younger heads much.
|
|
|