Author |
Message |
|
Hello everyone,
I'm creating this thread to document my GPUGrid GPU Task performance variances, while testing things such as:
- GPU task with no other tasks
- GPU task with full CPU load
- GPU task with overloaded CPU load
- Multiple GPU tasks on 1 video card
My system (as of right now) is:
Intel Core i7 965 Extreme (quad-core, hyper-threaded, Windows sees 8 processors)
Memory: 6GB
GPU device 0: eVGA GeForce GTX 660 Ti 3GB FTW (primary display)
GPU device 1: eVGA GeForce GTX 460 (not connected to any display)
OS: Windows 8 Pro x64 with Media Center
So far, I have some interesting results to share, and would like to "get the word out". If you'd like to share your results within this thread, feel free.
Regards,
Jacob |
|
|
|
I originally did some performance testing in another thread, but wanted the results consolidated into this "GPU Task Performance" thread.
That thread is titled "app_config.xml", and is located here:
http://www.gpugrid.net/forum_thread.php?id=3319
Note: The post within that thread, which contains the app_config values that I recommend using, can be found here:
http://www.gpugrid.net/forum_thread.php?id=3319#29216 |
|
|
|
Here are the first results (from running only on my GTX 660 Ti), copied from that thread:
========================================================================
Running with no other tasks (every other BOINC task and project was suspended, so the single GPUGrid task was free to use up the whole CPU core):
Task: 6669110
Name: I23R54-NATHAN_dhfr36_3-17-32-RND2572_0
URL: http://www.gpugrid.net/result.php?resultid=6669110
Run time (sec): 19,085.32
CPU time (sec): 19,043.17
========================================================================
Running at <cpu_usage>0.001</cpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:
Task: 6673077
Name: I11R21-NATHAN_dhfr36_3-18-32-RND5041_0
URL: http://www.gpugrid.net/result.php?resultid=6673077
Run time (sec): 19,488.65
CPU time (sec): 19,300.91
Task: 6674205
Name: I25R97-NATHAN_dhfr36_3-13-32-RND4438_0
URL: http://www.gpugrid.net/result.php?resultid=6674205
Run time (sec): 19,542.35
CPU time (sec): 19,419.97
Task: 6675877
Name: I25R12-NATHAN_dhfr36_3-19-32-RND6426_0
URL: http://www.gpugrid.net/result.php?resultid=6675877
Run time (sec): 19,798.77
CPU time (sec): 19,606.33
========================================================================
CONCLUSION:
So, as expected, there is some minor CPU contention whilst under full load, but not much (Task Run time is maybe ~3% slower). It's not affected much because the ACEMD process actually runs at a higher priority than other BOINC task processes, and therefor, are never starved for CPU, and are likely only minorly starved for contention during CPU process context switching. |
|
|
|
Here are some more results, where I focused on the "short" Nathan units:
========================================================================
Running with no other tasks (every other BOINC task and project was suspended, so the single GPUGrid task was free to use up the whole CPU core):
Task: 6678769
Name: I1R110-NATHAN_RPS1_respawn3-10-32-RND4196_2
URL: http://www.gpugrid.net/result.php?resultid=6678769
Run time (sec): 8,735.43
CPU time (sec): 8,710.61
Task: 6678818
Name: I1R42-NATHAN_RPS1_respawn3-12-32-RND1164_1
URL: http://www.gpugrid.net/result.php?resultid=6678818
Run time (sec): 8,714.75
CPU time (sec): 8,695.18
========================================================================
Running at <cpu_usage>0.001</cpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:
Task: 6678817
Name: I1R436-NATHAN_RPS1_respawn3-13-32-RND2640_1
URL: http://www.gpugrid.net/result.php?resultid=6678817
Run time (sec): 8,949.63
CPU time (sec): 8,897.27
Task: 6679874
Name: I1R414-NATHAN_RPS1_respawn3-7-32-RND6785_1
URL: http://www.gpugrid.net/result.php?resultid=6679874
Run time (sec): 8,828.17
CPU time (sec): 8,786.48
Task: 6679828
Name: I1R152-NATHAN_RPS1_respawn3-5-32-RND8187_0
URL: http://www.gpugrid.net/result.php?resultid=6679828
Run time (sec): 8,891.22
CPU time (sec): 8,827.11
========================================================================
CONCLUSION:
So, again, as expected, there is only slight contention while under full CPU load, because the ACEMD process actually runs at a higher priority than other BOINC task processes, and therefor, are never starved for CPU, and are likely only minorly starved for contention during CPU process context switching. |
|
|
|
So, previously, I was only running 1 GPU Task on that GPU (and the GPU Load would usually be around 87%-88%). But I wanted to find out what would happen when I run 2.
So, the following tests will use <gpu_usage>0.5</gpu_usage> ... in my app_config.xml.
Note: The GPU Load goes to ~97% when I do this, and I believe that's a good thing!
========================================================================
Long-run Nathan tasks...
Running at <cpu_usage>0.001</cpu_usage>, <gpu_usage>0.5</gpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:
Name: I19R1-NATHAN_dhfr36_3-22-32-RND2354_0
URL: http://www.gpugrid.net/result.php?resultid=6684711
Run time (sec): 35,121.51
CPU time (sec): 34,953.33
Name: I6R6-NATHAN_dhfr36_3-18-32-RND0876_0
URL: http://www.gpugrid.net/result.php?resultid=6685136
Run time (sec): 39,932.98
CPU time (sec): 39,549.67
Name: I22R42-NATHAN_dhfr36_3-15-32-RND5482_0
URL: http://www.gpugrid.net/result.php?resultid=6685907
Run time (sec): 35,077.12
CPU time (sec): 34,889.61
Name: I31R89-NATHAN_dhfr36_3-21-32-RND1236_0
URL: http://www.gpugrid.net/result.php?resultid=6687190
Run time (sec): 35,070.94
CPU time (sec): 34,901.26
Name: I8R42-NATHAN_dhfr36_3-22-32-RND2877_1
URL: http://www.gpugrid.net/result.php?resultid=6688517
Run time (sec): 32,339.90
CPU time (sec): 32,082.15
========================================================================
Short-run Nathan tasks...
Running at <cpu_usage>0.001</cpu_usage>, <gpu_usage>0.5</gpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:
Name: I1R318-NATHAN_RPS1_respawn3-11-32-RND9241_0
URL: http://www.gpugrid.net/result.php?resultid=6684931
Run time (sec): 12,032.03
CPU time (sec): 11,959.47
Name: I1R303-NATHAN_RPS1_respawn3-14-32-RND0610_0
URL: http://www.gpugrid.net/result.php?resultid=6690144
Run time (sec): 14,621.04
CPU time (sec): 10,697.88
========================================================================
CONCLUSIONS:
Long-run Nathan units:
1-at-a-time + full CPU load: ~19,600 run time per task
2-at-a-time + full CPU load: ~35,100 run time per task
Speedup: 1 - (35,100 / (19,600 * 2)) = 10.5% improvement
Short-run Nathan units:
1-at-a-time + full CPU load: ~8,900 run time per task
2-at-a-time + full CPU load: ~13,300 run time per task
Speedup: 1 - (13,300 / 8,900 * 2)) = 25.3% improvement
So far, it looks like running multiple tasks at a time... GETS WORK DONE QUICKER!
Now, admittedly, I am estimating on very few results here, but.. I'll continue using this "2-at-a-time" approach, and will reply here if I find anything different. |
|
|
|
This is very good info. However, I need to point out a couple potential down-side issues:
1) even with 2 tasks per GPU via app_config.xml, it does not increase the number of tasks you can download. For example, on my 4 GPU machine, it normally has 4 running, and 4 waiting to run. Running 8 at once means all 8 are running. So now there is a delay between the time a task completes, uploads, reports, a new task is downloaded (big file), and starts running. That *may* wipe out any utilization advantage.
2) The longer run-time with 2 tasks per GPU *may* cause them to miss the credit bonus for early returns.
YMMV
____________
Reno, NV
Team: SETI.USA
|
|
|
|
Point 1: ideally this would average out after some time, so that the different WUs per GPU finish at different times. Depending on your upload speed this might provide enough overlap to avoid running dry. Having more GPUs & WUs in flight should help with this issue.
Point 2: correct!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Point 1: ideally this would average out after some time, so that the different WUs per GPU finish at different times. Depending on your upload speed this might provide enough overlap to avoid running dry. Having more GPUs & WUs in flight should help with this issue.
To clarify, a simple example: a machine with 1 GPU would get 2 WUs and if these are not in sync, then while uploading/downloading 1 WU the other WU would run at 2x the speed. A real workaround would be to run the 2x WUs on a box with 1 NV and 1 ATI running on a different project, then 4 WUs would be allocated for the machine. As an aside I think running GPUGrid WUs 2x is a bad idea due to longer turn around time and possible errors. A machine reboot or GPU error (or as Jacob pointed out on the BOINC list, a BOINC restart) would be more likely to take out 2 of these long WUs instead of 1. |
|
|
|
After some setup difficulties, I now have two long run tasks running - one on each of my GTX 650 Ti GPUs. GPUGrid runs 24/7 on this AMD A10 based PC and there are always two tasks running with either one or two waiting to run. As each GTX 650 processes at a slightly different rate the number of tasks waiting to run varies. I believe this will maximize output from my PC enabling me to make the maximum contribution to the research. |
|
|
|
John,
My research indicates that you might be able to contribute more to the project, if you run 2 tasks on each of your GPUs, assuming the tasks don't result in computation errors.
You might try that, using the app_config.xml file, and see if your overall performance increases. I was able to see gains in GPU Load (seen via a program called GPU-Z), as well as increased throughput (seen by looking at task times, as noted within this thread).
Regards,
Jacob |
|
|
|
Hi, Jacob.
I am very inexperienced in writing .xml files and fear losing running tasks through syntax errors.
I would like to take it one step at a time for now and, maybe in a couple of weeks, try your suggestion. I will likely ask for help.....
Thanks for the suggestion.
John |
|
|
|
No problem. It's really not that hard, so don't be afraid, and... when you're ready, I encourage you to read this entire thread, which has details and examples:
"app_config.xml" located here:
http://www.gpugrid.net/forum_thread.php?id=3319
- Jacob |
|
|
|
Careful, guys. The GTX650Ti (Johns GPUs) sounds like it's almost the same as a GTX660Ti (Jacobs GPUs), but it's actually about a factor of 2 slower. Currently 70k credit long-runs take John 33k seconds, running 2 of them might require ~60 ks. That's almost one day, so we're getting close to missing the deadline for the credit-bonus here for even longer tasks (some give 150k credits, so should take over twice as long).
And this is not only about credits: the credit bonus is there to encourage people to return results early. The project needs this as much as it needs many WUs done in parallel. As long as we're still making the deadline for the credit bonus we can be sure to return results as quickly as they want us to return them.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Sure, in order to get maximum bonus credits, you'll have to be careful to make sure you complete all your tasks within 24 hours. And, in general, they want results returned quickly.
But, in order to help the project the most, throughput (how fast can you do tasks) is the factor to measure, and the "deadline" is the task's deadline, which usually is a few days I think. If the administrators deem that a task must be done at a certain time, then I hope they are setting task deadline appropriately. |
|
|
|
Thanks, Gentlemen:
I will leave this alone for now.....
With falling prices for the GTX 660 Ti, I may add one to my other AMD A10 based PC in September around my birthday.
John |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
You still have plenty of testing to do; all the possible same and mixed WU combinations would need to be looked at: NATHAN_dhfr36 (Long) + NOELIA_148n (Long)
NATHAN_dhfr36 (Long) + NOELIA_TRYP (Short)
NATHAN_dhfr36 (Long) + NATHAN_stpwt1 (Short)
NOELIA_148n (Long) + NOELIA_TRYP (Short)
NOELIA_148n (Long) + NATHAN_stpwt1 (Short)
NOELIA_TRYP (Short) + NATHAN_stpwt1 (Short)
... plus any I've missed and whatever else turns up...
Basically, how do the various Long and Short tasks perform running together, how do mixed WU types perform and as there are several apps in use (16.16app, 16.18, 16.49, 6.52) - how do they get on together?
You might want to start 'freeing up' a CPU thread/core when running two WU's; the ~3% loss could well be exponential (more like 9%). Note also that some apps might ask for a full CPU core, while others won't (I think this is also GPU specific; needed for Kepler's but not Fermi's).
When you do all that, then you will be in a position to look at the error rates and thus determine overall gain, or loss :))
You have to remember that all this depends on the operating system. It's a well discussed fact that Linux/WinXP/2003 are faster for crunching at GPUGrid (11%+). Your numbers probably won't hold up on these operating systems, but should be true for Vista and W7. The Win 2008 servers are somewhere in between in terms of performance loss.
This would all have to be tested for Fermi's and Titan's (which might offer more).
I wouldn't be keen on running two long WU's but two short tasks looks interesting.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
|
Yes, I still have testing to do. You can/should test too!
It's not easy to cherry-pick certain task type combinations -- I usually just let any task types run together. Maybe once I find even more time to test, I'll attempt doing the specific-combination testing, using custom suspending, and more vigilant monitoring.
As far as "freeing up a core", my research indicates that, at this point, doing so is COMPLETELY UNNECESSARY, at least for me. If you look at the acemd processes in Process Explorer, you'll see that process priority is 6, and the CPU-intensive-thread priority is either 6 or 7. This ensures that the thread and process do not get swapped out of the processor, even when I'm running a full load of other CPU tasks, since those CPU tasks are usually priority 1 or 4. Watching how the CPU time gets divvied up (in Process Explorer, or in Task Manager), also proves it -- you'll see the other processes getting less-than-a-core, but you'll see the acemd process "suffer" much. Plus, as you said, sometimes the GPUGrid tasks don't require much CPU at all (like when a NATHAN Long-run is on my GTX 460), so, reserving a core is sheer waste at that point, at least for my goals. So I won't do it.
I'm not trying to speculate here, and I'm certainly not trying to find reasons not to run multiple tasks on the same GPU. I think it's worth it.
What I'm trying to do is show the results that I have achieved, using my goals (maximize throughput for GPUGrid, without sacrificing any throughput for my other projects), and I encourage others to do the same.
Thanks,
Jacob |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I don't have much time to test, but OK, I'll do a little bit...
System:
GTX660Ti @1202MHz, i7-3770K CPU @4.2GHz, 8GB DDR3 2133, SATAIII drive, W7x64, 310.90 drivers, Boinc 7.0.60.
I've started using your suggested app_config.xml file:
<app_config>
<app>
<name>acemdbeta</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdlong</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemd2</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdshort</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
</app>
</app_config>
I was running one Long NATHAN_dhfr36 task. It had reached ~33% when I added the app_config file. GPU Utilization was around 87% (as you observed), power was about 87% and the temp ~60°C. CPU was set to only use 75% (free 2 threads), also running POGS. Note that I was using swan_sync=0.
I increased the Boinc cache and downloaded a Short NATHAN_stpwt1 task.
When I restarted Boinc, I had 4 POGS CPU tasks running (50% of the CPU). The two GPUGrid tasks used 25% of the CPU; a full CPU thread each (not due to swan_sync). GPU utilization rose to 98%, power to 97%, GPU temp to 65°C and the system Wattage went up by around 15 or 20W.
On my system these NATHAN_dhfr36 tasks (6.18 app) have varied in runtime from between ~18,400s and ~19,000s and the only two previous NATHAN_stpwt1 tasks (6.16 app) took 5,166 and 5,210s.
I expect that by just running another task you force the Kepler GPU's to run at higher clocks; they try to self-adjust their frequency!
- The Short NATHAN_stpwt1 task completed in 8,112s, so it took 56% longer, but not twice as long...
- Didn't automatically get another GPUGrid WU (Boinc Cache set to low??), but did when I updated; a NATHAN_RPS1_respawn (6.52app)
Both GPUGrid tasks each still using a full CPU thread (swan on). Will try to run a few with swan on and then off, for comparison.
The NATHAN_RPS1_respawn took 12,021sec. On average they take 8876sec, but have varied from 8,748 to 9,215sec. That's 35% longer than normal but a good bit less than twice as long.
The third task to run along with the Long WU is NATHAN_RPS1_respawn3-25-32-RND4658_0.
The Long task took 39,842sec, over twice as long as normal (2.13 times as long). Given that the first 33% was run by itself, the final 66% took over 3times as long as normal to complete the WU. That's a big loss when running Long and Short tasks together. Even considering the Short tasks were >0.5 as fast, in this case it looks less efficient overall.
Warning! Running two NATHAN_RPS1_respawn3 tasks together caused dangerously bad lag. GPU utilization fell to 33% and GPU temp dropped to 41°C. After 55min the second Short task had only reached 1.7% complete. Just one of these tasks runs at 94% GPU utilization on my system, so there is no way running two would be beneficial. I've since retested this, and found the same results. I was also able to run 4 POEM tasks as well as one respawn3, but they were very slow. Alone these 4 POEM tasks used 88% of the GPU and with the respawn3 WU that went up to 99%. For now I have disabled app_config, as I'm just getting these respawn3 WU's.
I ran a single NOELIA_Klebe_Equ task and then two at the same time.
While running the single task GPU utilization was 87% and while two tasks were running it rose to 97%.
Basically it's not any faster running two tasks:
041px21x3-NOELIA_Klebe_Equ-0-1-RND6607_0 4338582 7 Apr 2013 | 6:07:54 UTC 7 Apr 2013 | 10:08:02 UTC Completed and validated 13,243.66 5,032.64 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
041px2x1-NOELIA_Klebe_Equ-0-1-RND3215_0 4338518 7 Apr 2013 | 6:43:58 UTC 7 Apr 2013 | 10:27:29 UTC Completed and validated 13,320.82 4,158.10 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
005px46x3-NOELIA_Klebe_Equ-0-1-RND6629_0 4338501 7 Apr 2013 | 2:28:54 UTC 7 Apr 2013 | 4:50:37 UTC Completed and validated 6,288.30 2,656.45 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
- Running another two with swan off. The first of the two NOELIA_Klebe_Equ tasks started using 934MB and the second used an additional 808MB GDDR5. That 1742MB dropped to 1630MB before they reached 10% complete. With two tasks running the clock stabilized at 1189MHz.
Note that I'll just edit this post with any further results.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
|
Sounds good, thanks for testing.
Note: When running 2-at-a-time, I expect tasks to take slightly-less-than-double what they normally take, which would mean they are being processed faster over-all. |
|
|
|
Ah, you bring up a good point, I forgot to mention my clocking experiences with my Keplar architecture eVGA GTX 660 Ti 3GB FTW card...
- It's base clock is 1045 MHz, which I think is the lowest clock it can be while running a 3d application or GPU task.
- When GPU Load is not great (~60-75%), I think it usually upclocks a little (maybe up to 1160 MHz), but because it sees the application as "not demanding a lot", it doesn't try hard to upclock.
- When GPU Load is decent-ish (86%), it auto-upclocks a bit (usually to around 1215 MHz or 1228 MHz I think), with Power Consumption around 96-98% TDP.
- When GPU Load is better-saturated (97%-99%), it usually tries to upclock higher, but reaches a thermal limit. It usually ends up clocked at around 1180-1215 MHz, with a temperature of 84*C-89*C, at a Power Consumption around 96%.
- TIP: At that saturation, if you want, you can usually allow it to auto-upclock just a tad more, by using whatever overclock tools you have (I have eVGA Precision X), and just adjust the "Power Target". By default, I think the driver sets a Power Target of 100%, but what I usually do is adjust it to 140%. This let's it auto-clock higher, until it starts really hitting those thermal limits. My end result: My card usually runs at 1215 MHz, 86*C - 90*C, with Power Consumption around 106% TDP.
So, running at higher GPU Load keeps it clocked high, as high as the thermal limits can let it... which is a good thing, if you care more about GPUGrid throughput than the lifespan of your GPU. :)
Regards,
Jacob |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Careful, guys. The GTX650Ti (Johns GPUs) sounds like it's almost the same as a GTX660Ti (Jacobs GPUs), but it's actually about a factor of 2 slower. Currently 70k credit long-runs take John 33k seconds, running 2 of them might require ~60 ks. That's almost one day, so we're getting close to missing the deadline for the credit-bonus here for even longer tasks (some give 150k credits, so should take over twice as long).
Another big fly in the ointment regarding the 2x suggestion: Jacob's 660 Ti has 3GB of memory and the 650 Ti in question has only 1GB. I doubt if it can even run 2x. I have a 650 TI and ordered another for this project. Good bang for the buck and for power usage. I would not however suggest running 2x WUs on a 650 Ti. Personally I don't think running multiple concurrent WUs on GPUGrid is a good idea in general as Jacob's seemingly invalid (but validated) WUs would indicate.
|
|
|
|
You are correct that there is a memory concern, thank you for reminding me about this.
My testing indicates that going beyond your GPU's memory limit will result in an immediate task failure for the task being added.
To test/prove this, I wanted to overload my secondary GPU, the GTX 460 1 GB (which is not connected to any monitor).
Below is my setup and my testing results:
I suspended all tasks, suspended network activity, closed BOINC, made a copy of my data directory (so later I could undo all this testing without losing active work), changed the cc_config to exclude GPUGrid on device 0 (the 660 Ti), changed the GPUGrid app_config <gpu_usage> value to 0.25 (yup, this is a stress test), and I started resuming GPUGrid tasks 1 at a time, while monitoring Memory Usage (Dynamic) in GPU-Z.
Here are the scenarios that I saw (had to reset the data directory for nearly each scenario):
Scenario 1:
- Added a Long-run Nathan dhfr, GPU Mem Usage became 394 MB
- Added a Short-run Nathan RPS1, GPU Mem Usage increased to 840 MB, no problems
- Added a Long-run Nathan dhfr, GPU Mem Usage spiked up to 1004 MB, and the task immediately failed with Computation Error
Scenario 2:
- Added a Short-run Nathan RPS1, GPU Mem Usage became 898 MB
- Added a Long-run Nathan dhfr, GPU Mem Usage spiked up to 998 MB, and the task immediately failed with Computation Error
Scenario 3:
- Added a Long-run Nathan dhfr, GPU Mem Usage became 394 MB
- Added a Long-run Nathan dhfr, GPU Mem Usage increased to 789 MB, no problems
- Added a Short-run Nathan RPS1, GPU Mem Usage spiked, and the task immediately failed with Computation Error
Scenario 4:
- Added a Long-run Nathan dhfr, GPU Mem Usage became 394 MB
- Added a Long-run Nathan dhfr, GPU Mem Usage increased to 789 MB, no problems
- Added a Long-run Nathan dhfr, GPU Mem Usage increased to 998 MB, and surprisingly, all 3 tasks appear to be crunching OK
CONCLUSIONS:
- If adding a task will put the memory usage beyond your GPU's memory limit, the task being added will immediately fail with Computation Error.
- If you look carefully at those Short-run's being added, it looks like they "detect" the available GPU Ram, and run in a certain mode made to fit within that limit. For instance, in Scenario 1, I think the task detected 610 MB free, thought it was a 512 MB card, and limited itself to (840-394=) 446 MB. Then, in Scenario 2, it saw 1004 MB, thought it was a 1GB card, and limited itself to 898 MB. Then, in Scenario 3, there was only 215 MB free, and it couldn't load at all.
- If you look at the Long-run's, they follow a similar pattern. In Scenario 1, it saw a 1GB card, but maybe the tasks were built to only ever need 512 MB cards, so it limits to 394 MB. In Scenario 2, it couldn't load within 106 MB. Scenario 4 is interesting, because for the third task, it saw 215 MB free, and was able to run using (998-789=) 209 MB. So it can scale to 256 MB cards perhaps. Looking back at scenario 1, we see that the third task only had 164 MB free, which wasn't enough.
- So... I'm going to have to revisit my settings. I'd like to tell BOINC to run 2 tasks at a time on device 0, but only 1 task at a time on device 1... but I don't think that's an option. Another thing I might do is disable certain GPUGrid applications on device 1, such that I can be sure to only double-up ones that I know would work. But, because GPUGrid mixes "types" within a single "app" (like NOELIA and NATHAN, both within the same application), I don't think that's a valid play either. I may just devote device 1 to another project, whilst device 0 is set for 2 tasks. Still deciding, and am open to any suggestions you may have.
- Note, this issue hasn't bitten me yet, because device 1 is currently focused on World Community Grid's (WCG) Help Conquer Cancer (HCC)tasks; so GPUGrid tasks haven't doubled up much on my device 1 at all yet, but they will soon, when WCG HCC runs out of work in < 1 month, and I'll need a plan by then.
Hope you find this helpful -- it took 40 minutes to test and write up, and I encourage you to do your own testing too!
Regards,
Jacob
|
|
|
|
Nice testing Jacob, thanks!
I've read somewhere that ACMED supports 2 different modes, a faster one using more RAM and a slower one using less memory. Which one is employed is automatically decided based on available memory. Not sure how flexible the memory consumption is.. but it's certainly a point to keep in mind. We don't want to run 2 WUs for higher throughput and thereby force the app into a slower algorithm, which would very probably lead to a net performance loss.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Another thing to keep in mind concerning the 2x strategy. These current Nathans are the shortest running long type WUs we've seen in quite a while (BTW, not complaining at all). The Tonis are longer, and the Noelia and Gianni WUs are a lot longer. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
My impression so far is that if you have a sub-optimal GPU setup, you are more likely to see improvements from running two GPU tasks than if you have already optimized as much as possible for the GPUGrid project.
Clearly, with some WU types their is something to be gained from some systems and setups, but running other WU's will not gain anything, and in some cases will result in massive losses; with my setup running two short NATHAN_RPS1_respawn WU's (6.52app) was desperate.
I never really thought doing this on anything but the top GPU's could yield much and with the possible exception of Titans (which don't yet work) I doubt that running more than 2tasks would improve throughput for any GPUGrid work, but I think the memory crash test has led to better understanding by crunchers (though the researchers might feel differently).
It's clear that different WU types require different amounts of GDDR memory and different amounts of CPU attention. When I wan running with swan on I noticed that as well as the runtimes going up, the CPU times also went up:
Single task:
063ppx48x2-NOELIA_Klebe_Equ-0-1-RND3600_0 4339021 7 Apr 2013 | 13:11:15 UTC 7 Apr 2013 | 15:21:46 UTC Completed and validated 6,133.10 2,529.98 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
Two tasks together:
041px2x1-NOELIA_Klebe_Equ-0-1-RND3215_0 4338518 7 Apr 2013 | 6:43:58 UTC 7 Apr 2013 | 10:27:29 UTC Completed and validated 13,320.82 4,158.10 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
041px21x3-NOELIA_Klebe_Equ-0-1-RND6607_0 4338582 7 Apr 2013 | 6:07:54 UTC 7 Apr 2013 | 10:08:02 UTC Completed and validated 13,243.66 5,032.64 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
Obviously this is undesirable, especially saying as the tasks were not significantly faster.
On the 2GB GTX660Ti card versions, two memory controllers have to interleave. I know most tasks are only ~300MB so two would not reach 1.5GB, but I don't know if all the controllers are used simultaneously or what way that works. Anyway, could this memory layout hinder the 2GB GTX660Ti cards when running two tasks but not the 3GB GTX660Ti cards?
- Started running two more NOELIA_Klebe_Equ WU's. The first used 934MB and when the second ran the total went up to 1742MB. That means the second is only using 808MB, but between them they are entering that 1.5 to 2.0GB limited memory bandwidth zone for 2GB GTX660Ti's. Perhaps this is having some influence on my results compared to those on a 3GB card?
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Well, the speed-up for your 1st task would be really cool, if it also applied to the 2nd one. 2 possible reasons why it didn't: the task uses less memory and may have just switched to the slower algorithm, or is just generally slower with less memory even if the faster algorithm is still used. Or it's really the memory controller. As far as I have heard all 3 controllers are used for the first 1.5 GB and upon using the last 512 MB only 2 controllers are used. GPU-Grid has never been that bandwidth-hungry, however.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I have not yet seen any overall improvement running any two tasks with my system/setup but Jacob has. Fairly different rigs though (LGA1366 vs 1155, 3GB vs 2GB, 100% CPU usage vs 75% to 87%). Too many variables for my liking, and I still think some CPU tasks could influence performance (my SATA rattles away running POGS but not malaria). My system is fairly well optimized for GPU crunching, so I might have less headroom and the 3GB cards could well have an advantage, only seen with lots of memory. Perhaps the app uses the faster equations when it sees X amount of RAM available (1GB or 1.5GB perhaps). That parameter might be app or even task specific.
In the past when I tested this it was only really beneficial when the GPU utilization was quite low, under 80%, and there was an abundance of one task type - but new WU's, apps and GPU's have arrived since then.
According to GPUZ the memory controller load on the GTX660Ti is 40% when both tasks are running and 38% when one task is running. There is not that much difference but then GPU utilization could only go up by ~10%. 40% is relatively high; a GTX470 running @95% GPU utilization has a controller load of 23% and IIRC a GTX260 was only around 15%, but I don't know how significant it is.
When I suspended one of the running NOELIA_Klebe_Equ tasks, to see what the controller load was for 1 task, I got an app crash. I quickly closed Boinc, waited a minute and then closed the stupid windows error and started Boinc again. The suspended task started OK when I resumed it, but this highlights the fact that the tasks are not designed to run this way.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Just a heads up - Some NOELIA tasks are currently crashing when suspended or closed, even when running just 1. Those crashes are not the fault of running 2-at-once, so far as I know. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks, I suggest people don't try to run them unless they have selected the recommended "Use GPU while computer is in use"
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Well, I now have a set of results that is +ve for 2 tasks, albeit by 6%.
Note this is with swan off, and relative to a task where swan was on.
Two tasks running at the same time:
148nx13x1-NOELIA_Klebe_Equ-0-1-RND7041_0 4339235 7 Apr 2013 | 17:38:21 UTC 7 Apr 2013 | 20:54:29 UTC Completed and validated 11,493.31 3,749.67 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
109nx42x1-NOELIA_Klebe_Equ-0-1-RND5274_0 4339165 7 Apr 2013 | 17:37:44 UTC 7 Apr 2013 | 20:52:54 UTC Completed and validated 11,534.36 3,623.14 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
One task by itself (ref):
063ppx48x2-NOELIA_Klebe_Equ-0-1-RND3600_0 4339021 7 Apr 2013 | 13:11:15 UTC 7 Apr 2013 | 15:21:46 UTC Completed and validated 6,133.10 2,529.98 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
What I'm interested in is why the CPU time is still higher than the single task with swan on. Obviously this isn't a good thing because you are gaining some GPU performance increase but losing some CPU performance.
These were my result with swan on:
041px21x3-NOELIA_Klebe_Equ-0-1-RND6607_0 4338582 7 Apr 2013 | 6:07:54 UTC 7 Apr 2013 | 10:08:02 UTC Completed and validated 13,243.66 5,032.64 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
041px2x1-NOELIA_Klebe_Equ-0-1-RND3215_0 4338518 7 Apr 2013 | 6:43:58 UTC 7 Apr 2013 | 10:27:29 UTC Completed and validated 13,320.82 4,158.10 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
First, I believe the correct time numbers to use, when comparing, is Run Time (which determines overall throughput of how quickly you can complete tasks, and can be considered overall performance I believe), not CPU Time (which may vary, depending on several factors).
Second, for whatever reason, I thought that the "Swan" setting wasn't used anymore. Do you believe it's still used? If so, is it still used via a System Variable called "Swan_Sync" with a value of 0 or 1? Is it maybe only used on certain types of GPUS (like non-Keplar)? Is there a task type I could easily test turning Swan on and off, to immediately and easily see the resulting CPU Usage variation in Task Manager?
Regards,
Jacob |
|
|
|
I understood that the SWAN_SYNC environment variable is not being used any more. For Keplers they went the brute-force way and always request ~1 core per task and GPU. The SWAN_SYNC doesn't work well anymore if cards become too fast (relative to the CPU scheduler time slice length). that's why we started turning it off with fast Fermis already (SWAN_SYNC=0, default was 1). Not sure if it still works on Fermis.. but that's not really of current interest, I think.
Actually.. taking a quick look at my results I see "GPU itme = CPU time" for long-runs, whereas short runs use less CPU (~40% the GPU time).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Yep, I'm seeing similar times when comparing Run Time to CPU Time.
And Task manager agrees with those times...
For me, for the Nathans tasks, Task Manager shows:
- the long-run tasks use a full CPU core all the time while processing
- the short-run tasks only use a portion of the CPU
So, because the CPU usage can very task to task (think different task types within the same app, even), that is why I have set <cpu_usage> to 0.001 for each app, allowing my CPU tasks to pick up any remaining actual CPU slack. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I've been Run Time centric from the outset. The SWAN_SYNC variable now appears to be inert/inactive, though it might be built into the Long apps? I'm seeing ~31% CPU usage per GPUGrid app for the Short NOELIA_Klebe_Equ tasks. The Long tasks were using close to 100% of a CPU thread.
Presently running a I1R132-NATHAN_RPS1_respawn3-14-32-RND2200_1 and a 306px29x2-NOELIA_Klebe_Equ-0-1-RND4174_0 WU (SWAN disabled/off). 98% GPU Utilization 1.7GB DDR5.
The Short NATHAN_RPS1_respawn3 WU's used 1 full CPU thread, which might lead to why 2 didn't run well together (on my setup):
I1R137-NATHAN_RPS1_respawn3-14-32-RND8262_0 4338051 7 Apr 2013 | 0:05:04 UTC 7 Apr 2013 | 3:05:55 UTC Completed and validated 8,745.14 8,731.06 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
I1R414-NATHAN_RPS1_respawn3-24-32-RND6785_1 4337248 6 Apr 2013 | 19:28:08 UTC 7 Apr 2013 | 0:39:58 UTC Completed and validated 8,756.17 8,740.61 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
I1R439-NATHAN_RPS1_respawn3-20-32-RND6989_0 4337286 6 Apr 2013 | 17:11:24 UTC 6 Apr 2013 | 22:13:55 UTC Completed and validated 12,182.96 12,106.08 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
I1R81-NATHAN_RPS1_respawn3-25-32-RND4658_0 4336752 6 Apr 2013 | 14:24:43 UTC 6 Apr 2013 | 18:35:50 UTC Completed and validated 14,656.44 14,508.81 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)
Note that Nathan has more that one task type. It's important to be task type specific.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
FYI:
Per Toni (a project administrator), because of application and validator problems, I was told to disable running 2-at-a-time. So I have suspended my testing/research, and have put <gpu_usage> to 1, in the app_config.xml file.
http://www.gpugrid.net/forum_thread.php?id=3332#29425
Regards,
Jacob Klein |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
For the past two years I've run an OCed ASUS GTX 460 1GB; core 850, shader 1700, memory 2000. It's stable (perhaps I should try pushing it further...?). The fan is silent, and temp is around 66C. Current long Nathans take about nine hours.
If I run two WUs concurrently, what % increase in thruput might I see?
Thanks, Tom
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
tomba, running two tasks on a GTX 460 1GB is a bad idea; many tasks use too much GDDR (~700MB) so they would eat each other. You wouldn't see any benefit. You would probably see a slow down and possibly failures or system crashes. Overall benefits are likely to only be seen on cards with 3GB or more GDDR5.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
tomba, running two tasks on a GTX 460 1GB is a bad idea
Thanks for the heads-up.
____________
|
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
running two tasks on a GTX 460 1GB is a bad idea;
No, it'a a BAD idea :-) |
|
|
|
FYI:
Per Toni (a project administrator), because of application and validator problems, I was told to disable running 2-at-a-time. So I have suspended my testing/research, and have put <gpu_usage> to 1, in the app_config.xml file.
http://www.gpugrid.net/forum_thread.php?id=3332#29425
Regards,
Jacob Klein
I have solved the problem I was having where Nathan tasks were not processing correctly on my machine.
The problem was completely unrelated to the app_config.xml file.
Details here: http://www.gpugrid.net/forum_thread.php?id=3332&nowrap=true#29894
So, we can resume app_config testing (including 2-tasks-on-1-GPU).
I still recommend using the app_config.xml file that is in this post:
http://www.gpugrid.net/forum_thread.php?id=3319&nowrap=true#29216
... and I only recommend trying 2-tasks-on-1-GPU if you are running GPUGrid on GPUs that have 2GB or more RAM; if that's the case, you might try using <gpu_usage> value of 0.5 in your app_config.xml file, to see if GPU Load increases (by using GPU-Z) and throughput increases (by looking at task Run Times).
Thanks,
Jacob |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Yesterday I retired my trusty ASUS GTX 460, replacing it with an ASUS GTX 660.
I activated app_config.xml, restarted BOINC, then the 25% in-process NOELIA. Immediately another NOELIA appeared and both crunched together.
The first WU finished this morning (18 hours run time, full bonus), but it was not until it had uploaded that another WU came down to replace it. So I lost more than an hour of crunching.
Previously I had run with minimum work buffer set to 0.01, which gave me a new WU download about 10 minutes before the active WU finished. I just set the buffer to 2.00 days. No reaction, even after a few clicks on BOINC Update.
Any thoughts? Thanks.
It's early days but what I reckon is that the doom & gloom about lost bonuses when doing 2X is unfounded, at least for me. Both my current WUs will complete well within 24 hours.
____________
|
|
|
|
Hmm...
Do you have any GPU exclusions in your cc_config.xml file?
Also, what version of BOINC are you using? Hopefully v7.0.64.
Behavior I've noticed previously is that... if BOINC has a task that's running, it doesn't consider the GPU "idle", and so it only fetches work when the work left on *that task* is below the "Minimum work buffer". Assuming you want to keep the GPU busy with 2-tasks-at-1-time, I think you'd have to increase the "Minimum work buffer" to be larger than the time necessary to complete a task running 2-at-1-time (note: This is larger than the time necessary to complete a task running 1-at-a-time). But I noticed you said you increased it already, to 2 days! I would have thought 2 days would have been enough, but maybe it needs to be even higher?
I kept my GTX 460 1GB in this machine, alongside my GTX 660 Ti, and so.... because the GTX 460 only has 1GB, I can't do 2-at-a-time unless I exclude that GPU. So I only do 1-at-a-time. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Hmm... Do you have any GPU exclusions in your cc_config.xml file?
Also, what version of BOINC are you using?
BOINC 7.0.64 (x64), and here's my cc_config.xml file:
<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>
____________
|
|
|
|
You should take that report_results_immediately flag out of there -- BOINC has a feature where the project can request that now, and GPUGrid does request it, so... by you using it, you're only increasing server load on other projects that may not need results immediately.
As far as your problem goes... Don't click update. BOINC evaluates whether it needs to fetch work every minute. So, I recommend increasing the Minimum buffer by a little (by 0.1 or 0.2), then waiting 2 minutes, to see if it got work. If it didn't, keep increasing it a little at a time, until it does what you want.
Actually, you should be able to get it to fetch work if you increase the Minimum buffer to be a bit larger than the current "Remaining (estimated)" time of the current task, but again, you really should set it to be a bit higher than the largest "Remaining (estimated)" time possible for a given task, to ensure you always have 2 tasks to run.
What other projects are you attached to? Are any of them GPU projects that get work? Are you sure you have network activity allowed? (BOINC -> Activity -> Network)? Are you sure you're setting the "Minimum work buffer"? |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>
Personally I'd leave the report line in and try adding this one:
<fetch_on_update>1</fetch_on_update>
See if it makes a difference. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Personally I'd leave the report line in and try adding this one:
<fetch_on_update>1</fetch_on_update>
See if it makes a difference.
Did that, suspended GPUGRID, exited BOINC, restarted BOINC, restarted GPUGRID, issued the "Update" command a few times. Nada...
In answer to Jacob's questions:
I upped the minimum buffer several time. Nothing. In a fit of pique I set it to 10 days! Nothing.
I am attached to POEM but it is set to get no work.
BOINC -> Activity -> Network = activity always available.
I confirm I'm setting the "Minimum work buffer", in Tools / Computing Preferences.
____________
|
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
What does it say in the BOINC log when you manually update GPUGrid? |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
What does it say in the BOINC log when you manually update GPUGrid?
____________
|
|
|
|
Alright, let's try this:
Set your cc_config to show work_fetch_debug, using something like this:
---------------------------------------------------------------
<cc_config>
<log_flags>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
<options>
</options>
</cc_config>
---------------------------------------------------------------
Then restart BOINC, and watch Event Viewer to see work fetch run. You'll see it say "[work_fetch] work fetch start" at the beginning of a run.
In the Event Viewer, select all the lines from "Starting BOINC" all the way until the 2nd instance of "[work_fetch] work fetch start", so that we include all the messages from the entire first run.
Then hit "Copy selected", and then paste here in a reply. I'm generally very good at reading these work fetch logs. |
|
|
|
Oh, well there's your answer. If it says you've reached a limit on tasks in progress... BOINC did ask for work, but GPUGrid only lets you have so many tasks. I see (by clicking your name and looking at your computers) that you currently have 2 in progress; perhaps they only allow 2. Do you need more than 2? |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Oh, well there's your answer. If it says you've reached a limit on tasks in progress... BOINC did ask for work, but GPUGrid only lets you have so many tasks. I see (by clicking your name and looking at your computers) that you currently have 2 in progress; perhaps they only allow 2. Do you need more than 2?
No! All I want is what I got in the 1X scenario; a new WU a few minutes before the active WU finishes [in the X2 scenario, before one of the two active WUs finishes]. |
|
|
|
So, you DO need more than 2. You need a 3rd one to be downloaded, a couple minutes before either of the 2 running ones are done... right? :)
I wonder if GPUGrid has things setup to only allow 2-tasks-per-GPU. (For reference, I've had 5 or 6 tasks in progress before, but my system has 3 GPUs).
I'm not sure if I'll be able to help you test/confirm. To do that, I think I'd have to pull some GPUs out. But it sounds like your setup will probably be asking GPUGrid for work a lot, and not getting any, which is not ideal. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Alright, let's try this:
Set your cc_config to show work_fetch_debug, using something like this: etc...
25/05/2013 15:35:29 | | Starting BOINC client version 7.0.64 for windows_x86_64
25/05/2013 15:35:29 | | log flags: file_xfer, sched_ops, task, work_fetch_debug
25/05/2013 15:35:29 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
25/05/2013 15:35:29 | | Data directory: C:\ProgramData\BOINC
25/05/2013 15:35:29 | | Running under account TOM
25/05/2013 15:35:29 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [Family 6 Model 26 Stepping 5]
25/05/2013 15:35:29 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe
25/05/2013 15:35:29 | | OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
25/05/2013 15:35:29 | | Memory: 5.99 GB physical, 11.98 GB virtual
25/05/2013 15:35:29 | | Disk: 465.76 GB total, 170.61 GB free
25/05/2013 15:35:29 | | Local time is UTC +2 hours
25/05/2013 15:35:29 | | VirtualBox version: 4.1.10
25/05/2013 15:35:29 | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 320.18, CUDA version 5.50, compute capability 3.0, 2048MB, 1903MB available, 1982 GFLOPS peak)
25/05/2013 15:35:29 | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 320.18, device version OpenCL 1.1 CUDA, 2048MB, 1903MB available, 1982 GFLOPS peak)
25/05/2013 15:35:29 | GPUGRID | Found app_config.xml
25/05/2013 15:35:29 | Poem@Home | URL http://boinc.fzk.de/poem/; Computer ID 149338; resource share 75
25/05/2013 15:35:29 | GPUGRID | URL http://www.gpugrid.net/; Computer ID 151159; resource share 100
25/05/2013 15:35:29 | GPUGRID | General prefs: from GPUGRID (last modified 27-Apr-2013 11:10:03)
25/05/2013 15:35:29 | GPUGRID | Computer location: home
25/05/2013 15:35:29 | GPUGRID | General prefs: no separate prefs for home; using your defaults
25/05/2013 15:35:29 | | Reading preferences override file
25/05/2013 15:35:29 | | Preferences:
25/05/2013 15:35:29 | | max memory usage when active: 3067.49MB
25/05/2013 15:35:29 | | max memory usage when idle: 5521.48MB
25/05/2013 15:35:29 | | max disk usage: 10.00GB
25/05/2013 15:35:29 | | max CPUs used: 4
25/05/2013 15:35:29 | | (to change preferences, visit a project web site or select Preferences in the Manager)
25/05/2013 15:35:29 | | [work_fetch] Request work fetch: Prefs update
25/05/2013 15:35:29 | | [work_fetch] Request work fetch: Startup
25/05/2013 15:35:29 | | Not using a proxy
25/05/2013 15:35:29 | | [work_fetch] work fetch start
25/05/2013 15:35:29 | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 1
25/05/2013 15:35:29 | | [work_fetch] no eligible project for NVIDIA
25/05/2013 15:35:29 | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
25/05/2013 15:35:29 | | [work_fetch] no eligible project for CPU
25/05/2013 15:35:29 | | [work_fetch] ------- start work fetch state -------
25/05/2013 15:35:29 | | [work_fetch] target work buffer: 86400.00 + 0.00 sec
25/05/2013 15:35:29 | | [work_fetch] --- project states ---
25/05/2013 15:35:29 | Poem@Home | [work_fetch] REC 244.210 prio 0.000000 can't req work: suspended via Manager
25/05/2013 15:35:29 | GPUGRID | [work_fetch] REC 39539.005 prio -0.203740 can't req work: suspended via Manager
25/05/2013 15:35:29 | | [work_fetch] --- state for CPU ---
25/05/2013 15:35:29 | | [work_fetch] shortfall 345600.00 nidle 4.00 saturated 0.00 busy 0.00
25/05/2013 15:35:29 | Poem@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
25/05/2013 15:35:29 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
25/05/2013 15:35:29 | | [work_fetch] --- state for NVIDIA ---
25/05/2013 15:35:29 | | [work_fetch] shortfall 86400.00 nidle 1.00 saturated 0.00 busy 0.00
25/05/2013 15:35:29 | Poem@Home | [work_fetch] fetch share 0.000
25/05/2013 15:35:29 | GPUGRID | [work_fetch] fetch share 0.000
25/05/2013 15:35:29 | | [work_fetch] ------- end work fetch state -------
25/05/2013 15:35:29 | | [work_fetch] No project chosen for work fetch
25/05/2013 15:35:47 | GPUGRID | project resumed by user
25/05/2013 15:35:47 | | [work_fetch] Request work fetch: project resumed
25/05/2013 15:35:48 | GPUGRID | Restarting task 306px11x1-NOELIA_klebe_run2-2-3-RND3462_0 using acemdlong version 618 (cuda42) in slot 1
25/05/2013 15:35:48 | GPUGRID | Restarting task I2HDQ_6R14-SDOERR_2HDQd-1-4-RND2918_0 using acemdlong version 618 (cuda42) in slot 0
|
|
|
|
If you read that work fetch log, you'll see that the reason it didn't fetch work there, was because "can't req work: suspended via Manager"
Anyway, for your scenario, you do want a 3rd task. However, I have just confirmed (by pulling 2 of my 3 GPUs out of the system) that GPUGrid does only allow 2 in-progress tasks on a 1-GPU-system.
That being said... when the "completed" task gets reported, if you have a sufficiently high minimum buffer at that point, then it might get a new task.
So... try keeping your "Minimum work buffer" high enough (maybe set to 2 days), but then let it run long enough to complete a task. Don't do anything manual (don't click Update, don't suspend it, just let it go), and watch what it does. With a 2-day-buffer, does it automatically request and get a new task at the same time as reporting the completed one?
Please report your findings. I also am doing this test, since I'm curious. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
If you read that work fetch log, you'll see that the reason it didn't fetch work there, was because "can't req work: suspended via Manager"
Anyway, for your scenario, you do want a 3rd task. However, I have just confirmed (by pulling 2 of my 3 GPUs out of the system) that GPUGrid does only allow 2 in-progress tasks on a 1-GPU-system.
Surely it's not just my scenario? Anyone running 2X wants a third WU before the WU that's about to complete actually completes. Yes?
That being said... when the "completed" task gets reported, if you have a sufficiently high minimum buffer at that point, then it might get a new task.
As I said this morning, I got a new WU when the completed task had been uploaded - reported - missing an hour+ of crunching. That was with a buffer of 0.01. My buffer is now at two days. If BOINC reports at WU-finished time, not just WU uploaded time, there's a chance I'll now see a new WU downloaded as soon as one finishes. The proof of the pudding comes in about 100 minutes, when my most-advanced WU completes.
If, with a two day buffer, I still have to wait for a new WU until the upload is done, then I have to consider going back to 1X. |
|
|
|
If you read that work fetch log, you'll see that the reason it didn't fetch work there, was because "can't req work: suspended via Manager"
Anyway, for your scenario, you do want a 3rd task. However, I have just confirmed (by pulling 2 of my 3 GPUs out of the system) that GPUGrid does only allow 2 in-progress tasks on a 1-GPU-system.
Surely it's not just my scenario? Anyone running 2X wants a third WU before the WU that's about to complete actually completes. Yes?
Correct. This scenario probably fits anyone trying to do 2-tasks-on-1-GPU.
That being said... when the "completed" task gets reported, if you have a sufficiently high minimum buffer at that point, then it might get a new task.
As I said this morning, I got a new WU when the completed task had been uploaded - reported - missing an hour+ of crunching. That was with a buffer of 0.01. My buffer is now at two days. If BOINC reports at WU-finished time, not just WU uploaded time, there's a chance I'll now see a new WU downloaded as soon as one finishes. The proof of the pudding comes in about 100 minutes, when my most-advanced WU completes.
If, with a two day buffer, I still have to wait for a new WU until the upload is done, then I have to consider going back to 1X.
I hope it works out for you (and for me too!). But, even if it doesn't work, why would you have to go back to 1-at-a-time? Wouldn't you prefer "usually-2-at-a-time" instead of "always-1-at-a-time"? Just trying to understand your logic.
Regards,
Jacob |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
I hope it works out for you (and for me too!). But, even if it doesn't work, why would you have to go back to 1-at-a-time? Wouldn't you prefer "usually-2-at-a-time" instead of "always-1-at-a-time"? Just trying to understand your logic.
Regards,
Jacob
I've followed 2x on these fora this past month. Nice idea! But I'm no techie and I have not seen anything that nails down the performance improvements vs. 1X. I get the impression they are marginal.
If whatever the performance improvement is, minus the lost crunch time waiting for a completed WU to upload, is significantly positive, fine. If not? .......
That's my logic.
Cheers, Tom
|
|
|
|
If it is even only marginally better...
I'd still think that "usually 2-at-a-time but always at least 1-at-a-time" would be always better than "always just 1-at-a-time".
ie: Not having that 3rd task "ready" shouldn't make it any worse than running 1-at-a-time.
:) |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Not having that 3rd task "ready" shouldn't make it any worse than running 1-at-a-time.:)
I guess even you have yet to come to grips with measurable performance improvement..
|
|
|
|
Not having that 3rd task "ready" shouldn't make it any worse than running 1-at-a-time.:)
I guess even you have yet to come to grips with measurable performance improvement..
I have no idea what you meant by that. My testing has always showed increased GPU Load, which is the primary indicator of increased performance.
Also, I have confirmed performance improvement (by looking at task run times), on World Community Grid, and POEM@Home, by running multiple tasks on a single GPU, for those projects.
The only reason I don't have conclusive evidence for GPUGrid performance improvement is because I had to pause my testing (per Toni's request) for an unrelated problem.... and also because I had been running the GTX 460.
For now, I'm testing running GPUGrid only on the GTX 660 Ti, for the purposes of testing your "3rd task" condition, and also now to test on getting that conclusive evidence.
I'm hoping your results are positive. Since you too have increased GPU Load, I believe you will see increased task throughput.
Regards,
Jacob |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
I guess even you have yet to come to grips with measurable performance improvement..
I have no idea what you meant by that.
Note "measurable". What are the numbers? I do understand the problems of doing that, and I do commend you for your persistence, but until I see the numbers I won't be convinced.
I'm hoping your results are positive.
Not looking good. The competed WU is uploading but no new WU to take its place. In fact I did not get a new WU till the completed one uploaded, like before. But...
It occurs to me --- when a 2X WU completes and there is no third WU to take its place until the upload is done, does the remaining, active WU grab full control of the GPU even though its been given only 50% to work with?
I took this screen shot while the "third" WU was downloading and just one WU was performing:
Looks like the GPU is very busy!
If it is the case that the remaining WU grabs full control of the GPU, there's no contest - X2 probably wins! |
|
|
|
Right. So, when you say <gpu_usage>0.5</gpu_usage> ... all that is doing is telling BOINC how much to "consider allocated" for each task, for purposes of deciding how many tasks to start, therefor allowing you to do 2-at-a-time.
That app_config.xml setting does NOT LIMIT the GPU Usage or Load, in any way.
In fact, what you say is true; when one of the 2-at-a-time tasks gets done, the remaining task now utilizes the GPU Usage as if it was the only one ever running on it (to an extent; I believe I have proof that the task gets started in a certain "mode" based on GPU RAM available, which may possibly have an affect on GPU Load, but hopefully doesn't.)
You can test this by suspending certain tasks while watching GPU Load (though, I caution you, suspending a NOELIA task can crash the drivers and make GPU tasks error out, even tasks on other GPUs or tasks doing work for other projects.)
So... as an example:
If you had 3 GPUGrid tasks:
- Task A: normally gets 63% GPU Load
- Task B: normally gets 84% GPU Load
- Task C: normally gets 79% GPU Load; not downloaded yet
then...
The walkthrough of the scenario is:
You are running Tasks A and B on the same GPU, getting 93% GPU Load... Task A gets done, so uploading Task A. During that upload, only Task B is running, at 84% GPU Load. Then, once you get the Task C downloaded, you can run both B and C together, at 98% GPU Load.
Hope that makes sense. That's the behavior I'm used to seeing on other projects, and would expect to see here too.
Also, I see you consider 91% GPU Load "very busy". I would consider 98% to be "very busy" :) You might be interested in running eVGA Precision X; although it was designed for overclocking, I use it as a very handy tool for showing monitoring history over time, and putting GPU Temps as icons in my system tray. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
At 3:05am BOINC reported a failed NOELIA. No WU downloaded to replace it.
It was not until 5:07am, when BOINC reported the completion of the other WU, that two NOELIAs were downloaded. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
What's your cache at?
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
What's your cache at?
Tell me how I find that info... Ta. |
|
|
|
I have concluded my test (where I had only 1 active GPU, which was processing 2-tasks-at-once, with a 1.5 day min buffer, and wanted to see when the 3rd new task gets started). Note, I'm using BOINC v7.1.1 alpha, which includes a major work fetch tweaking as compared to the v7.0.64 public release.
Assuming I'm reading the logs below correctly, here is what I see:
26-May-2013 03:20:29: Computation for Task 2 finished
26-May-2013 03:20:29: At this time, GPUGrid was in "resource backoff", meaning we couldn't ask it for NVIDIA work, because the last time we asked it for NVIDIA work we didn't get any (because of the maximum-2-in-progress-per-GPU server-side rule), and BOINC automatically creates an incrementing backoff timer when that happens. The resource backoff was still effective for 3314.63 secs (~55 minutes)
26-May-2013 03:20:43: Upload of Task 2 results started
26-May-2013 03:32:59: GPUGrid was still in "resource backoff" for 2564.79 secs (~43 minutes)
26-May-2013 03:33:42: Upload of Task 2 results finished
26-May-2013 03:33:44: BOINC reported the GPUGrid task, and piggybacked a request for NVIDIA work (since there were no other contactable projects that supported NVIDIA work, and we still needed some). Note: This piggyback request should also happen on v7.0.64 also, but I cannot guarantee that.
26-May-2013 03:33:46: RPC completed; BOINC did get 1 new task from GPUGrid
26-May-2013 03:33:48: Download of Task 3 started
26-May-2013 03:34:09: Download of Task 3 finished; Task 3 started processing
So, according to this...
There was a ~14 minute "layover" where BOINC was only allowed to run 1 task on the GPU, due to GPUGrid's server-side limitation. But it did gracefully handle the scenario, did eventually get the 3rd task, and started it promptly. It worked as I expected it to work, given the server-side limitation, but it's not optimal, because we should be allowed to keep the GPU continuously fully loaded with 2 tasks. :(
I wonder if we can convince GPUGrid to relax the limit, to max-3-in-progress-per-GPU, instead of max-2-in-progress-per-GPU? In theory, that should close this gap, as it would allow the 3rd task to be downloaded/ready whenever the min-buffer says the client needed it (earlier than Task 2 completion).
Full log snippet:
26-May-2013 03:20:29 [---] [work_fetch] Request work fetch: application exited
26-May-2013 03:20:29 [GPUGRID] Computation for task I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1 finished
26-May-2013 03:20:29 [---] [work_fetch] work fetch start
26-May-2013 03:20:29 [---] [work_fetch] ------- start work fetch state -------
26-May-2013 03:20:29 [---] [work_fetch] target work buffer: 129600.00 + 8640.00 sec
26-May-2013 03:20:29 [---] [work_fetch] --- project states ---
26-May-2013 03:20:29 [GPUGRID] [work_fetch] REC 261498.750 prio -13.919910 can req work
26-May-2013 03:20:29 [---] [work_fetch] --- state for CPU ---
26-May-2013 03:20:29 [---] [work_fetch] shortfall 0.00 nidle 0.00 saturated 149576.75 busy 0.00
26-May-2013 03:20:29 [GPUGRID] [work_fetch] fetch share 0.000 (no apps)
26-May-2013 03:20:29 [---] [work_fetch] --- state for NVIDIA ---
26-May-2013 03:20:29 [---] [work_fetch] shortfall 122118.58 nidle 0.50 saturated 0.00 busy 0.00
26-May-2013 03:20:29 [GPUGRID] [work_fetch] fetch share 0.000 (resource backoff: 3314.63, inc 19200.00)
26-May-2013 03:20:29 [---] [work_fetch] ------- end work fetch state -------
26-May-2013 03:20:29 [---] [work_fetch] No project chosen for work fetch
26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_0
26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_1
26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_2
26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_3
26-May-2013 03:21:06 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_0
26-May-2013 03:21:06 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_7
26-May-2013 03:21:07 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_7
26-May-2013 03:21:07 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_9
26-May-2013 03:21:12 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_3
26-May-2013 03:21:12 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_10
26-May-2013 03:21:13 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_10
26-May-2013 03:21:38 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_1
26-May-2013 03:21:38 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_2
26-May-2013 03:32:59 [---] [work_fetch] work fetch start
26-May-2013 03:32:59 [---] [work_fetch] ------- start work fetch state -------
26-May-2013 03:32:59 [---] [work_fetch] target work buffer: 129600.00 + 8640.00 sec
26-May-2013 03:32:59 [---] [work_fetch] --- project states ---
26-May-2013 03:32:59 [GPUGRID] [work_fetch] REC 261379.396 prio -13.917504 can req work
26-May-2013 03:32:59 [---] [work_fetch] --- state for CPU ---
26-May-2013 03:32:59 [---] [work_fetch] shortfall 0.00 nidle 0.00 saturated 148218.82 busy 0.00
26-May-2013 03:32:59 [GPUGRID] [work_fetch] fetch share 0.000 (no apps)
26-May-2013 03:32:59 [---] [work_fetch] --- state for NVIDIA ---
26-May-2013 03:32:59 [---] [work_fetch] shortfall 123046.19 nidle 0.50 saturated 0.00 busy 0.00
26-May-2013 03:32:59 [GPUGRID] [work_fetch] fetch share 0.000 (resource backoff: 2564.79, inc 19200.00)
26-May-2013 03:32:59 [---] [work_fetch] ------- end work fetch state -------
26-May-2013 03:32:59 [---] [work_fetch] No project chosen for work fetch
26-May-2013 03:33:42 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_9
26-May-2013 03:33:42 [---] [work_fetch] Request work fetch: project finished uploading
26-May-2013 03:33:44 [---] [work_fetch] ------- start work fetch state -------
26-May-2013 03:33:44 [---] [work_fetch] target work buffer: 129600.00 + 8640.00 sec
26-May-2013 03:33:44 [---] [work_fetch] --- project states ---
26-May-2013 03:33:44 [GPUGRID] [work_fetch] REC 261379.396 prio -13.917369 can req work
26-May-2013 03:33:44 [---] [work_fetch] --- state for CPU ---
26-May-2013 03:33:44 [---] [work_fetch] shortfall 0.00 nidle 0.00 saturated 148162.47 busy 0.00
26-May-2013 03:33:44 [GPUGRID] [work_fetch] fetch share 0.000 (no apps)
26-May-2013 03:33:44 [---] [work_fetch] --- state for NVIDIA ---
26-May-2013 03:33:44 [---] [work_fetch] shortfall 123101.89 nidle 0.50 saturated 0.00 busy 0.00
26-May-2013 03:33:44 [GPUGRID] [work_fetch] fetch share 1.000
26-May-2013 03:33:44 [---] [work_fetch] ------- end work fetch state -------
26-May-2013 03:33:44 [GPUGRID] [work_fetch] set_request() for NVIDIA: ninst 1 nused_total 0.500000 nidle_now 0.500000 fetch share 1.000000 req_inst 0.500000 req_secs 123101.894609
26-May-2013 03:33:44 [GPUGRID] [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (123101.89 sec, 0.50 inst)
26-May-2013 03:33:44 [GPUGRID] Sending scheduler request: To report completed tasks.
26-May-2013 03:33:44 [GPUGRID] Reporting 1 completed tasks
26-May-2013 03:33:44 [GPUGRID] Requesting new tasks for NVIDIA
26-May-2013 03:33:46 [GPUGRID] Scheduler request completed: got 1 new tasks
26-May-2013 03:33:46 [---] [work_fetch] Request work fetch: RPC complete
26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-LICENSE
26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-COPYRIGHT
26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_1
26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_2
26-May-2013 03:33:49 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-LICENSE
26-May-2013 03:33:49 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-COPYRIGHT
26-May-2013 03:33:49 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_3
26-May-2013 03:33:49 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-pdb_file
26-May-2013 03:33:52 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_1
26-May-2013 03:33:52 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_3
26-May-2013 03:33:52 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-psf_file
26-May-2013 03:33:52 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-par_file
26-May-2013 03:33:55 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_2
26-May-2013 03:33:55 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-conf_file_enc
26-May-2013 03:33:56 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-par_file
26-May-2013 03:33:56 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-conf_file_enc
26-May-2013 03:33:56 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-metainp_file
26-May-2013 03:33:56 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_7
26-May-2013 03:33:57 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-metainp_file
26-May-2013 03:33:57 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_7
26-May-2013 03:33:57 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_10
26-May-2013 03:33:58 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_10
26-May-2013 03:34:06 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-pdb_file
26-May-2013 03:34:09 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-psf_file
26-May-2013 03:34:09 [GPUGRID] Starting task I61R18-NATHAN_dhfr36_5-19-32-RND7448_0 using acemdlong version 618 (cuda42) in slot 11 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
What's your cache at?
Tell me how I find that info... Ta.
Boinc Manager (Advanced View), Tools, Computing Preferences, network usage tab, minimum work buffer + maximum additional work buffer.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
At 3:05am BOINC reported a failed NOELIA. No WU downloaded to replace it.
It was not until 5:07am, when BOINC reported the completion of the other WU, that two NOELIAs were downloaded.
Did you have the work_fetch_debug option on at that time? If so, can you provide the log for the work-fetch sequence where the failed task was reported? It will hopefully be able to tell us why a request for work was not also piggybacked. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Boinc Manager (Advanced View), Tools, Computing Preferences, network usage tab, minimum work buffer + maximum additional work buffer.
Ah. Didn't know the buffer was also called the cache. 2 days.
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Did you have the work_fetch_debug option on at that time? If so, can you provide the log for the work-fetch sequence where the failed task was reported? It will hopefully be able to tell us why a request for work was not also piggybacked.
I have the debug log. The failed task was reported at 03:05:40. I've endlessly scrolled the log around that time, looking for the failure. It doesn't help that I don't know what I'm looking for!!
|
|
|
|
Did you have the work_fetch_debug option on at that time? If so, can you provide the log for the work-fetch sequence where the failed task was reported? It will hopefully be able to tell us why a request for work was not also piggybacked.
I have the debug log. The failed task was reported at 03:05:40. I've endlessly scrolled the log around that time, looking for the failure. It doesn't help that I don't know what I'm looking for!!
So, a "work fetch sequence" starts at the text "work fetch start", and ends a few lines after the text "end work fetch state". I say a few lines after, because BOINC tells us the result of the sequence after that text.
For reference, a couple posts up (where I posted the conclusion of my test where I got that 3rd task), within it are a few "work fetch sequences".
You can either use the Event Log to find the relevant lines, or (if you have closed BOINC) you can find a copy of the logs stored to file in your Data directory (location is shown as a log entry at BOINC startup; I think the default location is C:\ProgramData\BOINC). There are actually 2 files: stdoutdae.txt has the most recent log events, and stdoutdae.old has older log events from the prior BOINC session.
What I'm interested in is the 2 "work fetch sequences" around time 03:05:40... the sequence right before that task was reported, and the sequence that occurred at the same time that task was reported.
Make sense? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The lazy way around this is to have another GPU in the system, an ATI, or maybe use cc_config to exclude a second NVidia for this project. While that should keep the work flowing, it's not a proper fix.
The project setting of no more than 2 WU's per GPU won't be changed.
It's not the problem anyway. The problem is that the "resource backoff" remains after a resource becomes free. It needs to be reset/zeroed.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
I wonder if we can convince GPUGrid to relax the limit, to max-3-in-progress-per-GPU, instead of max-2-in-progress-per-GPU? In theory, that should close this gap, as it would allow the 3rd task to be downloaded/ready whenever the min-buffer says the client needed it (earlier than Task 2 completion).
I suppose extending the limit straight to 3 tasks per WU would be detrimental overall. In this case anyone with a large work buffer setting would get 3 tasks. this increases WU turn around time (bad for the project) and makes people miss the credit bonus (bad for crunchers). The limit is there in the first place to ensure quick turn around of WUs.
You can argue that "I know my system can handle them in time" and "I'm running 2 WUs in parallel, so I want a 3rd task".. which leads to the problem that the server can't differentiate between regular users and such ones running 2 WUs in parallel. A possible solution would be to introduce the number of WUs per GPU as a "possibly dangerous" parameter in the profile, like they did at Einstein, so that server could allow up to 3 WUs only for such hosts.
However, the proejct team seems rather busy right now. And this might introduce support issues, as everytime some new error pops up we'd have to ask people to back to running single WUs and replicate the issue. Could be done, but makes things more complicated for little gain (I'm not saying "negligible" on purpose here).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
The lazy way around this is to have another GPU in the system, an ATI, or maybe use cc_config to exclude a second NVidia for this project. While that should keep the work flowing, it's not a proper fix.
The project setting of no more than 2 WU's per GPU won't be changed.
It's not the problem anyway. The problem is that the "resource backoff" remains after a resource becomes free. It needs to be reset/zeroed.
I believe you're wrong. The problem is that GPUGrid won't give a 3rd task, until 1 of the 2 other tasks is reported.
I have privately emailed some of the GPUGrid admins, requesting the change to max-3-in-progress-per-GPU. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I'm just thinking about the gap between when a WU is reported and a new one is downloaded - your ~14 minute "layover"
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
BTW: running 2 WUs in parallel should be quite good for regular short quue tasks on higher end GPUs. Here GPU utilization was generally quite low, there's no problem with the bonus credit deadline and throughput could be increased significantly. Although these GPUs should be running long queue tasks anyway.
Well, the app_config could be set up this way. In fact, I might just change mine like this, just in case the long queue runs dry.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I suppose extending the limit straight to 3 tasks per WU would be detrimental overall. In this case anyone with a large work buffer setting would get 3 tasks. this increases WU turn around time (bad for the project) and makes people miss the credit bonus (bad for crunchers). The limit is there in the first place to ensure quick turn around of WUs.
The BOINC server-side-scheduler has mechanisms to allocate tasks appropriately. Won't it only possibly increase WU turn around time if a given application is out of tasks to allocate to computers requesting work? I'm not sure how often that happens, but even then, the server-side-scheduler can be setup to handle it gracefully I believe (possibly sending tasks to additional hosts in case the new host completes it first).
I don't see this limit-increase-request as detrimental.
I see it as logical and beneficial.
BTW: running 2 WUs in parallel should be quite good for regular short quue tasks on higher end GPUs. Here GPU utilization was generally quite low, there's no problem with the bonus credit deadline and throughput could be increased significantly. Although these GPUs should be running long queue tasks anyway.
Well, the app_config could be set up this way. In fact, I might just change mine like this, just in case the long queue runs dry.
MrS
My research indicates that long-run tasks usually get around 84-90% GPU Load on their own, but they get 98% GPU Load when ran combined.
Short-run tasks generally get much less GPU Load, as you stated. I think I saw 65% once. I don't currently have data on combining those yet, but I believe it would be very beneficial to combine them.
I don't think of this in terms of "getting bonus credit". I think of it in terms of "getting science done". I would hope that anyone using 0.5 gpu_usage would also think the same way, but if they were also concerned about bonus credits, obviously they'd have to do some research to see how quickly they can get tasks done.
I have set my app_config to 0.5 gpu_usage for all of the GPUGrid applications. My new goal is to keep the GPU Load of the 660 Ti as high as possible (running 2-at-a-time), even if it means I have to exclude work on the GTX 460 (which is now running SETI/Einstein, but not GPUGrid, since it only has 1GB, and cannot do 2-at-a-time). I wish I could specify "only do 2 at a time on THIS GPU", so that I could continue to do GPUGrid work on the GTX 460... and I will be asking the BOINC devs about that feature, when they redesign BOINC to treat each GPU as its own resource, instead of just "NVIDIA" as a resource. That is on their to-do list, believe it or not, and the goal is to get rid of all the necessity for GPU exclusions.
I may eventually switch it back to 1-at-a-time, so that the GTX 460 can again crunch GPUGrid (a project that I currently prefer, over SETI/Einstein).
Despite lots of people being against change, I suppose I'm an instigator in promoting change. I see a problem, I go after the fix. I see something untried and untested, I push it hard to see what happens. And now I'm rambling. :) |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
That would require more administration, put more strain on the server, and might require a server update. They are struggling to keep the work flowing at present, so fine-tuning to facilitate a handful of people who want to use app_config is very low priority.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
26/05/2013 03:05:08 | | [work_fetch] --- state for CPU ---
26/05/2013 03:05:08 | | [work_fetch] shortfall 691200.00 nidle 4.00 saturated 0.00 busy 0.00
26/05/2013 03:05:08 | Poem@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
26/05/2013 03:05:08 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
26/05/2013 03:05:08 | | [work_fetch] --- state for NVIDIA ---
26/05/2013 03:05:08 | | [work_fetch] shortfall 138786.29 nidle 0.00 saturated 15406.17 busy 0.00
26/05/2013 03:05:08 | Poem@Home | [work_fetch] fetch share 0.000
26/05/2013 03:05:08 | GPUGRID | [work_fetch] fetch share 0.000 (resource backoff: 14617.84, inc 19200.00)
26/05/2013 03:05:08 | | [work_fetch] ------- end work fetch state -------
26/05/2013 03:05:08 | | [work_fetch] No project chosen for work fetch
26/05/2013 03:06:08 | | [work_fetch] work fetch start
26/05/2013 03:06:08 | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
26/05/2013 03:06:08 | | [work_fetch] no eligible project for NVIDIA
26/05/2013 03:06:08 | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
26/05/2013 03:06:08 | | [work_fetch] no eligible project for CPU
26/05/2013 03:06:08 | | [work_fetch] ------- start work fetch state -------
26/05/2013 03:06:08 | | [work_fetch] target work buffer: 172800.00 + 0.00 sec
26/05/2013 03:06:08 | | [work_fetch] --- project states ---
26/05/2013 03:06:08 | Poem@Home | [work_fetch] REC 236.219 prio 0.000000 can't req work: suspended via Manager
26/05/2013 03:06:08 | GPUGRID | [work_fetch] REC 40724.365 prio -1.103737 can req work
26/05/2013 03:06:08 | | [work_fetch] --- state for CPU ---
26/05/2013 03:06:08 | | [work_fetch] shortfall 691200.00 nidle 4.00 saturated 0.00 busy 0.00
26/05/2013 03:06:08 | Poem@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
26/05/2013 03:06:08 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
26/05/2013 03:06:08 | | [work_fetch] --- state for NVIDIA ---
26/05/2013 03:06:08 | | [work_fetch] shortfall 138844.32 nidle 0.00 saturated 15346.68 busy 0.00
26/05/2013 03:06:08 | Poem@Home | [work_fetch] fetch share 0.000
26/05/2013 03:06:08 | GPUGRID | [work_fetch] fetch share 0.000 (resource backoff: 14557.83, inc 19200.00)
26/05/2013 03:06:08 | | [work_fetch] ------- end work fetch state -------
26/05/2013 03:06:08 | | [work_fetch] No project chosen for work fetch
26/05/2013 03:07:08 | | [work_fetch] work fetch start
26/05/2013 03:07:08 | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
26/05/2013 03:07:08 | | [work_fetch] no eligible project for NVIDIA
26/05/2013 03:07:08 | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
26/05/2013 03:07:08 | | [work_fetch] no eligible project for CPU
|
|
|
|
That would require more administration,
For all you know, it could be as simple as updating an integer column in a database.
put more strain on the server,
Actually, wouldn't it put less strain? Right now, even 1-at-a-time crunchers are requesting work from GPUGrid, and being denied, which uses network resources. BOINC has a resource backoff, sure, but increasing the limit of max-per-GPU would actually help this scenario. Perhaps you were referring to the task scheduler, which may be already setup to resend tasks to additional hosts if needed. I suppose it's possible that increasing the limit may add strain there, but only if we were completely running out of tasks frequently, I believe.
and might require a server update.
If updating a limit has become that hard to implement, then they have implemented the limit incorrectly. I doubt that's the case.
They are struggling to keep the work flowing at present, so fine-tuning to facilitate a handful of people who want to use app_config is very low priority.
This, too, is fine. I'm used to getting the cold shoulder from the GPUGrid admins, by now. I expect the possibility that my request will go completely ignored. But I believe it's a valid request, and so, I privately asked them anyway.
- Jacob
|
|
|
|
tomba:
I don't see the portion where the task was reported.
You said you knew the time the task was reported, but... is that correct?
Maybe you were off by a few hours, ie: maybe it used UTC time, but you're in a different timezone?
In the logs, we should see something that says:
"[GPUGRID] Sending scheduler request: To report completed tasks."
and
"[GPUGRID] Reporting 1 completed tasks"
Along with it should be a work fetch sequence.
That's the sequence we need to see.
Can you find it? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Running two GPUGrid tasks at a time is NOT beneficial to the project or most crunchers - overall, it is detrimental to both.
Presently there is only 1 known circumstance where it's faster for some WU types, and that doesn't factor in the observed increase in error rates:
On GPU's with larger amounts of memory (3GB or more) running on WDDM systems that are poorly optimized for GPU crunching (high CPU usage by CPU projects) - Jacob's 3GB GTX660Ti
I would advise against running the present Short WU's two at a time, because of their runtime and the fact that there is more credit to be had from running Long WU's. On a GTX660Ti (W7) it takes ~5.5h to complete a short WU. Even if it only took 10h to complete two WU's, you would be much better off running Long WU's because of the obvious credit increase. Also, and generally speaking, the contribution of running Long WU's over any extended period of time is more important to the project than running short WU's. That's why the credit system exists in it's present form.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
|
Getting back on topic... it occurs to me that an investigation into CPU Time does need to happen.
On POEM tasks, as well as GPUGrid tasks, although overall task time is improved when running 2-at-a-time (from my testing), the overall CPU time is higher when running 2-at-a-time.
It appears that things might work this way, for a GPU task that also utilizes a full core:
- When a task is being processed on its own, the CPU core is fully utilized, and the GPU Load is a certain % (say 85%).
- When the task is being processed in tandem with another task on the GPU, aka 2-per-GPU, the CPU core is still fully utilized, but the GPU Load for the task is something akin to (98% / 2 = 49%). So, it will take less-than-double the time to complete, but during that time, the CPU is being fully used.
- I'm not sure if actual computational work is being done in the CPU process, or if it's just a "feeder" to feed/poll the GPU, to keep the GPU busy.
- The results indicate that, though the tasks are completing faster overall at 2-per-GPU, more CPU is being used to get those results.
This is a concern for any cruncher that also crunches CPU projects; that CPU may be being wasted.
So, the "benefits" of running 2-at-a-time may actually be dependent upon the user's preference of sacrificing some CPU time (up to a full core per additional GPU task) to achieve the increased task throughput.
Note: I'm not talking about changing the BOINC preference for "use at most x% of the processors". I'm talking about per-task CPU time for the GPU tasks that are completed 2-per-GPU.
This has caused me to re-evaluate my current 5-per-GPU strategy for POEM (whose GPU tasks always use a full core), and re-evaluate my current 2-per-GPU strategy for GPUGrid (whose GPU tasks only sometimes use a full core). In this re-evaluation, I believe I am going to have to come up with a personal tolerance level of how much CPU I'm willing to sacrifice. ie: I don't think there's an objective way to approach this, to where it won't depend on user preference... is there?
Hope this makes sense to somebody. :)
Logical input would be appreciated. |
|
|
|
Moderation notice: flames went a bit high in this thread, so it was hidden while things burned down. By now 2 posts are missing, but all the relevant information should still be there.
Back to topic:
Jacob wrote: Won't it only possibly increase WU turn around time if a given application is out of tasks to allocate to computers requesting work? I'm not sure how often that happens, but even then, the server-side-scheduler can be setup to handle it gracefully I believe (possibly sending tasks to additional hosts in case the new host completes it first).
I don't see this limit-increase-request as detrimental.
I see it as logical and beneficial.
You're right that the change in WU turn around time does not matter/harm as long as the number of parallel searches in progress is larger or equal to the number of attached cards times 2 (now) or times 3 (increased limit). I don't know how close we are to this limit, but in recent times there seems to have been plenty of work available, so we might be safe. The project staff would have to monitor and decide in this issue, possibly even adjusting things back if it doesn't work out (any more).
Well, actually it wouldn't be neccessary to go to straight "3 WUs per GPU", I think "2*nGPU + 1" would suffice in all but extreme cases (multiple GPUs, very slow upload).
That seems like a reasonable change to me, but it would have to be communicated appropriately (at least in the news, maybe also pushing it via the BOINC message system). The point I'd be afraid of is this: people running GPU-Grid probably set their cache sizes according to their CPU projects, as long as they still get the credit bonus. At such "typical" settings BOINC would go for a straight 2-WU cache, which might make them miss the bonus credits. We'd need to be careful to avoid this, otherwise there'll be far more harm done by annoying crunchers than throughput gained by having the 3rd WU around.
We might want to run some further tests before pushing for the 3-WU-cache. To begin, quantifying the throughput increase for some long-runs would be nice. GPU utilization sure goes up, so there must be some increase. Ah, if only those SMX's could work on entirely different tasks! But that's not available below Titan.
Some numbers from me:
By now I've changed my app_config to 0.5 for the long-runs as well. Let's see how well it goes. Although I've got a special case since I actually want to run as much POEM as I can, so occasionally I'm running those WUs. Meaning in the last few days I was missing the deadline for credit bonus anyway (since BOINC caches a 2nd GPU-Grid WU far too early for me) and now I'll have widely varying configurations running, depending on how many POEMs I get:
- 5 or more POEMs: up to 8 POEMs run
- 1 to 4 POEMs: these POEMs and 1 GPU-Grid task
- 0 POEMs: 2 GPU-Grid tasks
By now I observed the following: 1 KIDc22_SOD would use ~85% GPU, with 2 POEMs (which themselves would lead to pretty low utilization) along results in 95 - 97% utilization. This must be better, although it will be hard for me to quantify anything.
Alright, now together with a regular KIDc22: GPU usage 98% (wow, haven't seem anything like this before over here!), overall GPU memory used 760 MB (fine - but why so low? Shouldn't it roughly double?) and GPU power consumption is slightly up (~62% -> 68%, again the highest I have seen here).
@SK: I can't remember any report of an increased error rate due to running 2 WUs in parallel. The bug discovered by Jacob some time ago was totally unrelated to this.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
... it occurs to me that an investigation into CPU Time does need to happen.
On POEM tasks, as well as GPUGrid tasks, although overall task time is improved when running 2-at-a-time (from my testing), the overall CPU time is higher when running 2-at-a-time.
...
This is a concern for any cruncher that also crunches CPU projects; that CPU may be being wasted.
Due to the much larger CPU times involved (and possibly wasted) with running 2-at-a-time for GPUGrid... I think 2-at-a-time should be used only if you are okay giving GPUGrid preference to use the CPU (up to a full core, per task), over any attached CPU projects.
For me, since GPUGrid already gets a lot of my machine's resources (usually 2 GPUs) and credits (approximately 75%), I am not okay with this. I want CPU resources to be available for my other 12 CPU projects.
So, I have personally decided to stick with 1-at-a-time, for GPUGrid. This will also allow me to use both my GTX 660 Ti, and my GTX 460, to do GPUGrid work (which I would prefer), rather than forcing me to allocate the GTX 460 to some other GPU project (which I would not prefer).
Also, as a side note, because of the larger CPU times involved with running x-at-a-time for POEM@Home... I have also adjusted it: from doing 5-at-a-time, to doing 3-at-a-time. I had been keeping track of POEM task times, including per-task-run-time and per-task-CPU time, as ran on my GTX 660 Ti, and I think I finally have enough data to justify my decision to move from 5 to 3. Details are below.
POEM@Home x-at-a-time task times:
1 POEM task on 660 Ti, alongside other tasks at full load:
- Attempt 1:
- Task Run time (sec): 929.8
- Task CPU time (sec): 902.3
- Task complete every: 929.8
- Attempt 2: 5/27/2013 (it only ran at 1045 Mhz)
- Task Run time (sec): 1,127.50
- Task CPU time (sec): 960.76
- Task complete every: 1,127.50
- Attempt 3: 5/27/2013 (it only ran at 1045 Mhz)
- Task Run time (sec): 1,082.05
- Task CPU time (sec): 955.13
- Task complete every: 1,082.05
2 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
- Task Run time (sec): 1,062.62
- Task CPU time (sec): 1,021.05
- Task complete every: 531.31
- Attempt 2: 5/26/2013
- Task Run time (sec): 1,234.60
- Task CPU time (sec): 1,056.06
- Task complete every: 617.3
- Attempt 3: 5/27/2013
- Task Run time (sec): 1,201.19
- Task CPU time (sec): 1,036.07
- Task complete every: 600.595
- Attempt 4: 5/27/2013 1241 Mhz
- Task Run time (sec): 1,190.03
- Task CPU time (sec): 1,027.76
- Task complete every: 595.015
3 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
- Task Run time (sec): 1,405.38
- Task CPU time (sec): 1,337.66
- Task complete every: 468.46
- Attempt 2:
- Task Run time (sec): 1,295.70
- Task CPU time (sec): 1,205.33
- Task complete every: 431.9
- Attempt 3:
- Task Run time (sec): 1,233.04
- Task CPU time (sec): 1,197.50
- Task complete every: 411.01333333333333333333333333333
- Attempt 4:
- Task Run time (sec): 1,345.84
- Task CPU time (sec): 1,207.16
- Task complete every: 448.61333333333333333333333333333
- Attempt 5:
- Task Run time (sec): 1,584.40
- Task CPU time (sec): 1,383.26
- Task complete every: 528.13333333333333333333333333333
- Attempt 6: 5/26/2013
- Task Run time (sec): 1,412.456667
- Task CPU time (sec): 1,190.23
- Task complete every: 470.818889
- Attempt 7: 5/26/2013
- Task Run time (sec): 1,348.02
- Task CPU time (sec): 1,142.396667
- Task complete every: 449.34
- Attempt 8: 5/27/2013
- Task Run time (sec): 1,417.43
- Task CPU time (sec): 1,194.49
- Task complete every: 472.47666666666666666666666666667
- Attempt 9: 5/27/2013
- Task Run time (sec): 1,361.78
- Task CPU time (sec): 1,162.97
- Task complete every: 453.92666666666666666666666666667
4 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
- Task Run time (sec): 1,464.20
- Task CPU time (sec): 1,364.09
- Task complete every: 366.05
- Attempt 2:
- Task Run time (sec): 1,596.06
- Task CPU time (sec): 1,378.56
- Task complete every: 399.015
- Attempt 3:
- Task Run time (sec): 1,542.45
- Task CPU time (sec): 1,308.56
- Task complete every: 385.6125
- Attempt 4: 5/27/2013 1241Mhz
- Task Run time (sec): 1,670.58
- Task CPU time (sec): 1,340.23
- Task complete every: 417.645
5 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
- Task Run time (sec): 1,801.34
- Task CPU time (sec): 1,580.75
- Task complete every: 360.268
- Attempt 2:
- Task Run time (sec): 1,752.97
- Task CPU time (sec): 1,535.52
- Task complete every: 350.594
- Attempt 3:
- Task Run time (sec): 1,822.53
- Task CPU time (sec): 1,574.04
- Task complete every: 364.506
6 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
- Task Run time (sec): 2,200.69
- Task CPU time (sec): 1,988.87
- Task complete every: 366.78166666666666666666666666667
- Attempt 2:
- Task Run time (sec): 2,138.92
- Task CPU time (sec): 1,817.86
- Task complete every: 356.48666666666666666666666666667
|
|
|
|
Right, I forgot to comment on the CPU usage. For GPU-Grid on Keplers this is mostly polling the GPU, I think, since the CPU usage on older cards is so much lower. POEM does some actual number crunching on the CPU as well (which would be far slower on the GPU).
And you're right in that this is a general tradeoff everyone has to decide for themselves: how much CPU time am I willing to give up in order to feed my GPU better. One criterion would be overall RAC, where "feeding the GPU" should normally win. On the other hand one could argue that CPU credits are worth more than GPU credits. I couldn't see any better rule here than "do what you want".
Regarding POEM: back when there was still enough work I made the tests on my system and found maximum throughput at 867.9k RAC with 8 concurrent tasks. I didn't write the other numbers down, but progression was rather flat at the top, maybe a few 1000's per day more for an additional thread. Hence I view any CPU work being done on this host as bonus (since with full POEM supply it wouldn't do any at all), so I'm fine sacrificing another core if it helps overall throughput (my team needs it, badly ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Well, actually it wouldn't be neccessary to go to straight "3 WUs per GPU", I think "2*nGPU + 1" would suffice in all but extreme cases (multiple GPUs, very slow upload).
Well.. I am positive some people do in fact have really slow upload speeds (where it could take an hour to upload a task's result). And, if the user was running 2-at-a-time, the worst case would be if both tasks complete at the same time. Ideally, it would actually be nice to have 2 tasks on-hand to start up, while the 2 results are being uploaded (so, server max-in-progress-of-4-per-GPU)... but if only 1 task was available (max-in-progress-of-3-per-GPU), then the GPU could still be worked 1-at-a-time until a 2nd task became available.
In regards to implementation, it COULD be implemented as a project web preference, but then, what if a user has tons of various computers with various combinations of GPUs, and they only want to increase their limit for a specific computer? This is why I don't like the idea of having this as an option within web preferences, or even location-specific (work/school/etc) web preferences.
The point I'd be afraid of is this: people running GPU-Grid probably set their cache sizes according to their CPU projects, as long as they still get the credit bonus. At such "typical" settings BOINC would go for a straight 2-WU cache, which might make them miss the bonus credits. We'd need to be careful to avoid this, otherwise there'll be far more harm done by annoying crunchers than throughput gained by having the 3rd WU around.
I agree that it should be implemented in a way that doesn't cause harm. It's unfortunate that people probably DO rely on the max-x-in-progress to yield them bonus credits, while also keeping large cache settings. So, I'm not sure what the answer is yet.
We might want to run some further tests before pushing for the 3-WU-cache. To begin, quantifying the throughput increase for some long-runs would be nice. GPU utilization sure goes up, so there must be some increase. Ah, if only those SMX's could work on entirely different tasks! But that's not available below Titan.
I did document, towards the beginning of this thread, some results where I showed increased task completion throughput. Not much faster, but within the ballpark of 3%-20% faster.
Regarding your [POEM + GPUGrid] settings, I too run a similar config. I noticed that, while POEM tasks ran alongside a GPUGrid task all on the same GPU, the POEM tasks did not seem to complete timely at all. And because those POEM tasks take a full CPU core to run, there was a lot of CPU time spent to complete the POEM tasks. Unfortunately, there's no good way to say "run x-at-a-time for Project A, run x-at-a-time for Project B, but don't let them run together on the same GPU" unless you specify a hard limit using BOINC GPU exclusions, which we do not want to do. So, my move to doing 1-at-a-time for GPUGrid, solves this POEM-CPU-usage problem for me. I'm definitely interested in your findings.
My current recommendations, for anyone that wants to try 2-at-a-time for GPUGrid, are:
- only do it if all GPUs involved have 2GB GPU RAM or more (due to failures adding a task when not enough GPU RAM is available)
- only do it if you have reasonably fast upload speeds (I'd say capable of uploading a result within 15 minutes. Note: the max-2-in-progress-per-GPU server limit does mean that there is a window where a GPU could be entirely non-utilized by GPUGrid, and faster upload speeds help to close that window)
- only do it if you don't mind GPUGrid tasks spending more CPU time than they normally would
- only do it if you are okay with the possibility of BOINC running multiple GPU projects on a single GPU, which could slow down throughput for them |
|
|
|
skgiven:
I have recently also made an additional change to my system, which I think you might find interesting.
Previously, you recommended freeing a core through the BOINC preference "Use at most x% of the processors", but I did not want to do that since GPUGrid tasks sometimes use less than a full core, and are okay to be overloaded since they run at a higher Windows priority as compared to the CPU tasks. So I still have that preference set at "Use at most 100% of the processors" and I believe that's the correct setting for me, since I want full CPU utilization always.
BUT... On my system, running 1-task-per-GPU, where GPUGrid can potentially be running on both my Kepler GTX 660 Ti as well as my Fermi GTX 460... because GPUGrid tasks generally use a full core on Keplers, I "for sure" knew a CPU was being fully utilized whenever 2 GPUGrid tasks were running.
So, I found a way to take advantage of that logic, to better accommodate my system, to prevent overloading, while still ensuring full-CPU-load. I changed GPUGrid's app_config.xml file to use <cpu_usage> of 0.5 for all the applications. That way, if only 1 GPUGrid task is running, BOINC will not "allocate a CPU", but if 2 GPUGrid tasks are running, 0.5 + 0.5 = 1.0, and BOINC will "allocate a CPU", which I want because it means I know a task is running on the Kepler, and I'd be unnecessarily overloaded if I didn't allocate it.
I think it's helping, too, based on my initial results. You were right, unnecessary overloading is detrimental to task throughput, thanks for keeping me thinking about that.
Have a good day,
Jacob |
|
|
|
My current recommendations, for anyone that wants to try 2-at-a-time for GPUGrid, are:
- only do it if all GPUs involved have 2GB GPU RAM or more (due to failures adding a task when not enough GPU RAM is available)
- only do it if you have reasonably fast upload speeds (I'd say capable of uploading a result within 15 minutes. Note: the max-2-in-progress-per-GPU server limit does mean that there is a window where a GPU could be entirely non-utilized by GPUGrid, and faster upload speeds help to close that window)
- only do it if you don't mind GPUGrid tasks spending more CPU time than they normally would
- only do it if you are okay with the possibility of BOINC running multiple GPU projects on a single GPU, which could slow down throughput for them
I'll add to this list: only do so if you'Ve got an otherwise stable system.
I got 2 computation errors on GPU-Grid WUs running the config mentioned above while I was away for some sports. This is unusual for my system, but I've been changing my config too much recently, so I can't really blame running 2 GPU-Grids concurrently.
For now I'll be sticking with this, until I'm sure I've got the rest sorted out: GPU-Grid long runs at 0.51 GPUs, POEM at 0.12 GPUs. This way up to 4 POEMs run along GPU-Grid, but I avoid 2 GPU-Grids for now. The POEMs do take longer this way, but I crunch all they give me and average GPU utilization is higher, so overall throughput must be higher. This will be next to impossible to quantify, though, as the amount of POEMs I run is restricted by supply. And the WU times will depend on how many POEMs I get.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Wow, so much angst over such a small thing. Guess that's how WW1 started too. My 1 cent: keep the limit at 2... |
|
|
|
I have concluded my test (where I had only 1 active GPU, which was processing 2-tasks-at-once, with a 1.5 day min buffer, and wanted to see when the 3rd new task gets started). Note, I'm using BOINC v7.1.1 alpha, which includes a major work fetch tweaking as compared to the v7.0.64 public release.
...
There was a ~14 minute "layover" where BOINC was only allowed to run 1 task on the GPU, due to GPUGrid's server-side limitation. But it did gracefully handle the scenario, did eventually get the 3rd task, and started it promptly. It worked as I expected it to work, given the server-side limitation, but it's not optimal, because we should be allowed to keep the GPU continuously fully loaded with 2 tasks. :(
I wonder if we can convince GPUGrid to relax the limit, to max-3-in-progress-per-GPU, instead of max-2-in-progress-per-GPU? In theory, that should close this gap, as it would allow the 3rd task to be downloaded/ready whenever the min-buffer says the client needed it (earlier than Task 2 completion).
I do appreciate your efforts to squeeze all of the computing power your system has by tweaking your existing software environment, and I think you've done a great job on your system!
But...
1. If you want to have a dedicated cruncher computer for GPUGrid (it means that every part of it - hardware and software - is chosen to be optimal for GPUGrid crunching and built with crunching purposes in mind) can have a different operating system (Linux, or WinXP), which helps it's hardware to perform like your (over)tweaked Win8, without (over)tweaking.
2. It's quite possible that future GPUGrid (long) tasks will use more than 1GB GPU memory (maybe as much as the GPU has), and this will make your 2 tasks at the same time tweak obsolete.
3. I think that to eliminate this 14 minutes of suboptimal crunching in every 10-16 hours of optimal crunching by server side changes (effecting every cruncher, and the whole GPUGrid workflow) does not worth the effort from the GPUGrid (or any project) staff's point of view. (Taking in consideration item 1 and 2, and the effort needed to eliminate the unexpected detrimental side-effects of such changes) |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
I think that to eliminate this 14 minutes of suboptimal crunching...
Out in the sticks, with a pedestrian upload max, trying 2X, I lost 90 minutes of crunch time before a new WU arrived, not 14. Below is a graph of my experience, FWIW. The black line is the period I was running 2X, the blue line is 1X.
|
|
|
|
Retvari,
Thanks for your response.
I do appreciate your efforts to squeeze all of the computing power your system has by tweaking your existing software environment, and I think you've done a great job on your system!
Thanks. I'm doing it for the community as well, not just me. If we can get more science done, as a community, then let's do it! Plus, I like to test and I like to push performance limits :)
But...
1. If you want to have a dedicated cruncher computer for GPUGrid (it means that every part of it - hardware and software - is chosen to be optimal for GPUGrid crunching and built with crunching purposes in mind) can have a different operating system (Linux, or WinXP), which helps it's hardware to perform like your (over)tweaked Win8, without (over)tweaking.
I do understand that getting the absolute best performance would involve choosing a specific hardware and OS combination. For me, though, and likely for others, we're just running BOINC on PCs that we use for work or for play. So... I'm just trying to make the absolute best out of it, using the hardware and OS that I would normally otherwise use without BOINC.
2. It's quite possible that future GPUGrid (long) tasks will use more than 1GB GPU memory (maybe as much as the GPU has), and this will make your 2 tasks at the same time tweak obsolete.
If the tasks' execution was changed to use more memory by default, then it would trigger a change in minimum specifications for 2-at-a-time, for sure. Perhaps 3GB would become the new minimum-recommended GPU RAM requirement for 2-at-a-time processing. But I wouldn't call the plan obsolete. It would just have to change, along with the new task requirements.... unless they totally change the tasks structure to use all of the GPU's RAM, in which case, they would make 2-at-a-time infeasible. You're right about that case.
3. I think that to eliminate this 14 minutes of suboptimal crunching in every 10-16 hours of optimal crunching by server side changes (effecting every cruncher, and the whole GPUGrid workflow) does not worth the effort from the GPUGrid (or any project) staff's point of view. (Taking in consideration item 1 and 2, and the effort needed to eliminate the unexpected detrimental side-effects of such changes)
It depends on what their priorities are, for sure. I hope they consider making it an option, since their current policy is too restrictive for some. At any rate, this server-side-limitation of max-2-in-progress-per-GPU... is the only "variable" in this performance testing that a user has no control over. And if they decide not to accommodate, then that's the way it would have to be, and I'd then recommend against 2-at-a-time (unless you happen to have another GPU in the system that isn't doing GPUGrid work, such that you could work around the problem by getting GPUGrid to give you additional tasks for the GPU that is doing GPUGrid work).
It is what it is.
We'll see if they change their policy on this.
I'm not holding my breath.
Regards,
Jacob |
|
|
|
I can say now for certain that the problem I was seeing (2 task failures) can not be attributed to running 2-at-once. However, I can not state anything more useful by now. I had upgraded my memory to DDR3-2400 some time ago and thought I'd settled on good settings, but this may not have been the case. I got soem serious instability over the last few days.. I wonder why it took so long to surface, but I'm positive I'll find the issue soon.
After that I'll tweak Collatz on my HD4000 before I'll continue testing here.. I won't forget about it, even if it will take some time!
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I don't know where you're located, but if the northern hemisphere and not close to the pole, maybe it is the summer and rising temperatures that made these errors appear?
____________
|
|
|
|
Thanks, but temperatures are barely climbing over 20°C here in Germany. I stepped back from BCLK 104.5 MHz to 104.0 MHz (plus a cold start in between) and this seems to have done the trick.. for now. I want to be more careful now with further changes.. but am already OC'ing the HD4000 for Collatz.. :D
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I do have some results by now which are worth sharing. I switched GPU-Grid to 0.51 GPUs, so that only one of them runs at a time (still being careful..) and some room is there for another project. In my case it's POEM. And I actually got enough work from them to test a few cases. On to the results, always using a Nathan KIDc22_SODcharge, and averaged measurements over a few minutes each:
#GPU-Grids | #POEMs | GPU load in % | GPU power in % | memory controler load in %
1 | 0 | 85.4 | 63.0 | 35.2
1 | 1 | 94.0 | 63.3 | 33.0
1 | 2 | 96.0 | 62.3 | 31.2
1 | 3 | 96.9 | 63.6 | 29.6
1 | 4 | 97.9 | 61.6 | 27.4
0 | 8 | 94.8 | 51.2 | 5
Obviously the WUs take longer this way, but I can't really quantify it since the supply of POEMs is scarce. However, what's clear is the higher average GPU load running in this configuration. GPU power consumption and memory controller load drop with an increasing number of POEMs, because the fractional runtime of POEM on the GPU increases. And POEM itself stresses the GPU significantly less than GPU-Grid.
I have not had any failures running this configuration (although the actual run time with POEMs wasn't that long) and will continue to use it at least in this way.
Edit: GPU clock was a constant 1.23 GHz in all cases (maximum boost clock), GPU temperature was always below 70°C.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I've got a test: running 2 long-runs NATHAN_KIDc22_full for 133950 credits on a GTX660Ti. A single WU needed 36.46 ks, whereas 2 concurrent ones needed 80.88 ks. While I didn't run this myself it seems pretty clear that a 11% performance loss is not what we're looking for, despite the increased GPU utilization.
Edit: it's a quite fresh install of Win 7 or 8, current driver.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|