Advanced search

Message boards : Graphics cards (GPUs) : Desktop freezes

Author Message
CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12815 - Posted: 27 Sep 2009 | 0:53:41 UTC

While I'm crunching for GPUGRID my rig freezes. I mean - the whole system (nor BOINC client nor smth else). It makes me a "bit" pissed off, coz I can not work or even browse the internet.

This freezes happens periodically and lasts from couple of seconds to 20-30 seconds. Even if I'm typing on the forums when freeze happened - no letter appear, looks like keyboard hangs also. I do sure this happens when I'm crunching GPUGRID, coz if I'm turning it off - never ever this freezes happen.

From other hand I do not think it's GPUGRID "fault" coz in my understanding BOINC should turn GPUGRID immediately off while I move the mouse or type a single letter,

Of course I tried to turn GPU off using "use GPU while computer is in use" either in BOINC either in web-profile - nothing helps.

Another point, may be it will useful to understand the reason. I'm using "desktop effects" and awn (avant-window-navigator - MacOS style launcher and switches between apps) and in majority cases (but not in all) freezes happen when I'm putting the mouse there in order to switch to smth else. With almost 100% guarantee to freeze the rig is pop-up windows (ever Micky-Mouse size). So, in general ANY activity which requires even small piece of the efforts from GPU (yes, even a letter to appear when I'm typing) causes freeze.

I tried to figure out which app causes this by opening gnome system monitor and then trying to freeze - looks it's Xorg...

Here's my hardware and software:
- E6300 overclocked from 2800 to 4000 (100% stable)
- eVGA GTX275 overclocked to 702/1584/1260 (100% stable)
- ubuntu 9.04 (kernel 2.6.28.15-generic)
- 190.18 and 190.32 driver and CUDA2.3 - was on all of them
- BOINC 6.4.7, 6.6.36, 6.10.4 and 6.10.6 - was on all of them

If you need any additional info or it's necessary to do smth - let me know. Of cource, I'n NOT guru in linux, but not noob :-)
____________

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12820 - Posted: 27 Sep 2009 | 5:23:50 UTC - in response to Message 12815.
Last modified: 27 Sep 2009 | 5:24:47 UTC

OK, forgetting about BOINC and CUDA for a minute...

Are you aware that 190.xx driver series is still considered to be a BETA release on Linux?
Are you aware that other people are reporting 'freezing' problems with 190.xx? There are several threads over on the NV Linux forum.

190.18 intermittent freezes
UseEvents and KDE 4.3 direct rendering broken with 190.16

Let's start here. Do you have 'Option "UseEvents" "true"' in the 'Device' section of your xorg.conf?
____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12822 - Posted: 27 Sep 2009 | 6:32:31 UTC - in response to Message 12820.

Sure I know that 190.xx is beta, but I never heard that it's buggy.

I inserted this string into xorg.conf but it becomes even worse: freezes are all the time (or better to say - it's one huge freeze) and sometimes mouse starts to moves, but it's not clicking at all, so I had to reset the rig by button on the case.

There are two things which make me think that the reason behind is BOINC: the 1st - if I'm suspending GPUGRID WU's (but rosetta continue to work) I never ever had any single freeze whatever I'm going (even watching movie). And the 2nd - even if I deselected "use GPU..." elapsed time continue to run and status says "running" but no "waiting to run", so looks BOINC does not intercepting this event and does not suspending GPUGRID...

So there are 2 option: either to wait for good driver (A) or to install 185.xx with CUDA 2.2 (B). Am I right in understanding what u r trying to say? :-) Or should I continue to try to be patient and (when I'm completely pissed of) - just to suspend GPUGRID WU's?
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12825 - Posted: 27 Sep 2009 | 6:50:58 UTC

There is also a bug with the use while in use options that may mean that BOINC does not get out of the way correctly. You will not see that until post 6.10.9 ...

Not sure if it is applicable or not ... but there is a related bug in BOINC ...

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12826 - Posted: 27 Sep 2009 | 8:01:21 UTC - in response to Message 12822.
Last modified: 27 Sep 2009 | 8:11:56 UTC

Sure I know that 190.xx is beta, but I never heard that it's buggy.


All Linux nVidia releases are buggy to various degrees! (I don't say that to be argumentative or derogatory, that's just the way it is.) Being an early adopter whn nVidia release new drivers is asking for trouble. Aside from some vdpau fixes or unless you need CUDA 2.3 (which you don't for GPUGRID) and especially as you are wanting to use the machine interactively, if you don't need 190.xx, go back to 185.18.36 and CUDA 2.2.

I inserted this string into xorg.conf but it becomes even worse: freezes are all the time (or better to say - it's one huge freeze) and sometimes mouse starts to moves, but it's not clicking at all, so I had to reset the rig by button on the case.


I didn't actually want you to insert that string into the xorg.conf, just wanted to know if it is there. ;)
If it is there it will exacerbate or actually cause freezes!

There are two things which make me think that the reason behind is BOINC: the 1st - if I'm suspending GPUGRID WU's (but rosetta continue to work) I never ever had any single freeze whatever I'm going (even watching movie). And the 2nd - even if I deselected "use GPU..." elapsed time continue to run and status says "running" but no "waiting to run", so looks BOINC does not intercepting this event and does not suspending GPUGRID...


Right, as Paul points out (in the post above) there is a bug related to 'in use' config options in BOINC, so that's why I said let's leave BOINC and CUDA out of this for the moment.

So there are 2 option: either to wait for good driver (A) or to install 185.xx with CUDA 2.2 (B). Am I right in understanding what u r trying to say? :-) Or should I continue to try to be patient and (when I'm completely pissed of) - just to suspend GPUGRID WU's?


Firstly, I wouldn't recommend running the 190.xx series. If it wasn't for the fact that GPUGRID now requires CUDA 2.2, I'd still be running 180.60 on all my Linux machines. This has been by far the most stable nVidia driver release on Linux for years! (My experience with the first 190.xx driver release was a hard lock within 5 mins. That tells me all I need to know about it. I'll come back again and try again when it is out of BETA and is promoted to the current release.)

Secondly, the video card can only do so much. I do not run desktop effects on any of my crunchers. With a single video card you are asking for and expecting too much. The fancy GUI effects put load on the video card at the same time as it is loaded by the CUDA app. And you expect the desktop to be responsive? My advice, don't do it. I can make my desktop "stutter" with a CUDA app running, desktop effects enabled, and spinning the mouse wheel to scroll in Firefox. Compositing, hardware acceleration, OpenGL, etc. etc. at the same time as CUDA is asking too much.

Thirdly, as Paul pointed out, a fix for the 'in use' option not working correctly. Can you install the latest 6.10.9 BOINC release?

Summary: downgrade to nVidia driver release 185.18.36. Turn off desktop effects. Update to BOINC 6.10.9.
____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12841 - Posted: 28 Sep 2009 | 2:21:40 UTC - in response to Message 12825.

There is also a bug with the use while in use options that may mean that BOINC does not get out of the way correctly. You will not see that until post 6.10.9 ...

Not sure if it is applicable or not ... but there is a related bug in BOINC ...

so it's not my stupidity? and there is bug in BOINC. Hope one day it will be fixed...

Paul, where I can get 6.10.9? I try from here:
http://boinc.berkeley.edu/dl/

but there are version for windows and MacOS but not for linux...

Anyway i'm checking this link every day t will be available, i'll post result here immediately.
____________

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12842 - Posted: 28 Sep 2009 | 2:39:22 UTC - in response to Message 12826.


All Linux nVidia releases are buggy to various degrees! (I don't say that to be argumentative or derogatory, that's just the way it is.) Being an early adopter whn nVidia release new drivers is asking for trouble. Aside from some vdpau fixes or unless you need CUDA 2.3 (which you don't for GPUGRID) and especially as you are wanting to use the machine interactively, if you don't need 190.xx, go back to 185.18.36 and CUDA 2.2.


it's pity that all nvidia drivers are buggy, but at least they are not making "normal life" of linux users so terrible like ATI one.

should I remove CUDA completely from the computer and CUDA2.2 on top of 185.xx driver or I do not need CUDA at all? soupid question, I'm really new to CUDA and nvidia...

I didn't actually want you to insert that string into the xorg.conf, just wanted to know if it is there. ;)
If it is there it will exacerbate or actually cause freezes!


oops... At least I tried to fix a how I can replace xorg.conf by previous version from console :)

Right, as Paul points out (in the post above) there is a bug related to 'in use' config options in BOINC, so that's why I said let's leave BOINC and CUDA out of this for the moment.


Hope 6.10.9 version will be available soon and I can check if the bug is still there or not.

[/quote]Firstly, I wouldn't recommend running the 190.xx series. If it wasn't for the fact that GPUGRID now requires CUDA 2.2, I'd still be running 180.60 on all my Linux machines. This has been by far the most stable nVidia driver release on Linux for years! (My experience with the first 190.xx driver release was a hard lock within 5 mins. That tells me all I need to know about it. I'll come back again and try again when it is out of BETA and is promoted to the current release.)

Secondly, the video card can only do so much. I do not run desktop effects on any of my crunchers. With a single video card you are asking for and expecting too much. The fancy GUI effects put load on the video card at the same time as it is loaded by the CUDA app. And you expect the desktop to be responsive? My advice, don't do it. I can make my desktop "stutter" with a CUDA app running, desktop effects enabled, and spinning the mouse wheel to scroll in Firefox. Compositing, hardware acceleration, OpenGL, etc. etc. at the same time as CUDA is asking too much.

Thirdly, as Paul pointed out, a fix for the 'in use' option not working correctly. Can you install the latest 6.10.9 BOINC release?

Summary: downgrade to nVidia driver release 185.18.36. Turn off desktop effects. Update to BOINC 6.10.9.[/quote]

1. I'll install 185.xx version 2morrow (it's bit late now :-) )

2. This is the very last thing I'd lik do understand that the effects causes this freezes, at least for now. I've got ustient (ha-ha) and if I heed to do a lot on my rig I'm just suspending GPUGRID WU's. I know it's not right way to do, but .. this is life.

If due to some reasons this option will not work I'll turn effects off. In fact, I'm almost ready to it.

3. When 6.10.9 will appear somewhere - trust me, I'll be the very 1st guy in the line to get it :-)

and one thing more. thanx a lot for your help to the noob in GPUGRID:-)

____________

zioriga
Send message
Joined: 30 Oct 08
Posts: 46
Credit: 494,132,425
RAC: 3,861,070
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 12845 - Posted: 28 Sep 2009 | 5:28:52 UTC

@CTAPbli here is the address

https://boinc.berkeley.edu/dl/?C=M;O=D

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12849 - Posted: 28 Sep 2009 | 6:37:52 UTC

The fix for "in use" bug is NOT in 6.10.9 ... 6.10.9 has minor tweaks for ATI usage only (and these are not usable unless server side changes are made by the project).

6.10.9 is an interim build which is why you don't see a Linux version. For our purposes here there is no difference between 6.10.7 and 6.10.9 (.8 is a bad build).

That is why I said post 6.10.9 ... :)

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12855 - Posted: 28 Sep 2009 | 10:01:34 UTC - in response to Message 12849.

OK, Paul I can wait :-)
____________

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12856 - Posted: 28 Sep 2009 | 10:06:42 UTC - in response to Message 12849.
Last modified: 28 Sep 2009 | 10:22:09 UTC

The fix for "in use" bug is NOT in 6.10.9 ...


Paul,

I'm confused. Before I open my mouth again, exactly what are you referring to, so I'm on the same page.

The original change to check for a running graphics app....


+// check whether each GPU is running a graphics app (assume yes)
+// return true if there's been a change since last time
+//
+bool COPROC_CUDA::check_running_graphics_app() {
+ int retval, j;
+ bool change = false;
+ for (j=0; j<count; j++) {
+ bool new_val = true;
+ int device, kernel_timeout;
+ retval = (*__cuDeviceGet)(&device, j);
+ if (!retval) {
+ retval = (*__cuDeviceGetAttribute)(&kernel_timeout, CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT, device);
+ if (!retval && !kernel_timeout) {
+ new_val = false;
+ }
+ }
+ if (new_val != running_graphics_app[j]) {
+ change = true;
+ }
+ running_graphics_app[j] = new_val;
+ }
+}


Or the I: BM 6.10.4 - Cuda task doesn't suspend - The same as in BM 6.10.5 message on boinc_alpha that talks of backing out the check for a running graphics app and reverting to previous behaviour, (if it doesn't work as expected) - which hasn't been done yet?

PS. I had to chuckle this morning. It appears your posts to boinc mailing list are listened to. ;)


+ if (display_driver_version) {
+ sprintf(vers, "%d", display_driver_version);
+ } else {
+ strcpy(vers, "unknown");
+ }

____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12857 - Posted: 28 Sep 2009 | 10:15:57 UTC - in response to Message 12842.
Last modified: 28 Sep 2009 | 10:17:05 UTC

should I remove CUDA completely from the computer and CUDA2.2 on top of 185.xx driver or I do not need CUDA at all? soupid question, I'm really new to CUDA and nvidia...


If you use the 185.xx driver you need the CUDA 2.2 toolkit.

oops... At least I tried to fix a how I can replace xorg.conf by previous version from console :)


Did your editor create a backup file, xorg.conf~, that you could move back. Alternatively, just edit it again and remove the "UseEvents" config option. It defaults to "false" if it's not there at all.

Hope 6.10.9 version will be available soon and I can check if the bug is still there or not.


I just asked Paul for clarification in the post above.

1. I'll install 185.xx version 2morrow (it's bit late now :-) )


OK.

2. This is the very last thing I'd lik do understand that the effects causes this freezes, at least for now. I've got ustient (ha-ha) and if I heed to do a lot on my rig I'm just suspending GPUGRID WU's. I know it's not right way to do, but .. this is life.

If due to some reasons this option will not work I'll turn effects off. In fact, I'm almost ready to it.


I understand why you'd rather not turn off desktop events. But the thing is, they tend to expose driver bugs like nothing else can, as well as consuming GPU resources.

3. When 6.10.9 will appear somewhere - trust me, I'll be the very 1st guy in the line to get it :-)


I wasn't aware that a generic 6.10.9 hadn't been built for Linux. In any case, hold off from changing BOINC software versions at the moment. I'd like to understand exactly what Paul is talking about.
____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12860 - Posted: 28 Sep 2009 | 11:38:37 UTC - in response to Message 12857.

If you use the 185.xx driver you need the CUDA 2.2 toolkit.


OK. I'll do it this night and I'll post results.

oops... At least I tried to fix a how I can replace xorg.conf by previous version from console :)


Did your editor create a backup file, xorg.conf~, that you could move back. Alternatively, just edit it again and remove the "UseEvents" config option. It defaults to "false" if it's not there at all.


it's pretty easy: I deleted xorg.conf, renamed xorg.conf~ into xorg.conf. Nothing special, really :-)

I understand why you'd rather not turn off desktop events. But the thing is, they tend to expose driver bugs like nothing else can, as well as consuming GPU resources.


So, effects are really helpful for finding different bugs :-) If drivers will not help I'll do it. I'm pretty pissed off.

Anyway, turning this effects off I consider as a temporary, coz in my understanding they should not interfere with BOINC while

I wasn't aware that a generic 6.10.9 hadn't been built for Linux. In any case, hold off from changing BOINC software versions at the moment. I'd like to understand exactly what Paul is talking about.

OK.
____________

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12862 - Posted: 28 Sep 2009 | 12:02:58 UTC - in response to Message 12860.

So, effects are really helpful for finding different bugs :-) If drivers will not help I'll do it. I'm pretty pissed off.

Anyway, turning this effects off I consider as a temporary, coz in my understanding they should not interfere with BOINC while


Desktop effects use hardware acceleration. The same hardware that's running the CUDA code. Your asking the same GPU to compute spinning the desktop cube at the same time as it is crunching numbers with CUDA.

It's great having OpenGL hardware acceleration for desktop effects, offloading the decoding to hardware with vdpau when playing a movie, and being able to crunch with CUDA. But ask a GPU to do all 3 at the same time and it won't do any of them particularly well! I'm not saying don't do it. What I'm trying to say is that if you want to crunch, then dedicate the resource (as much as you can) to crunching. Being able to use a consumer grade graphics card for CUDA crunching as well as displaying your desktop is good to have, but there is a reason that nVidia sell dedicated CUDA hardware solutions that do not have graphics output.

____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12873 - Posted: 28 Sep 2009 | 19:31:28 UTC
Last modified: 28 Sep 2009 | 19:32:09 UTC

First, 6.10.10 for windows is out.

The following two changes are the most relevant and specifically the second one is the one I thought might be addressing the issue raised. Where BOINC does not properly suspend and resume based on preferences. There is another one I saw but even going through the list again manually I seem to be missing it ... sigh ... {edit spelling}

Changeset 19199:

client/scheduler/web: add per-project preferences for whether
to accept CPU, NVIDIA and ATI jobs.
These prefs are shown only where relevant:
e.g., only for processor types for which the project has app versions,
and if it has versions for only one type, no pref is shown.

These prefs affect both client and scheduler.
The client won't ask for work for a device blocked by prefs,
and the scheduler won't send it.

This replaces earlier optional project-specific prefs for
"no CPU jobs" and "no GPU jobs".
(However, these prefs continue to be honored on the server side).

- client: if NVIDIA driver is unknown, say that rather than 0


Changeset 19198:
lient: fix bug in CPU prefs enforcement:
enforce "suspend if no recent input" and "exclusive apps"
only if overall mode if RUN_MODE_AUTO (run according to prefs)


Jack,

As to that code segment and the backing out of the "fix" that is not something I have been referring to anywhere. There *IS* a change in that neck of the woods but it is a change in the logic and involves two flags and is Changeset 19137

Or I have completely lost the whole thread ...

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12874 - Posted: 28 Sep 2009 | 21:05:36 UTC - in response to Message 12873.


Thanks for the reply.

As to that code segment and the backing out of the "fix" that is not something I have been referring to anywhere.


Well, from the limited testing I've done - check_running_graphics_app(), which is being used for 'Use GPU while computer in use' is not having the desired effect. I need to do some more testing before reporting this.

Or I have completely lost the whole thread ...


No, I don't think so. Too many changes in a short time period.
____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12875 - Posted: 28 Sep 2009 | 21:41:38 UTC - in response to Message 12862.

I installed 185.36.128 and CUDA Toolkit 2.2, but freezes still there.

Then I turned off effects. As of now no more freezes, but I still think that it's like hiding the head in the sand.
"As deeper your head in the sand, as more unprotected your ass"
|(c) Ozzi's proverb :-)

Hope Paul will manage to fix this bug one day.

BTW Paul, there is 6.10.10 version for windows only. Hopefully one day linux one will be also available.
____________

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12882 - Posted: 29 Sep 2009 | 2:35:05 UTC - in response to Message 12875.

and for MacOS also, but not for linux. I can wait
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12891 - Posted: 29 Sep 2009 | 8:46:51 UTC - in response to Message 12875.

I installed 185.36.128 and CUDA Toolkit 2.2, but freezes still there.

Then I turned off effects. As of now no more freezes, but I still think that it's like hiding the head in the sand.

Yeah, get a whiz-bang system and then you have to shut off everything to make BOINC work... BOINC is supposed to be working in the background as idle and not interfere with anything ... then again, what do I know ... :)

"As deeper your head in the sand, as more unprotected your ass"
|(c) Ozzi's proverb :-)

I had never heard that one ... or at least I cannot recall hearing it ...

Hope Paul will manage to fix this bug one day.

Sadly guys I can fix nothing... and for the most part my reports are like whistling on the wind ... the good news and bad news is that it looks like my disability is on the upswing again so there will be much rejoycing as I will not be able to do as much ... of course I get the benefit of feeling like I am drunk all the time free of charge ...

BTW Paul, there is 6.10.10 version for windows only. Hopefully one day linux one will be also available.

I mentioned that some place here ... in addition to as I noted above, my attention span is also shot so that is why I am not sure I was tracking this thread well ...

Anyway, based on the notes the 6.10.10 release aside from the UI changes (that may or may not be working) there is nothing in .10 over .7 for us here ... the main development issues in that release is on ATI cards and how they interact with the server. If you want to follow there are notes on Collatz as to what is going on (though they may have referred out to the BOINC board.

In that I am not all that pressed and my set-up is working (with the exception to the FIFO bug, still unacknowledged as far as I know) 6.10.7 would be what I would recommend. The FIFO bug will only get you if you run TWO GPU projects at the same time ... and then the most annoying thing is that it kills the resource share allocations... run time will be biased towards (as near as I can tell) the project that will download the most work for your queue size ...

In my case I am running with 0.1 extra work and it seems to respect shares, sort of ... if MW goes off the air I get several hours of Collatz work instead of only one or two tasks ... I run those off and then I get, if they are back MW work till it goes off the air again ...

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12894 - Posted: 29 Sep 2009 | 9:45:37 UTC - in response to Message 12891.
Last modified: 29 Sep 2009 | 9:49:01 UTC

I installed 185.36.128 and CUDA Toolkit 2.2, but freezes still there.

Then I turned off effects. As of now no more freezes, but I still think that it's like hiding the head in the sand.

Yeah, get a whiz-bang system and then you have to shut off everything to make BOINC work... BOINC is supposed to be working in the background as idle and not interfere with anything ... then again, what do I know ... :)


Not a lot about CUDA apps? ;)

The CPU component of the GPU task may be idling but the GPU is not idling when a CUDA kernel is executing on it. Desktop effects (compositing) also take resource. (Far more than they should for the sake of eye candy.) Hit the GPU with 'texture from pixmap' while executing CUDA kernels and expect the desktop to stutter.

As an aside, if you read the CUDA release notes, they tell you that individual CUDA kernel launches are limited to a 5 sec run time restriction when a display is attached to the GPU. For this reason it is recommended that CUDA is run on a GPU that is NOT attached to an X display. If you choose to ignore the recommendation, I'd suggest doing everything possible not to add extra load to a GPU while it's running CUDA and connected to a display, like turning off desktop effects.
____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12916 - Posted: 29 Sep 2009 | 18:21:01 UTC - in response to Message 12894.

I installed 185.36.128 and CUDA Toolkit 2.2, but freezes still there.

Then I turned off effects. As of now no more freezes, but I still think that it's like hiding the head in the sand.

Yeah, get a whiz-bang system and then you have to shut off everything to make BOINC work... BOINC is supposed to be working in the background as idle and not interfere with anything ... then again, what do I know ... :)


Not a lot about CUDA apps? ;)

Yep, never said I did... but I do know a lot about the conceptual idea of how BOINC should operate... has operated, and does operate...

The CPU component of the GPU task may be idling but the GPU is not idling when a CUDA kernel is executing on it. Desktop effects (compositing) also take resource. (Far more than they should for the sake of eye candy.) Hit the GPU with 'texture from pixmap' while executing CUDA kernels and expect the desktop to stutter.

This I know

As an aside, if you read the CUDA release notes, they tell you that individual CUDA kernel launches are limited to a 5 sec run time restriction when a display is attached to the GPU. For this reason it is recommended that CUDA is run on a GPU that is NOT attached to an X display. If you choose to ignore the recommendation, I'd suggest doing everything possible not to add extra load to a GPU while it's running CUDA and connected to a display, like turning off desktop effects.

On the other hand, though some will say it is not BOINC's fault but the project's ... there is a wide variance with the way BOINC is operating with the various projects in that for most I have no issues at all and see significant effects with one, maybe two ... my point being that as usual the UCB team is abdicating the responsibility to help the projects with the notion that this kind of thing is a project responsibility ...

Maybe so, but that only means that we now have 50 teams that have to figure this stuff out on their own instead of one ...

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12921 - Posted: 29 Sep 2009 | 19:23:28 UTC - in response to Message 12916.

As an aside, if you read the CUDA release notes, they tell you that individual CUDA kernel launches are limited to a 5 sec run time restriction when a display is attached to the GPU. For this reason it is recommended that CUDA is run on a GPU that is NOT attached to an X display. If you choose to ignore the recommendation, I'd suggest doing everything possible not to add extra load to a GPU while it's running CUDA and connected to a display, like turning off desktop effects.


On the other hand, though some will say it is not BOINC's fault but the project's ... there is a wide variance with the way BOINC is operating with the various projects in that for most I have no issues at all and see significant effects with one, maybe two ... my point being that as usual the UCB team is abdicating the responsibility to help the projects with the notion that this kind of thing is a project responsibility ...

Maybe so, but that only means that we now have 50 teams that have to figure this stuff out on their own instead of one ...


I understand what you are saying, but at the end of the day the boinc core is a glorified launcher, (the middleware, if you like), and the projects are responsible for the diverse clients. The UCB team can't really be responsible for the projects, and their client software.

The one size fits all approach does not work so well, (eg. FIFO GPU scheduling), and to be frank individual projects are going to want to see optimizations that suit their own purposes rather than generalizations. Sounds like it is a no win situation to me.

CUDA (and GPGPU in general) is such a young technology that how many people do know it inside out and backwards? I mean, at least the boinc core can set the process prioities for the CPU task clients. That's pretty tricky to do with GPU side of the GPU tasks. ;) They're either on or off. Unlike the multitude of options that are built into eg. the Linux kernel CPU scheduler, you just don't have that functionality available on the GPU. (And the jury is still out on whether the 'GPU in use' config property and underlying code is actually doing what the developer expects that it is doing. I'm not 100% convinced it is but was too busy today to spend more time testing this.)

Anyway, I hope I made the point I was trying to make. If people would change their expectations, and think of running CUDA on a GPU that's doing something else, (like driving a display via X) as a less than optimal way of doing things, that would go some way towards it. (If you want optimal on a consumer grade card, forget about whether desktop effects are switched on or off and don't use it to drive a display at all!) Giving it a chance, by not using other hardware acceleration functionality (desktop effects) at the same time as using CUDA computing capability, seems obvious to me. Lobbing bricks in the general direction of nVidia drivers and BOINC because the desktop stutters when they have no understanding of how their hardware actually works, or what is a reasonable expectation, just shows a lack of education. (I expect to get bashed for that last sentence, and I'm not trying to be insulting, but it does seem that some peoples expectations are set way beyond what their hardware is actually capable of. YMMV.)

____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

zpm
Avatar
Send message
Joined: 2 Mar 09
Posts: 159
Credit: 13,639,818
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12922 - Posted: 29 Sep 2009 | 19:52:15 UTC - in response to Message 12921.
Last modified: 29 Sep 2009 | 19:53:00 UTC

my system i have noticed, when running 1 or 2 apps in high priority in windows task manager, i get a freeze up for 1-2 secs every so often; but i live with it... crunch time is decreased by 5-10% depending on the app and wu.

CTAPbIi
Send message
Joined: 29 Aug 09
Posts: 175
Credit: 259,509,919
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12927 - Posted: 29 Sep 2009 | 23:45:28 UTC - in response to Message 12921.

OK, that's clear that there will no quick fix for "use GPU..." in nearest future, right?

In fact, the 2nd day I'm surviving w/o desktop effects and you know what? I'm still alive :-) sure it's less functional, but there are NO freezes which made me pissed off so much. So, thx JackOfAll and Paul for your help.

May I asked couple of questions while such people are around? in Q4 this year nvidia will present GT300 cards. so, here are actually two questions:

- have you heard smth about architecture of future cards? I mean - if there is a sense to upgrade from GTX275 to the new ones

- will BOINC app for GPUGRID work on SLI? I mean I'd like to get 2 cards and I'm not sure if I should SLI bridge or not (like in Folding and i must NOT connect cards with the bridge);

- I'd like to build in next couple of weeks new rig based on 1366 socket. Now I've got E6300 (Wolfdale-2M 2800 stock OC'ed to 4000). Will this upgrade gain me smth in terms of GPUGRID? I mean - does GPUGRID suffers now from the lack of CPU? right now I'm crunching for Rosetta, but nice for GPUGRID is 10 and for Rosetta - 19.

BTW, Paul, regarding this proverb. I've got a friend from Perth and is one of his favorite proverbs. I'm not sure if it's widely used all over Australia, but here's the story behind :-)
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 12930 - Posted: 30 Sep 2009 | 1:27:59 UTC

@Jack, I sure wish I had enough left to answer you ... I understand your point, but, as middleware BOINC has more responsibilities when more than one project is affected or needs a feature. Then, that is exactly where middleware is supposed to step up to avoid reinventing the wheel ...

@CTA

Doing anything with the CPU does nothing really for GPU Grid, but would help Rosetta which is not all bad. Getting faster GPUs or more of them is the key to doing more here. The strategy with new generations is the questing on when to adopt. So, the GTX300 comes out ... that drives down the cost of GTX2xy cards which means that I can buy a couple more for the same cost as one 300 card. In that this adds to my total capacity I can do more.

6= months to a year from now if and when a card dies the 300 has been supplemented with a 325 or whatever ... so it is lower in cost (in theory) and I can get one for less and then get that significant boost. If you do have the cash to invest is it better to buy at the top or twice as many in the middle ... arguments both ways ... and I have done both... :)

If the 300 is twice as fast for the same power footprint that argues for the extra up-front expense as the power cost to run it is less... it is all very dependent on exact details...

As to SLI, with the later drivers you are supposed to be able, with BOINC, to use SLI ... I am not interested in the extra gaming power so I have not bothered to try it, not broken so I am not trying to fix it ... :)

JackOfAll
Avatar
Send message
Joined: 7 Jun 09
Posts: 40
Credit: 24,377,383
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 12935 - Posted: 30 Sep 2009 | 11:36:28 UTC - in response to Message 12927.

OK, that's clear that there will no quick fix for "use GPU..." in nearest future, right?


I'm kind of limited to what I can and cannot say having signed a rather draconian NDA. We use CUDA commercially in a software product. We actually outsource our software development now, so I've asked someone who I consider to be a CUDA expert to take a look at what that code currently does and if there is a better way of achieving the objective. When I get a response I'll pass it to UCB.

In fact, the 2nd day I'm surviving w/o desktop effects and you know what? I'm still alive :-) sure it's less functional, but there are NO freezes which made me pissed off so much. So, thx JackOfAll and Paul for your help.


Glad you can live without the 'bling' for the moment.

May I asked couple of questions while such people are around? in Q4 this year nvidia will present GT300 cards. so, here are actually two questions:

- have you heard smth about architecture of future cards? I mean - if there is a sense to upgrade from GTX275 to the new ones


Details are still a little thin on the ground and depending on who you believe we might not even see the new architecture cards until next year. Right now, IMHO is a bad time to be buying new nVidia cards. I'd advocate holding off for a couple of months. (Especially with the high end cards, > GTX275.)

- will BOINC app for GPUGRID work on SLI? I mean I'd like to get 2 cards and I'm not sure if I should SLI bridge or not (like in Folding and i must NOT connect cards with the bridge);


Paul answered this above. The 190.xx driver series and CUDA 2.3 allows you to access individual GPU's (for CUDA purposes) whilst the cards are in SLI mode. Not tried it personally, but I know it does work.


____________
Crunching on Linux: Fedora 11 x86_64 / nVidia 185.18.36 driver / CUDA 2.2

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 755,434,080
RAC: 210,052
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13035 - Posted: 6 Oct 2009 | 4:42:58 UTC
Last modified: 6 Oct 2009 | 5:21:27 UTC

I've seen something similar to the display freeze problem under the 64-bit Windows versions of BOINC (at least 6.10.3), but the freeze is permanent enough that I've been unable to check if all the other software freezes as well. Seems to occur only when running both a GPUGRID workunit and a CPU workunit from some other project, and only if the CPU workunit has graphics that are big enough to fill the screen. I'm not familiar with the terms used to describe avoiding any use of the screensaver that comes with recent BOINC versions, if you want to do this under Linux versions, but I'd suggest trying this if you know how.

I've also seen something similar to the FIFO problem, on the same machine, but only when it downloads two CPU workunits from the same other project and the combined estimated time for the two is greater than that project's delay to the deadline, so they can't run on the same CPU core and still finish on time. 6.10.3 seems to think that workunits that have gone into high priority mode do not need to obey the limits on how much memory BOINC is allowed to use.
Also, simply installing 6.10.3 seems to increase the estimated times for running many types of workunits, especially those from The Lattice Project. There's little sign that those workunits often even need as much time as the initial estimate, so 6.10.3 may eventually adjust its database to give more accurate estimates.

Another of my machines, slower and without a GPU capable of running GPUGRID workunits, and still running BOINC 6.6.36, had two similar The Lattice Project workunits arrive with an initial runtime low enough to allow running them both on the same CPU core by one day before the deadline, and therefore 64-bit Windows BOINC 6.6.36 saw no reason to require running them on separate CPU cores. However it hasn't found any reason to download any CPU workunits form any of the BOINC projects it's connected to since then, even though one of its CPU cores is now idle.

So far, both of these problems appear to apply when using a 190.* driver, with no clear evidence on whether they also apply when using a 185.* driver, so I'd suggest looking for posts on whether there's a 185.* driver that works with recent GPUGRID workunits under Linux.

Also, both problems appears to apply when using a 9800 GPU card but likely not when using a GTX 260 or higher GPU card, so I'd suggest mentioning which GPU cards are involved in any further discussion of these problems. I'd also suggest mentioning whether your machine has enough GPUs to make SLI worthwhile - none of mine have more than one, and therefore don't seem to be allowed to even use SLI.

In case it makes a difference, none of my machines have a second graphics card to move its monitor to.

Post to thread

Message boards : Graphics cards (GPUs) : Desktop freezes

//