Advanced search

Message boards : Graphics cards (GPUs) : Mysterious effects with 6.3.14

Author Message
Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 2933 - Posted: 9 Oct 2008 | 22:25:28 UTC

Had some mysterious effects under 6.3.14 with parallel working tasks, don't sure if this is a problem of the 6.3.14.

LHC-WUs can't run with PS3Grid, PS3Grid stops if LHC is working.

PS3Grid can't run with Magnetism, compute error for the PS3Grid-WU.

Has anybody else made similar observations of tis side effects?
____________

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 2936 - Posted: 9 Oct 2008 | 23:01:26 UTC - in response to Message 2933.

Here's mine:

Need 6.3.14 Multi-GPU Help.....

Short version is everything seems to be running fine on my single GPU rigs, but the multi-gpu rig has issues....


HTH





Profile Venturini Dario[VENETO]
Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 3014 - Posted: 13 Oct 2008 | 8:21:11 UTC - in response to Message 2933.

I don't know if the things are connected but I had some problems with LHC after I started running PS3Grid with the 6.3.14 version.

All of my LHC-WUs completed succesfully but ALL resulted invalid. I'll investigate further...

Profile DoctorNow
Avatar
Send message
Joined: 18 Aug 07
Posts: 83
Credit: 122,995,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3015 - Posted: 13 Oct 2008 | 8:34:24 UTC - in response to Message 2933.
Last modified: 13 Oct 2008 | 8:38:44 UTC

LHC-WUs can't run with PS3Grid, PS3Grid stops if LHC is working.

I discovered that this is depending on how much LHC-WUs are running.
On my X2 the PS3Grid-task only did stop when both cores did crunch on a LHC WU. If there was only one busy with LHC, PS3 runs further.
(Encountered this with 6.3.10, maybe 6.3.14 handles this differently)


All of my LHC-WUs completed succesfully but ALL resulted invalid. I'll investigate further..

The LHC ones I finished in combination with PS3 did all validated correctly. Maybe it's also client depending, are you using 6.3.14?
____________
Member of BOINC@Heidelberg and ATA!

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 240,644,353
RAC: 4,720,322
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 3016 - Posted: 13 Oct 2008 | 9:04:02 UTC

I don't think the failing LHC WUs have something to do with PS3GID/GPUGRID, but with the 6.x.x clients..

From the LHC News -

05.09.2008 10:40 BST -
.... Also the SixTrack application and the newest version of the BOINC client (6.2.X) don't seem to like each other. The developers are looking into this problem and it will be fixed as soon as possible.


____________

pixelicious.at - my little photoblog

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3026 - Posted: 13 Oct 2008 | 15:59:48 UTC

One found many small glitches in the scheduler. I've been reporting these to the developer. You need to turn on debug to see some of what is going on. It may not affect everyone running here. I'm testing some on the GPU alpha project where we use less than 1 CPU. It's not a big problem, but client does thing like start 2 CPU tasks + 1 CPU/CUDA on a 2 CPU host. Other times when a CUDA end it will not start another until maybe 10 minutes later, It does however reserve the GPU, just not start any work. I spent 5-6 hours doing some debugging yesterday to find these problems.

We are close to a fully properly behaving clinet, but not quite there.

So please everyone be patient, It will happen soon. We will get a better client and you will not need to use ncpus+1 to use all CPUs and CUDA processing.

Another thing I found.

I cannot run MalariaControl's optimer app at same time as a CUDA task. It stops the CUDA task dead if I had 2 Malaria running along with a CPU/CUDA. That app runs as a wrapper app running some JAVA code. This seems to be the problem. Once I stopped running that, my GPU ms/step improved. Example - test tasks that are suppose to run 8 minutes, took 8 CPU minutes, spread out over 1, 3 or 4 plus wall hours. The other time they were held up sitting idle. BOINC would show as running even though cpu time did not increment. If only one Malaria was running CUDA would run some but not quite full speed. I'm not sure if it affects tasks runnng here now. I have to run some more, now with Malaria turned off, to see if it does or does not. You can turn the Malaria optiomizer app off so you don't get work in your preferences at MalariaControl. IF anyone else is running along side CUDA. Take a sample of your last 5 to 10 CUDA tasks, most importantly the ms/step recorded in the task detail. Then stop optimizer, when none is left on your computer, let more CUDA run. You need to run at least 3 to get a sample. See if your ms/step improves.

Profile ayQue
Send message
Joined: 6 Sep 08
Posts: 18
Credit: 806,771
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 3027 - Posted: 13 Oct 2008 | 16:35:31 UTC - in response to Message 3026.

One found many small glitches in the scheduler. I've been reporting these to the developer. You need to turn on debug to see some of what is going on. It may not affect everyone running here. I'm testing some on the GPU alpha project where we use less than 1 CPU. It's not a big problem, but client does thing like start 2 CPU tasks + 1 CPU/CUDA on a 2 CPU host. Other times when a CUDA end it will not start another until maybe 10 minutes later, It does however reserve the GPU, just not start any work. I spent 5-6 hours doing some debugging yesterday to find these problems.

We are close to a fully properly behaving clinet, but not quite there.

So please everyone be patient, It will happen soon. We will get a better client and you will not need to use ncpus+1 to use all CPUs and CUDA processing.


..that sounds great!! :) Thank you in advance for losing nerves or something like that while debugging for us... :)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3030 - Posted: 13 Oct 2008 | 18:50:14 UTC

Concerning the Malaria - GPU-Grid interaction:

Generally there are some problems with priorities and task scheduling in windows. This is not to say Linux is necessarily better, I just have more experience with Win. An example: if you run Matlab at "lowest priority" on all cores there are plenty of programs, which can not get CPU time any more, even if they run at "normal". It could be that the Malaria-app is quite more aggressive than it's task priority suggests.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3035 - Posted: 13 Oct 2008 | 20:26:15 UTC

I have found out it will be possible for the client to run 1 more task than CPUs, even with no ncpus+1 (which should not be used anyway).

This case will occur when there are two CPU tasks in deadline toruble, it will run those in addition to a CPU/CUDA task in order to keep the GPU in full use too.

Everyone should read this new wiki note which expalins the new behavior of 6.3 clients

The new behavior will be to make max use of CPUs and GPUs, even if it means sometimes using more CPU than you physically have or have allocated.

So don't panic when your dual core appears to becomes a tri core.

Post to thread

Message boards : Graphics cards (GPUs) : Mysterious effects with 6.3.14

//