Author |
Message |
K1atOdessaSend message
Joined: 25 Feb 08 Posts: 249 Credit: 392,702,681 RAC: 1,417,376 Level
Scientific publications
|
Is there something particularly different about the IBUCH_TRYP WU's? I have had a 100% error rate on these WU's over the past several days, while have had very few issues with other types of WU's.
Seems odd that it seems to affect the one type of WU more than others. I do generally have one error every now and then historically, but the past week has been terrible. I removed all overclocking, but that hasn't helped the IBUCH_TRYP WU's. Any ideas? I'd prefer not to have to just abort all these as they do not appear to work on my system (2x 8800GT, 1x 9500GT).
Checking the WU's, it appears others have had issues with a lot of those same WU's as well. Some have been successfully finished, but there are errors on 8800/9800 GT's and GTX2xx cards (though the GTX2xx seem to fair a little better on these WU's). |
|
|
|
Now you come to mention it, four of the seven errors currently visible across my three cards are iBUCH_TRYP, though all are different sub-types (kickout, kickout1, kickin and reverse). reverse was particularly irritating - it ran for 52,500 seconds before erroring: most of the others have been relatively quick and painless.
On the other hand, I can see three successful runs too (repro, kickin and 169).
All my cards are 9800GT variants from the same manufacturer: two have both successes and failures, so no obvious explanation there. |
|
|
K1atOdessaSend message
Joined: 25 Feb 08 Posts: 249 Credit: 392,702,681 RAC: 1,417,376 Level
Scientific publications
|
Earlier this afternoon I had all WU's error out (at least those running on the 9500GT). No change in my system, no new drivers, new new BOINC, no OC'ing. Temps lower than normal given we're getting into fall. Not sure what happened, but I rebooted. Now I have 3 WU's in process and no errors after about an hour (all the errors today were after a few seconds). Two IBUCH_TRYP WU's in the queue, so we'll see if it was just some weird thing with my system that gave much higher than normal error rates. Hopefully that is the case, though I don't have an explanation. |
|
|
|
Looking at a fuller log, six out of my last 13 errors have been IBUCH_TRYP, but OTTO_HERG are almost as bad: 4 out of 13. I've had errors with OTTO_HERG4, 5, 7 and 8. |
|
|
|
It's particularly annoying when it happens almost 80% of the way through a long task: compare the last two tasks for host 43404. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
The IBUCH_TRYP WUs are failing on lesser cards and succeeding on the GTX 260 and above.
Here's 5 that failed on multiple "below 260" cards and then succeeded on GTX 260 or higher:
http://www.gpugrid.net/workunit.php?wuid=947203
http://www.gpugrid.net/workunit.php?wuid=939689
http://www.gpugrid.net/workunit.php?wuid=936708
http://www.gpugrid.net/workunit.php?wuid=939220
http://www.gpugrid.net/workunit.php?wuid=941652
It also happens on some other WU types but not as often:
http://www.gpugrid.net/workunit.php?wuid=934230
http://www.gpugrid.net/workunit.php?wuid=939398
In general there seems to be more and more a trend toward WUs not running on sub GTX 260 GPUs. A very bad trend IMO.
|
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Another WU that's failed on 4 machines, including a GTX 295:
http://www.gpugrid.net/workunit.php?wuid=956237 |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Another WU that's failed on 4 machines, including a GTX 295:
http://www.gpugrid.net/workunit.php?wuid=956237
This one too was finally successfully completed by a GTX 280.
|
|
|
AndrewSend message
Joined: 9 Dec 08 Posts: 29 Credit: 18,754,468 RAC: 0 Level
Scientific publications
|
Cheers for this thread guys.
I just looked at my previous 5 failures which failed mysteriously with no OC - they failed either on HERG or TRYP tasks. Each of these tasks had another 8800gt/9800gt fail on them, before being successfully completed by a GT260.
Is there a way for the server to detect the graphics card model and send different types of task? I don't want to be wasting resources on tasks which are going to fail. |
|
|
K1atOdessaSend message
Joined: 25 Feb 08 Posts: 249 Credit: 392,702,681 RAC: 1,417,376 Level
Scientific publications
|
Is there a way for the server to detect the graphics card model and send different types of task? I don't want to be wasting resources on tasks which are going to fail.
That would be nice. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Is there a way for the server to detect the graphics card model and send different types of task? I don't want to be wasting resources on tasks which are going to fail.
That would be nice.
Now the new TONI_HERG WUs are failing on the sub GTX 260 cards :-(
|
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Is there a way for the server to detect the graphics card model and send different types of task? I don't want to be wasting resources on tasks which are going to fail.
That would be nice.
Now the new TONI_HERG WUs are failing on the sub GTX 260 cards :-(
They are suggesting a G200-based card as the minimum these days for here, so its quite likely that none of them will run properly unless you have GTX2xx card.
____________
BOINC blog |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
My GTX260 is doing fine. The other cards are not fairing so well. Both Compute Capable 1.1 cards are struggling. One is a GTS250 and the other an 8800 512MB. The problems seem to have accelerated around 28th Nov for some reason. Prior to that I was getting about 25% failure rates with most failing early on. Now they are failing at any stage, often after about 10h of work!
1564214 979457 27 Nov 2009 19:22:11 UTC 28 Nov 2009 14:48:35 UTC
Error while computing 34,574.43 1,953.66 4,503.74
1573342 985406 30 Nov 2009 2:15:52 UTC 1 Dec 2009 0:47:42 UTC
Error while computing 39,950.28 2,110.41 4,503.74
Name D370-TONI_HERGdof5-1-40-RND4152_0
Workunit 979457
Created 27 Nov 2009 18:35:19 UTC
Sent 27 Nov 2009 19:22:11 UTC
Received 28 Nov 2009 14:48:35 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 51279
Report deadline 2 Dec 2009 19:22:11 UTC
Run time 34574.428879
CPU time 1953.663
stderr out
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTS 250"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [pme_fill_charges_accumulate] failed in file 'fillcharges.cu' in line 73 : unknown error.
</stderr_txt>
]]>
Validate state Invalid
Claimed credit 4503.73958333333
Granted credit 0
application version Full-atom molecular dynamics v6.71 (cuda) |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Is there a way for the server to detect the graphics card model and send different types of task? I don't want to be wasting resources on tasks which are going to fail.
That would be nice.
Now the new TONI_HERG WUs are failing on the sub GTX 260 cards :-(
They are suggesting a G200-based card as the minimum these days for here, so its quite likely that none of them will run properly unless you have GTX2xx card.
Actually on the front page they say this:
Graphics card:
* (one or more)Recommended: Geforce GTX 250-275-280-285-295, Tesla10
What the heck is a GTX 250? Are they now not supporting the GTX 260?
Any information would sure be useful...
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The Home page has a mistake, and they have been told about it.
The same mistake has been made many times by many people, including me.
There is a GTX260 and a GTS250, but no GTX250!
The GTX260 cards can either have 196 or 216 shaders. Usually a card with 216 shaders will have 216sp on the box.
The GTS250 does not use a G200 core. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
The Home page has a mistake, and they have been told about it.
The same mistake has been made many times by many people, including me.
There is a GTX260 and a GTS250, but no GTX250!
The GTX260 cards can either have 196 or 216 shaders. Usually a card with 216 shaders will have 216sp on the box.
The GTS250 does not use a G200 core.
Exactly, so why don't they bother to correct the main page? That was my point :-)
So is the GTX 260 on the approved list or not? Mine runs all WUs fine.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The GTX260 216sp is on My List!
My one works Perfectly - 100% success during last 2 weeks running 24/7.
Its a Palit GTX260 216sp, with 2 fans, and it sits at about 62 degrees C.
They have GT200 55nm cores.
People should note that the GTS 250 uses a G92 core which is 65nm.
The one I have is struggling at the minute. 25% fail time. But I will keep it attached for now.
I had to stop trying to support the project with my 8800 GTS 512, which was not far off the performance of the GTS 250 - until recently that is, when most tasks started to fail.
It might go towards the deposit for a G300 card, if they ever turn up! |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
We have corrected the main page for the 250.
gdf |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks, |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
The GTX260 216sp is on My List!
My one works Perfectly - 100% success during last 2 weeks running 24/7.
Its a Palit GTX260 216sp, with 2 fans, and it sits at about 62 degrees C.
They have GT200 55nm cores.
People should note that the GTS 250 uses a G92 core which is 65nm.
The one I have is struggling at the minute. 25% fail time. But I will keep it attached for now.
I had to stop trying to support the project with my 8800 GTS 512, which was not far off the performance of the GTS 250 - until recently that is, when most tasks started to fail.
It might go towards the deposit for a G300 card, if they ever turn up!
My GTX 260 works perfectly too but it's not on THEIR list. My G92 based cards work for most but not at types of WUs. When the 5 NVidia cards I have stop working here I'm gone. There doesn't seem to be much of an effort to keep the project running with hardware that up until the last month ran fine. The ATI initiative is going in the wrong direction too IMO. They're only supporting a VERY few top end cards. If OpenCL is so limited why not use CAL, which is used effectively by much smaller projects? So 2 codebases would have to be supported, big deal. Smaller projects support many more. I've been transitioning toward ATI cards (the 40nm based HD 4770 for the high energy efficiency and double precision support), but they probably won't work here so I'll go/stay somewhere that they will like MilkyWay and Collatz. Simple as that, no hard feelings...
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
My GTX260 will be here for a while and my GTS250 is hanging in there for now.
I would have liked to be able to add my ATI 4850 to the project, but I cant see that happening. For now it works away on Folding@home tasks.
The 8800 GTS 512 is now also on Folding@home, as it did not seem to like too many of the recent GPUGrid tasks and got to the pnint it was sitting idle.
When I looked into it TONI-HERG was the main culprate for my GTS250 and the 8800, but the 8800 was also failing other tasks, IBUCH and some GIANNI that the GTS250 was getting through. Fortunately there are other tasks now that are keeping my GTS productive. |
|
|
canardoSend message
Joined: 11 Feb 09 Posts: 4 Credit: 8,675,472 RAC: 0 Level
Scientific publications
|
Just finished an IBUCH .... erev
http://www.gpugrid.net/result.php?resultid=1707999
on a 250
http://www.gpugrid.net/show_host_detail.php?hostid=26091 .
looks like you found a way around the bug. Congrats
Now Tony_Herg
Ciao
____________
|
|
|