Advanced search

Message boards : Graphics cards (GPUs) : Cuda Error

Author Message
Profile The Gas Giant
Avatar
Send message
Joined: 20 Sep 08
Posts: 54
Credit: 607,157
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 3058 - Posted: 15 Oct 2008 | 17:22:11 UTC

My last returned wu was invalid. In the stderr out file it reports

"Cuda error: Kernel [kick_drift_kernel] failed in file 'step.cu' in line 46".

The messages in BOINC were

16/10/2008 12:09:10 AM|PS3GRID|Computation for task Tq15782-GPUTEST3-5-10-acemd_1 finished
16/10/2008 12:09:10 AM|PS3GRID|Output file Tq15782-GPUTEST3-5-10-acemd_1_1 for task Tq15782-GPUTEST3-5-10-acemd_1 absent
16/10/2008 12:09:10 AM|PS3GRID|Output file Tq15782-GPUTEST3-5-10-acemd_1_2 for task Tq15782-GPUTEST3-5-10-acemd_1 absent
16/10/2008 12:09:10 AM|PS3GRID|Output file Tq15782-GPUTEST3-5-10-acemd_1_3 for task Tq15782-GPUTEST3-5-10-acemd_1 absent

Any ideas?

Paul.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 3063 - Posted: 15 Oct 2008 | 19:36:04 UTC - in response to Message 3058.

This is usually given by an overclocked unstabled system.

gdf

Profile The Gas Giant
Avatar
Send message
Joined: 20 Sep 08
Posts: 54
Credit: 607,157
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 3064 - Posted: 15 Oct 2008 | 20:03:28 UTC

Thanks. I had increased the OC a little more last night....damn. Now set back to where it was.

Profile The Gas Giant
Avatar
Send message
Joined: 20 Sep 08
Posts: 54
Credit: 607,157
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 3453 - Posted: 28 Oct 2008 | 20:42:49 UTC

Another error on this wu. Temperatures were all at the cool portion of the day. Why does this happen only at the end of a wu? The last 7 days ran fine.

Profile The Gas Giant
Avatar
Send message
Joined: 20 Sep 08
Posts: 54
Credit: 607,157
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 3493 - Posted: 29 Oct 2008 | 20:29:07 UTC

Another error. Again at the end of the wu.

30/10/2008 2:43:00 AM|PS3GRID|Computation for task NG16509-GPUTEST4-3-10-acemd_0 finished
30/10/2008 2:43:00 AM|PS3GRID|Output file NG16509-GPUTEST4-3-10-acemd_0_1 for task NG16509-GPUTEST4-3-10-acemd_0 absent
30/10/2008 2:43:00 AM|PS3GRID|Output file NG16509-GPUTEST4-3-10-acemd_0_2 for task NG16509-GPUTEST4-3-10-acemd_0 absent
30/10/2008 2:43:00 AM|PS3GRID|Output file NG16509-GPUTEST4-3-10-acemd_0_3 for task NG16509-GPUTEST4-3-10-acemd_0 absent

Hmmm.....

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3498 - Posted: 30 Oct 2008 | 8:55:57 UTC

Seems like you're getting them rather frequently now. Lower your OC?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile The Gas Giant
Avatar
Send message
Joined: 20 Sep 08
Posts: 54
Credit: 607,157
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 3518 - Posted: 30 Oct 2008 | 19:38:08 UTC

If it was excessive OC then I have a couple of questions:

1. Why does the wu run to the end before it has a problem?

2. Why did it run fine for 7 days and the 5 days before that at the same OC settings?

3. My wu's are currently ending in the early hours of the morning, so temps are quite cool. (ok a statement not a question) Dust filters are also clean. Yesterday was very warm and that wu completed without error. Hmmm.

Live long and BOINC!


ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3521 - Posted: 30 Oct 2008 | 21:21:14 UTC

Errors due to OC can be highly random, especially when you are sitting just at the border to stability. You'd expect it to be a bit more systematic than what you're seing, but GDFs "This is usually given by an overclocked unstable system."
should really ring your alarm bells.

I'd say switch the machine off, physically disconnect the power cord for >15 min and try again. If you still get errors lower the OC by at least 54 MHz shader and 27 MHz core (both correspond to one clock speed step) and see what you get.

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Cuda Error

//