Advanced search

Message boards : Graphics cards (GPUs) : GTS 250 65nm G92 Rev A2 - Successes and Failures

Author Message
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13836 - Posted: 8 Dec 2009 | 18:08:31 UTC

Below is a list of tasks, from 1st Dec 09 to 7th Dec 09, that my GTS250 managed to complete successfully:

151-KASHIF_HIVPR_sub_so_ba2-73-100-RND7713_0
39-GIANNI_BIND_2-32-100-RND1522
38-IBUCH_2_reverse_TRYP_0911-11-40-RND1649
D160-TONI_HERGdof5-5-40-RND0496
467-GIANNI_BIND_166_119-32-100-RND4596
92-KASHIF_HIVPR_twomons_ba2-72-100-RND5413
p1515000-IBUCH_3_pYEEI_2011-13-20-RND1885
98-KASHIF_HIVPR_n1_for_1hhp_open_ba5-81-100-RND2003
70-GIANNI_BIND_166_119-43-100-RND0394
315-GIANNI_BIND_166_119-33-100-RND5680

Last week the same GTS 250 failed Four tasks, All ...TONI-HERG...
TONI-HERG is BAD for this GTS 250 Card, so any I get for the card will be Aborted By User;

They failed after the following amounts of time,
38565, 15028, 19544 and 3083 seconds.
That is a total of 21h of lost crunching, or 12.5% lost time, last week.
The previous week it was twice that, 25% lost time, when I had more of these Bad Work Units.

So things are looking up again for the old 65nm G92 Rev A2 card.

Using Boinc 6.10.18, Driver 19539, CUDA 3000.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13979 - Posted: 18 Dec 2009 | 8:45:49 UTC - in response to Message 13836.
Last modified: 18 Dec 2009 | 8:46:21 UTC

Between the 10th Dec and 17th Dec by my GTS250 had 5 Errors and 9 Successes.
19h 15min were lost (69206s), or 11.5% of the time. Slightly better than the previous week (12.5) and still much better that the week before (25%).

Since the 13th there has only been one failure, although it did fail after 9h30min!
I suspect that failure was as a result of the task being run when I was using the system. So I made sure it does not run GPUGrid when I am using it (which is not too often)!

All Error messages have the following line,
MDIO ERROR: cannot open file "restart.coor"

List of tasks undertaken:
1633773 1024720 15 Dec 2009 23:57:30 UTC 16 Dec 2009 13:41:29 UTC Completed and validated 47,063.52 4,500.40 3,977.21 5,369.23 Full-atom molecular dynamics v6.71 (cuda23)
1632027 1023349 15 Dec 2009 6:17:34 UTC 15 Dec 2009 23:57:30 UTC Error while computing 34,324.59 1,311.34 4,428.01 --- Full-atom molecular dynamics v6.71 (cuda23)
1629991 1022109 14 Dec 2009 15:42:01 UTC 15 Dec 2009 11:17:44 UTC Completed and validated 52,633.66 3,102.44 4,503.74 6,080.05 Full-atom molecular dynamics v6.71 (cuda23)
1627586 1020487 14 Dec 2009 0:18:19 UTC 14 Dec 2009 20:42:11 UTC Completed and validated 55,474.79 3,033.25 4,531.91 6,118.08 Full-atom molecular dynamics v6.71 (cuda23)
1625604 1007426 13 Dec 2009 11:55:40 UTC 14 Dec 2009 6:21:57 UTC Completed and validated 52,461.03 2,915.99 4,503.74 6,080.05 Full-atom molecular dynamics v6.71 (cuda23)
1624544 1018750 12 Dec 2009 11:21:41 UTC 13 Dec 2009 11:53:58 UTC Completed and validated 55,578.08 3,299.06 4,531.91 6,118.08 Full-atom molecular dynamics v6.71 (cuda23)
1624517 1018739 12 Dec 2009 10:56:14 UTC 12 Dec 2009 11:14:49 UTC Error while computing 1,015.11 54.30 4,022.81 --- Full-atom molecular dynamics v6.71 (cuda23)
1624470 1018708 12 Dec 2009 10:28:30 UTC 12 Dec 2009 10:34:45 UTC Error while computing 265.39 18.05 4,428.01 --- Full-atom molecular dynamics v6.71 (cuda23)
1622530 1013606 11 Dec 2009 20:43:48 UTC 12 Dec 2009 10:28:30 UTC Error while computing 32,402.90 1,903.20 4,531.91 --- Full-atom molecular dynamics v6.71 (cuda23)
1620740 1016195 11 Dec 2009 7:20:21 UTC 12 Dec 2009 5:01:01 UTC Completed and validated 50,106.54 2,207.99 4,022.81 5,430.80 Full-atom molecular dynamics v6.71 (cuda23)
1620010 1015882 12 Dec 2009 10:34:45 UTC 12 Dec 2009 10:56:14 UTC Error while computing 1,199.04 117.81 3,977.21 --- Full-atom molecular dynamics v6.71 (cuda23)
1617985 1014436 10 Dec 2009 10:43:53 UTC 11 Dec 2009 16:01:44 UTC Completed and validated 45,358.59 5,275.05 3,539.96 4,778.94 Full-atom molecular dynamics v6.71 (cuda23)
1616001 1013057 9 Dec 2009 23:10:00 UTC 10 Dec 2009 18:54:39 UTC Completed and validated 56,684.09 3,287.19 4,531.91 6,118.08 Full-atom molecular dynamics v6.71 (cuda23)

Failure 1:

________________________________________

Name p270000-IBUCH_2_pYEEI_2011-5-20-RND2486_0
Workunit 1015882

Created 11 Dec 2009 1:24:50 UTC
Sent 12 Dec 2009 10:34:45 UTC
Received 12 Dec 2009 10:56:14 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 51279

Report deadline 17 Dec 2009 10:34:45 UTC
Run time 1199.038244
CPU time 117.812
stderr out <core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTS 250"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [pme_fill_charges_overflow] failed in file 'fillcharges.cu' in line 97 : unknown error.

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 3977.21064814815
Granted credit 0
application version Full-atom molecular dynamics v6.71 (cuda23)

Failure 2:

________________________________________

Name 471-GIANNI_BIND_166_119-30-100-RND4009_1
Workunit 1013606

Created 11 Dec 2009 20:08:53 UTC
Sent 11 Dec 2009 20:43:48 UTC
Received 12 Dec 2009 10:28:30 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 51279

Report deadline 16 Dec 2009 20:43:48 UTC
Run time 32402.901728
CPU time 1903.197
stderr out <core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTS 250"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error.

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 4531.90972222222
Granted credit 0
application version Full-atom molecular dynamics v6.71 (cuda23)

Failure 3:

________________________________________

Name 88-KASHIF_HIVPR_n1_for_1hhp_open_ba4-78-100-RND1283_0
Workunit 1018708

Created 12 Dec 2009 9:52:07 UTC
Sent 12 Dec 2009 10:28:30 UTC
Received 12 Dec 2009 10:34:45 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 51279

Report deadline 17 Dec 2009 10:28:30 UTC
Run time 265.390701
CPU time 18.04932
stderr out <core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTS 250"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error.

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 4428.01157407407
Granted credit 0
application version Full-atom molecular dynamics v6.71 (cuda23)

Failure 4:
Name34-KASHIF_HIVPR_sub_so_ba1-72-100-RND1262_0 Workunit1018739 Created12 Dec 2009 10:17:10 UTC Sent12 Dec 2009 10:56:14 UTC Received12 Dec 2009 11:14:49 UTC Server stateOver OutcomeClient error Client stateCompute error Exit status1 (0x1) Computer ID51279 Report deadline17 Dec 2009 10:56:14 UTC Run time1015.111446 CPU time54.30395 stderr out
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTS 250"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error.

</stderr_txt>
]]>
Validate stateInvalid Claimed credit4022.81481481481 Granted credit0 application versionFull-atom molecular dynamics v6.71 (cuda23)

Failure 5:
Name89-KASHIF_HIVPR_n1_for_1hhp_open_ba4-78-100-RND7252_1 Workunit1023349 Created15 Dec 2009 5:43:05 UTC Sent15 Dec 2009 6:17:34 UTC Received15 Dec 2009 23:57:30 UTC Server stateOver OutcomeClient error Client stateCompute error Exit status1 (0x1) Computer ID51279 Report deadline20 Dec 2009 6:17:34 UTC Run time34324.593376 CPU time1311.344 stderr out
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTS 250"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTS 250"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 16
# Number of cores: 128
Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error.

</stderr_txt>
]]>
Validate stateInvalid Claimed credit4428.01157407407 Granted credit0 application versionFull-atom molecular dynamics v6.71 (cuda23)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14065 - Posted: 27 Dec 2009 | 19:42:19 UTC - in response to Message 13979.
Last modified: 27 Dec 2009 | 19:43:25 UTC

From the 18th Dec 2009 to the 24th my GTS250 successfully completed 7 tasks in a row, and averaged over 7000 points per day, with tasks completing in between 46000 and 60000 seconds.
On the 24th there was a failure after 2seconds, 143-IBUCH_reverse1fix_pYEEI_2312-0-40-RND2977, from a known bad batch of tasks, and then a TONI_HERG task failed after 14,135 seconds. Surprisingly that task succeeded on a GeForce 9600 GT despite failing on 2 additional systems.

No failures from the 24th to the 27th.
So, since the 18th Dec (9 days ago) my GTS250 has only lost 14138seconds.
(777600 – 14138) / 777600sec = 98% successful GPU processing time!

A Huge improvement.


Techs and Scientists - Thank You,

Post to thread

Message boards : Graphics cards (GPUs) : GTS 250 65nm G92 Rev A2 - Successes and Failures

//