Advanced search

Message boards : News : acemdshort application 8.15 - discussion

Author Message
Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32034 - Posted: 18 Aug 2013 | 13:02:34 UTC

Dear all,

I'm working on putting out updated Windows and Linux applications that will have full support for 780s and Titans. If you have one of these card types, please sub to "acemdbeta" and try some WUs. The beta app version is 7.02 for both architectures.

Please post experiences here.

MJH

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32035 - Posted: 18 Aug 2013 | 13:40:26 UTC - in response to Message 32034.
Last modified: 18 Aug 2013 | 13:48:46 UTC

As Thomas Edison said
"I have not failed. I simply found 10 000 solutions that do not work. ".

Always error.
http://www.gpugrid.net/result.php?resultid=7161475

Good luck.

GeForce 326.58
https://developer.nvidia.com/opengl-driver
Boinc 7.2.11 (x64)
http://boinc.berkeley.edu/dl/?C=M;O=D
Win 8 Pro 64

@+
*_*
____________

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32036 - Posted: 18 Aug 2013 | 13:53:54 UTC - in response to Message 32035.

Get one 7.01 app for linux and failed.

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32054 - Posted: 18 Aug 2013 | 19:33:08 UTC - in response to Message 32035.
Last modified: 18 Aug 2013 | 19:40:56 UTC

As Thomas Edison said
"I have not failed. I simply found 10 000 solutions that do not work. ".

Always error.
http://www.gpugrid.net/result.php?resultid=7161475

Good luck.

GeForce 326.58
https://developer.nvidia.com/opengl-driver
Boinc 7.2.11 (x64)
http://boinc.berkeley.edu/dl/?C=M;O=D
Win 8 Pro 64

@+
*_*


That task used "Long runs (8-12 hours on fastest card) v6.18 (cuda42)", not "ACEMD beta version v7.02 (cuda42)".
____________
Reno, NV
Team: SETI.USA

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32055 - Posted: 18 Aug 2013 | 19:40:15 UTC
Last modified: 18 Aug 2013 | 19:45:47 UTC

2 of 2 failed:

http://www.gpugrid.net/result.php?resultid=7161475
http://www.gpugrid.net/result.php?resultid=7162501

Is there a particular version of driver equipped? FWIW, this win7 machine is on 326.41 BETA.
____________
Reno, NV
Team: SETI.USA

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32057 - Posted: 18 Aug 2013 | 22:06:26 UTC

Out of 6 that ran, one completed and validated.

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32059 - Posted: 18 Aug 2013 | 22:18:08 UTC - in response to Message 32055.

Is there a particular version of driver equipped?


Stupid autocorrect. That should be, "Is there a particular version of the driver required?"
____________
Reno, NV
Team: SETI.USA

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32068 - Posted: 19 Aug 2013 | 17:55:54 UTC - in response to Message 32059.
Last modified: 19 Aug 2013 | 18:15:45 UTC

703 is now built with CUDA 5.5, so driver 319.17 is the minimum required to run it.

For Titan and 780s cards, the driver MUST be 326.41 or later. Use any earlier version and you'll get frequent crashes.

MJH

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32070 - Posted: 19 Aug 2013 | 19:27:42 UTC

Looks like you get crashes even with 326.41. So maybe it is a problem with the app?
____________
Reno, NV
Team: SETI.USA

Husu*
Send message
Joined: 25 Mar 09
Posts: 8
Credit: 100,005,181
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32072 - Posted: 19 Aug 2013 | 21:15:32 UTC

Maybe post this also in the proper thread, sorry for double post:

ACEMD beta version v7.03 (cuda42) seems to be working ok, have gotten 3 work units so far all success:

http://www.gpugrid.net/result.php?resultid=7169488
http://www.gpugrid.net/result.php?resultid=7170042
http://www.gpugrid.net/result.php?resultid=7170118

This is with Windows 7 x64 SP1, GTX Titan in Double Precision mode and Nvidia driver: 326.41

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32074 - Posted: 19 Aug 2013 | 22:25:30 UTC

Several 7.03 MJHARVEY ran successfully:

http://www.gpugrid.net/results.php?hostid=139142

Titan (DP off)
326.41
W7 x64
BOINC 7.2.5

Looks like it downloaded a couple of dll's to my GPUGrid data folder.

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32075 - Posted: 19 Aug 2013 | 22:34:24 UTC - in response to Message 32074.

Ok for me,

http://www.gpugrid.net/results.php?userid=5128

@+
*_*
____________

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32076 - Posted: 19 Aug 2013 | 23:23:34 UTC

Ditto:

Stderr output
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
# Time per step (avg over 25000 steps): 1.684 ms
# Approximate elapsed time for entire WU: 42.088 s
18:22:16 (3516): called boinc_finish

</stderr_txt>
]]>


Way to go GPUgrid team. Congrats.

Look forward to bringing my 780s over here as soon as they're switched to production tasks.

Cheers

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32078 - Posted: 20 Aug 2013 | 2:45:15 UTC

I see that there is a new app for windows just posted today (7.03), so I decided to give it another shot. This time success!

http://www.gpugrid.net/result.php?resultid=7173196
http://www.gpugrid.net/result.php?resultid=7173177

Run times for each was less than a minute.
____________
Reno, NV
Team: SETI.USA

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32080 - Posted: 20 Aug 2013 | 10:06:44 UTC

Good, 703 worked ok. There's a new revision 704; please try that out.

Profile (retired account)
Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 32081 - Posted: 20 Aug 2013 | 11:51:46 UTC - in response to Message 32080.
Last modified: 20 Aug 2013 | 11:52:45 UTC

Good, 703 worked ok.


Indeed, my Titan did 440 (!) beta v7.03 workunits while I slept and not a single error.

There's a new revision 704; please try that out.


Some validate ok, some error out after a few seconds.

http://www.gpugrid.net/result.php?resultid=7176884 failed
http://www.gpugrid.net/result.php?resultid=7176874 ok
http://www.gpugrid.net/result.php?resultid=7176867 failed
http://www.gpugrid.net/result.php?resultid=7176799 ok
http://www.gpugrid.net/result.php?resultid=7176798 failed
http://www.gpugrid.net/result.php?resultid=7176782 failed

I switched to DP mode for the last three workunits (which also limits the clock of the Titan as you know), but no difference apparently (one ok, two failed).

http://www.gpugrid.net/result.php?resultid=7176773 ok
http://www.gpugrid.net/result.php?resultid=7176914 failed
http://www.gpugrid.net/result.php?resultid=7176734 failed

This is with nVidia driver 326.41, BOINC 7.2.11 and on Windows 7 SP1 64bit.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32082 - Posted: 20 Aug 2013 | 12:05:43 UTC

Is anyone seeing failure of 704 on a card that IS NOT at Titan or 780?

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32083 - Posted: 20 Aug 2013 | 12:29:52 UTC - in response to Message 32080.

Good, 703 worked ok. There's a new revision 704; please try that out.


Over night, my TITAN ran 50x 7.03 tasks. All validated.

It also ran 3x 7.04 tasks. All failed. Here is a sample:

http://www.gpugrid.net/result.php?resultid=7177032
____________
Reno, NV
Team: SETI.USA

Profile bundaboy
Send message
Joined: 20 Nov 10
Posts: 6
Credit: 1,046,334,951
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32084 - Posted: 20 Aug 2013 | 12:30:59 UTC - in response to Message 32082.

Is anyone seeing failure of 704 on a card that IS NOT at Titan or 780?

Just got 7 of them on my GTX460SE (314.22, WinXP), all ended and validated, no error.
____________


Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32085 - Posted: 20 Aug 2013 | 12:50:00 UTC - in response to Message 32082.
Last modified: 20 Aug 2013 | 12:50:47 UTC

Is anyone seeing failure of 704 on a card that IS NOT at Titan or 780?

It's working fine on my GTX 670.
It is a CUDA5.5 app? (according to its name in the BOINC manager it's a CUDA4.2 app, but the BOINC manager downloaded a couple of CUDA5.5 dll's with this beta app)

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32086 - Posted: 20 Aug 2013 | 12:54:09 UTC

I've run 10+ tasks on my machine with a 590 (dual GPU) and a 580. No problems at all. 320.49 driver. Should I try the 326.41 BETA driver too?


____________
Reno, NV
Team: SETI.USA

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32087 - Posted: 20 Aug 2013 | 13:53:12 UTC - in response to Message 32086.
Last modified: 20 Aug 2013 | 13:57:23 UTC

7.04 all failed for my Titan

http://www.gpugrid.net/results.php?userid=5128&offset=0&show_names=0&state=0&appid=

@+
*_*
____________

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32088 - Posted: 20 Aug 2013 | 14:16:39 UTC

3 failed, and 1 completed on 780 w/ latest drivers

Profile (retired account)
Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 32089 - Posted: 20 Aug 2013 | 14:41:28 UTC - in response to Message 32082.

Is anyone seeing failure of 704 on a card that IS NOT at Titan or 780?


No failures on a GT650M (mobile, Kepler, 384 CUs), BOINC 7.0.64, driver 320.49.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32090 - Posted: 20 Aug 2013 | 14:44:22 UTC

Ok, thanks. Looks like we'll need two apps, CUDA 5.5 (cc 3.5 or driver > 325) and CUDA 4.2 for everything else.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32091 - Posted: 20 Aug 2013 | 15:12:03 UTC

Do you guys think there will be any performance improvements with 5.5?

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,308,230,581
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32092 - Posted: 20 Aug 2013 | 15:18:02 UTC
Last modified: 20 Aug 2013 | 15:20:36 UTC

Is anyone seeing failure of 704 on a card that IS NOT at Titan or 780?
GTX 560Ti W7 64bit driver 320.19: no failures
GTX 260 rev A.2 (65nm) W7 64bit driver 326.80: no failures! That is great.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32093 - Posted: 20 Aug 2013 | 15:52:12 UTC - in response to Message 32091.


Do you guys think there will be any performance improvements with 5.5?


Only for cc 3.5, in the sense that those cards will now work!

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32094 - Posted: 20 Aug 2013 | 16:30:29 UTC
Last modified: 20 Aug 2013 | 16:44:40 UTC

acemdbeta version 705 now has two variants -

* 705-55 for hosts with cc >= 1.3 and driver >=315.25

* 705-42 for hosts with cc >=1.3 and <3.5 and driver between 295.41 and 315.25.

We've had to do this because:

1) The Windows Cuda 4.2 doesn't work reliably with cc 3.5 cards, and Cuda 5.5 requires a driver too new for general deployment.

2) The cc 3.5 cards (Titan, 7xx, GT640) need the latest driver[1] to fix a stability problem.

If you have a cc 3.5 card but a driver before 315.25, you will receive NO WORK![2].

Unfortunately, Windows users with a cc 3.5 card, Windows, and driver >315.25 and < 326.41 will get work but should expect frequent crashes. If you're in this situation, please upgrade the driver.

MJH
 
[1] Currently this means only 315.25 for Linux and 326.41 for Windows
[2] Assuming the BOINC scheduler does what it's told...

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32095 - Posted: 20 Aug 2013 | 16:52:47 UTC - in response to Message 32094.
Last modified: 20 Aug 2013 | 17:00:21 UTC

7.05 OK for my TITAN.

http://www.gpugrid.net/results.php?userid=5128

I can reselect all the options preferences calculation GPUGRID my account to receive all types of units ? or must wait until the Beta units?

Great to see I finally end of the tunnel, congratulations.

GeForce 326.58
https://developer.nvidia.com/opengl-driver
Boinc 7.2.11 (x64)
http://boinc.berkeley.edu/dl/?C=M;O=D
Win 8 Pro 64
Double precision = Off


@+
*_*
____________

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32097 - Posted: 20 Aug 2013 | 17:05:49 UTC

Heading to work so can't test them, but nvidia today released driver 326.80.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32098 - Posted: 20 Aug 2013 | 17:15:48 UTC

Two 7.05(cuda55) downloaded and ran ok.
http://www.gpugrid.net/results.php?hostid=139142

Titan (DP off)
326.41
W7 x64
BOINC 7.2.5

Profile (retired account)
Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 32099 - Posted: 20 Aug 2013 | 17:17:35 UTC - in response to Message 32094.
Last modified: 20 Aug 2013 | 17:23:44 UTC

Unfortunately, Windows users with a cc 3.5 card, Windows, and driver >315.25 and < 326.41 will get work but should expect frequent crashes.


MJH,

understood and IMHO good enough for beta. But can't you take the OS into the equation and send 705-55 only to Windows hosts with cc >= 1.3 and driver >=326.41?

(And 705-42 on Windows of course only to hosts with cc >=1.3 and <3.5 and driver between 295.41 and 326.41?)

I can reselect all the options preferences calculation GPUGRID my account to receive all types of units ?


Zarck,

don't think so, at least not for Titan or 780 until a Cuda 5.5 application is released for short and long runs (non-beta):

http://www.gpugrid.net/apps.php

eMPee584
Avatar
Send message
Joined: 7 Mar 09
Posts: 5
Credit: 42,650
RAC: 0
Level

Scientific publications
wat
Message 32101 - Posted: 20 Aug 2013 | 17:37:38 UTC

Unfortunately, the 7.05 beta still silently fails here (http://www.gpugrid.net/forum_thread.php?id=3437). Also, my attempts to attach tracing to the short-lived process when it comes available

while true; do pid=$(pgrep -f acemd) && pstree -alcpsU $(pidof boinc) && strace64 -vfp $pid; done

are also failing:
strace64: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

Same with
while true; do pid=$(pgrep -f acemd) && gdb -p $pid; done

GNU gdb (GDB) 7.6 (Debian 7.6-5)
This GDB was configured as "i486-linux-gnu".
Attaching to process 8561
warning: process 8561 is a zombie - the process has already terminated
ptrace: Operation not permitted.

I understand that you can not put effort into supporting each and every thinkable system, but it must be a small problem and if I'd better understand the cause it'd be easy to fix imho.
Is there the possibility to put a tiny bit more debugging output in the beta versions?

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32102 - Posted: 20 Aug 2013 | 17:59:16 UTC - in response to Message 32094.

Ubuntu 13.04 64 bit Titan. 325 driver got 7.05 cuda55.

All is ok.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32103 - Posted: 20 Aug 2013 | 18:03:34 UTC - in response to Message 32101.

Dear eMPee584,

As I said in reply to your other post - you are most likely missing some important component of a normal 64-bit userspace.

If you wander off into the long grass, don't be surprised when you find snakes.

MJH

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32105 - Posted: 20 Aug 2013 | 18:35:38 UTC - in response to Message 32095.

Zarck,

We'll roll this build out to the other apps once I'm satisfied that it is working correctly, after a week or so of test.

MJH

Husu*
Send message
Joined: 25 Mar 09
Posts: 8
Credit: 100,005,181
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32106 - Posted: 20 Aug 2013 | 18:45:25 UTC

So far all WU's of ACEMD beta version v7.05 (cuda55) have been OK what I've gotten http://www.gpugrid.net/results.php?hostid=157253

I put in 2nd Titan to see if it has any effect, but so far no errors.

I'll leave the queues as they are, so the computer will keep doing beta units.

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32107 - Posted: 20 Aug 2013 | 20:01:19 UTC

70 ACEMD beta version 7.05(cuda55) downloaded and ran ok. No errors.


7181433 4697366 20 Aug 2013 | 19:06:04 UTC 20 Aug 2013 | 19:13:31 UTC Completo e validado 66.94 61.18 150.00 ACEMD beta version v7.05 (cuda42)
7181426 4697359 20 Aug 2013 | 19:07:00 UTC 20 Aug 2013 | 19:14:29 UTC Completo e validado 64.86 61.53 150.00 ACEMD beta version v7.05 (cuda42)


GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
NVIDIA GeForce GTX 680 (2048MB) driver: 320.49
Microsoft Windows 7 Home Premium x86 Edition

____________

Profile (retired account)
Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 32108 - Posted: 20 Aug 2013 | 20:13:17 UTC

120+ beta v7.05 (cuda55) and no errors.

GTX Titan, nVidia driver 326.41, BOINC 7.2.11, Win 7 SP1 64bit

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32109 - Posted: 20 Aug 2013 | 21:00:59 UTC

7.05 works fine on both my TITAN and on my machine with the 590 and 580.

I wonder what will happen when I eventually move them all into a single machine?
____________
Reno, NV
Team: SETI.USA

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32110 - Posted: 20 Aug 2013 | 21:25:25 UTC - in response to Message 32109.

I wonder what will happen when I eventually move them all into a single machine?


I wouldn't expect any problems with the app.

MJH

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32111 - Posted: 20 Aug 2013 | 21:43:10 UTC

The acemd 7.05-55 app is running fine on my GTX 670 (WinXPx64, v326.80).
The MJHARVEY_TEST7 8 and 9 took 1 minute to complete, but now this host received two MJHARVEY_TEST10 workunits, which will take 1 hour and 30 munites to complete (my estimation).

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32112 - Posted: 20 Aug 2013 | 21:59:02 UTC - in response to Message 32111.

Yes, TEST10 are 100x longer.

MJH

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32113 - Posted: 20 Aug 2013 | 22:46:25 UTC - in response to Message 32094.

I suppose I get the 7.05 app when BOINC request new beta work?

My GTX770 has driver 320.49 and has done 47 tasks with only one error, so seems stable? It still gets work.
Am I missing something or do I read wrongly?
____________
Greetings from TJ

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32114 - Posted: 20 Aug 2013 | 23:10:16 UTC - in response to Message 32113.

You need to be requesting beta work and subscribed to application "acemdbeta"

MJH

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32115 - Posted: 20 Aug 2013 | 23:22:02 UTC - in response to Message 32110.

I wonder what will happen when I eventually move them all into a single machine?


I wouldn't expect any problems with the app.

MJH


Yes, I believe the apps work.

What I am questioning about is BOINC. Is it smart enough to request both kinds of work, and keep straight the two virtual queues of downloaded tasks? I know it works for ATI vs. nVidia vs. CPU tasks. But what about two different apps, based on slightly different versions of CUDA requirements? Will it always assign the right task/app to the right GPU?
____________
Reno, NV
Team: SETI.USA

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32116 - Posted: 20 Aug 2013 | 23:27:35 UTC - in response to Message 32115.

Will it always assign the right task/app to the right GPU?


You'll get the cuda-55 app for all of them, assuming the driver is >=326.41

Matt

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32117 - Posted: 20 Aug 2013 | 23:28:43 UTC

My Titan has run 4 TEST10's - all failed in around 61 seconds.

http://www.gpugrid.net/result.php?resultid=7184238
http://www.gpugrid.net/result.php?resultid=7184558
http://www.gpugrid.net/result.php?resultid=7184588
http://www.gpugrid.net/result.php?resultid=7184668

Titan (DP off)
326.41
W7 x64
BOINC 7.2.5

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32118 - Posted: 20 Aug 2013 | 23:54:54 UTC - in response to Message 32116.

Will it always assign the right task/app to the right GPU?


You'll get the cuda-55 app for all of them, assuming the driver is >=326.41

Matt


Ah! I was thinking that the 580/580 were only 4.2. But I see they are 5.5 like TITAN. Got it.
____________
Reno, NV
Team: SETI.USA

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32119 - Posted: 21 Aug 2013 | 1:04:36 UTC
Last modified: 21 Aug 2013 | 1:07:16 UTC

660TI/Driver 310.90. XP Pro SP1 32 bit. CUDA 4.2 version 7.05 betas are now completing with out errors. 9 in a row and counting. Great work techs.

Just received 1 that had failed on 2 other crunchers after about 1 minute. I'm 6 minutes in with only 5% completed but it is still progressing. All the ones before this one finished in about 1 minute. At the present rate this one will take close to 2 hours if it completes. Should be interesting.

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32120 - Posted: 21 Aug 2013 | 3:25:35 UTC

Okay, I have all three cards in a single machine, and everything seems to be working fine. Just FYI, running into a 31 tasks/day limit.
____________
Reno, NV
Team: SETI.USA

TomaszPawel
Send message
Joined: 18 Aug 08
Posts: 121
Credit: 59,836,411
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32121 - Posted: 21 Aug 2013 | 6:49:43 UTC
Last modified: 21 Aug 2013 | 6:50:08 UTC

http://www.gpugrid.net/workunit.php?wuid=4700082

http://www.gpugrid.net/workunit.php?wuid=4699964

2 Errors

GTX670 320.49
____________
POLISH NATIONAL TEAM - Join! Crunch! Win!

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32122 - Posted: 21 Aug 2013 | 7:39:59 UTC

Overnight 89 tasks with only 2 errors with test10, but the wingman with several cards have also errors there.

One GTX770 with driver 320.49 BOINC 7.0.64 and ACEMD beta version 7.05 windows_intelx86 (cuda42), and:
One GTX660 with driver 326.41 BOINC 7.0.64 and ACEMD beta version 7.05 windows_intelx86 (cuda55) has also one WU with ACEMD beta version 7.05 windows_intelx86 (cuda42).

Well done Harvey, I would like to see these apps doing the Santi´s that are very troublesome on my GTX660.

____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32123 - Posted: 21 Aug 2013 | 9:27:58 UTC

I noticed a few things in the last hour.
My AMD rig with the GTX770 had quit BOINC and Task Manager and stopped all tasks. I know that as the system temperature dropped to its minimum temperature.
Restarting BOINC and Task Manager did not result in any problems (for now) and no WU (GPUGRID and Rosetta on CPU) has failed! Strange I have never seen this before.

Secondly the same system did not get any cuda5.5 tasks or the app but it is a 7xx card. Driver is 320.49. This is the card info from BOINC:
8/21/2013 11:07:10 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 770 (driver version 320.49, CUDA version 5.50, compute capability 3.0, 2048MB, 1829MB available, 3411 GFLOPS peak)
8/21/2013 11:07:10 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 770 (driver version 320.49, device version OpenCL 1.1 CUDA, 2048MB, 1829MB available, 3411 GFLOPS peak)

But it does TEST10 with ACEMD beta version 7.05 (cuda42). However none TEST10 has succeeded yet on my 2 rigs. Does this mean MJH that I need to update to 326.41 driver?
____________
Greetings from TJ

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32124 - Posted: 21 Aug 2013 | 9:40:53 UTC - in response to Message 32123.
Last modified: 21 Aug 2013 | 9:41:04 UTC

Driver nVidia 326.80 Beta,

http://www.nvidia.fr/Download/Find.aspx?lang=us

@+
*_*
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32126 - Posted: 21 Aug 2013 | 11:17:29 UTC
Last modified: 21 Aug 2013 | 11:17:39 UTC

I have now two TEST10 that resulted okay. One on the 770 with driver 320.49 (cuda42) and one on the 660 with driver 326.41 (cuda55).
____________
Greetings from TJ

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32127 - Posted: 21 Aug 2013 | 13:12:24 UTC

Just received 1 that had failed on 2 other crunchers after about 1 minute. I'm 6 minutes in with only 5% completed but it is still progressing. All the ones before this one finished in about 1 minute. At the present rate this one will take close to 2 hours if it completes. Should be interesting.

Task errored out at 50 minutes. Seems like these beta tasks that run beyond a minute or so end in an error.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32128 - Posted: 21 Aug 2013 | 13:47:01 UTC - in response to Message 32127.
Last modified: 21 Aug 2013 | 13:49:39 UTC

Just received 1 that had failed on 2 other crunchers after about 1 minute. I'm 6 minutes in with only 5% completed but it is still progressing. All the ones before this one finished in about 1 minute. At the present rate this one will take close to 2 hours if it completes. Should be interesting.

Task errored out at 50 minutes. Seems like these beta tasks that run beyond a minute or so end in an error.

No, not all tasks, the TEST10 are longer. MJH said in this thread that TEST10 is 100 times longer. Indeed are around 2 hours on my rigs. Some errorred out, some did finish good.
Your 310.90 driver seems to be the issue for the long ones. reead MJH post a way back in this thread for more information.
____________
Greetings from TJ

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32129 - Posted: 21 Aug 2013 | 17:14:45 UTC

After i update driver 326.41:


7190119 4705219 21 Aug 2013 | 15:45:06 UTC 21 Aug 2013 | 16:55:00 UTC Erro enquanto computava 3,486.92 2,580.57 --- ACEMD beta version v7.05 (cuda55)
7190113 4705213 21 Aug 2013 | 15:45:06 UTC 21 Aug 2013 | 15:51:46 UTC Completo e validado 61.87 59.28 150.00 ACEMD beta version v7.05 (cuda55)
7190104 4705205 21 Aug 2013 | 15:45:06 UTC 21 Aug 2013 | 15:50:44 UTC Completo e validado 64.33 59.37 150.00 ACEMD beta version v7.05 (cuda55)


Task with error : http://www.gpugrid.net/result.php?resultid=7190119
____________

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32138 - Posted: 22 Aug 2013 | 19:42:31 UTC

Another task (cuda55) with the same error: http://www.gpugrid.net/result.php?resultid=7194889

Stderr output

<core_client_version>7.2.5</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x77783219[/b]
____________

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32142 - Posted: 22 Aug 2013 | 22:35:36 UTC - in response to Message 32138.


Maximum elapsed time exceeded


Well, that shouldn't be happening. Could you PM me you app_info.xml please?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32143 - Posted: 23 Aug 2013 | 0:00:02 UTC

Hello MJH,
Your beta TEST WU's do great on my GTX660 and 770 except the longer TEST10 ones, these have an error rate of around 50%
I had 9 of these who finished okay and 8 errorred out. Most of those (6) had at least 1 wing(wo)man with error too.
____________
Greetings from TJ

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32145 - Posted: 23 Aug 2013 | 0:59:12 UTC - in response to Message 32142.


Maximum elapsed time exceeded


Well, that shouldn't be happening. Could you PM me you app_info.xml please?


Interesting. I do not have one.

My system in this case is:
I7 Windows7 64
Boinc 7.2.5
NVIDIA GeForce GTX 680 (2048MB) driver: 326.41

And Tthrottle to proctect my cpu's from overheating.


____________

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32147 - Posted: 23 Aug 2013 | 6:40:12 UTC - in response to Message 32142.


Maximum elapsed time exceeded


OK, now I see why it's committing hara kiri. 706 coming soon..

MJH

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32151 - Posted: 23 Aug 2013 | 12:23:23 UTC

I can find 3 type of error messages this one seems to be the most at hand:
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x773C3219

____________
Greetings from TJ

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32152 - Posted: 23 Aug 2013 | 13:04:09 UTC

707 is now live. Should be an end to the "max time elapsed" problem.

MJH

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32153 - Posted: 23 Aug 2013 | 13:09:54 UTC

You broke something:

Name 18-GIANNI_TEST5-1-50-RND3064_0
Workunit 4711690
Created 23 Aug 2013 | 13:05:46 UTC
Sent 23 Aug 2013 | 13:05:51 UTC
Received 23 Aug 2013 | 13:08:25 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -1073741819 (0xffffffffc0000005) Unknown error number
Computer ID 154816
Report deadline 28 Aug 2013 | 13:05:51 UTC
Run time 0.00
CPU time 0.00
Validate state Invalid
Credit 0.00
Application version ACEMD beta version v7.07 (cuda55)

All tasks are like this

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32154 - Posted: 23 Aug 2013 | 13:45:32 UTC - in response to Message 32153.

Indeed I did. 708.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32155 - Posted: 23 Aug 2013 | 13:56:27 UTC - in response to Message 32154.

Indeed I did. 708.

I had no luck with 7.08.....

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32157 - Posted: 23 Aug 2013 | 14:28:23 UTC - in response to Message 32155.
Last modified: 23 Aug 2013 | 14:28:51 UTC

Indeed I did. 708.

I had no luck with 7.08.....

Three more in a row is more than bad luck....

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32158 - Posted: 23 Aug 2013 | 14:37:38 UTC - in response to Message 32157.

All 7.08 WU's are failing:

7198133 4711851 23 Aug 2013 | 14:30:51 UTC 23 Aug 2013 | 14:33:50 UTC Error while computing 0.00 0.00 --- ACEMD beta version v7.08 (cuda55)
7198126 4711845 23 Aug 2013 | 14:27:23 UTC 23 Aug 2013 | 14:30:51 UTC Error while computing 0.00 0.00 --- ACEMD beta version v7.08 (cuda55)
7198125 4711839 23 Aug 2013 | 14:27:23 UTC 23 Aug 2013 | 14:30:51 UTC Error while computing 0.00 0.00 --- ACEMD beta version v7.08 (cuda55)
7197995 4711758 23 Aug 2013 | 13:49:02 UTC 23 Aug 2013 | 14:01:55 UTC Error while computing 0.00 0.00 --- ACEMD beta version v7.08 (cuda55)
7197994 4711756 23 Aug 2013 | 13:49:02 UTC 23 Aug 2013 | 14:01:55 UTC Error while computing 0.00 0.00 --- ACEMD beta version v7.08 (cuda55)
7197979 4711674 23 Aug 2013 | 13:49:02 UTC 23 Aug 2013 | 14:01:55 UTC Error while computing 0.00 0.00 --- ACEMD beta version v7.08 (cuda55)
7197936 4711725 23 Aug 2013 | 13:49:02 UTC 23 Aug 2013 | 14:01:55 UTC Error while computing 0.00 0.00 --- ACEMD beta version v7.08 (cuda55)
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32159 - Posted: 23 Aug 2013 | 14:54:43 UTC

Didn't work for me either:
http://www.gpugrid.net/result.php?resultid=7198181
____________

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32160 - Posted: 23 Aug 2013 | 16:25:09 UTC - in response to Message 32159.

All 7.07 and 7.08 in errors,

http://www.gpugrid.net/results.php?userid=5128

@+
*_*
____________

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32161 - Posted: 23 Aug 2013 | 17:57:44 UTC

709.

MJH

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32162 - Posted: 23 Aug 2013 | 18:01:38 UTC - in response to Message 32161.

709.

MJH

Still runs to an error right at the start.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32163 - Posted: 23 Aug 2013 | 18:22:08 UTC - in response to Message 32162.
Last modified: 23 Aug 2013 | 18:24:27 UTC

710 for 55 with extra debug.

MJH

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32164 - Posted: 23 Aug 2013 | 18:33:39 UTC - in response to Message 32163.
Last modified: 23 Aug 2013 | 18:37:27 UTC

I decided to grab a couple of these 7.10 debug versions.
Also, I made sure that I ran it on each of my 2 GPUs (GTX 660 Ti, GTX 460) that do GPUGrid work in my system.

They both errored, after MARK 15.

GTX 660 Ti:
http://www.gpugrid.net/result.php?resultid=7198769
GTX 460:
http://www.gpugrid.net/result.php?resultid=7198775

Note:
It would be AWESOME if you could keep the "GPU Name" listed in the stderr.txt for the tasks. Maybe consider printing it each time the task is started/restarted (since it could restart on a different GPU!) Currently, I don't think I have any way of knowing which GPU(s) worked on the task.

Thanks,
Jacob

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32165 - Posted: 23 Aug 2013 | 18:44:51 UTC - in response to Message 32164.
Last modified: 23 Aug 2013 | 18:45:05 UTC

It would be AWESOME if you could keep the "GPU Name" listed in the stderr.txt for the tasks. Maybe consider printing it each time the task is started/restarted (since it could restart on a different GPU!) Currently, I don't think I have any way of knowing which GPU(s) worked on the task.

This is a very good idea. I've requested that in the "Wish List" topic before.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32166 - Posted: 23 Aug 2013 | 18:48:17 UTC - in response to Message 32164.

I decided to grab a couple of these 7.10 debug versions.
Also, I made sure that I ran it on each of my 2 GPUs (GTX 660 Ti, GTX 460) that do GPUGrid work in my system.

They both errored, after MARK 15.

The same happened on my host.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32167 - Posted: 23 Aug 2013 | 18:57:02 UTC

Same on my Titan - stopped at mark 15.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32168 - Posted: 23 Aug 2013 | 19:05:25 UTC
Last modified: 23 Aug 2013 | 19:07:33 UTC

Thanks guys. 711 has even more debug.

This is an awful way to tackle the problem, but I can't reproduce it in-house, and I'm damned if I can make Visual Studio emit a binary that does something useful on segfault like drop a core file.

And yes, the GPU+version debug will be staying. If you want something else there too, now is the time to ask for it!

MJH

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32169 - Posted: 23 Aug 2013 | 19:16:58 UTC - in response to Message 32168.

Sweet news that the GPU printout is staying - hopefully you print it each time the task is restarted too, as I have 3 total GPUs in my system. Things that might prove useful:
- The RAM that the GPU has (seems very useful to have)
- The full array of GPUs
- CPU information

For 7.11, I again got 2 tasks (1 for each of my GPUGrid GPUs).
They both:
- did print "MARK 15"
- did print "MARK 15.1"
- got as far as printing "di MARK 13" before erroring.

http://www.gpugrid.net/result.php?resultid=7198878
http://www.gpugrid.net/result.php?resultid=7198892

Thanks,
Jacob

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32170 - Posted: 23 Aug 2013 | 19:29:33 UTC

Same on the Titan - di mark 13:

http://www.gpugrid.net/result.php?resultid=7198904
http://www.gpugrid.net/result.php?resultid=7198901
http://www.gpugrid.net/result.php?resultid=7198899

All have same result, but the 3rd one is a GIANNI test, not an MJHARVEY??

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32171 - Posted: 23 Aug 2013 | 19:31:11 UTC - in response to Message 32169.

712. Almost at the bottom turtle now.

M

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32172 - Posted: 23 Aug 2013 | 19:37:12 UTC - in response to Message 32171.

For v7.12, for me, they got as far as:
CPi MRK 8

http://www.gpugrid.net/result.php?resultid=7198972
http://www.gpugrid.net/result.php?resultid=7198969

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32173 - Posted: 23 Aug 2013 | 19:41:47 UTC - in response to Message 32168.

And yes, the GPU+version debug will be staying. If you want something else there too, now is the time to ask for it!

It would be nice to have:
- the manufacturer (vendor) ID
- the type of the GPU
- the number of the GPU
- the memory size of the GPU
- the clock rate of the GPU
in the stderr output file.

Something like in this post.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32174 - Posted: 23 Aug 2013 | 19:45:06 UTC

yup - CPI MRK 8

http://www.gpugrid.net/result.php?resultid=7198979
http://www.gpugrid.net/result.php?resultid=7198992

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32175 - Posted: 23 Aug 2013 | 19:53:56 UTC - in response to Message 32171.

713 now

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32176 - Posted: 23 Aug 2013 | 19:58:47 UTC - in response to Message 32175.
Last modified: 23 Aug 2013 | 19:59:48 UTC

7.13 appears to start (and make progress) on each of my 2 GPUs (GTX 660 Ti, GTX 460). It does not crash immediately.
So... I guess I'll let it run, and try to remember to report back when it's done.
Note: GPU Usage is around 15%-25% while processing the task, quite low.

PS: I'd also love to see the DRIVER VERSION printed in the stderr.txt. Could prove quite useful in tracking down specific versions causing problems (which has happened in the past!)

Thanks,
Jacob

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32177 - Posted: 23 Aug 2013 | 20:09:57 UTC - in response to Message 32176.

Better kill that - it will make way too much debug.
714 now.

MJH

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32178 - Posted: 23 Aug 2013 | 20:13:31 UTC - in response to Message 32177.
Last modified: 23 Aug 2013 | 20:13:41 UTC

You weren't kidding - 5 minutes created 140MB debug :)
Hope everyone that got 7.13 knows to abort them!

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32179 - Posted: 23 Aug 2013 | 20:17:09 UTC

For me, 7.14 is running/progressing at an appropriate GPU Usage, without any immediate crash.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32180 - Posted: 23 Aug 2013 | 20:18:06 UTC

One 7.13 finished ok in about 7 minutes on my Titan.
http://www.gpugrid.net/result.php?resultid=7199039

The another 7.13 crashed at sRKA 7

http://www.gpugrid.net/result.php?resultid=7199046[/url]

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32181 - Posted: 23 Aug 2013 | 20:28:46 UTC

One 7.13 finished ok in about 12 minutes on my gtx 680
http://www.gpugrid.net/result.php?resultid=7199074
____________

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32182 - Posted: 23 Aug 2013 | 20:28:49 UTC

Three 7.14's completed ok.
http://www.gpugrid.net/result.php?resultid=7199117
http://www.gpugrid.net/result.php?resultid=7199119
http://www.gpugrid.net/result.php?resultid=7199122

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32183 - Posted: 23 Aug 2013 | 20:33:38 UTC

My 7.14 task finished successfully too, but I don't know what GPU was used - that debug info has now disappeared :)

http://www.gpugrid.net/result.php?resultid=7199103

I would like to see, at a minimum, each time the task is started/restarted:
- GPU Name, Vendor, and Memory size
- Driver Version
... hopefully you can also include other identifying information too, as others have suggested, and as that other thread seems to show that you once actually did print/include!

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32184 - Posted: 23 Aug 2013 | 20:34:29 UTC - in response to Message 32183.

the webpage only show the last few lines of the debug.

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32185 - Posted: 23 Aug 2013 | 21:06:02 UTC

One 7.14 finished ok in about 1 minute. GPU utilization ~ 72%
http://www.gpugrid.net/result.php?resultid=7199239
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32186 - Posted: 23 Aug 2013 | 21:39:15 UTC

One short 7.14 finished fine.
One long 7.14 and 7.15 is running...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32187 - Posted: 23 Aug 2013 | 21:47:22 UTC - in response to Message 32186.

One short 7.14 finished fine.
One long 7.14 and 7.15 is running...

The 7.14 has 10% lower GPU utilization than the 7.15, I'm going to abort the 7.14....

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32188 - Posted: 23 Aug 2013 | 21:51:16 UTC

I've received a short 7.16 :)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32189 - Posted: 23 Aug 2013 | 21:55:52 UTC - in response to Message 32188.

I've received a short 7.16 :)

It's finished ok, but the GPU usage is still 10% lower than the 7.15.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32190 - Posted: 23 Aug 2013 | 22:06:29 UTC - in response to Message 32189.

I've received a short 7.16 :)

It's finished ok, but the GPU usage is still 10% lower than the 7.15.

7.17 is available :) I've aborted the 7.16....

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32191 - Posted: 23 Aug 2013 | 22:10:02 UTC - in response to Message 32190.

I've received a short 7.16 :)

It's finished ok, but the GPU usage is still 10% lower than the 7.15.

7.17 is available :) I've aborted the 7.16....

The GPU utilization of the 7.17 seems to be normal (95%)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32192 - Posted: 23 Aug 2013 | 22:31:46 UTC - in response to Message 32191.
Last modified: 23 Aug 2013 | 22:34:32 UTC

On my eVGA GTX 660 Ti FTW (with the CPU fully loaded by CPU tasks), the 7.17 MJHARVEY_TEST10 task doesn't even reach 80% GPU utilization. Also, the drivers don't think the GPU is under enough load to ramp up the clock via GPU Boost. So, at 79% GPU utilization and 74% power utilization, the GPU stays at my normal 3D clock of 1045 Mhz, instead of the 3D boost clock of 1241 Mhz that I expect (and get from other GPUGrid tasks).

It just seems that the task is not making the GPU work hard enough to achieve that GPU Boost clock.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32193 - Posted: 23 Aug 2013 | 22:38:20 UTC - in response to Message 32192.

On my eVGA GTX 660 Ti FTW (with the CPU fully loaded by CPU tasks), the 7.17 MJHARVEY_TEST10 task doesn't even reach 80% GPU utilization. Also, the drivers don't think the GPU is under enough load to ramp up the clock via GPU Boost. So, at 79% GPU utilization and 74% power utilization, the GPU stays at my normal 3D clock of 1045 Mhz, instead of the 3D boost clock of 1241 Mhz that I expect (and get from other GPUGrid tasks).

It just seems that the task is not making the GPU work hard enough to achieve that GPU Boost clock.

You should leave one CPU thread free per GPU (so on your i7-965x with 3 GPUs you should set the multiprocessor setting to 63% CPU).

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32194 - Posted: 23 Aug 2013 | 22:52:43 UTC - in response to Message 32193.

Nope. That's not how I roll.

Anyway, as I said, it doesn't trigger GPU Boost speeds, whereas all of the normal GPUGrid tasks do.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32195 - Posted: 23 Aug 2013 | 23:14:30 UTC

No problem with GPU boost functioning correctly for me.

At any rate, my 780s never get past the 50 sec mark for the longer beta tasks. They always end abruptly.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32196 - Posted: 23 Aug 2013 | 23:37:50 UTC - in response to Message 32195.

No problem with GPU boost functioning correctly for me.

At any rate, my 780s never get past the 50 sec mark for the longer beta tasks. They always end abruptly.

Give the 7.19 a try.

One of my 7.17 long has finished, 3 more are near to completion.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32197 - Posted: 24 Aug 2013 | 0:03:09 UTC - in response to Message 32196.

No problem with GPU boost functioning correctly for me.

At any rate, my 780s never get past the 50 sec mark for the longer beta tasks. They always end abruptly.

Give the 7.19 a try.

One of my 7.17 long has finished, 3 more are near to completion.

There goes the 7.20 :)
It seems to be running fine on my hosts.
But I'm going to sleep in 10 minutes, and I'm going to resume the non-beta tasks I've suspended to give the beta units priority.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32198 - Posted: 24 Aug 2013 | 0:15:47 UTC

A 7.20 has been running ok so far on my Titan - 25 minutes in. Looks like it wants to run about 90 minutes.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32199 - Posted: 24 Aug 2013 | 0:15:52 UTC

MJH, you must be getting sleepy:

For a 50s task I got 15,000.00 points lol.

Good news, so far, I have a Test10 which has past both the 50s and 120s marks, which were common crash points.

Will keep an eye on it.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32200 - Posted: 24 Aug 2013 | 0:27:18 UTC - in response to Message 32198.

A 7.20 has been running ok so far on my Titan - 25 minutes in. Looks like it wants to run about 90 minutes.

That's good news indeed!
How much is the GPU usage? (on Titan and on GTX780)

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32201 - Posted: 24 Aug 2013 | 0:37:01 UTC

Titan's GPU Load is 72%.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32202 - Posted: 24 Aug 2013 | 0:47:09 UTC - in response to Message 32201.

Titan's GPU Load is 72%.

This should be higher at least by 20% with real workunits.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 32203 - Posted: 24 Aug 2013 | 0:49:07 UTC

Agreed, unless it turns out we can run multiple instances?

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32204 - Posted: 24 Aug 2013 | 1:49:46 UTC

80% on w7

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32205 - Posted: 24 Aug 2013 | 2:07:02 UTC

Completed and validated my first MJH TEST10

http://www.gpugrid.net/result.php?resultid=7199958

# Time per step (avg over 2500000 steps): 1.724 ms
# Approximate elapsed time for entire WU: 4309.969 s

Profile bundaboy
Send message
Joined: 20 Nov 10
Posts: 6
Credit: 1,046,334,951
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32206 - Posted: 24 Aug 2013 | 10:10:40 UTC

GTX460SE, WinXP, 314.11, Boinc 6.10.58
BETA 7.18 cuda42 *TEST11* - GPU Utilization 50-60% :(
But last 4 minutes 99% :)
____________


Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32207 - Posted: 24 Aug 2013 | 10:35:07 UTC - in response to Message 32206.

Are you saying it failed?
When your computers are hidden, even admins can't see them..

MJH

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32208 - Posted: 24 Aug 2013 | 10:59:00 UTC - in response to Message 32207.

I’m seeing 85% GPU usage on a GTX660 and 80% usage on a GTX660Ti. ACEMD beta version 7.20 (cuda55).

The output file is very useful. Thanks for this, it will make basic troubleshooting much easier.

    Stderr output

    <core_client_version>7.0.64</core_client_version>
    <![CDATA[
    <stderr_txt>
    # GPU [GeForce GTX 660] Platform [Windows] Rev [3170M] VERSION [55]
    # SWAN Device 1 :
    # Name : GeForce GTX 660
    # ECC : Disabled
    # Global mem : 2048MB
    # Capability : 3.0
    # PCI ID : 0000:02:00.0
    # Device clock : 1032MHz
    # Memory clock : 3004MHz
    # Memory width : 192bit
    # Driver version : r325_00
    # Time per step (avg over 250000 steps): 3.487 ms
    # Approximate elapsed time for entire WU: 871.837 s
    called boinc_finish

    </stderr_txt>
    ]]>


There is little point in running beta work and hiding your systems!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile bundaboy
Send message
Joined: 20 Nov 10
Posts: 6
Credit: 1,046,334,951
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32209 - Posted: 24 Aug 2013 | 11:00:49 UTC - in response to Message 32207.

Are you saying it failed?

Nope, completed and validated - just the GPU utilization was low.
____________


Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32210 - Posted: 24 Aug 2013 | 11:13:01 UTC

Morning all.

Version 800 is the final beta, hopefully.

I'll put some real work units on it now.

MJH

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32211 - Posted: 24 Aug 2013 | 11:36:13 UTC - in response to Message 32210.

The test WUs I care about are now: NATHAN_s1p_test_titans1

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32216 - Posted: 24 Aug 2013 | 13:20:45 UTC - in response to Message 32211.

The test WUs I care about are now: NATHAN_s1p_test_titans1

I've received a couple of NATHAN_s1p_test_titans2 workunits.
They have by far the highest GPU usage on GTX670 and GTX680 (97-99%).

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32219 - Posted: 24 Aug 2013 | 14:34:34 UTC

Mine are almost complete, the titan WUs, currently @ 89% usage, which is where they should be for W7. Time to complete will be about 1:10 total.

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 32220 - Posted: 24 Aug 2013 | 16:00:04 UTC - in response to Message 32211.

The test WUs I care about are now: NATHAN_s1p_test_titans1


As some of you have noted, after the server/database error, they are now NATHAN_s1p_test_titans2

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32221 - Posted: 24 Aug 2013 | 16:03:44 UTC - in response to Message 32220.
Last modified: 24 Aug 2013 | 16:04:13 UTC

The test WUs I care about are now: NATHAN_s1p_test_titans1


As some of you have noted, after the server/database error, they are now NATHAN_s1p_test_titans2


Yes

http://www.gpugrid.net/result.php?resultid=7205032

@+
*_*

Crédit 60,900.00
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32222 - Posted: 24 Aug 2013 | 16:12:31 UTC - in response to Message 32221.

I have one finished too in 2h43m. A new one is running at 94% GPU load and estimated time by BOINC 46h20m? 5% done in 8m, so BOINC needs arithmetic lessons :)
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32224 - Posted: 24 Aug 2013 | 16:34:16 UTC - in response to Message 32221.

The test WUs I care about are now: NATHAN_s1p_test_titans1


As some of you have noted, after the server/database error, they are now NATHAN_s1p_test_titans2


Yes

http://www.gpugrid.net/result.php?resultid=7205032

@+
*_*

Crédit 60,900.00

You should fine tune your system (for example: leave a core free for GPUGrid), because 5pot's GTX780 finished a similar workunit in 4200 seconds, while it took 5000 seconds on your Titan.

Profile (retired account)
Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 32225 - Posted: 24 Aug 2013 | 16:37:56 UTC

Finished two ok in about 1 hr. 24 min.:

I5R14-NATHAN_s1p_test_titans2-1-50-RND4698_1
I5R4-NATHAN_s1p_test_titans2-0-50-RND1283_0

GTX Titan, nVidia driver 326.41, BOINC 7.2.11, Win 7 SP1 64bit

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32226 - Posted: 24 Aug 2013 | 16:54:09 UTC - in response to Message 32224.

The test WUs I care about are now: NATHAN_s1p_test_titans1


As some of you have noted, after the server/database error, they are now NATHAN_s1p_test_titans2


Yes

http://www.gpugrid.net/result.php?resultid=7205032

@+
*_*

Crédit 60,900.00

You should fine tune your system (for example: leave a core free for GPUGrid), because 5pot's GTX780 finished a similar workunit in 4200 seconds, while it took 5000 seconds on your Titan.


I changed the speed of my Titan by enabling and disabling the double precision mode, being calculated, this probably explains the slow.

@+
*_*
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32238 - Posted: 24 Aug 2013 | 21:39:25 UTC - in response to Message 32226.

I changed the speed of my Titan by enabling and disabling the double precision mode

Didn't read the entire thread, just in case it hasn't been mentioned before: enabling double precision mode disables turbo completely and hence locks the clocks speed at the base level - independently of double precision being used or not. This is a really crude solution from nVidia, they should have just let turbo cap the power as usual (from my point of view.. but I don't design these cards).

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32243 - Posted: 24 Aug 2013 | 21:57:55 UTC - in response to Message 32211.

The test WUs I care about are now: NATHAN_s1p_test_titans1

Should those who don't have Titan or GTX780 opt out beta units?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32244 - Posted: 24 Aug 2013 | 22:07:46 UTC - in response to Message 32243.
Last modified: 24 Aug 2013 | 22:15:34 UTC

I have completed 3 of the I4R9-NATHAN_s1p_test_titans2 tasks successfully.
They ran at decent GPU utilization, and did provide some good information in stderr.txt.
However....

1) I noticed that it seems that my GTX 460's clock is erroneously doubled -- Take a look at the GTX 460 result below. Device clock should have been 763 Mhz, not 1526 Mhz. Both GPU-Z and Precision-X show 763 Mhz. Is your detection algorithm bugged somehow?

2) Also, for showing the driver version, although knowing the "branch" (r325_00) is handy, it'd be much better to show the actual driver version (326.80 in my case). Could you please make a change to include that?

GTX 660 Ti result (restarted 2 times):
http://www.gpugrid.net/result.php?resultid=7202451
GTX 660 Ti result (not restarted):
http://www.gpugrid.net/result.php?resultid=7205677
GTX 460 result (restarted 2 times):
http://www.gpugrid.net/result.php?resultid=7202415



The test WUs I care about are now: NATHAN_s1p_test_titans1

Should those who don't have Titan or GTX780 opt out beta units?

I think we can stay in the beta. The test, as far as I know, is if the application will work for any of the supported GPUs, including the Titan and GTX780.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32245 - Posted: 24 Aug 2013 | 22:24:03 UTC - in response to Message 32244.


1) I noticed that it seems that my GTX 460's clock is erroneously doubled --


No, it's correct. It is reporting the SM clock, which on pre-Kepler cards is double the main clock. cf http://www.nvidia.es/object/product-geforce-gtx-460-es.html


2) Also, for showing the driver version, although knowing the "branch" (r325_00) is handy, it'd be much better to show the actual driver version (326.80 in my case). Could you please make a change to include that?


Yeah, that didn't look right. I'll see if it's possible.

MJH

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32246 - Posted: 24 Aug 2013 | 22:26:14 UTC - in response to Message 32244.
Last modified: 24 Aug 2013 | 22:27:54 UTC

1) I noticed that it seems that my GTX 460's clock is erroneously doubled -- Take a look at the GTX 460 result below. Device clock should have been 763 Mhz, not 1526 Mhz. Both GPU-Z and Precision-X show 763 Mhz. Is your detection algorithm bugged somehow?

There are two clock rates in the Fermi based cards (besides the memory clock rate): the core clock, and the shader clock. You are talking about the core clock, and the debug info shows the shader clock. As you can see, the shader (or CUDA core) frequency has a fixed (by hardware) double rate of the core clock frequency in the Fermi based cards

2) Also, for showing the driver version, although knowing the "branch" (r325_00) is handy, it'd be much better to show the actual driver version (326.80 in my case). Could you please make a change to include that?

+1

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32247 - Posted: 24 Aug 2013 | 22:36:13 UTC - in response to Message 32243.


Should those who don't have Titan or GTX780 opt out beta units?


No, please stay in. It's important that it is tested as widely as possible before I push it out to the production queues.

MJH

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32248 - Posted: 24 Aug 2013 | 22:44:41 UTC - in response to Message 32245.
Last modified: 24 Aug 2013 | 22:47:30 UTC


1) I noticed that it seems that my GTX 460's clock is erroneously doubled --


No, it's correct. It is reporting the SM clock, which on pre-Kepler cards is double the main clock. cf http://www.nvidia.es/object/product-geforce-gtx-460-es.html

Then could you please consider having it say something better than "Device clock"? I currently think "GPU Core clock" when you say "Device clock".
Maybe "Processor Clock (MHz)", per the English specs found here:
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-460/specifications



2) Also, for showing the driver version, although knowing the "branch" (r325_00) is handy, it'd be much better to show the actual driver version (326.80 in my case). Could you please make a change to include that?


Yeah, that didn't look right. I'll see if it's possible.

MJH

Thank you -- Knowing the real driver version would be tremendously useful I think!

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32249 - Posted: 24 Aug 2013 | 23:15:51 UTC - in response to Message 32248.

"You should fine tune your system (for example: leave a core free for GPUGrid), because 5pot's GTX780 finished a similar workunit in 4200 seconds, while it took 5000 seconds on your Titan."

Its GeForce 780 is overclocked, I've tested the overclock on my GeForce 570, which is dead. I did not want to suffer the same fate in my Titan.

@+
*_*
____________

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32250 - Posted: 24 Aug 2013 | 23:35:14 UTC

Thats its default boost clock. Different cards have different boost clocks depending on the quality of the chip.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32251 - Posted: 25 Aug 2013 | 0:03:36 UTC - in response to Message 32249.
Last modified: 25 Aug 2013 | 0:05:49 UTC

"You should fine tune your system (for example: leave a core free for GPUGrid), because 5pot's GTX780 finished a similar workunit in 4200 seconds, while it took 5000 seconds on your Titan."


Its GeForce 780 is overclocked, I've tested the overclock on my GeForce 570, which is dead. I did not want to suffer the same fate in my Titan.

Thanks to the detailed debug info, I've noticed the overclocking of the GTX780 too, still your running times should be less (even less than that overclocked GTX780's running times).
I'll explain why:
The GTX 780's clock rate is higher by 16.457% (1019/875) than your Titan's, but your Titan has 16.666% (2688/2304) more shaders (CUDA cores) than a GTX780.
In other words:
If we multiply the number of shaders and their frequency the result is a theoretical performance index:
Your standard GTX Titan: 875MHz*2688=2352000
The overclocked GTX780:1019MHz*2304=2347776
As you can see, in theory your GTX Titan should be a little (by 1.8%) faster than the overclocked GTX780.

Possibly the GPUGrid application can't feed that much CUDA cores. The question is why can't the application do that?
Maybe your CPU is too busy to do that;
Or this is software-related.

To 5pot:
Please share your secret with us:
How could your overclocked GTX780 be faster than a factory clocked GTX Titan?
What is the exact type of your GTX780? Are they water cooled?

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32253 - Posted: 25 Aug 2013 | 0:10:10 UTC

No secret. I do a lot of research before I spend this type of $$$. Best GPU I've found, and apparently others have as well is the EVGA 780 ACX.

http://www.newegg.com/Product/Product.aspx?Item=N82E16814130918

Stock boost on this card is 1020MHz, however, I apparently managed to get some pretty nice binned ones. My default boost is 1100MHz for one, while the other is 1123Mhz.

When running these tasks, at 92% GPU usage, I get a steady 70C with 75% fan speed, which is beautiful. These cards have ball bearing fans as well, so they increase their lifetime. This was, believe it or not, a factor in purchasing these cards. Quiet too.

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32255 - Posted: 25 Aug 2013 | 2:38:25 UTC

FWIW, my TITAN does them in 4300-4600 with no threads reserved. I will try reserving a thread to see if there is any difference. This is not in the DP-enhanced mode. So it will automatically OC up as must as temps allow.
____________
Reno, NV
Team: SETI.USA

Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32257 - Posted: 25 Aug 2013 | 8:06:52 UTC - in response to Message 32255.
Last modified: 25 Aug 2013 | 8:14:36 UTC

I check my Titan, the fan was full of dust is running at 836 Mhz, once cleaned, the card now runs at 928 MHz, the performance should be better.

@+
*_*


____________

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32264 - Posted: 25 Aug 2013 | 13:31:38 UTC

Should make quite a large difference. Please post a new update with latest times.

Lazydude
Send message
Joined: 25 Sep 08
Posts: 12
Credit: 161,238,437
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 32265 - Posted: 25 Aug 2013 | 13:55:00 UTC

Gainward GTX 780 Phantom (GLH)
Coreclock 1189 (oc´d factory oc)

Similar times to 5pot

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32267 - Posted: 25 Aug 2013 | 15:00:24 UTC

In the last 24 hours my GTX660 got 10 ACEMD v8.00 WU´s and they all finished fine.
Then it got a Nathan LR with v6.18 (cuda42) and that errorred out again. This is way:
Stderr output

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1 (0xffffffff)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"

</stderr_txt>
]]>

____________
Greetings from TJ

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32269 - Posted: 25 Aug 2013 | 15:44:17 UTC

That's not a beta app

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 18,324,915
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32271 - Posted: 25 Aug 2013 | 16:41:14 UTC - in response to Message 32255.

FWIW, my TITAN does them in 4300-4600 with no threads reserved. I will try reserving a thread to see if there is any difference. This is not in the DP-enhanced mode. So it will automatically OC up as must as temps allow.

After more testing, I see that my TITAM takes 4520 seconds with or without a reserved thread. No difference at all.
____________
Reno, NV
Team: SETI.USA

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32272 - Posted: 25 Aug 2013 | 18:31:38 UTC - in response to Message 32271.

That is good news. I have done the same test with my 660 and saw the GPU load fluctuating from 88% to 1-3% and longer run time.
But the 660 is in no way a Titan.
____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32273 - Posted: 25 Aug 2013 | 18:35:15 UTC - in response to Message 32269.

Correct!
But I hope that MJH´s 8.00 version soon will be used with the other WU´s so that my error-rate on the 660 drops significantly.
That´s why I posted it here and hope the MJH will read it, so he sees the error of the other WU´s (not his tests ones).
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32274 - Posted: 25 Aug 2013 | 18:40:55 UTC - in response to Message 32271.
Last modified: 25 Aug 2013 | 19:23:20 UTC

FWIW, my TITAN does them in 4300-4600 with no threads reserved. I will try reserving a thread to see if there is any difference. This is not in the DP-enhanced mode. So it will automatically OC up as must as temps allow.

After more testing, I see that my TITAN takes 4520 seconds with or without a reserved thread. No difference at all.

That's goos news.
Zarck's Titan still needs 4700 secs. Then maybe the AMD architecture is to blame for that. The AMD FX CPU don't have integrated PCIe controller, it uses a Hypertransport link to the North Bridge. The AMD 990FX NB has "only" 2x PCIe 2.0 x16 support, while the Intel i7-3770 and 4770 has (only one) integrated PCIe 3.0 x16. The PCIe 2.0 x16 is quite enough for the GK104 (up to the GTX 680 and 770), however it could be hindering the performance of the GK110 based cards (GTX780 and Titan), because they have 50% and 75% (respectively) more CUDA cores than a GK104 based card.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32276 - Posted: 25 Aug 2013 | 21:47:39 UTC - in response to Message 32274.

Can we continue the discussion about CPU Integrated controller vs AMD chipset/CPU in the CPU Comparisons - general open discussion thread?
I think it's worth exploring in its own right and might detract from this beta application thread. Thanks,
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32278 - Posted: 25 Aug 2013 | 22:33:22 UTC - in response to Message 32276.

I've had 4 NATHAN_s1p_test_titans2 errors on a GTX660Ti and 2 successes. Errors are of the form,

Name I2R11-NATHAN_s1p_test_titans2-10-50-RND9096_0
Workunit 4720185
Created 25 Aug 2013 | 10:30:21 UTC
Sent 25 Aug 2013 | 10:33:41 UTC
Received 25 Aug 2013 | 12:01:26 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -97 (0xffffffffffffff9f) Unknown error number
Computer ID 139265
Report deadline 30 Aug 2013 | 10:33:41 UTC
Run time 4,874.18
CPU time 4,809.04
Validate state Invalid
Credit 0.00
Application version ACEMD beta version v8.00 (cuda55)
Stderr output

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -97 (0xffffff9f)
</message>
<stderr_txt>
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3170M] VERSION [55]
# SWAN Device 0 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:05:00.0
# Device clock : 1110MHz
# Memory clock : 3004MHz
# Memory width : 192bit
# Driver version : r325_00
# Simulation has crashed.

</stderr_txt>
]]>

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32290 - Posted: 26 Aug 2013 | 13:55:35 UTC

The Beta app is now live as ACEMD-Short. Version 800

MJH

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32291 - Posted: 26 Aug 2013 | 13:56:59 UTC - in response to Message 32290.
Last modified: 26 Aug 2013 | 13:57:15 UTC

Sweet! Did you get the "Driver version" thing figured out so that it'll show 326.80?

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32292 - Posted: 26 Aug 2013 | 14:01:53 UTC - in response to Message 32290.

The Beta app is now live as ACEMD-Short. Version 800

MJH


Super. My Titan is crunching first Nathan's.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32293 - Posted: 26 Aug 2013 | 14:12:24 UTC - in response to Message 32291.


Did you get the "Driver version" thing figured out so that it'll show 326.80?


Not yet. That'll follow later.

MJH

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32294 - Posted: 26 Aug 2013 | 15:48:07 UTC

Wow even on Fermi the new app is ~120secs faster ^^

So, seems good from this front.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32295 - Posted: 26 Aug 2013 | 16:50:21 UTC

Shouldnt there be a tick box for CUDA 5.5 under short runs, or am I missing something?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32301 - Posted: 26 Aug 2013 | 18:21:13 UTC - in response to Message 32295.

It's selected automatically, based on client driver version.

MJH

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32303 - Posted: 26 Aug 2013 | 18:55:57 UTC

Looking like 90 min for the short runs.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32305 - Posted: 26 Aug 2013 | 19:31:17 UTC

I think that renaming threads is not nice.
Anyway, it's good news that you put the new app to the short queue.
When do you plan to put it in the long queue too?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32306 - Posted: 26 Aug 2013 | 19:55:46 UTC - in response to Message 32305.


When do you plan to put it in the long queue too?


In a week or so.

MJH

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32310 - Posted: 26 Aug 2013 | 23:30:43 UTC

OK its good for cc1.3 cards too. Tried the 285GTX witch normally is retired from my side from GPUGrid, but now as Test good enough ^^ 26,241.59 secs = 7,3h
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Zarck
Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32466 - Posted: 29 Aug 2013 | 15:07:42 UTC - in response to Message 32310.
Last modified: 29 Aug 2013 | 15:08:05 UTC

Load on the graphics card to O% increase at 0%, I left the unit after 45 minutes.

http://www.gpugrid.net/result.php?resultid=7221739

@+
*_*
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32469 - Posted: 29 Aug 2013 | 15:17:36 UTC - in response to Message 32466.

You need app 8.01 and then the Noelia's run smooth as ever.
____________
Greetings from TJ

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32483 - Posted: 29 Aug 2013 | 17:40:03 UTC - in response to Message 32469.

8.02 makes NOELIA tasks run even smootherer

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32559 - Posted: 31 Aug 2013 | 0:00:54 UTC

There is a 8.04 app in the Beta queue.
I've received this alongside with some TEST14 workunits.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32681 - Posted: 4 Sep 2013 | 10:39:58 UTC

There is a new acemdshort application, version 8.11 (Windows only). Since this is (hopefully!) the last app revision, now's a good time to summarise the changes in the 800 series over the older app:

* SM3.5 support for Titan, Geforce 780, etc.
* CUDA 4.2 and CUDA 5.5 builds, automatically assigned based on client driver version. This represents the first step in deprecating CUDA 4.2 and moving exclusively to CUDA 5.5.
* Improved stability Fixed several bugs that caused significant rates of compute errors.
* Reduced driver crashes Reduced incidence of driver hangs on suspend. The problem is not yet eliminated totally.
* Improved reporting GPU stats and temperatures now reported in the stderr. Error codes cleaned up to give better data on failure modes.
* application bug-fixes many fixes and enhancements for the science.

MJH

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32700 - Posted: 4 Sep 2013 | 15:29:41 UTC - in response to Message 32681.
Last modified: 5 Sep 2013 | 16:18:31 UTC

Here is a list of compute errors codes for the 8xx series applications and their meanings. If you encounter a new one, or have a question or observation about the circumstances of an error, please PM me.

* 255 See -1

* 247 See -9

* 212 See -44

* 197 The WU took longer to much complete than the client was expecting and so it was terminated. Indicates a WU misconfiguration. If recurrent, try re-attaching to the project.

* 194 Unknown. ("finish file present too long")

* 193 Unknown. (Segfault on Linux)

* 159 See -97

* 98 See -9

* -1 Unknown

* -9 The GPU compute capability is not supported by the application; for example a pre cc 1.3 G80.

* -44 The computer's date is wrong.

* -80 Failed to recover after an access violation (Win32)

* -97 "Simulation has become unstable". This indicates that the scientific simulation that the application performs has gone wrong. If this happens as soon as the WU starts, there may be a problem with the WU. If it happens frequently or after the WU has made some progress, a hardware problem is strongly indicated. Check GPU temperatures (now reported in stderr) and for the presence of other GPU-using programs (eg games).

* -185 Application doesn't start, "Access denied" error in the stderr. Check that the client has downloaded the application correctly - if unsure re-attach to the project. Could also be caused by antivirus preventing BOINC starting new processes.

* -226 "Too many exits" The app repeatedly exited without indicating BOINC that the WU was complete and was restarted, until a limit was reached. Cause unclear.

* -1073741515 (Windows only) The application failed to intialize properly. Indicates missing DLLs. Re-attach to the project, to force the application and its support DLLs to be re-downloaded. Ensure that VS2008 redistributables are installed http://www.microsoft.com/en-us/download/details.aspx?id=29

* -1073741819 (Windows only) Access violation (Segmentation fault). The application made an illegal memory access. These seem mostly to come from inside the Nvidia driver but root cause(s) unknown. If occurring repeatedly, reboot machine.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32719 - Posted: 5 Sep 2013 | 9:49:00 UTC

The beta's are gone and the Santi's start to error again on my GTX660.
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32720 - Posted: 5 Sep 2013 | 10:17:26 UTC - in response to Message 32719.
Last modified: 5 Sep 2013 | 10:27:08 UTC

Short runs (2-3 hours on fastest card) v8.11 (cuda55)

Outcome Computation error
Client state Compute error
Exit status -97 (0xffffffffffffff9f) Unknown error number

* -97 "Simulation has become unstable". This indicates that the scientific simulation that the application performs has gone wrong. If this happens as soon as the WU starts, there may be a problem with the WU. If it happens frequently or after the WU has made some progress, a hardware problem is strongly indicated. Check GPU temperatures (now reported in stderr) and for the presence of other GPU-using programs (eg games) - MJH

Your temps look reasonable (mostly around 66°C). I suggest you restart the system, and if errors continue to occur look into what else might be causing this problem (games, video programs, antivirus scans, updates...). You might want to note the failure time and check your logs to see what was happening at that time or just before.

Both times you had the error, the stderr log ends in,
# GPU 0 Current Temp: 64 C
# The simulation has become unstable. Terminating to avoid lock-up (1)

The slight GPU temperature drop from 66°C to 64°C might indicate resource consumption by something else on your system just before the WU was ended, or the GPU temperature might just have dropped as the WU was ended?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32721 - Posted: 5 Sep 2013 | 10:56:05 UTC - in response to Message 32720.


* -97 "Simulation has become unstable".


The new app does a much better job at determining when a WU has gone bad and aborting. Previously this might have manifested itself as non-specific crash/driver reset.

In some circumstances it may be possible to attempt recovery from this failure. Expect a new beta trying an idea out later today.

MJH

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32723 - Posted: 5 Sep 2013 | 11:21:32 UTC - in response to Message 32720.

Thanks for the information skgiven (and moving my post).

No games or whatever on this machine, only 6 Rosetta WU's and 2 cores free for GPUGRID. Two Xeon's processors so it are real cores, no HT on these oldies.
In the right event that I am behind the system and I see a WU fail, I did notice a temperature drop of the GPU. This is normal off course as it is no longer working hard. It will not drop match as a new WU start again.

AV is F-Secure is this is no longer a problem for BOINC. I have been in very close contact with someone from there main office for a few months to get everything working after their update. I did a lot of testing for them and they even run Rosetta for themselves for a month to get it working. So that is no issue. I have it free with my ISP subscription and am using it for little longer than 2 years now, and for the last 8 months it has never been an issue on any PC for any project.

System was restarted just before night as a WU had down clocked my GPU.

I have set this system to beta to test the new Harvey's Matt will bring in shortly.
____________
Greetings from TJ

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32735 - Posted: 5 Sep 2013 | 15:39:28 UTC
Last modified: 5 Sep 2013 | 15:40:43 UTC

Hello: I have finished a task well 8.11 but comment some weird stuff.

Task: 9x2-SANTI_MAR4222-4-25-RND9976_0 - Runtime 8317.47 seconds, on my GTX 770 and FX8350 CPU to 4.4GHz.

stderr output
<core_client_version> 7.2.11 </ core_client_version>
<! [CDATA [
<stderr_txt>
mp: 59 C
# 0 Current GPU Temp: 60 C
# 0 Current GPU Temp: 59 C
# 0 Current GPU Temp: 59 C
# 0 Current GPU Temp: 59 ....... and more.. more GPU temperature records ...?? ending,

# GPU 0 Current Temp: 61 C
# Time per step (avg over 3000000 steps): 2.771 ms
# Approximate elapsed time for Entire WU: 8313.984 s
called boinc_finish

</ stderr_txt>
]]>

Execution times I see them too long if I compare with other short tasks. I think that something does not work very well. Greetings.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32737 - Posted: 5 Sep 2013 | 17:27:37 UTC - in response to Message 32735.

Execution times I see them too long if I compare with other short tasks. I think that something does not work very well. Greetings.


That is correct Carlesa25, there are problems with. There is a lengthy thread about this. Its this on: http://www.gpugrid.net/forum_thread.php?id=3450
If you start reading at the first post, you will quickly understand what is wrong.
____________
Greetings from TJ

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32743 - Posted: 5 Sep 2013 | 21:09:25 UTC - in response to Message 32700.
Last modified: 5 Sep 2013 | 21:14:38 UTC

I just had an 8.13 task result in:
-1073741819 (0xffffffffc0000005)
Full details below.

I think the cause was that an 8.11 NOELIA_KLEBEbeta (which I was suspending to test) TDR'd the drivers, and this new task segfaulted trying to get started.

We still have some work to do resolving the suspending of tasks. I've noticed that they continue running for up to 15 seconds even after suspense. I thought I remember Einstein having a problem that might be related, via
http://einstein.phys.uwm.edu/forum_thread.php?id=10141
... I'm trying to dig up the BOINC API fix (from early June 2013) that solved it.

EDIT:
I found it. Check out this fix.

Would it be applicable towards helping us (GPUGrid project and users) suspend more correctly?
http://boinc.berkeley.edu/trac/changeset/b98bc309cceccf95b9fac578c47cbea06a8b8150/boinc-v2
If so, can you implement it?



http://www.gpugrid.net/result.php?resultid=7250888

Name 178-MJHARVEY_CRASH1-0-25-RND2676_0
Workunit 4754442
Created 5 Sep 2013 | 14:05:52 UTC
Sent 5 Sep 2013 | 16:39:18 UTC
Received 5 Sep 2013 | 21:02:06 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -1073741819 (0xffffffffc0000005) Unknown error number
Computer ID 153764
Report deadline 10 Sep 2013 | 16:39:18 UTC
Run time 8.18
CPU time 0.00
Validate state Invalid
Credit 0.00
Application version ACEMD beta version v8.13 (cuda55)
Stderr output

<core_client_version>7.2.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>
]]>

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32745 - Posted: 5 Sep 2013 | 21:41:37 UTC - in response to Message 32743.


We still have some work to do resolving the suspending of tasks. I've noticed that they continue running for up to 15 seconds even after suspense.


Side effect of having more code blocked out in critical sections. As the article you found indicates, prompt terminating on suspend requires the monitoring thread to wake up while the app thread is outside a critical region.

The only way this is going to get fixed to change the dumb way the boinc client lib blugeons the app process to death, and give the app opportunity to close down gracefully.
This will take a bit of work, but it's high on the Todo list.

MJH

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32746 - Posted: 5 Sep 2013 | 21:43:54 UTC - in response to Message 32745.
Last modified: 5 Sep 2013 | 21:44:56 UTC

Thanks.
Regarding
http://boinc.berkeley.edu/trac/changeset/b98bc309cceccf95b9fac578c47cbea06a8b8150/boinc-v2

... It looked like a simple-to-moderate code change that just changes the way suspension works with the critical sections. It looks very applicable toward making our suspense requests run smoother, and I hope it isn't hard to implement. (I don't know much about where the API code comes in to play, but if it's just "a piece that's included when building apps", then maybe it'll be pretty easy for you to "hook it in")

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32747 - Posted: 5 Sep 2013 | 21:50:25 UTC - in response to Message 32746.

Jacob,

That fix would already have been included in 8.12 when I updated to the latest boinc library revision.

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32749 - Posted: 5 Sep 2013 | 21:52:29 UTC - in response to Message 32747.
Last modified: 5 Sep 2013 | 21:55:31 UTC

Well, for my situation there, it was an 8.11 that caused the problem.
:) I'll keep testing, and hopefully it works even better in the already-released 8.12 and 8.13

Thanks for making progress - I really do appreciate it!!

Edit: 8.13 is suspending/resuming VERY nicely. I can't wait to have 8.13 running on a NOELIA_KLEBE task (to test it!)

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32755 - Posted: 5 Sep 2013 | 22:13:06 UTC - in response to Message 32749.

I have tested it on a NOELIA_INS task in order to get a beta. The suspending and starting again worked. (Not getting beta WU as it knew that a task was suspended :( )
____________
Greetings from TJ

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,392,698
RAC: 20,073,067
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32757 - Posted: 5 Sep 2013 | 23:23:36 UTC - in response to Message 32745.

The only way this is going to get fixed to change the dumb way the boinc client lib blugeons the app process to death, and give the app opportunity to close down gracefully.
This will take a bit of work, but it's high on the Todo list.

MJH

You have allies in the BOINC community. Eric Korpela of SETI@home wrote (on 13 Nov 2008 - unfortunately in a private forum I can't link):

Yes, the terminate with no mercy policy sucks and we should find if there is a way around it, or at least a way to allow I/O to finish.

About time we got round to fixing that...

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32760 - Posted: 5 Sep 2013 | 23:44:31 UTC

First MJHarvey_Crash beta just finished.
http://www.gpugrid.net/result.php?resultid=7251789

Only betas I can't get to finish are the Noelia_Klebe
http://www.gpugrid.net/result.php?resultid=7248385

Could the Noelia_Klebes be troublesome because my card is a PE and ramps up to 1200MHz. on the core when crunching? Just wondered because it runs @ 1200 for all the other tasks too. I've also noticed that there is a huge time discrepancy between GPU/CPU on these failed tasks when all the others show GPU/CPU times to be very close.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32766 - Posted: 6 Sep 2013 | 7:31:07 UTC - in response to Message 32760.

Your card mostly ran at 58°C, so it wasn't overly taxed by the Noelia_Klebe WU.
My GTX660Ti also clocks up to ~1200MHz. That said I also get the odd error from it and other similar cards.

The Noelia_Klebe WU's don't use a full CPU core/thread in the same way most other WU's do. This has been the case since they were first released.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32815 - Posted: 6 Sep 2013 | 20:51:21 UTC

acemdshort is now updated to 8.14. This version has improved stability during suspend/resume.

MJH

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32817 - Posted: 7 Sep 2013 | 1:06:31 UTC

My GTX660Ti also clocks up to ~1200MHz. That said I also get the odd error from it and other similar cards.

Wish it was just an odd now or than error. I haven't had 1 NOELIA_KLEBE beta complete and validate yet. The all end with the time exceeded error after running for an hour or so.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32826 - Posted: 8 Sep 2013 | 14:02:19 UTC

I had one CRASH test overnight that took 22,493.99 seconds to complete. Checking the system shows that the one that was running on half the core clock of the GPU. So that is the explanation. However no reason in the stderr report, core clock was there reported as it should be, 1058MHz. I reboot the system and all is normal again.
I have seen reduced clock speeds, but that was after an error or ACEMD crash , this is new that it happened without any errors.
____________
Greetings from TJ

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32846 - Posted: 8 Sep 2013 | 17:55:34 UTC

Hello: 8.14 Tasks are running low load on the GPU <60% and also very unstable, varies more than 10% + -. in my GTX 770.

The CPU runs smoothly, but the result is that it takes twice as necessary, a short assignment are about four hours ...??.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32848 - Posted: 8 Sep 2013 | 18:10:52 UTC - in response to Message 32826.

TJ, I guess you are referring to this WU,

194-MJHARVEY_CRASH1-1-25-RND6694_0 4759387 7 Sep 2013 | 14:08:27 UTC 8 Sep 2013 | 0:59:20 UTC Completed and validated 22,493.99 22,461.82 18,750.00 ACEMD beta version v8.14 (cuda55)

When a WU doesn't use the GPU enough, it can cause the GPU to downclock. The temps were only 52°C, while your other runs on similar WU's had temps rising to 67°C. A 15°C drop sounds about right for a downclock.

Perhaps a mechanism to report changes in core clock, as well as temp, would be useful (if it's not too late)!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32849 - Posted: 8 Sep 2013 | 18:54:34 UTC - in response to Message 32848.

Hello: If this task is completed, but is now running; SANTI_MARwt2-4-25-uan RND4912_0 with load <65%. GPU

The GTX770 is running at 1254 Mhz GPU Clock without problem.
Temperature 55 °C, 20% use FB, BUS use 7% (two variants unstable + - 2%)
Memory Usage: 519 MB in GPU.

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32850 - Posted: 8 Sep 2013 | 19:31:40 UTC - in response to Message 32849.
Last modified: 8 Sep 2013 | 19:40:37 UTC

Hello: If this task is completed, but is now running; SANTI_MARwt2-4-25-uan RND4912_0 with load <65%. GPU


Hello: Regarding the issue of little use GPU if it has to be the way of working of these tasks, the solution will perform two tasks on the GPU to achieve maximum load.

That those responsible will be interesting to confirm this issue in order to decide how to handle these tasks.

NOTE: I happened to run two tasks at the same GPU 8.14 GTX770 and the total charge passed 55% to 70% + - 5% Memory 777 MB FB and BUS 22% and 8% 1254 Mhz GPU.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32851 - Posted: 8 Sep 2013 | 19:48:48 UTC - in response to Message 32848.

TJ, I guess you are referring to this WU,

194-MJHARVEY_CRASH1-1-25-RND6694_0 4759387 7 Sep 2013 | 14:08:27 UTC 8 Sep 2013 | 0:59:20 UTC Completed and validated 22,493.99 22,461.82 18,750.00 ACEMD beta version v8.14 (cuda55)

Yes, skgiven that is the one.
Later this morning I had one error, but that did not down clock the core clock.
But as these CRASH tests are Santi's SR and I had a lot of errors of them, my error rate has lowered significantly.
____________
Greetings from TJ

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32860 - Posted: 9 Sep 2013 | 10:30:32 UTC - in response to Message 32850.

Hello: If this task is completed, but is now running; SANTI_MARwt2-4-25-uan RND4912_0 with load <65%. GPU


Hello: Regarding the issue of little use GPU if it has to be the way of working of these tasks, the solution will perform two tasks on the GPU to achieve maximum load.

That those responsible will be interesting to confirm this issue in order to decide how to handle these tasks.

NOTE: I happened to run two tasks at the same GPU 8.14 GTX770 and the total charge passed 55% to 70% + - 5% Memory 777 MB FB and BUS 22% and 8% 1254 Mhz GPU.


Hello: Sorry ... 8.14 my problems with no load on the GPU result from a corruption of the driver, reinstalled the question has been solved and GPU load is normal 85% + -

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32861 - Posted: 9 Sep 2013 | 11:35:31 UTC - in response to Message 32860.
Last modified: 9 Sep 2013 | 11:36:48 UTC

My Asus 770 runs at 91-92% GPU load steady, with core clock of 1097MHz, however I have it set to 1060MHz. This is for a Nathan WU and obvious no 8.14 app yet.
____________
Greetings from TJ

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33143 - Posted: 22 Sep 2013 | 11:43:27 UTC

The server should now once again be dishing out Short tasks to Linux clients.

MJH

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 34154 - Posted: 8 Dec 2013 | 15:34:24 UTC

I've promoted the 8.15 beta application to the short queue.
This version has a workaround to catch tasks that repeatedly fail, necessitating a machine reset.

Matt

Profile (retired account)
Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34157 - Posted: 8 Dec 2013 | 17:57:13 UTC - in response to Message 34154.
Last modified: 8 Dec 2013 | 17:57:31 UTC

Unfortunately it does not remedy the current sudden reboots I am experiencing, pls. see report here.
____________
Mark my words and remember me. - 11th Hour, Lamb of God

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34158 - Posted: 8 Dec 2013 | 19:59:31 UTC - in response to Message 34157.

The same type of WU that was restarting my system (SANTI_baxbimSPW2) now fails without restarting the system:

I992-SANTI_baxbimSPW2-11-62-RND9421_0 4978461 8 Dec 2013 | 16:06:07 UTC 8 Dec 2013 | 19:53:07 UTC Error while computing 1.90 0.00 --- Short runs (2-3 hours on fastest card) v8.15 (cuda55)
I414-SANTI_baxbimSPW2-8-62-RND7653_0 4978454 8 Dec 2013 | 16:06:07 UTC 8 Dec 2013 | 19:53:07 UTC Error while computing 1.43 0.00 --- Short runs (2-3 hours on fastest card) v8.15 (cuda55)

I457-SANTI_baxbimSPW2-10-62-RND3310_0 4978087 8 Dec 2013 | 13:02:28 UTC 8 Dec 2013 | 19:53:07 UTC Aborted by user 878.21 845.17 --- Short runs (2-3 hours on fastest card) v8.14 (cuda55)
I33-SANTI_baxbimSPW2-11-62-RND0268_0 4977237 8 Dec 2013 | 13:02:28 UTC 8 Dec 2013 | 19:53:07 UTC Aborted by user 8,172.70 8,082.04 --- Short runs (2-3 hours on fastest card) v8.14 (cuda55)
potx1x127-NOELIA_INS1P-6-13-RND4570_0 4974468 6 Dec 2013 | 23:40:06 UTC 7 Dec 2013 | 10:17:03 UTC Aborted by user 6,738.16 6,649.12 --- Long runs (8-12 hours on fastest card) v8.14 (cuda55)

The Aborted 8.14 WU's were stopped following system restarts. They were causing driver restarts and the tasks were not progressing.

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1963.

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Grandpa
Send message
Joined: 19 Aug 12
Posts: 3
Credit: 201,022,393
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 34160 - Posted: 8 Dec 2013 | 20:51:26 UTC

C:\ProgramData\BOINC\projects\www.gpugrid.net\acemd.815-55.exe
Avast antivirus is seeing this as a possible virus and blocking it (Win32:Evo-gen [Susp]) which I know it is just a false positive but I figured I should report it.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34162 - Posted: 9 Dec 2013 | 0:40:22 UTC - in response to Message 34160.
Last modified: 9 Dec 2013 | 0:43:50 UTC

Heard the fan on my GTX770 roar, quickly opened MSI Afterburner and saw the GPU power usage at over 4000%!! A few seconds later and the system restarted, blue screen... I guess the app has found a driver bug.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile (retired account)
Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34165 - Posted: 9 Dec 2013 | 2:21:41 UTC - in response to Message 34162.

That's interesting. As mentioned in the other thread, I tested a v8.14 short run with my GT 650M. The workunit did finish ok, but while crunching, the subnotebook shut down gracefully once. Very likely to prevent overheating, since the GPU temp. acc. to stderr.txt was at 85 °C at that time, which is unusual, esp. at the current room temperatures. I activated the notebook-cooler after that...

I735-SANTI_baxbimSPW2-5-62-RND3690_1

Maybe related, maybe not.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34171 - Posted: 9 Dec 2013 | 15:49:02 UTC - in response to Message 34165.

Upgraded drivers to latest WHQL but still got systems restarts and GPUGrid errors. Some fail with error messages, others just say aborted:

Stderr output

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
The file exists.
(0x50) - exit code 80 (0x50)
</message>
<stderr_txt>

</stderr_txt>

It's probably not helpful to just get the 'aborted by user message'
Stderr output

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
]]>

Today I also had a drivers not found problem after a blue screen + restart.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : News : acemdshort application 8.15 - discussion

//