Error units - Noelia

Message boards : Graphics cards (GPUs) : Error units - Noelia

Author	Message
Ebonydogx Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level Scientific publications	Message 26457 - Posted: 26 Jul 2012 \| 2:14:23 UTC Last modified: 26 Jul 2012 \| 2:33:31 UTC
	Noelia, welcome aboard. Been looking for your wus, finally got some. Unfortunately, I just had 11 Noelia wu's crash after about 13 secs each with a computational error. Entries from my event log for one wu are below 7/25/2012 9:03:49 PM \| GPUGRID \| Starting task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 using acemdlong version 616 (cuda42) in slot 6 7/25/2012 9:04:04 PM \| GPUGRID \| Computation for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 finished 7/25/2012 9:04:04 PM \| GPUGRID \| Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_1 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent 7/25/2012 9:04:04 PM \| GPUGRID \| Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_2 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent 7/25/2012 9:04:04 PM \| GPUGRID \| Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_3 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent Wu's are of this variety: run9_replica7-NOELIA_sh2fragment_run-0-4-RND8072_2 Workunit 3598139 Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> Thought you'd want to know - let me know if I should have forwarded other details. Edit: Win7 64, 2x560ti, AMD FX 6100 6 core @ 3.3 GHZ, 850 watt psu
	ID: 26457 \| Rating: 0 \| rate: / Reply Quote

neilp62 Send message Joined: 23 Nov 10 Posts: 14 Credit: 7,899,095,437 RAC: 1,975,376 Level Scientific publications	Message 26458 - Posted: 26 Jul 2012 \| 2:49:30 UTC
	Hmm, I've experience the same error with two back-to-back NOELIA WUs. My PC finished a PAOLA WU just before the NOELIA WUs with no error. For now, I'll suspend the GPUGRID project until something is posted about this...
	ID: 26458 \| Rating: 0 \| rate: / Reply Quote

[PUGLIA] kidkidkid3 Send message Joined: 23 Feb 11 Posts: 98 Credit: 1,281,189,317 RAC: 1,983,672 Level Scientific publications	Message 26462 - Posted: 26 Jul 2012 \| 5:40:05 UTC - in response to Message 26458.
	Hi Noelia, same error (twice) also for me in http://www.gpugrid.net/result.php?resultid=5664840 http://www.gpugrid.net/result.php?resultid=5664472 <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> I'll stop or cancel your WU until something is posted about this error. k. ____________ Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing. (Martin Luther King)
	ID: 26462 \| Rating: 0 \| rate: / Reply Quote

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 26463 - Posted: 26 Jul 2012 \| 6:10:54 UTC
	Ditto. All run2 WUs if I'm not mistaken and all failing on all other sent hosts. Definitely a problem with these.
	ID: 26463 \| Rating: 0 \| rate: / Reply Quote

werdwerdus Send message Joined: 15 Apr 10 Posts: 123 Credit: 1,004,473,861 RAC: 0 Level Scientific publications	Message 26464 - Posted: 26 Jul 2012 \| 7:40:27 UTC
	yep some errors, also gpu utilization is pretty low, currently at 79% on my GTX 470 task rundig8_run5-NOELIA_smd2-1-5-RND4856_0 using acemdlong version 616 (cuda42)
	ID: 26464 \| Rating: 0 \| rate: / Reply Quote

noelia Send message Joined: 5 Jul 12 Posts: 35 Credit: 393,375 RAC: 0 Level Scientific publications	Message 26469 - Posted: 26 Jul 2012 \| 15:34:01 UTC - in response to Message 26464.
	Hi guys, I apologize for this inconvenience. It is the first time I run the system after doing the equilibration phase in acemdbeta (the first step commented on this other thread: http://www.gpugrid.org/forum_thread.php?id=3088 ), and works quite differently as when we run it locally, so that's why all the simulations where crashing within a few seconds. Now the procedure is automatized and this should not be a problem in the future when running this way. Thank you for you time :)
	ID: 26469 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 26472 - Posted: 26 Jul 2012 \| 17:17:15 UTC - in response to Message 26469.
	Shouldn't these workunits processed by the 6.47 beta client?
	ID: 26472 \| Rating: 0 \| rate: / Reply Quote

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 26483 - Posted: 27 Jul 2012 \| 5:12:26 UTC
	Just had another one recently. It was sent about 5 or so hours ago. I hope these are out of the system now, because I've produced 53 errors on these tasks. Just seems like a lot of wasted bandwidth on my end and on yours. I understand things happen, but please take these out of the hopper if you have not done so already. Cheers
	ID: 26483 \| Rating: 0 \| rate: / Reply Quote

Ebonydogx Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level Scientific publications	Message 26489 - Posted: 27 Jul 2012 \| 18:32:04 UTC - in response to Message 26483.
	Started processing one replacement Noelia wu. run10_replica37-NOELIA_sh2fragment_fixed-0-4-RND1582_0 It is only 10% complete after about 90 mins. At start-up the wu projected 9:36 to complete but is now on track for a bit over 14 hours, and prolly significantly more. http://www.gpugrid.net/workunit.php?wuid=3601656 This wu will never qualify for max bonus bc there just isn't enough time to process and return within 24 hours. Same problem that Nathan wus had back in Feb/March if I remember correctly. I'll let this wu run another couple of hours, see how it is tracking, then update this post. In the meanwhile, may I suggest you visit with Nathan on proc time as he has lived through this before & was able to adjust the wu's so proc returned to "8-12 hours on fastest cards." On my 560 ti's his wu's typically take about 8 hours to crunch and another 10-12 mins to upload. Thank you!
	ID: 26489 \| Rating: 0 \| rate: / Reply Quote

Ebonydogx Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level Scientific publications	Message 26490 - Posted: 27 Jul 2012 \| 21:12:25 UTC - in response to Message 26489. Last modified: 27 Jul 2012 \| 21:13:19 UTC
	I'll let this wu run another couple of hours, see how it is tracking, then update this post. Edit: after 4.5 hours, still on track to finish in a bit over 14 hours
	ID: 26490 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 26491 - Posted: 28 Jul 2012 \| 0:30:20 UTC - in response to Message 26490.
	I'll let this wu run another couple of hours, see how it is tracking, then update this post. Edit: after 4.5 hours, still on track to finish in a bit over 14 hours After the release of the GTX 6xx series, I wouldn't consider a GTX 560 Ti as one of "the fastest cards". Besides the GTX 560 Ti 448, it has only 256 usable shaders (by the GPUGrid client) because it is a CC2.1 card (while the Ti 448 'limited edition' is a CC2.0 card, so all of it's shaders can be used by the GPUGrid client). At the moment the fastest cards are: GTX 690, 680, 670, 590, 580, 570, 480, 470, 560 Ti 448, 465
	ID: 26491 \| Rating: 0 \| rate: / Reply Quote

Ebonydogx Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level Scientific publications	Message 26493 - Posted: 28 Jul 2012 \| 3:56:45 UTC - in response to Message 26491.
	No argument on which cards are currently fastest out there. Nevertheless, the 560ti can comfortably finish all existing tasks in 8-12 hours with the exception of this latest group from Noelia. That's all I meant.
	ID: 26493 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : Error units - Noelia

	About	Science	Volunteers	Performance	Forum	Join us	Donate