Advanced search

Message boards : News : New systems in Long queue

Author Message
noelia
Send message
Joined: 5 Jul 12
Posts: 35
Credit: 393,375
RAC: 0
Level

Scientific publications
wat
Message 27733 - Posted: 19 Dec 2012 | 16:41:25 UTC

Hi all!

A good amount of new WUs will be around for the next weeks in the long queue. The systems are called hfXA_long and will provide around 90000 credits each.

Thanks and Merry Christmas to all!!
Noelia

TheFiend
Send message
Joined: 26 Aug 11
Posts: 99
Credit: 2,500,112,138
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27737 - Posted: 19 Dec 2012 | 19:14:34 UTC

And a Merry Xmas to you and all the team at GPUGRID

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,190,821,966
RAC: 10,571,630
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27742 - Posted: 19 Dec 2012 | 22:43:06 UTC - in response to Message 27733.
Last modified: 19 Dec 2012 | 22:50:45 UTC

My computers just downloaded a couple of them. They seem to be very long (looks like about 14 to 16 hours before they finish on my computers.) GPU usage is 90%+ on both windows 7 and xp. I hope the upload files aren't too big, so I don't get an error for that.

What exactly are we crunching with these Wu's, in layman's terms please?

Merry Christmas!

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27743 - Posted: 19 Dec 2012 | 23:13:03 UTC - in response to Message 27742.
Last modified: 19 Dec 2012 | 23:42:43 UTC

. I hope the upload files aren't too big, so I don't get an error for that.

Merry Christmas!


Hope that too, short before christmas i have a speedissue with the 3G connection of my main boinc cluster 2-6 kb/sec shared Upload overall arent that much :/

But yes, 99% GPU Load is great :)

Merry chrstmas for you too!
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,190,821,966
RAC: 10,571,630
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27744 - Posted: 20 Dec 2012 | 2:03:29 UTC

I hope I am wrong on this but it looks like the output file for these work units are going to be larger than 128 MB. After one of the units was 25% done, the 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_4 file was about 35 MB in size, at 33% done it was about 47 MB in size. If this projection holds when the file will be 100% done, the file will be about 140 MB, too large to upload, unless you raise the size limit of the upload files. I hate to see an otherwise successfully completed unit, error out like this.

This error occurred before:

5/8/2012 3:26:52 AM | GPUGRID | Computation for task 1H46_11_8-PAOLA_RNP-0-5-RND3163_0 finished
5/8/2012 3:26:52 AM | GPUGRID | Output file 1H46_11_8-PAOLA_RNP-0-5-RND3163_0_4 for task 1H46_11_8-PAOLA_RNP-0-5-RND3163_0 exceeds size limit.
5/8/2012 3:26:52 AM | GPUGRID | File size: 131283476.000000 bytes. Limit: 128000000.000000 byte

http://www.gpugrid.net/forum_thread.php?id=2970#24795

But you still have an opportunity to correct this.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 27745 - Posted: 20 Dec 2012 | 3:32:51 UTC - in response to Message 27744.

I'm uploading one right now, it's 109.95MB and it took 8 hours 15 minutes to complete.

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27747 - Posted: 20 Dec 2012 | 4:46:51 UTC
Last modified: 20 Dec 2012 | 4:48:03 UTC

yep they get compressed before uploaded right? with gzip or something. I have 2 uploading right now. took 9:22 and 9:09 on two GTX 660 Ti gpus. 109.95 MB each. 97-98% gpu utilization on winxp.
____________
XtremeSystems.org - #1 Team in GPUGrid

TheFiend
Send message
Joined: 26 Aug 11
Posts: 99
Credit: 2,500,112,138
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27748 - Posted: 20 Dec 2012 | 8:12:47 UTC

I've had one error out on one of my GTX670's :(

11 hours.. :(



Name 1x22_18-NOELIA_hfXA_long-0-2-RND2038_0
Workunit 3964872
Created 19 Dec 2012 | 15:06:36 UTC
Sent 19 Dec 2012 | 18:43:06 UTC
Received 20 Dec 2012 | 7:46:04 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x0)
Computer ID 109019
Report deadline 24 Dec 2012 | 18:43:06 UTC
Run time 39,198.89
CPU time 38,715.33
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v6.16 (cuda42)
Stderr output

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
MDIO: cannot open file "restart.coor"
# Time per step (avg over 6250000 steps): 6.274 ms
# Approximate elapsed time for entire WU: 39213.953 s
called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>1x22_18-NOELIA_hfXA_long-0-2-RND2038_0_4</file_name>
<error_code>-131</error_code>
</file_xfer_error>

</message>
]]>


Had another complete in 9 hours.

Profile Gattorantolo [Ticino]
Avatar
Send message
Joined: 29 Dec 11
Posts: 44
Credit: 251,211,525
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwat
Message 27749 - Posted: 20 Dec 2012 | 9:07:23 UTC - in response to Message 27748.
Last modified: 20 Dec 2012 | 9:08:27 UTC

Me too, after 10 hours...the task was already finished :-((( GTX680
What`s the problem?
____________
Member of Boinc Italy.

Profile Gattorantolo [Ticino]
Avatar
Send message
Joined: 29 Dec 11
Posts: 44
Credit: 251,211,525
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwat
Message 27751 - Posted: 20 Dec 2012 | 10:48:18 UTC - in response to Message 27749.
Last modified: 20 Dec 2012 | 10:49:10 UTC

Me too, after 10 hours...the task was already finished :-((( GTX680
What`s the problem?

The second error after 11 hours...i stop the "long run"...i`am crunching now ACEMD standard!!!!!
____________
Member of Boinc Italy.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 27752 - Posted: 20 Dec 2012 | 11:08:57 UTC - in response to Message 27751.

I have increased the upload size to 200MB now, but this will take place only on new results. I guess it's at the border line between 128MB (the previous limit), so it depends on the compression.

It's our mistake as the internal test did not complain, but now it's a bit of a problem to cancel them.

gdf

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,190,821,966
RAC: 10,571,630
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27753 - Posted: 20 Dec 2012 | 11:22:12 UTC - in response to Message 27744.

I hope I am wrong on this but it looks like the output file for these work units are going to be larger than 128 MB. After one of the units was 25% done, the 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_4 file was about 35 MB in size, at 33% done it was about 47 MB in size. If this projection holds when the file will be 100% done, the file will be about 140 MB, too large to upload, unless you raise the size limit of the upload files. I hate to see an otherwise successfully completed unit, error out like this.

This error occurred before:

5/8/2012 3:26:52 AM | GPUGRID | Computation for task 1H46_11_8-PAOLA_RNP-0-5-RND3163_0 finished
5/8/2012 3:26:52 AM | GPUGRID | Output file 1H46_11_8-PAOLA_RNP-0-5-RND3163_0_4 for task 1H46_11_8-PAOLA_RNP-0-5-RND3163_0 exceeds size limit.
5/8/2012 3:26:52 AM | GPUGRID | File size: 131283476.000000 bytes. Limit: 128000000.000000 byte

http://www.gpugrid.net/forum_thread.php?id=2970#24795

But you still have an opportunity to correct this.



Here is the reason why they failed, from my event log:

12/20/2012 6:17:55 AM | GPUGRID | Computation for task 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1 finished
12/20/2012 6:17:55 AM | GPUGRID | Output file 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_4 for task 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1 exceeds size limit.
12/20/2012 6:17:55 AM | GPUGRID | File size: 144107276.000000 bytes. Limit: 128000000.000000 bytes
12/20/2012 6:18:08 AM | GPUGRID | Started upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_0
12/20/2012 6:18:08 AM | GPUGRID | Started upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_1
12/20/2012 6:18:16 AM | GPUGRID | Finished upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_0
12/20/2012 6:18:16 AM | GPUGRID | Started upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_2
12/20/2012 6:18:45 AM | GPUGRID | Finished upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_1
12/20/2012 6:18:45 AM | GPUGRID | Started upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_3
12/20/2012 6:19:03 AM | GPUGRID | Finished upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_2
12/20/2012 6:19:03 AM | GPUGRID | Started upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_7
12/20/2012 6:19:04 AM | GPUGRID | Finished upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_3
12/20/2012 6:19:05 AM | GPUGRID | Finished upload of 1x1_10-NOELIA_hfXA_long-0-2-RND6540_1_7

Guys, you're suppose to learn from your mistakes, not repeat them.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 27754 - Posted: 20 Dec 2012 | 11:29:36 UTC - in response to Message 27753.
Last modified: 20 Dec 2012 | 11:30:44 UTC

Ok.
We have checked in the DB. There are 87 failures like this and 1800 successes for this batch.

As I said the problem is that the submission script did not picked it up.

All new results will have a limit of 256MB.

gdf

Starting from the new application in January we expect to upload much smaller files. 1/3 of the current size.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,190,821,966
RAC: 10,571,630
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27755 - Posted: 20 Dec 2012 | 12:18:22 UTC - in response to Message 27754.

Well, it happened again.

12/20/2012 6:58:25 AM | GPUGRID | Output file 1x17_3-NOELIA_hfXA_long-0-2-RND7641_0_4 for task 1x17_3-NOELIA_hfXA_long-0-2-RND7641_0 exceeds size limit.
12/20/2012 6:58:25 AM | GPUGRID | File size: 144107276.000000 bytes. Limit: 128000000.000000 bytes
12/20/2012 6:58:36 AM | GPUGRID | Starting task 10x12_4-NOELIA_hfXA_long-0-2-RND0606_1 using acemdlong version 616 (cuda42) in slot 2


http://www.gpugrid.net/result.php?resultid=6222298

I have three more of these units crunching right now. I hope this doesn't happen again.

Profile Gattorantolo [Ticino]
Avatar
Send message
Joined: 29 Dec 11
Posts: 44
Credit: 251,211,525
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwat
Message 27756 - Posted: 20 Dec 2012 | 13:00:18 UTC - in response to Message 27755.

What we have to do? 23 hours GPU work for nothing... :-(
____________
Member of Boinc Italy.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 27757 - Posted: 20 Dec 2012 | 13:53:06 UTC - in response to Message 27756.

Can you manually increase the limit or at least see how much it is?

Is your version of the boinc client compressing the files? I don't understand why for some it works and for few it does not.

gdf

TheFiend
Send message
Joined: 26 Aug 11
Posts: 99
Credit: 2,500,112,138
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27761 - Posted: 20 Dec 2012 | 15:31:28 UTC

I've just had another one fail at upload!!! :(

[AF>Belgique] bill1170
Send message
Joined: 4 Jan 09
Posts: 13
Credit: 835,602,199
RAC: 646,671
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27762 - Posted: 20 Dec 2012 | 15:34:55 UTC - in response to Message 27757.

Same problem
"20/12/2012 12:40:09 | GPUGRID | Output file 1x18_5-NOELIA_hfXA_long-0-2-RND1979_0_4 for task 1x18_5-NOELIA_hfXA_long-0-2-RND1979_0 exceeds size limit.
20/12/2012 12:40:09 | GPUGRID | File size: 144107276.000000 bytes. Limit: 128000000.000000 bytes"

with this one :
http://www.gpugrid.net/workunit.php?wuid=3964747

on GTX660Ti XP32 Boinc 7.0.28

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27763 - Posted: 20 Dec 2012 | 15:36:58 UTC

This is very very bad.
I have only one successful NOELIA_hfXA_long task (it was the first one) and 9 failures 8 of the failures are because the upload limit exceeded.
Maybe the BOINC manager (on Windows) has this 128MB upload limit, so it couldn't be fixed on server side, since the increase I've got this error again.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27764 - Posted: 20 Dec 2012 | 16:29:22 UTC

Oh man..thats hard to know all fresh WUs will error out too in 14-18 hours and i cant abort them all..."50M until End of the year byebye" -_-
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Rantanplan
Send message
Joined: 22 Jul 11
Posts: 166
Credit: 138,629,987
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27766 - Posted: 20 Dec 2012 | 16:44:38 UTC - in response to Message 27763.
Last modified: 20 Dec 2012 | 17:16:22 UTC

ok, i raised the transfer limit to 1500,00 MB per day !? will that effect anything ?

ok i found this , but will taht help, i even dont understand that.

http://boinc.berkeley.edu/trac/wiki/JobSubmission

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27767 - Posted: 20 Dec 2012 | 17:08:20 UTC

Two here as well!
20/12/2012 12:30:29 a.m. | GPUGRID | Output file 1x13_10-NOELIA_hfXA_long-0-2-RND5009_0_4 for task 1x13_10-NOELIA_hfXA_long-0-2-RND5009_0 exceeds size limit.
20/12/2012 12:30:29 a.m. | GPUGRID | File size: 144107276.000000 bytes. Limit: 128000000.000000 bytes
20/12/2012 11:49:34 a.m. | GPUGRID | Output file 1x31_12-NOELIA_hfXA_long-0-2-RND4384_1_4 for task 1x31_12-NOELIA_hfXA_long-0-2-RND4384_1 exceeds size limit.
20/12/2012 11:49:34 a.m. | GPUGRID | File size: 144107276.000000 bytes. Limit: 128000000.000000 bytes
And for sure on my other computer failed one as well.

Is there a possibility, to identify the WUs GDF has corrected? The failed ones took me over 11 hours to crunch.

valterc
Send message
Joined: 21 Jun 10
Posts: 21
Credit: 6,161,484,672
RAC: 4,196,238
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27768 - Posted: 20 Dec 2012 | 18:09:57 UTC
Last modified: 20 Dec 2012 | 18:44:03 UTC

ERR_FILE_TOO_BIG -131

One of the output files is bigger than the maximum set by the project for upload.
BOINC will not try to upload this file.

Solution: Go to the project's forums and report this behavior.

Nothing can be done client-side. I don't know if any modifications done server-side (I guess that a restart is needed) will actually reflect on already distributed workunits...

[EDIT]

I have a wu crunching since ~7 hours and I hope I will not lose it.... I found inside "client_state.xml" some lines about the "name_of_the_wu_4" which is the larger of the output files made while crunching: <max_nbytes>128000000.000000</max_nbytes> will change the value to a much larger one, restart boinc and see what happens...

TheFiend
Send message
Joined: 26 Aug 11
Posts: 99
Credit: 2,500,112,138
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27769 - Posted: 20 Dec 2012 | 18:23:26 UTC - in response to Message 27761.

I've just had another one fail at upload!!! :(


And a 3rd one :(

Rantanplan
Send message
Joined: 22 Jul 11
Posts: 166
Credit: 138,629,987
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27770 - Posted: 20 Dec 2012 | 18:28:36 UTC - in response to Message 27769.
Last modified: 20 Dec 2012 | 18:29:35 UTC

i wend wrong

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 27771 - Posted: 20 Dec 2012 | 18:32:39 UTC - in response to Message 27770.
Last modified: 20 Dec 2012 | 18:35:31 UTC

So the situation is this one.

All the new WUs SENT after we have made the change this morning are ok.

The ones that were already queued in your client before this morning in Barcelona have a 50% probability of failing the upload, so cancel them.

gdf

Check this file in client_state.xml
<max_nbytes>128000000.000000</max_nbytes>

valterc
Send message
Joined: 21 Jun 10
Posts: 21
Credit: 6,161,484,672
RAC: 4,196,238
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27772 - Posted: 20 Dec 2012 | 18:34:55 UTC - in response to Message 27771.

So the situation is this one.

All the new WUs SENT after we have made the change this morning are ok.

The ones that were already queued in your client before this morning in Barcelona have a 50% probability of failing the upload, so cancel them.

gdf
or try what I suggested a few posts ago....

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 27773 - Posted: 20 Dec 2012 | 18:35:56 UTC - in response to Message 27772.

Yes do that.

gdf

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27774 - Posted: 20 Dec 2012 | 19:03:05 UTC - in response to Message 27768.
Last modified: 20 Dec 2012 | 19:12:16 UTC

I have a wu crunching since ~7 hours and I hope I will not lose it.... I found inside "client_state.xml" some lines about the "name_of_the_wu_4" which is the larger of the output files made while crunching: <max_nbytes>128000000.000000</max_nbytes> will change the value to a much larger one, restart boinc and see what happens...

I did it, and it's working!

1. Exit BOINC manager with stopping scientific applications
2. Locate the client.xml file and open it with a text editor
-- On Windows XP: notepad.exe "C:\Documents and Settings\All Users\Application Data\BOINC\client_state.xml"
-- On Windows 7: notepad.exe "C:\Program Data\BOINC\client_state.xml"
3. Search and replace the <max_nbytes>128000000.000000</max_nbytes> value to <max_nbytes>198000000.000000</max_nbytes>
4. Save and Exit
5. Restat BOINC manager.

Rantanplan
Send message
Joined: 22 Jul 11
Posts: 166
Credit: 138,629,987
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27775 - Posted: 20 Dec 2012 | 19:05:12 UTC - in response to Message 27773.
Last modified: 20 Dec 2012 | 19:10:59 UTC

now it must be right.


<file>
<name>1x29_7-NOELIA_hfXA_long-0-2-RND2127_2_4</name>
<nbytes>0.000000</nbytes>
<max_nbytes>256000000.000000</max_nbytes>
<status>0</status>
<upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url>
</file>


Crunshing this task, for sure it will work again.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27779 - Posted: 20 Dec 2012 | 22:57:28 UTC - in response to Message 27775.

Why are you not using <max_nbytes>0.000000</max_nbytes> ?

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 27780 - Posted: 20 Dec 2012 | 23:07:42 UTC - in response to Message 27779.

I did not even know about it.

gdf

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 27781 - Posted: 20 Dec 2012 | 23:36:44 UTC

Ok... i´m new on this beautifull project, and i´m not secure about editing things... But I understand that I can go back to long runs because the ones that are beeing splited now will work without any change?
I had a lot of 10/11 hr processing wasted and dont want to loose it all again..

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,190,821,966
RAC: 10,571,630
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27783 - Posted: 21 Dec 2012 | 2:21:41 UTC

I finally got one to upload successfully, after 4 failures, and with 3 more crunching!!!



http://www.gpugrid.net/result.php?resultid=6241005



Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27785 - Posted: 21 Dec 2012 | 9:17:08 UTC

Yes they are back running, you can crunch on.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 27786 - Posted: 21 Dec 2012 | 10:42:22 UTC - in response to Message 27785.

Thank you mate, they are reporting fine now!

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27787 - Posted: 21 Dec 2012 | 12:13:09 UTC - in response to Message 27771.
Last modified: 21 Dec 2012 | 12:31:25 UTC

On 20 Dec 12, GDF wrote...

All the new WUs SENT after we have made the change this morning are ok.

So, just to make sure, I should be okay having downloaded WU 6224269 on 21 Dec, even though it was created on 19 Dec?

Thanks,

MarkR
____________

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 27788 - Posted: 21 Dec 2012 | 13:24:16 UTC - in response to Message 27787.

@ritterm - It should be ok. It's the "sent time" that counts for this issue.

You can anyway check the <max_nbytes> parameter, as above, to be sure.

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27789 - Posted: 21 Dec 2012 | 14:29:15 UTC - in response to Message 27788.

@ritterm - It should be ok. It's the "sent time" that counts for this issue.

You can anyway check the <max_nbytes> parameter, as above, to be sure.


Thanks, Toni. I checked the client_state file and there appear to be numerous files with different <max_nbytes> settings for the WU. Again, just so I'm sure I've got this right, the output file is the one ending in "_4"? If so, I think I'm okay:

<file>
<name>10x6_4-NOELIA_hfXA_long-0-2-RND8944_0_4</name>
<nbytes>0.000000</nbytes>
<max_nbytes>256000000.000000</max_nbytes>
<status>0</status>
<upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url>
</file>
____________

pvh
Send message
Joined: 17 Mar 10
Posts: 23
Credit: 1,173,824,416
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27791 - Posted: 21 Dec 2012 | 17:44:20 UTC

I had one of these failures too today:

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
MDIO: cannot open file "restart.coor"
# Time per step (avg over 6250000 steps): 15.773 ms
# Approximate elapsed time for entire WU: 98580.515 s
03:24:41 (23694): called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>1x3_1-NOELIA_hfXA_long-0-2-RND0218_1_4</file_name>
<error_code>-131</error_code>
</file_xfer_error>

</message>
]]>


100,000 seconds of work down the toilet... :( Please fix this!

Vinnidikt
Send message
Joined: 6 Mar 12
Posts: 1
Credit: 61,673,501
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 27792 - Posted: 21 Dec 2012 | 22:38:49 UTC

http://www.gpugrid.net/results.php?userid=86706
37 hours wasted...

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27793 - Posted: 22 Dec 2012 | 2:50:04 UTC

All 4 of my machines failed these after between 24 and 25 hours each, so about 99 hours wasted. Since no TONI WUs seem to be available, had to switch projects for a while :-(

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27794 - Posted: 22 Dec 2012 | 3:51:39 UTC

The task I referred to in my previous message finished okay.
____________

Profile Gattorantolo [Ticino]
Avatar
Send message
Joined: 29 Dec 11
Posts: 44
Credit: 251,211,525
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwat
Message 27795 - Posted: 22 Dec 2012 | 6:07:22 UTC - in response to Message 27794.
Last modified: 22 Dec 2012 | 6:07:52 UTC

Now is working :-) 3xNoelia WU, 135.000 credits each :-)
____________
Member of Boinc Italy.

Jari Pyyluoma
Send message
Joined: 2 Aug 08
Posts: 12
Credit: 1,165,835,704
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27801 - Posted: 22 Dec 2012 | 11:51:18 UTC - in response to Message 27795.

Why do you allow 7 errors? That seems wasteful.

name 1x40_14-NOELIA_hfXA_long-0-2-RND7079
application Long runs (8-12 hours on fastest card)
created 19 Dec 2012 | 15:18:38 UTC
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
Task
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
6222894 107753 19 Dec 2012 | 23:14:47 UTC 20 Dec 2012 | 23:43:07 UTC Error while computing 47,292.92 47,269.47 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42)
6245526 141259 21 Dec 2012 | 1:58:49 UTC 21 Dec 2012 | 4:16:55 UTC Error while computing 2,685.14 100.26 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42)
6246153 141700 21 Dec 2012 | 7:20:11 UTC 21 Dec 2012 | 9:59:23 UTC Error while computing 4.46 2.36 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42)
6246927 140345 21 Dec 2012 | 17:45:40 UTC 26 Dec 2012 | 17:45:40 UTC In progress --- --- --- Long runs (8-12 hours on fastest card) v6.16 (cuda42)

Rantanplan
Send message
Joined: 22 Jul 11
Posts: 166
Credit: 138,629,987
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27803 - Posted: 22 Dec 2012 | 13:00:08 UTC - in response to Message 27801.

http://www.gpugrid.net/result.php?resultid=6224928

absolutely clueless , what happend ? 38.000 seconds down the drain.

cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 27809 - Posted: 22 Dec 2012 | 21:11:58 UTC - in response to Message 27803.

Thats a different error.

ERROR: file deven.cpp line 1106: # Energies have become nan

Means the simulation went into a state that was not physically possible and was aborted.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27822 - Posted: 24 Dec 2012 | 12:53:55 UTC - in response to Message 27795.

Now is working :-) 3xNoelia WU, 135.000 credits each :-)

Just had "Error while computing" 16 hours in...
____________

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27823 - Posted: 24 Dec 2012 | 16:53:19 UTC

I haven't had a problem with these new WU, but good lord these WU a HUGE. It take my 660M about 30 hours to finish one (and it is heavily overclocked).

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27824 - Posted: 24 Dec 2012 | 19:25:03 UTC - in response to Message 27823.

Careful with the OC's; these tasks might consume slightly more power.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,190,821,966
RAC: 10,571,630
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27825 - Posted: 24 Dec 2012 | 20:56:24 UTC

Now that we have these units uploading successfully, I noticed that the server status has 22,000 + work units listed as unsent. I have never seen that number that high, and since in the last few days, I have been getting nothing but these Noelia units, putting 2 and 2 together, I ask that question again, what's the big thing that we are crunching here?

TheFiend
Send message
Joined: 26 Aug 11
Posts: 99
Credit: 2,500,112,138
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27827 - Posted: 24 Dec 2012 | 23:05:40 UTC - in response to Message 27825.

Now that we have these units uploading successfully, I noticed that the server status has 22,000 + work units listed as unsent. I have never seen that number that high, and since in the last few days, I have been getting nothing but these Noelia units, putting 2 and 2 together, I ask that question again, what's the big thing that we are crunching here?


Maybe they're just keeping stocked up for Xmas!

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27829 - Posted: 25 Dec 2012 | 10:20:25 UTC

I think that too. Scientist want to have xmas too ;)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27830 - Posted: 25 Dec 2012 | 10:49:37 UTC
Last modified: 25 Dec 2012 | 10:50:59 UTC

I hope more they activade soon cuda42 only with the smaller upload files. One of my cards need only 13 hours to compute, but the 3g connection has uploadspeedproblems and i need over 14 hours to upload. So it blocks new wus to download and i miss the 24h bonus too :( so the card witch needs over 18 hours get the full bonus due high speed internet, what a Shame ^^
____________
DSKAG Austria Research Team: http://www.research.dskag.at



cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 27831 - Posted: 25 Dec 2012 | 16:30:45 UTC

The Cuda31 versions of these tasks are very slow 24h+ on slightly factory overclocked 560ti all of the 4.2 tasks are working much better at most they take 16 hours(Always at-least 25% faster sometimes upto 40% faster). Unfortunately the server is passing out about 25% Cuda31 tasks to 42 capable machines and I always catch them to late to make aborting worthwile.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27839 - Posted: 26 Dec 2012 | 0:45:01 UTC - in response to Message 27824.

Careful with the OC's; these tasks might consume slightly more power.

That's too bad that the credit/time ratio of these workuntis doesn't reflect this.

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27840 - Posted: 26 Dec 2012 | 5:26:19 UTC - in response to Message 27839.

The 30+ hours per WU is ruining my RAC :P

I'm crunching longer and getting less credit haha, oh well. At least these WUs don't lag my graphics.

Profile Lazarus-uk
Send message
Joined: 16 Nov 08
Posts: 29
Credit: 122,821,515
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27841 - Posted: 26 Dec 2012 | 8:59:29 UTC


Maybe they would consider extending the 24hr bonus time to 36hrs or even 48hrs for these longer WUs. A lot of people seem to be struggling to finish them in 24hrs.


Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27843 - Posted: 26 Dec 2012 | 10:59:49 UTC - in response to Message 27841.

Sounds like a reasonable request given that the tasks are so long (~17h on a GTX470) and the Work Units are only two steps deep. However, all 20,000WU's should be completed by early January, so it's a short experiment, and it would be a lot of work to redo the credit system for a single batch of tasks - something the researchers would not want to be doing for every new batch!
Anyway, this 2task-deep research model is a test, and might not be adopted.

See Gianni's recent New Year experiment post.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27846 - Posted: 26 Dec 2012 | 23:38:11 UTC - in response to Message 27843.

What do they mean by "2-step deep" ?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27847 - Posted: 27 Dec 2012 | 1:42:01 UTC - in response to Message 27846.
Last modified: 27 Dec 2012 | 1:51:38 UTC

What do they mean by "2-step deep" ?

In this project every workunit we process is a piece of a "longer" Molecular Dynamics simulation. So when a workunit is finished, uploaded and validated, then its result will be sent to another host to process the MD simulation from where the previous one has finished, until the whole given timeframe of the MD simulation is completed. For example: the whole timeframe for the MD simulation is 1µsec, it will be divided into 100 pieces of 10ns workunits (aka "steps"). However in the case of NOELIA_hfXA_long, there are only 2 such pieces.
On the server status page there are 22.000 unsent long workunits right now, in this case every one of them will be processed twice, while in other cases there are only 2.200 unsent long workunits, but those usually processed for 100 times.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27849 - Posted: 27 Dec 2012 | 16:15:22 UTC - in response to Message 27841.

... A lot of people seem to be struggling to finish them in 24hrs.

How about this for a down-to-the-wire full bonus WU:



And this after a 2hr:40min upload!

It does seem a bit daft that upload time should be added to processing time. Doesn't a done WU package include processing-end time??
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27850 - Posted: 27 Dec 2012 | 17:24:32 UTC - in response to Message 27849.

Credit is applied when a task is reported, it's not based on runtime. While you obviously have a point, report time is the time it tasks to download, run, upload and report a task.
Tasks aren't of any research use until they are uploaded. Reporting is different; it's a separate tiny upload, mainly for credits, and goes to a different database.

Good to see you managed to get that task reported in time. I suggest people lower their cache (where possible) if you are close to missing out on a time bonus.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27851 - Posted: 27 Dec 2012 | 17:44:29 UTC - in response to Message 27825.
Last modified: 27 Dec 2012 | 17:48:41 UTC

Now that we have these units uploading successfully, I noticed that the server status has 22,000 + work units listed as unsent. I have never seen that number that high, and since in the last few days, I have been getting nothing but these Noelia units, putting 2 and 2 together, I ask that question again, what's the big thing that we are crunching here?

I've been able to get nothing but the Noelia WUs either. They take too long to run to meet the 24hr deadline so have moved the NVidias to other projects. The Toni WUs are the only ones that work OK on my 4 cards as the Nathans take too much memory to run efficiently. Will check back from time to time as I would really like to be running GPUGrid...

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27852 - Posted: 27 Dec 2012 | 18:13:28 UTC - in response to Message 27847.

What do they mean by "2-step deep" ?

In this project every workunit we process is a piece of a "longer" Molecular Dynamics simulation. So when a workunit is finished, uploaded and validated, then its result will be sent to another host to process the MD simulation from where the previous one has finished, until the whole given timeframe of the MD simulation is completed. For example: the whole timeframe for the MD simulation is 1µsec, it will be divided into 100 pieces of 10ns workunits (aka "steps"). However in the case of NOELIA_hfXA_long, there are only 2 such pieces.
On the server status page there are 22.000 unsent long workunits right now, in this case every one of them will be processed twice, while in other cases there are only 2.200 unsent long workunits, but those usually processed for 100 times.


Would this explain the size of these WU? (2 step deep means longer timeframe of simulation per WU, right?)

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27853 - Posted: 27 Dec 2012 | 18:22:40 UTC - in response to Message 27851.
Last modified: 27 Dec 2012 | 18:23:10 UTC

Now that we have these units uploading successfully, I noticed that the server status has 22,000 + work units listed as unsent. I have never seen that number that high, and since in the last few days, I have been getting nothing but these Noelia units, putting 2 and 2 together, I ask that question again, what's the big thing that we are crunching here?

I've been able to get nothing but the Noelia WUs either. They take too long to run to meet the 24hr deadline so have moved the NVidias to other projects. The Toni WUs are the only ones that work OK on my 4 cards as the Nathans take too much memory to run efficiently. Will check back from time to time as I would really like to be running GPUGrid...


Why don't you switch to the short task queue? AFAIK they still get a 24 hour bonus and are at least 1/2 the length or less of the long tasks.
____________
XtremeSystems.org - #1 Team in GPUGrid

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27854 - Posted: 27 Dec 2012 | 18:33:05 UTC - in response to Message 27853.

Why don't you switch to the short task queue?

Pretty please, tell me how to do that. Thanks!

____________

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27855 - Posted: 27 Dec 2012 | 20:06:52 UTC - in response to Message 27854.

Why don't you switch to the short task queue?

Pretty please, tell me how to do that. Thanks!

1) Go to your account area
2) Under "Preferences", select "GPUGRID Preferences"
3) Select "Edit GPUGRID preferences" for setting appropriate for your host (default, home, school, work)
4) Under "Run only the selected applications", check "ACEMD standard" and be sure the others are not checked.

Hope that helps!

MarkR
____________

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27856 - Posted: 27 Dec 2012 | 20:34:49 UTC - in response to Message 27855.

1) Go to your account area

Hi Mark! Thanks for answering.

In BOINC I clicked "Your Account". Came here:



Me no see "Preferences"!!

____________

Dylan
Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 27857 - Posted: 27 Dec 2012 | 20:40:12 UTC
Last modified: 27 Dec 2012 | 20:43:57 UTC

Tomba, you are viewing your account information that everyone can see.

To view your preferences however, click on your name on the top right of the GPUGRID web site. It should be next to server status, and is above the Volunteers and the Science links. This will take you to the correct account area where you can change your preferences.

Look at the top of the picture you posted, and click on tomba.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27858 - Posted: 27 Dec 2012 | 20:51:03 UTC - in response to Message 27857.

Tomba, you are viewing your account information that everyone can see.

To view your preferences however, click on your name on the top right of the GPUGRID web site. It should be next to server status, and is above the Volunteers and the Science links. This will take you to the correct account area where you can change your preferences.

Look at the top of the picture you posted, and click on tomba.


Got it! Thank you!!
____________

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27859 - Posted: 28 Dec 2012 | 1:38:34 UTC - in response to Message 27854.

Why don't you switch to the short task queue?

Pretty please, tell me how to do that. Thanks!

It's in your project preferences. Be advised that the credit earned is much lower though.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27869 - Posted: 28 Dec 2012 | 19:08:15 UTC

Just had another Noelia "Error while computing", after eight hours of crunching. That's the sixth out of a total of 15.

Can I please have more Nathan benHPs...?
____________

Dylan
Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 27870 - Posted: 28 Dec 2012 | 19:33:27 UTC

Are you overclocking your card? Is your computer crashing? Things like that can cause these errors.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27872 - Posted: 28 Dec 2012 | 21:07:59 UTC - in response to Message 27869.

Just had another Noelia "Error while computing", after eight hours of crunching. That's the sixth out of a total of 15.

Can I please have more Nathan benHPs...?

Obviously you can't have more of them.
Your error is an "Energies have become nan", so see the appropriate thread.

cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 27873 - Posted: 29 Dec 2012 | 0:37:19 UTC

Is there any way to stop the old 31 wus from hitting my machine other than checking and aborting manually? These are insanely slow on my old 560ti. The server reports 1000s of wus ready to send and when I abort a 31 wu it almost always send a 42 task so I'm thinking the server isn't out of 42 wus. I saw this rarely in the past with the long queue but it seems much worse with these tasks.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27874 - Posted: 29 Dec 2012 | 11:01:57 UTC - in response to Message 27873.

You can try my workaround on Linux:
1. turn off file size checking in cc_config.xml
2. overwrite the cuda31 binary with the cuda42 binary

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 231
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27875 - Posted: 29 Dec 2012 | 17:49:28 UTC

We can remove already 3.1 apps on the long queue, but we will not upgrade the application until new year.

gdf


GDF says yes, server says no!

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27876 - Posted: 29 Dec 2012 | 18:44:49 UTC - in response to Message 27872.

Your error is an "Energies have become nan", so see the appropriate thread.

Thank you for responding. I went there. All a bit confusing BUT I think I got the message that I need to under-clock my GPU. In just over two months I've had 16 "errors" out of 94 "good" WUs (yes - I log all my WUs...)

I have an ASUS GTX 460 1 gig. Stock settings are Core: 675, Shader: 1350, Mem: 1800.

For the 18 months I've had the 460, I've been GPUGRID-ing at: Core: 850, Shader: 1700, Mem 2000. It's only recently the errors have reared their ugly heads...

I have 'EVGA Precision', that lets me do changes, though I have nothing that lets me change the voltage...

Can I please ask you to recommend which of the three parameters I should change to give me the best chance of avoiding errors.

Many thanks, Tom


____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27877 - Posted: 29 Dec 2012 | 20:55:15 UTC - in response to Message 27876.

I have an ASUS GTX 460 1 gig. Stock settings are Core: 675, Shader: 1350, Mem: 1800.

ASUS have their own GPU overclocking tool, but I don't recommend it.

For the 18 months I've had the 460, I've been GPUGRID-ing at: Core: 850, Shader: 1700, Mem 2000. It's only recently the errors have reared their ugly heads...

As the GPU is getting older, it can tolerate less overclocking.
Also, the CUDA4.2 tasks tolerate less overclocking than the old CUDA3.1 tasks.

I have 'EVGA Precision', that lets me do changes, though I have nothing that lets me change the voltage...

I use and recommend MSI Afterburner for that purpose.

Can I please ask you to recommend which of the three parameters I should change to give me the best chance of avoiding errors.

1. Cleaning the cooling fins of the GPU and the CPU with high pressure air duster (using a vacuum-cleaner at the same time is recommended)
- (also the PSU, as much as you can do it through the rear grille, or the fan)
2. If there is any overclock applied to the PCIe bus, cancel it, and set the PCIe bus frequency to its default of 100MHz
3. GPU memory clock is irrelevant, raising it is just asking for trouble. (set it to factory default)
4. You can raise the GPU voltage by 0.025V increments as long as your GPU temp is below 80°C (and gets stable).
5. Lower your GPU core clock by 10MHz decrements, until it gets stable. (note that not every frequency can be set, so the actual GPU frequency can be ~3MHz lower than the value beside the slider)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27878 - Posted: 29 Dec 2012 | 21:01:10 UTC - in response to Message 27876.

Firstly make sure the GPU Fan speed is keeping the GPU below 70°C. Sometimes adding a case fan or opening the side panel helps.
Start downclocking by reducing the GDDR5 frequency by 10%
If that fails try 20%.
If need be reduce the GPU clock by 5 or 10%
If these measures fail get software to increase the Voltage, but only by the least amount (typically ~0.025V)

Note that Rosetta tasks were causing issues a while back, and might still be; I think some tasks used up too much RAM, which causes memory caching to disk and numerous related errors.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27880 - Posted: 30 Dec 2012 | 16:45:34 UTC - in response to Message 27877.

Again, thank you for responding!

I use and recommend MSI Afterburner for that purpose.

I installed that...

1. Cleaning the cooling fins of the GPU and the CPU with high pressure air duster (using a vacuum-cleaner at the same time is recommended)
- (also the PSU, as much as you can do it through the rear grille, or the fan)

I do that religiously every three months. Just did it again. Fans are running quieter.

2. If there is any overclock applied to the PCIe bus, cancel it, and set the PCIe bus frequency to its default of 100MHz

I have no idea how to do that so I assume it's at default.

3. GPU memory clock is irrelevant, raising it is just asking for trouble. (set it to factory default)

Did that using Afterburner.

4. You can raise the GPU voltage by 0.025V increments as long as your GPU temp is below 80°C (and gets stable).
5. Lower your GPU core clock by 10MHz decrements, until it gets stable. (note that not every frequency can be set, so the actual GPU frequency can be ~3MHz lower than the value beside the slider)

4. and 5. I shall save for later. I now have a dust-free PC and and a default GPU memory clock. One step at a time...

Thanks again!

____________

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27881 - Posted: 30 Dec 2012 | 17:27:56 UTC - in response to Message 27878.

Firstly make sure the GPU Fan speed is keeping the GPU below 70°C. Sometimes adding a case fan or opening the side panel helps.

My GPU temp is a solid 65°C

Start downclocking by reducing the GDDR5 frequency by 10%

I get REALLY confused by the variations in terminology. What is GDDR5 frequency??? Is that core, shader, memory???
____________

Dylan
Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 27882 - Posted: 30 Dec 2012 | 18:15:29 UTC

Tomba, GDDR5 is the memory used in the GPU. See this Wikipedia article:

http://en.wikipedia.org/wiki/GDDR5

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27941 - Posted: 5 Jan 2013 | 18:00:23 UTC

All well with Noelias this past week. Just getting in under the 24-hour bar.

The latest one forecasts 38 hours to complete!!!
____________

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27945 - Posted: 6 Jan 2013 | 8:56:21 UTC - in response to Message 27941.

The latest one forecasts 38 hours to complete!!!

Call off the dogs! After 15 house running, completion is just 5h:40m away.

____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28028 - Posted: 13 Jan 2013 | 23:07:15 UTC - in response to Message 27852.

Would this explain the size of these WU? (2 step deep means longer timeframe of simulation per WU, right?)

I guess so, but I'm not the person who has comprehensive knowledge about these workunits.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28033 - Posted: 14 Jan 2013 | 12:03:56 UTC - in response to Message 28028.

I think it means task generations; so 2 steps would mean completing a batch of tasks and then a second batch, auto-generated from the first batches results.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : News : New systems in Long queue

//