Advanced search

Message boards : News : New project in long queue

Author Message
noelia
Send message
Joined: 5 Jul 12
Posts: 35
Credit: 393,375
RAC: 0
Level

Scientific publications
wat
Message 28895 - Posted: 1 Mar 2013 | 10:49:14 UTC

Hello all,

After testing the new application, it is time to send a new project. I'm sending at the moment around 6000WUs to the long queue. Credits will be around 100000. Let me know if you have any issues, since is the first big thing we submit to the recently updated long queue and these WUs include new features.

Noelia

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28896 - Posted: 1 Mar 2013 | 11:16:58 UTC

I can't download any, I keep trying, but no long runs in the last hour.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,108,783,435
RAC: 15,545,660
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28899 - Posted: 1 Mar 2013 | 12:20:38 UTC - in response to Message 28896.

These units appear to be very long, could be close to 20 hours finishing time on my computers. Assuming there are no errors!!

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28911 - Posted: 2 Mar 2013 | 7:02:10 UTC

I've had to abort 3 NOELIA'S in the past 2 hours, GPU usage was at 100% and the memory controller was at 0%. I had to reboot the computer to get the GPU's working again. Windows popped up an error message complaining that "acemd.2865P.exe had to be terminated unexpectedly". As soon as I suspended the NOELIA work unit the error message went away.

This was on a GTX560, GTX670 and a GTX680. Windows XP 64 bit, BIONIC v7.0.28

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28930 - Posted: 3 Mar 2013 | 9:20:30 UTC
Last modified: 3 Mar 2013 | 9:21:51 UTC

So far i got an error one After 14000secs :( and a second one witch was successful after 15 hours (560ti 448 core edition, 157k credits). Now its calculating a third one..lets see.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28933 - Posted: 3 Mar 2013 | 13:17:06 UTC - in response to Message 28895.

The first one of these I received was:

005px1x2-NOELIA_005p-0-2-RND6570_0

After running for over 24 hours this happened:

<core_client_version>7.0.52</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified.
(0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>


I have another one that's at 62.5% after 14 hours. Looking at some of the NOELIA Wus, they seem to be failing all over the place, some of them repeatedly. They're also too long for my machines to process and return in 24 hours. After the one that's running either errors out or completes I will be aborting the NOELIA WUs. Wasting 24+ hours of GPU time per failure is not my favorite way to waste electricity. Sorry. BTW, the TONI WUs run fine.

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28934 - Posted: 3 Mar 2013 | 14:10:34 UTC

I've found NOELIA WUs to be highly unreliable, even on the short queue. I don't like getting one as I've no idea if it'll complete without errors. I had to abort a short NOELIA one yesterday as it kept crunching in circles meaning it crunched for some minutes and then returned to the beginning to do the same all over again.
____________

Team Belgium

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28935 - Posted: 3 Mar 2013 | 14:17:03 UTC
Last modified: 3 Mar 2013 | 14:18:20 UTC

These new NOELIA tasks don't use a full CPU thread (core if you like) to feed a Kepler type GPU, like other workunits (like TONI_AGG) used to. Is this behavior intentional or not? Maybe that's why it takes so long to process them. It takes 40.400 secs for my overclocked GTX 580 to finish these tasks, while it takes 38.800 for a (slightly overclocked) GTX 670, so there is a demonstrable loss (~5%) in their performance.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 28936 - Posted: 3 Mar 2013 | 15:19:13 UTC

Some of the new NOELIA units are bugged somehow, I think. Some run fine, some of them not.

Jim Daniels (JD)
Send message
Joined: 20 Jan 13
Posts: 9
Credit: 206,731,892
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28937 - Posted: 3 Mar 2013 | 17:17:41 UTC

I posted this in the "long application updated to the latest version" but Firehawk inferred these issues should be reported in this thread. I don't know if this is a 6.18 issue or a NOELIA WU issue but I guess time will tell. So I apologize in advance for the double posting if that is a bigger faux pas than not knowing which thread is the appropriate one to post to. ;-)

--------------------

While running my first 6.18 long run task my laptop locked up and I had to do a hard reboot. After the system was back up this WU had terminated with an error. The details are below. However, I have run two 6.18 WUs successfully since then. It appears one other host also terminated with an error on this WU.

The NOELIA WUs seem to be averaging about 80% utilization GTX 680M and the run times are over 18 hours. I don't know how much effect having to share CPU time is having on these numbers.

Error Details:

i7-3740QM 16GB - Win7 Pro x64 - GTX 680m (Alienware 9.18.13.717)
GPU: dedicated to GPUGRID - CPU: SETI, Poem, Milkyway, WUProp, FreeHAL

------------------------------------------------------------------------

Work Unit: 4209987 (041px21x2-NOELIA_041p-0-2-RND9096_0)

Stderr output

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28938 - Posted: 3 Mar 2013 | 17:40:37 UTC

You're getting us hawks mixed up, I've been using this name sense 95 and that's the first time I think that's happend.

Jim Daniels (JD)
Send message
Joined: 20 Jan 13
Posts: 9
Credit: 206,731,892
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28940 - Posted: 3 Mar 2013 | 19:09:22 UTC - in response to Message 28938.

Mea Culpa.

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 186
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28941 - Posted: 3 Mar 2013 | 19:28:05 UTC
Last modified: 3 Mar 2013 | 19:31:29 UTC

Am also getting several Noelia tasks making very slow progress. Same problem as flagged in the beta test. Also the size of the upload is causing issues for me as well.

Profile Bikermatt
Send message
Joined: 8 Apr 10
Posts: 37
Credit: 3,839,902,185
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28942 - Posted: 3 Mar 2013 | 20:00:13 UTC

The Noelia workunits refuse to run on my 660ti linux system. They lock up or make no progress. I have finished one on two different linux systems with 670s without problems.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28943 - Posted: 3 Mar 2013 | 20:33:00 UTC

The Noelia longs either fail in the first 3 to 4 minutes on my GTX 560 and GTX 650 Ti (the only four failures I have had on GPUGRID), or else they complete successfully.

I can't tell if they take any longer though; the last one took 23 hours, instead of the more usual 18 hours, but I have seen that on the 6.17 work units also, so it may just be the size of the work unit. I have not had any hangs thus far (Win7 64-bit; BOINC 7.0.52 x64).

All in all, it is not that bad for me.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28944 - Posted: 3 Mar 2013 | 21:12:11 UTC

Well Jim, now you can understand how most of us got 90% of our errors. If you had looked closer you would have noticed that almost all of them came from a first run of NOELIA's in early February. Instead, you thought you would display you're distributed computing prowess and give us you're expert advice and proceeded to tell us about our substandard components or our inability to overclock correctly and the overheating issue's we must be having.

I'm referencing this thread.

http://www.gpugrid.net/forum_thread.php?id=3299

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28945 - Posted: 3 Mar 2013 | 22:12:07 UTC - in response to Message 28944.

flashawk,

Thank you for your insight. But I just started on Feb. 14, and I think it was well past your first group of errors. At any rate, they ran fine on my cards even though not on some others, where they often failed after an hour or more. Maybe you can give better advice?

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28946 - Posted: 3 Mar 2013 | 22:44:05 UTC

I guess it's understandable, the best advice I could ever give in my 51 years "is wait and see". I don't walk in to another’s club house and start rearranging the furniture. There's been many a time when I've jumped to quick conclusions in my own mind only to find out later that I was wrong.

Anyway, didn't mean to be too harsh, let me be the first to say "Welcome to GPU-GRID" and I'm sure you have allot to contribute.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28947 - Posted: 4 Mar 2013 | 0:19:33 UTC
Last modified: 4 Mar 2013 | 0:44:59 UTC

I managed to get http://www.gpugrid.net/result.php?resultid=6567675 to run to completion, by making sure it wasn't interrupted during computation. But 12 hours on a GTX 670 is a long time to run without task switching, when you're trying to support more than one BOINC project.

Edit - on the other hand, task http://www.gpugrid.net/result.php?resultid=6563457 following behind on the same card with the same configuration failed three times with

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

The TONI task following, again on the same card, seems to have started and to be running normally.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28948 - Posted: 4 Mar 2013 | 1:07:04 UTC - in response to Message 28947.
Last modified: 4 Mar 2013 | 1:27:51 UTC

Richard Haselgrove wrote:

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.


I just had the same thing happen to me Richard, right after the computation error, a TONI wu started on the same GPU card and it was at an idle with 0% GPU load and 0% memory controller usage. I had to suspend BOINC and reboot to get the GPU crunching again. As far as times go on my GTX670's, the NOELIA wu's have ranged from 112MB to 172MB so far and the smaller one took 7.5 hours and the large one took 11.75 hours.

So I think the size of the output file directly effects the run time (as usual). They may have to pull the plug on this batch and rework them, we'll have to wait and see what they decide.

Edit: Check out this one, I just downloaded it a couple minutes ago. I noticed it ended in a 6, that means I'm the 7th person to get it. This is off the hook - man!

http://www.gpugrid.net/workunit.php?wuid=4210634

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28950 - Posted: 4 Mar 2013 | 9:41:40 UTC - in response to Message 28948.

So I think the size of the output file directly effects the run time (as usual). They may have to pull the plug on this batch and rework them, we'll have to wait and see what they decide.

Far more likely that the tasks which run - by design - for a long time, generate a large output file.

After the last NOELIA failure (which triggered a driver restart), I ran a couple of small BOINC tasks from another project. The first one errored, the second ran correctly. After that, I ran a long TONI - successful completion, no computer restart needed. I'm running the 314.07 driver.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28951 - Posted: 4 Mar 2013 | 10:49:23 UTC

My systems hasn't been changed since the application upgrade.
I've had no problems with these new NOELIA tasks until now. (I've received a couple of tasks with their name ending with _4 and _5)
They do all the strange behavior a workunit can do:
- 95-100% GPU usage with no progress indicator increase (even after hours of processing)
- the same thing as above, but 0% GPU usage.
- Causing the following workunit (a TONI for example) do the same strange behavior (a system restart can fix this)
- significant change in the GPU usage (from 75-80% to 95-100%) after a couple of minutes, but no progress.
- the progress indicator stays at 0% when I abort a stuck task.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 28952 - Posted: 4 Mar 2013 | 11:18:33 UTC

I´m having some new weird issue, but only on my AMD 3x690 rig. For 3 times now, BSOD´s, systems restarts. It only go away if all the worunits (and the cache!!!) where aborted. I don´t have a clue on why this happens, but this AMD rig is rock solid in normal crunching and it´s doing more than 2m per day alone.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,108,783,435
RAC: 15,545,660
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28953 - Posted: 4 Mar 2013 | 11:41:00 UTC - in response to Message 28951.

My systems hasn't been changed since the application upgrade.
I've had no problems with these new NOELIA tasks until now. (I've received a couple of tasks with their name ending with _4 and _5)
They do all the strange behavior a workunit can do:
- 95-100% GPU usage with no progress indicator increase (even after hours of processing)
- the same thing as above, but 0% GPU usage.
- Causing the following workunit (a TONI for example) do the same strange behavior (a system restart can fix this)
- significant change in the GPU usage (from 75-80% to 95-100%) after a couple of minutes, but no progress.
- the progress indicator stays at 0% when I abort a stuck task.


I have had the same issues and on top of that I got error message saying that acemd.2865.exe has crashed, and the video card ends up running at a slower speed.

I have had more errors with this application than the last time I did beta testing.


Hans Sveen
Send message
Joined: 29 Oct 08
Posts: 3
Credit: 439,280,899
RAC: 5,234
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28954 - Posted: 4 Mar 2013 | 11:55:32 UTC

Hello!
I just want to add up my experience with the latest batch:
Until late yesterday/ early this morning,my capable pc's run just fine!

The two win pc's (ID: 67760 and ID: 145297)started to crash after running for about 4 minutes , when looking at the boinc messages they told me that output files were missing and during the short run before crashing no check pointing was done. I also did take a look at my wingmen, most errors was "The system cannot find the path specified. (0x3) - exit code 3 (0x3). Some times also exit code -1 and -9 occured.

To elliminate Windows driver or other Window error, I loaded some wu's into this host (ID: 132991)running Ubuntu oh yes after running for about 5 minutes the crashed telling me ( by Boinc Message tab of course)"exited with zero status but no 'finished' file", did this several times before crashing and then with "Output file absent".

When looking at the outcome after upload this is what I got.
Stderr output
<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
MDIO: cannot open file "restart.coor"

</stderr_txt>
]]>

Hope this can help debugging the batch!

Ps:
All three pc's now running "TONI" wu's without the need to restart!

With regards,

Hans Sveen
Oslo, Norway

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28955 - Posted: 4 Mar 2013 | 12:32:29 UTC - in response to Message 28953.

I have had more errors with this application than the last time I did beta testing.

I think we need to try and distinguish between 'application' problems and 'task' problems - or 'project' (as in research project), as Noelia called it in starting this thread.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,108,783,435
RAC: 15,545,660
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28957 - Posted: 4 Mar 2013 | 13:18:10 UTC - in response to Message 28955.

I have had more errors with this application than the last time I did beta testing.

I think we need to try and distinguish between 'application' problems and 'task' problems - or 'project' (as in research project), as Noelia called it in starting this thread.


So do you think that the fact that we getting these errors after we changed application to version 6.18 is mere coincidence?

Maybe you are right. There is a way to prove this: run the failed units under application 6.17. If they fail, it's the units, but if they don't fail, it's the new application.



Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28958 - Posted: 4 Mar 2013 | 13:51:39 UTC - in response to Message 28957.

I have had more errors with this application than the last time I did beta testing.

I think we need to try and distinguish between 'application' problems and 'task' problems - or 'project' (as in research project), as Noelia called it in starting this thread.

So do you think that the fact that we getting these errors after we changed application to version 6.18 is mere coincidence?

Maybe you are right. There is a way to prove this: run the failed units under application 6.17. If they fail, it's the units, but if they don't fail, it's the new application.

In my personal experience, all TONI tasks, and 50% of NOELIA tasks, have run correctly under application version 6.18

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,140,895,172
RAC: 315,774
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28959 - Posted: 4 Mar 2013 | 14:31:08 UTC

041px48x2-NOELIA_041p-1-2-RND9263--After 15 hours of when the work on this task ended, nvidia driver crashed and the work has been marked as faulty .. Another was marked correctly--nn016_r2-TONI_AGGd8-38-100-RND3157_0--- but these problems are already more than a week, it's insane..nvidia driver falls for a proper shut down boinc manager,exempl..
Now comming this tasks Ann166_r2-TONI_AGGd8-11-100-RND7649_0 and nn137_r2-TONI_AGGd8-20-100-RND8105_0 and Ann027_r2-TONI_AGGd8-19-100-RND9134_3
But I'm skeptical and I do not think that they will end well.
Counting two week without any sense..as many volunteers now

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28961 - Posted: 4 Mar 2013 | 14:51:54 UTC - in response to Message 28958.

I have had more errors with this application than the last time I did beta testing.

I think we need to try and distinguish between 'application' problems and 'task' problems - or 'project' (as in research project), as Noelia called it in starting this thread.

So do you think that the fact that we getting these errors after we changed application to version 6.18 is mere coincidence?
Maybe you are right. There is a way to prove this: run the failed units under application 6.17. If they fail, it's the units, but if they don't fail, it's the new application.

In my personal experience, all TONI tasks, and 50% of NOELIA tasks, have run correctly under application version 6.18

Richard, this is my experience exactly. All TONIs run fine and 50% of NOELIAs crash. TONI should maybe give a clinic to the others. I don't think it has much to do with 6.18 either, it's just that the new NOELIAS were released at the same time as 6.18.

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28962 - Posted: 4 Mar 2013 | 14:59:41 UTC

aborted a NOELIA one after it began crunching in circles...
____________

Team Belgium

Ken_g6
Send message
Joined: 6 Aug 11
Posts: 8
Credit: 74,546,994
RAC: 157,187
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 28969 - Posted: 4 Mar 2013 | 17:25:34 UTC
Last modified: 4 Mar 2013 | 17:31:53 UTC

The first Noelia
(the angels did say...)
took over 48 hours (on a GTX 460 768mb that's completed most work in 25 or less, but it finally...)
completed today.

The second one I got, which many have apparently had a different problem with, kept restarting on my machine with:

SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.

That seems to be an out-of-GPU-memory error. So maybe someone should set stricter minimum memory limits on these Noelia tasks?

Edit: Technically, that wasn't my first Noelia; just the first one of this batch. I got at least one, probably more, in February, and they took 25 hours but were otherwise fine.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28970 - Posted: 4 Mar 2013 | 17:43:12 UTC - in response to Message 28969.

The first Noelia
The second one I got...

I see that both WUs are marked

errors WU cancelled

Something may be happening behind the scenes.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 28971 - Posted: 4 Mar 2013 | 18:02:38 UTC
Last modified: 4 Mar 2013 | 18:25:11 UTC

These NOELIA WUs have been cancelled. Their successors will have a slightly different configuration that will hopefully be more stable.

Note that with this app GPUs of compute capabilities 1.0, 1.1 and 1.2 are no longer supported. This means that only Geforce GTX260s and higher will get Long WUs.

MJH

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 28972 - Posted: 4 Mar 2013 | 18:03:17 UTC

We're looking at the issue. The problematic WUs have been cancelled for now. The problem was clearly on our end, but it seems that there were multiple reasons they were having issues, and mostly not Noelia's fault. She'll resend new simulations that avoid the problems in the next day or so. The large upload sizes will also be fixed.

As always, thanks for making your concerns known and alerting us to the issue.

Nate

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 28974 - Posted: 4 Mar 2013 | 18:09:50 UTC - in response to Message 28971.

Be aware also these and subsequent WUs will fail if you have over-ridden the application version and are not running the latest.

MJH

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28978 - Posted: 4 Mar 2013 | 19:46:04 UTC - in response to Message 28972.

We're looking at the issue. The problematic WUs have been cancelled for now.

Were the TONI WUs cancelled too? They ran fine..

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28980 - Posted: 4 Mar 2013 | 20:01:21 UTC - in response to Message 28978.

We're looking at the issue. The problematic WUs have been cancelled for now.

Were the TONI WUs cancelled too? They ran fine..

And the two I have in progress are still fine, and shown as viable on the website.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28981 - Posted: 4 Mar 2013 | 20:32:27 UTC - in response to Message 28980.

We're looking at the issue. The problematic WUs have been cancelled for now.

Were the TONI WUs cancelled too? They ran fine..

And the two I have in progress are still fine, and shown as viable on the website.

Just got a couple new ones. Seems the queue coincidentally ran dry for a while:

GPUGRID 03-04-13 13:45 Requesting new tasks for NVIDIA
GPUGRID 03-04-13 13:45 Scheduler request completed: got 0 new tasks
GPUGRID 03-04-13 13:45 No tasks sent
GPUGRID 03-04-13 13:45 No tasks are available for Long runs (8-12 hours on fastest card)

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 28984 - Posted: 4 Mar 2013 | 20:44:46 UTC - in response to Message 28972.

We're looking at the issue. The problematic WUs have been cancelled for now. The problem was clearly on our end, but it seems that there were multiple reasons they were having issues, and mostly not Noelia's fault. She'll resend new simulations that avoid the problems in the next day or so. The large upload sizes will also be fixed.

As always, thanks for making your concerns known and alerting us to the issue.

Nate

Thank you guys. Another thing that I really appreciate on this project is your awesome and fast support.
Wich didn´t happen on the project I ran in the past 13 years.... sadly.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,108,783,435
RAC: 15,545,660
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28989 - Posted: 5 Mar 2013 | 0:49:34 UTC - in response to Message 28972.

We're looking at the issue. The problematic WUs have been cancelled for now. The problem was clearly on our end, but it seems that there were multiple reasons they were having issues, and mostly not Noelia's fault.

Nate


Were the issues related to the new application, the Wu's or both?


flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28998 - Posted: 6 Mar 2013 | 7:26:23 UTC

How big are the uploads for these reworked NOELIA's supposed to be? The 3 I've finished were barely over 4MB after 11 1/2 hours of "crunching". Is this about right?

FrRie
Send message
Joined: 21 Dec 11
Posts: 2
Credit: 21,062,866
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 29000 - Posted: 6 Mar 2013 | 11:45:09 UTC

... I got messages like "abort by user" - but I didn't abort any ...
I observed incrementing of remaining time in one case ...
W 7 Ultimate 64
BOINC 7.0.28
GTX 580

Thanks for reactions

FrRie
____________

idimitro
Send message
Joined: 25 Jun 12
Posts: 3
Credit: 47,912,263
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 29001 - Posted: 6 Mar 2013 | 12:58:28 UTC

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29002 - Posted: 6 Mar 2013 | 13:08:20 UTC - in response to Message 29001.
Last modified: 6 Mar 2013 | 13:08:38 UTC

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?


it's not possible to block specific tasks. At least that's what I learned from my own tread. http://www.gpugrid.net/forum_thread.php?id=3315
____________

Team Belgium

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,108,783,435
RAC: 15,545,660
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29003 - Posted: 6 Mar 2013 | 13:08:30 UTC - in response to Message 29001.

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?


It happened to me too. I had 2 noelia units that are aborted by user, which I didn't abort. They were otherwise running fine. So, what is happening?

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29004 - Posted: 6 Mar 2013 | 13:25:16 UTC - in response to Message 29001.

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?


http://www.gpugrid.net/forum_thread.php?id=3311&nowrap=true#28972

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29006 - Posted: 6 Mar 2013 | 14:21:34 UTC
Last modified: 6 Mar 2013 | 14:22:13 UTC

Were the issues related to the new application, the Wu's or both?


The WUs were not set to upload the smaller file size format we are now trying to move to. They were set to use the old format, which could result in very large file upload sizes, as some people complained about.

The problem with the application was an obscure one. It wasn't an issue with the application per se, but rather with how the application interacts with BOINC and this specific type of configuration file for the simulations. In short, the application was doing at the start of every WU a function that it was only supposed to do in the first WU in a chain. This caused all but the first WU in a chain to fail. This isn't a problem locally for us, but with how BOINC handles the files, it became a problem. We are working on a long-term fix, but we have simply found a way around it for now.

... I got messages like "abort by user" - but I didn't abort any ...

It happened to me too. I had 2 noelia units that are aborted by user, which I didn't abort. They were otherwise running fine. So, what is happening?


I am not sure what is happening. Even if we cancel a group of WUs, they should complete on your computer (if they are good simulations). The "Abort by user" can only come from the user/client, typically when you deliberately cancel a WU with the "Abort" button. Hopefully there is nothing else going on...

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29008 - Posted: 6 Mar 2013 | 14:46:03 UTC - in response to Message 29006.

... I got messages like "abort by user" - but I didn't abort any ...

It happened to me too. I had 2 noelia units that are aborted by user, which I didn't abort. They were otherwise running fine. So, what is happening?


I am not sure what is happening. Even if we cancel a group of WUs, they should complete on your computer (if they are good simulations). The "Abort by user" can only come from the user/client, typically when you deliberately cancel a WU with the "Abort" button. Hopefully there is nothing else going on...

Ah. That's one I can help you with.

I got an 'aborted by user', too - task 6581613. If you look at the task details, it has "exit status 202".

At some stage in the development of recent BOINC clients, David updated and expanded the range of error and exit status codes returned by the client. Unfortunately, he didn't - at first, and until prodded - update the decode tables used on project web sites.

You need to update html/inc/result.inc on your web server to something later than
http://boinc.berkeley.edu/trac/changeset/1f7ddbfe3a27498e7fd2b4f50f3bf9269b7dae25/boinc/html/inc/result.inc
to get a proper website display using

case 202: return "EXIT_ABORTED_BY_PROJECT";

Full story in http://boinc.berkeley.edu/dev/forum_thread.php?id=7704

Operator
Send message
Joined: 15 May 11
Posts: 108
Credit: 297,176,099
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29009 - Posted: 6 Mar 2013 | 15:15:01 UTC - in response to Message 29008.

I was surprised to see some "Aborted By User" tasks this morning, especially since they happened while I was sleeping!

As an example: 290px20xbis-NOELIA_290p-0-2-RND4773_1

Created 5 Mar 2013 | 19:22:08 UTC
Sent 5 Mar 2013 | 21:01:03 UTC
Received 6 Mar 2013 | 9:58:01 UTC
Server state Over
Outcome Computation error
Client state Aborted by user
Exit status 202 (0xca)

But after viewing the details of the task itself it said "WU cancelled" in red.

http://www.gpugrid.net/workunit.php?wuid=4227683

name 290px20xbis-NOELIA_290p-0-2-RND4773
application Long runs (8-12 hours on fastest card)
created 5 Mar 2013 | 16:59:14 UTC
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
errors WU cancelled

So they got cut off in mid crunch, and on the surface it makes it look like we aborted them.

Operator
____________

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29013 - Posted: 6 Mar 2013 | 17:37:14 UTC

Out of all 4 of my machines, I had 7 "Aborted by user" errors last night. My computers will be on probation by tomorrow and I won’t be able to download work units.

Ken_g6
Send message
Joined: 6 Aug 11
Posts: 8
Credit: 74,546,994
RAC: 157,187
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 29014 - Posted: 6 Mar 2013 | 18:04:23 UTC

I haven't had the server abort any Noelias lately. I've just had them all segfault within an hour or two. :(

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29028 - Posted: 7 Mar 2013 | 5:00:41 UTC

Still having the same problems with noelias. That will put my biggest machine down, because this one is BSODing and ruining all the cache, with is very hard to build atm.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29029 - Posted: 7 Mar 2013 | 6:22:10 UTC

It seems to me that these NOELIA's are suffering from memory leaks, when my card finnishes one and starts the next the GPU pegs at 99 - 100% and the memory controller stays at 0%. If I reboot, all is well and works fine. The previous wu won't release the memory on the GPU, thus the reboot. This is Windows XP Pro 64 bit, different operating systems seem to be dealing with it differently, Windows 7 and 8 are getting BSOD's or driver crashes, I also get the "acemd.2865P.exe had to be terminated unexpectedly" error. Oh well, I don't even know if this stuff we post helps or gets read.

wdiz
Send message
Joined: 4 Nov 08
Posts: 20
Credit: 871,871,594
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29030 - Posted: 7 Mar 2013 | 8:10:30 UTC
Last modified: 7 Mar 2013 | 8:10:49 UTC

Same here..Seems that the new Noelia WU doesn't work well..it freeze the computer. had to reset the project.
I'm running Archlinux 3.7.10-1-ARCH kernel with GTX 660Ti and GTX 580
Boinc 7.0.53

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29032 - Posted: 7 Mar 2013 | 10:57:04 UTC
Last modified: 7 Mar 2013 | 11:18:18 UTC

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29033 - Posted: 7 Mar 2013 | 11:21:03 UTC

Just noelias incoming. Impossible to run the project atm. Too bad, is a big farm.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29036 - Posted: 7 Mar 2013 | 12:07:34 UTC - in response to Message 29033.

Just noelias incoming. Impossible to run the project atm. Too bad, is a big farm.

I just moved to a different project too. Too bad, I liked helping out here but they don't seem to test anything before release.

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29038 - Posted: 7 Mar 2013 | 12:10:31 UTC - in response to Message 29032.
Last modified: 7 Mar 2013 | 12:18:54 UTC

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.


Thanks, we're looking at it. Obviously this is pretty serious. I will submit some additional stuff to long that I know for sure are good simulations so that we can get a handle on this.

Edit: So I have submitted to long queue some simulations we know are good. If it is an issue with the app, we will find out. They have name NATHAN_dhfr36_3

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29039 - Posted: 7 Mar 2013 | 12:19:38 UTC - in response to Message 29032.

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.

I've just reported (in Number Crunching) a failure with a long queue task under Windows 7/64, which didn't freeze the computer or poison the GPU, while short queue tasks under XP/32 are (mostly) running.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29040 - Posted: 7 Mar 2013 | 12:28:04 UTC - in response to Message 29038.

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.


Thanks, we're looking at it. Obviously this is pretty serious. I will submit some additional stuff to long that I know for sure are good simulations so that we can get a handle on this.

Edit: So I have submitted to long queue some simulations we know are good. If it is an issue with the app, we will find out. They have name NATHAN_dhfr36_3



I noticed the NATHAN units, they are coming really good. All machines are back.....will report results ASAP :D

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,866,381,738
RAC: 20,081,961
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29041 - Posted: 7 Mar 2013 | 12:47:12 UTC

A NATHAN has started running OK here too, even with no reboot after the NOELIA failure (technique as described in NC).

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,108,783,435
RAC: 15,545,660
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29042 - Posted: 7 Mar 2013 | 13:18:55 UTC - in response to Message 29038.

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.


Thanks, we're looking at it. Obviously this is pretty serious. I will submit some additional stuff to long that I know for sure are good simulations so that we can get a handle on this.

Edit: So I have submitted to long queue some simulations we know are good. If it is an issue with the app, we will find out. They have name NATHAN_dhfr36_3


In my case, both the 32 bit windows xp and the 64 bit windows 7 are having errors, this morning. The units I downloaded yesterday, seem to be okay.

Though, I did get a crash on my windows 7 computer, on a unit running fine, when I did a reboot, though another running on the other card didn't crash. The setting (speed of GPU, memory and fan) on the video card which the unit crashed on were reset. I had to do another reboot, with the units suspended to get the video card settings right.

I also noticed the that on windows 7 machine the units take 18 hours plus to finish, while on the windows xp machine it takes about 13 hours. This difference seems to be excessive.



[AF>Belgique] bill1170
Send message
Joined: 4 Jan 09
Posts: 13
Credit: 1,292,573,895
RAC: 3,498,181
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29043 - Posted: 7 Mar 2013 | 13:54:13 UTC - in response to Message 29042.

It's not limited to XP64, My XP32 got the error in acemd.2865P.exe as well.

cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29044 - Posted: 7 Mar 2013 | 13:54:55 UTC - in response to Message 29042.

5-6 WU's failed for me this morning. I'm on driver 313.26. As far as I can tell these WU's have also failed for everyone else they were distributed to. I'm seeing these in dmesg (I thought my card might be failing but I think there is a problem with the new WU's)

[649702.679741] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[649705.790669] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[649715.295948] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649730.302031] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649745.308105] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649760.317500] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649776.323990] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649791.831930] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649806.838000] NVRM: Xid (0000:01:00): 8, Channel 00000001

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29047 - Posted: 7 Mar 2013 | 14:24:13 UTC
Last modified: 7 Mar 2013 | 14:24:40 UTC

New news here: http://www.gpugrid.net/forum_thread.php?id=3318

It looks like it might be an extension of the issue I discussed before, but we're not sure. We're going to run tests on the beta queue to try and figure it out.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,217,465,968
RAC: 1,257,790
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29060 - Posted: 7 Mar 2013 | 19:52:12 UTC - in response to Message 29032.
Last modified: 7 Mar 2013 | 19:55:03 UTC

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are running fine these NOELIAs.

After my post (above), my 32 bit hosts had some failures and stuck workunits, so their previous relatively successful behavior maybe just by chance.

Post to thread

Message boards : News : New project in long queue

//