Advanced search

Message boards : Graphics cards (GPUs) : help : SWAN: FATAL : swanMalloc failed

Author Message
Paolo Biagini
Send message
Joined: 4 Jul 09
Posts: 8
Credit: 1,849,825
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 15705 - Posted: 12 Mar 2010 | 8:18:14 UTC

I have:
Boinc 6.10.17
AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ [Family 15 Model 35 Stepping 2] (2 processors) NVIDIA GeForce 8800 GT (255MB) Linux 2.6.27-17-generic

all task say error :
http://www.gpugrid.net/results.php?hostid=42512

any idea ?
Paolo

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15709 - Posted: 12 Mar 2010 | 11:59:16 UTC - in response to Message 15705.

That usually signifies a memory error.
Have you rebooted your machine and tried another WU?
____________
Thanks - Steve

Paolo Biagini
Send message
Joined: 4 Jul 09
Posts: 8
Credit: 1,849,825
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 15824 - Posted: 19 Mar 2010 | 8:43:25 UTC - in response to Message 15709.

Yes, I have try to reboot...
All problem start the 6 march
but I haven't change anything !!!
My card have only 256Mb maybe ACEMD - GPU molecular dynamics v6.04 (cuda)
need more ram ?

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 15828 - Posted: 19 Mar 2010 | 9:35:41 UTC - in response to Message 15824.

We have fixed that in the upcoming release, considering special cases where the GPU card does not have enough memory.

gdf

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15829 - Posted: 19 Mar 2010 | 9:58:00 UTC - in response to Message 15824.
Last modified: 19 Mar 2010 | 10:11:02 UTC

My GT240s use between 290MB and 320MB (according to GPUZ), so if you did only have 256MB it might be an issue (as your card has 112 shaders, my GT240 has only 96). It would be interesting to know how much RAM GPUZ says your card has, and how much it is using!

Hopefully the fix will allow plenty of redundant cards to work again.

GPUZ: http://www.techpowerup.com/downloads/1761/mirrors.php

Paolo Biagini
Send message
Joined: 4 Jul 09
Posts: 8
Credit: 1,849,825
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 15833 - Posted: 19 Mar 2010 | 14:11:19 UTC - in response to Message 15829.
Last modified: 19 Mar 2010 | 14:12:11 UTC

I have Linux and I can't stop to run GPUZ...
but...
nvclock -i

-- General info --
Card: nVidia Geforce 8800GT
Architecture: G92 A2
PCI id: 0x611
GPU clock: 601.712 MHz
Bustype: PCI-Express

-- Shader info --
Clock: 1512.000 MHz
Stream units: 112 (11111011b)
ROP units: 16 (1111b)
-- Memory info --
Amount: 256 MB
Type: 256 bit DDR3
Clock: 702.000 MHz

-- PCI-Express info --
Current Rate: 16X
Maximum rate: 16X

-- Sensor info --
Sensor: Analog Devices ADT7473
Board temperature: 40C
GPU temperature: 55C
Fanspeed: 82 RPM
Fanspeed mode: manual
PWM duty cycle: 29.8%

-- VideoBios information --
Version: 62.92.16.00.a1
Signon message: GeForce 8800 GT VGA BIOS
Performance level 0: gpu 600MHz/shader 1500MHz/memory 700MHz/0.00V/100%
VID mask: 3
Voltage level 0: 0.95V, VID: 0
Voltage level 1: 1.00V, VID: 1
Voltage level 2: 1.05V, VID: 2
Voltage level 3: 1.10V, VID: 3

01:00.0 VGA compatible controller: nVidia Corporation GeForce 8800 GT (rev a2)
Subsystem: XFX Pine Group Inc. Device 2334
Flags: bus master, fast devsel, latency 0, IRQ 18
Memory at c2000000 (32-bit, non-prefetchable) [size=16M]
Memory at b0000000 (64-bit, prefetchable) [size=256M]
Memory at c0000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at 9000 [size=128]
[virtual] Expansion ROM at c3000000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nvidia

Paolo Biagini
Send message
Joined: 4 Jul 09
Posts: 8
Credit: 1,849,825
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 15874 - Posted: 21 Mar 2010 | 12:28:53 UTC

I think that only Full-atom molecular dynamics v6.70 (cuda)
work !!!
But ACEMD - GPU molecular dynamics v6.04 (cuda) NOT!

2031809 1280039 21 Mar 2010 10:52:11 UTC 26 Mar 2010 10:52:11 UTC In progress --- --- --- --- Full-atom molecular dynamics v6.70 (cuda)
2031228 1279667 21 Mar 2010 7:43:30 UTC 21 Mar 2010 10:52:11 UTC Error while computing 5.05 4.74 0.01 --- ACEMD - GPU molecular dynamics v6.04 (cuda)
2024218 1275454 19 Mar 2010 23:08:15 UTC 21 Mar 2010 12:25:18 UTC Completed and validated 111,720.16 54,169.80 7,645.29 9,556.61 Full-atom molecular dynamics v6.70 (cuda)
2017917 1270836 18 Mar 2010 23:10:45 UTC 18 Mar 2010 23:12:48 UTC Error while computing 8.17 7.98 0.01 --- ACEMD - GPU molecular dynamics v6.04 (cuda)
2017890 1270813 18 Mar 2010 23:12:48 UTC 18 Mar 2010 23:14:40 UTC Error while computing 9.19 8.80 0.02 --- ACEMD - GPU molecular dynamics v6.04 (cuda)
2017480 1270533 18 Mar 2010 21:15:41 UTC 18 Mar 2010 21:19:22 UTC Error while computing 10.69 8.22 0.01 --- ACEMD - GPU molecular dynamics v6.04 (cuda)
2012755 1267945 17 Mar 2010 23:07:39 UTC 17 Mar 2010 23:09:46 UTC Error while computing 9.19 8.51 0.01 --- ACEMD - GPU molecular dynamics v6.04 (cuda)
2010804 1266573 17 Mar 2010 23:09:46 UTC 18 Mar 2010 21:15:41 UTC Completed and validated 58,608.77 57,965.07 5,830.52 8,745.78 ACEMD - GPU molecular dynamics v6.04 (cuda)

imcola
Send message
Joined: 26 Oct 09
Posts: 7
Credit: 8,428,912
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 16038 - Posted: 28 Mar 2010 | 23:20:10 UTC - in response to Message 15874.
Last modified: 28 Mar 2010 | 23:50:00 UTC

same problem here, looking at my more recent validated work, its all 6.70 and 6.03, however, its a 50/50 if 6.03 passes or fails. all my errored work is 6.04 & 6.03, still blowing though roughly 10/1 or better WUs erroring out vs running/validating on a given day/machine. variety of cards, CPUs and 64bit OS's. Another common hit I'm seeing is 'SWAN: FATAL : Unable to create context', not sure what the deal is w/ that, but, bottom line it's ~ 0-3sec cpu time and out for these WU's. Reading the moderators comment I'm leaning toward WU vs config issue, if absolutely no WU's were crunching, I would be hunting for a tweak here or there but, I'm still looking for a solution, open to suggestions:)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16044 - Posted: 29 Mar 2010 | 10:46:49 UTC - in response to Message 16038.
Last modified: 29 Mar 2010 | 10:50:01 UTC

imcola,
Looks like your GeForce 9600GT on Linux, your 9800GT on Linux, and your GTS250 on W7hpx64 can only run ACEMD tasks.

Your 9800 GT on W7hpx64 can run both ACEMD and ACEMD ver 2 tasks.

I would suggest you select to only run ACEMD tasks.

Alternatively you create another user account and add your 9800 GT on W7hpx64 to that new user account and set it to just pick up ACEMD ver 2 tasks (faster).
If you do, you could create a team and have 2 accounts for the team.

The perfect solution would be for the techs to allow people to configure individual cards/systems online. Lets face it, new apps are needed, so this may be the way forward.


Paolo Biagini,
Do the same, select to only run ACEMD tasks.


To change account settings:
Goto Your Account, GPUGRID preferences, Edit GPUGRID Preferences,

Alter the Application versions as follows:

Run only the selected applications
ACEMD: YES
ACEMD ver 2.0: no
ACEMD beta: no

Profile liveonc
Avatar
Send message
Joined: 1 Jan 10
Posts: 292
Credit: 41,567,650
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 16047 - Posted: 29 Mar 2010 | 11:39:43 UTC - in response to Message 16044.

Why does he need to make two accounts? Why not just use the Default/Home/School/Work GPUGRID preferences? Just use one GPUGRID preference for the 9600GT & another GPUGRID preference for the 9800GT.
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16063 - Posted: 29 Mar 2010 | 18:11:36 UTC - in response to Message 16047.
Last modified: 29 Mar 2010 | 18:12:51 UTC

liveonc, you are correct - imcola should just setup 2 different profiles; that's the sane way to do it!

imcola
Send message
Joined: 26 Oct 09
Posts: 7
Credit: 8,428,912
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 16067 - Posted: 29 Mar 2010 | 22:20:45 UTC - in response to Message 16063.

all of you, thanks for the input, I still have another day until I get back home, I will give a look at my machine profiles as you suggest and post back once I see how things go, Happy Crunching!!

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 16075 - Posted: 29 Mar 2010 | 22:53:09 UTC - in response to Message 16067.

tomorrow we will upload a new beta which should fix the problem.

gdf

imcola
Send message
Joined: 26 Oct 09
Posts: 7
Credit: 8,428,912
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 16110 - Posted: 31 Mar 2010 | 23:06:26 UTC - in response to Message 16063.

Thanks guys!, back in business, I did end up creating a couple diff profiles as suggested, but once I got home and detached- bounced BOINC and re-attached on the various machines, everybody went back to work using 'home' w/ just original ACEMD WUs, no ver2 or beta work. All except for the gts250, kept saying no WUs avail, so I set 'school' to get both ACEMD original & ver2 ACEMD WUs and pointed the 250 to school, detached/bounced BOINC and re-attached and it took right off after the d/l(s) completed. Cool!!

I guess I still can't differentiate, between the versions beyond v6.70, v6.71, v6.04 and v6.03, but as long as BOINC/gpugrid and the profiles/cards can, I'll be happy to just play along. THANKS again for the insight!! Many happy crunched returns!!

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16111 - Posted: 31 Mar 2010 | 23:17:11 UTC - in response to Message 16110.

Thanks guys!, back in business, I did end up creating a couple diff profiles as suggested, but once I got home and detached- bounced BOINC and re-attached on the various machines, everybody went back to work using 'home' w/ just original ACEMD WUs, no ver2 or beta work. All except for the gts250, kept saying no WUs avail, so I set 'school' to get both ACEMD original & ver2 ACEMD WUs and pointed the 250 to school, detached/bounced BOINC and re-attached and it took right off after the d/l(s) completed. Cool!!

I guess I still can't differentiate, between the versions beyond v6.70, v6.71, v6.04 and v6.03, but as long as BOINC/gpugrid and the profiles/cards can, I'll be happy to just play along. THANKS again for the insight!! Many happy crunched returns!!


6.03 is for Windows and 6.04 is for Linux - Both are ACEMD ver 2
6.70 & 6.71 are ACEMD
You dont really need to differentiate, just select ACEMD or ACEMD ver 2 for each profile.

Paolo Biagini
Send message
Joined: 4 Jul 09
Posts: 8
Credit: 1,849,825
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 16115 - Posted: 1 Apr 2010 | 14:42:59 UTC - in response to Message 16044.


Paolo Biagini,
Do the same, select to only run ACEMD tasks.


To change account settings:
Goto Your Account, GPUGRID preferences, Edit GPUGRID Preferences,

Alter the Application versions as follows:

Run only the selected applications
ACEMD: YES
ACEMD ver 2.0: no
ACEMD beta: no


OK, I try it.
bye

imcola
Send message
Joined: 26 Oct 09
Posts: 7
Credit: 8,428,912
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 16116 - Posted: 1 Apr 2010 | 19:35:24 UTC - in response to Message 16111.

6.04 WUs for my linux boxes seem to primarily crash and burn ~ 0-3 secs CPU and out. 6.70 seems to run ok, but it seems there are not so many of those WUs. Is this the case?

GDF, you mentioned a new beta release which would correct errors mentioned in this thread, Linux and 6.04 seems buggy for sure for my 2 boxes. I have set up a 'school' profile to pick up acemd and ver2 and beta, but don't seem to get anything but 6.70 (runs) & 6.04 (won't) (Home should just grab ACEMD WUs, while work should take either ACEMD or Ver2) . This GPUGRID project seems to need some extra help, where as boinc/cpu crunching I can pretty much let it go by itself. If you have some guidance or can use a crunchers feedback, i will play guinea pig for a little while if it will help to bebug this linux side of the project, you might throw me in the deep end from the code perspective, but H/W wise I'm pretty solid, offer extended FWIW. UBUNTU 9.10 on both machines.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 16159 - Posted: 4 Apr 2010 | 8:51:24 UTC - in response to Message 16116.

BOINC is not able to report your driver version.
Could it be that it's not properly installed?

gdf


6.04 WUs for my linux boxes seem to primarily crash and burn ~ 0-3 secs CPU and out. 6.70 seems to run ok, but it seems there are not so many of those WUs. Is this the case?

GDF, you mentioned a new beta release which would correct errors mentioned in this thread, Linux and 6.04 seems buggy for sure for my 2 boxes. I have set up a 'school' profile to pick up acemd and ver2 and beta, but don't seem to get anything but 6.70 (runs) & 6.04 (won't) (Home should just grab ACEMD WUs, while work should take either ACEMD or Ver2) . This GPUGRID project seems to need some extra help, where as boinc/cpu crunching I can pretty much let it go by itself. If you have some guidance or can use a crunchers feedback, i will play guinea pig for a little while if it will help to bebug this linux side of the project, you might throw me in the deep end from the code perspective, but H/W wise I'm pretty solid, offer extended FWIW. UBUNTU 9.10 on both machines.

imcola
Send message
Joined: 26 Oct 09
Posts: 7
Credit: 8,428,912
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 16179 - Posted: 5 Apr 2010 | 20:25:53 UTC - in response to Message 16159.

Good question, I've just updated from Ubuntu recommended v185, and installed 195.36.15 Nvidia drivers on both boxes, Ubuntu sees new version and displays the Nvidia X server settings. But after restarting BOINC, msg is that it sees the card, but driver version unknown and CUDA vers 3000, so I would say yes, they are installed and correctly, but BOINC still seems clueless about them

Paul Sands
Avatar
Send message
Joined: 14 Feb 09
Posts: 3
Credit: 218,189,331
RAC: 1,040,019
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16182 - Posted: 5 Apr 2010 | 23:34:44 UTC - in response to Message 16179.

All of my linux hosts have always reported the driver as unknown.( I think the boinc dev's are aware of this) 6.04 runs ok on my hosts but uses 100% cpu so I have elected not to run them. The ACEMD tasks seem to be running out as I haven't got any for a few days now.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16183 - Posted: 5 Apr 2010 | 23:53:01 UTC - in response to Message 16182.
Last modified: 5 Apr 2010 | 23:57:03 UTC

6.04 runs ok on my hosts but uses 100% cpu so I have elected not to run them.

Are you completely mad? Run them, they are not there to look at. They run 30% faster than the other tasks (that have run out).

Let me put this another way.
My highly overclocked i7 does about 1/5th the work of a GT240 (under windows). A GTX260 does about twice the work of a GT240 and a GTX295 does twice the work of a GTX260. So a GTX295 does 20 times the work of my i7.
Your GTX260s would do 13 times the work of an i7 under Linux!

- Give up one CPU core - you have quads and GTX260's!

I even sack a CPU core under Windows, just to facilitate GPUGrid even when it does not need one full core. It's worth it for me using Windows never mind linux!

BTW this is Way OT.

imcola
Send message
Joined: 26 Oct 09
Posts: 7
Credit: 8,428,912
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 16186 - Posted: 6 Apr 2010 | 1:24:27 UTC - in response to Message 16182.

Well I ended up throttling back CPU utiliz to 80% on 1 machine, the other i didnt touch, but I pointed them both to 'work' to pick up some ver2 WUs (maybe). I now have 2 6.04 WUs running and well past the 2sec cpu mark, so I guess, i will kick back and monitor for a few days, see what happens w/ no twiddling for a couple days, be back in a few I guess

Paolo Biagini
Send message
Joined: 4 Jul 09
Posts: 8
Credit: 1,849,825
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 16201 - Posted: 8 Apr 2010 | 14:29:36 UTC - in response to Message 16115.


Paolo Biagini,
Do the same, select to only run ACEMD tasks.


To change account settings:
Goto Your Account, GPUGRID preferences, Edit GPUGRID Preferences,

Alter the Application versions as follows:

Run only the selected applications
ACEMD: YES
ACEMD ver 2.0: no
ACEMD beta: no


OK, I try it.
bye


NOT WORK IN ANY WAY :-(


imcola
Send message
Joined: 26 Oct 09
Posts: 7
Credit: 8,428,912
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 16207 - Posted: 8 Apr 2010 | 20:00:39 UTC - in response to Message 16201.

Hi Paolo, it took me a few days to get my Linux boxes dialed in to run 6.04 WUs. I have now got 6 validated vs 1 errored out, and 7 more running/queued to go. Let me backtrack, maybe you will spot a prob on yours. I had to throttle back CPU utiliz on one of mine under BOINC prefs, from using 100% CPU back to 80%. Your own earlier post showed your single 6.04 successful run, confirms my work, these 6.04 linux GPU runs are real CPU hogs vs a windows GPU WU, although I have not a clue why a GPU linux WU should require nearly a sec/CPU for the entire run time in secs, while a windows seems to need 10% or less CPU for its GPU run to complete. End result, try freeing up CPU cycles from other projects. See if that helps.
If you dont have the latest/greatest (I assume) Nvidia drivers installed, it was another upgrade for me to get the 6.04 WUs to run clean, you will want to be able to display the NVIDIA X server settings from the OS to confirm good install, not to worry if BOINC still cant detect it, the WU's can tell the diff and were not running from backlevel drivers. HTHs

Everybody who responded to thread, I really had to bang all of your observations/suggestions together in order to get this project back on track/ functional. Thanks, Paolo check back if u r still stuck

Post to thread

Message boards : Graphics cards (GPUs) : help : SWAN: FATAL : swanMalloc failed

//