Advanced search

Message boards : News : Update acemd3 app

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 57041 - Posted: 1 Jul 2021 | 18:29:57 UTC

I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57042 - Posted: 1 Jul 2021 | 18:36:19 UTC - in response to Message 57041.

YES! Thank you so much!
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57043 - Posted: 1 Jul 2021 | 18:58:49 UTC - in response to Message 57042.
Last modified: 1 Jul 2021 | 19:07:46 UTC

I noticed the plan class is listed as "cuda1121" on the Applications page. is this a typo? will it cause any issues with getting work or running the application?

also you might need to put a cap (maybe compute capability or something) on the project server side to prevent the CUDA10.0 app from being sent to Ampere hosts. currently we saw many errors because the CUDA10.0 app was still sent to Ampere hosts. there should be a way to make sure Ampere hosts only get the 11.2 app and not try to use the cuda 10 app.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57045 - Posted: 1 Jul 2021 | 19:30:59 UTC
Last modified: 1 Jul 2021 | 19:31:09 UTC

Great news! So far it's only Linux, right?

MrS
____________
Scanning for our furry friends since Jan 2002

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57046 - Posted: 1 Jul 2021 | 19:53:00 UTC - in response to Message 57045.

So far.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57047 - Posted: 1 Jul 2021 | 20:02:17 UTC

Just so people are aware, CUDA 11.2 (I assume the "1121" means CUDA 11.2.1 "update 1") means you need at least driver 460.32 on Linux.
____________

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 57048 - Posted: 1 Jul 2021 | 20:22:33 UTC - in response to Message 57047.

Can someone confirm that the Linux cuda100 app is still sent out (and likely fail)?

T

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57049 - Posted: 1 Jul 2021 | 20:31:25 UTC - in response to Message 57048.

Can someone confirm that the Linux cuda100 app is still sent out (and likely fail)?

T


is this the reason that the Linux tasks have been failing recently? they need this new app? did you remove the Linux cuda100 app?
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57050 - Posted: 1 Jul 2021 | 20:56:08 UTC - in response to Message 57048.
Last modified: 1 Jul 2021 | 21:04:46 UTC

I just got a couple tasks on my RTX 3080 Ti host, it got the new app. it failed in 2 seconds. it looks like you're missing a file, or you forgot to statically link boost into the app:

16:50:34 (15968): wrapper (7.7.26016): starting
16:50:34 (15968): wrapper (7.7.26016): starting
16:50:34 (15968): wrapper: running acemd3 (--boinc input --device 0)
acemd3: error while loading shared libraries: libboost_filesystem.so.1.74.0: cannot open shared object file: No such file or directory
16:50:35 (15968): acemd3 exited; CPU time 0.000360
16:50:35 (15968): app exit status: 0x7f
16:50:35 (15968): called boinc_finish(195)


https://www.gpugrid.net/result.php?resultid=32631384

but it's promising that I didnt get the "invalid architecture" error
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57051 - Posted: 1 Jul 2021 | 21:11:25 UTC
Last modified: 1 Jul 2021 | 21:41:43 UTC

Looks like Ubuntu 20.04.2 LTS has libboost-all-dev 1.71 installed.

I remember that Gridcoin needs libboost-all-dev 1.74 installed now also when building.

That is in 21.04.

[Edit]
Theoretically yes AFAIK anything about wrapper containers.

I just wonder if you installed the latest 1.74 libboost-all-dev environment that the tasks wouldn't fail.

https://www.boost.org/users/history/version_1_74_0.html

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57052 - Posted: 1 Jul 2021 | 21:31:01 UTC - in response to Message 57051.

i think these are sandboxed in the wrapper. so packages on the system in theory shouldnt matter right?
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57053 - Posted: 1 Jul 2021 | 21:45:12 UTC

Just failed a couple more acemd3 tasks. What a waste . . . . as hard as they are to snag.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57054 - Posted: 1 Jul 2021 | 22:24:20 UTC - in response to Message 57053.

Just failed a couple more acemd3 tasks. What a waste . . . . as hard as they are to snag.


did you get the new app? do you have that newer version of boost installed?
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57055 - Posted: 1 Jul 2021 | 23:28:47 UTC - in response to Message 57054.

No I just have the normal CUDA 10.0 app installed. I am just investigating what would be needed to install the missing libraries.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 57056 - Posted: 1 Jul 2021 | 23:52:38 UTC

Great to see this progress as prices of GPUs are beginning to fall and Ampere GPUs are currently dominating the market availability. I hope China's Ban on mining becomes a budgetary boon for crunchers and gamers worldwide.

The Amperes should eventually expedite this project considerably.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57057 - Posted: 2 Jul 2021 | 0:05:05 UTC - in response to Message 57055.

No I just have the normal CUDA 10.0 app installed. I am just investigating what would be needed to install the missing libraries.


looks like you're actually getting the new app now: http://www.gpugrid.net/result.php?resultid=32631755
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57058 - Posted: 2 Jul 2021 | 0:16:36 UTC - in response to Message 57057.

No I just have the normal CUDA 10.0 app installed. I am just investigating what would be needed to install the missing libraries.


looks like you're actually getting the new app now: http://www.gpugrid.net/result.php?resultid=32631755

Huh, hadn't noticed.

So maybe the New version of ACEMD v2.12 (cuda1121) is going to be the default app even for the older cards.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57059 - Posted: 2 Jul 2021 | 0:41:12 UTC - in response to Message 57058.
Last modified: 2 Jul 2021 | 0:49:30 UTC

I think you’ll only get 11.2 app if you have a driver that’s compatible. Greater than 460.32. Just my guess. I’ll need to see if systems with and older driver will still get the cuda 100 app

Edit: answering my own question. I guess the driver being reported doesn’t factor into the app selection anymore. My systems reporting an older driver still received the new app. So it won’t prevent the app from being send to someone without a new enough driver.
____________

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 57060 - Posted: 2 Jul 2021 | 8:50:26 UTC - in response to Message 57059.
Last modified: 2 Jul 2021 | 8:57:08 UTC

I'm still trying to figure out the best way to distribute the app. The current way has hard-coded minimum-maximum driver versions for each CUDA version and it's too cumbersome to maintain.

Suggestions are welcome. The server knows the client's CUDA version and driver version, as well as the app's CUDA plan class.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,388,072,716
RAC: 9,847,676
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57061 - Posted: 2 Jul 2021 | 12:07:52 UTC - in response to Message 57060.

I'm still trying to figure out the best way to distribute the app. The current way has hard-coded minimum-maximum driver versions for each CUDA version and it's too cumbersome to maintain.

Suggestions are welcome. The server knows the client's CUDA version and driver version, as well as the app's CUDA plan class.


Here is an idea:

How about distribution by card type? That would exclude the really slow cards, like 740M.

BTW: What driver version do we need for this?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57062 - Posted: 2 Jul 2021 | 12:54:25 UTC - in response to Message 57060.
Last modified: 2 Jul 2021 | 13:09:11 UTC

Toni, I think the first thing that needs to be fixed is the problem with boost 1.74 library not being included in the app distribution. the app is failing right away because it's not there. you either need to distribute the .so file or statically link it into the acemd3 app so it's not needed separately.

manually installing it seems to be a workaround, but that's a tall order to make every Linux user have to perform.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57063 - Posted: 2 Jul 2021 | 14:49:02 UTC

after manually installing the required boost to get past that error, I now get this error on my 3080 Ti system:

09:55:10 (4806): wrapper (7.7.26016): starting
09:55:10 (4806): wrapper (7.7.26016): starting
09:55:10 (4806): wrapper: running acemd3 (--boinc input --device 0)
ACEMD failed:
Error launching CUDA compiler: 32512
sh: 1: : Permission denied


09:55:11 (4806): acemd3 exited; CPU time 0.479062
09:55:11 (4806): app exit status: 0x1
09:55:11 (4806): called boinc_finish(195)


Task: https://www.gpugrid.net/result.php?resultid=32632410

I tried purging and reinstalling the nvidia drivers, but no change.

it looks like this same error popped up when you first released acemd3 2 years ago: http://www.gpugrid.net/forum_thread.php?id=4935#51970

biodoc wrote:
Multiple failures of this task on both windows and linux

http://www.gpugrid.net/workunit.php?wuid=16517304

<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
15:19:27 (30109): wrapper (7.7.26016): starting
15:19:27 (30109): wrapper (7.7.26016): starting
15:19:27 (30109): wrapper: running acemd3 (--boinc input --device 0)
# Engine failed: Error launching CUDA compiler: 32512
sh: 1: : Permission denied

15:19:28 (30109): acemd3 exited; CPU time 0.186092
15:19:28 (30109): app exit status: 0x1
15:19:28 (30109): called boinc_finish(195)

</stderr_txt>


Why is the app launching CUDA compiler?


you then updated the app which fixed the problem at that time, but you didnt post exactly what was changed: http://www.gpugrid.net/forum_thread.php?id=4935&nowrap=true#52022

Toni wrote:
It was a cryptic bug in the order loading shared libraries, or something like that. Otherwise unexplainably system-dependent.

I see VERY few failures now. The new app will be a huge step forward on several aspects, not least maintainability. We'll be transitioning gradually.


so whatever kind of change you made between v2.02 and v2.03 seems to be what needs fixing again.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57064 - Posted: 2 Jul 2021 | 15:26:32 UTC

I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems.

Thank you so much.
Those efforts are for noble reasons.

Regarding persistent errors:
I also manually installed boost as a try at one of my Ubuntu 20.04 hosts, by means of the following commands:

sudo add-apt-repository ppa:mhier/libboost-latest
sudo apt-get update
sudo apt-get install boost1.74
reboot

But a new task downloaded after that still failed:
e3s644_e1s419p0f770-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9285_4
Then, I've reset GPUGrid project, and it seems that it did the trick.
A new task is currently running on this host, instead of failing after a few seconds past:
e4s126_e3s248p0f238-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND6347_7
49 minutes, 1,919% progress by now.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57065 - Posted: 2 Jul 2021 | 15:33:05 UTC - in response to Message 57064.

I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems.

Thank you so much.
Those efforts are for noble reasons.

Regarding persistent errors:
I also manually installed boost as a try at one of my Ubuntu 20.04 hosts, by means of the following commands:

sudo add-apt-repository ppa:mhier/libboost-latest
sudo apt-get update
sudo apt-get install boost1.74
reboot

But a new task downloaded after that still failed:
e3s644_e1s419p0f770-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9285_4
Then, I've reset GPUGrid project, and it seems that it did the trick.
A new task is currently running on this host, instead of failing after a few seconds past:
e4s126_e3s248p0f238-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND6347_7
49 minutes, 1,919% progress by now.


Thanks, I'll try a project reset. though I had already done a project reset after the new app was announced. I guess it can't hurt.

____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57066 - Posted: 2 Jul 2021 | 15:45:20 UTC - in response to Message 57065.

nope, even after the project reset, still the same error

process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
11:42:55 (5665): wrapper (7.7.26016): starting
11:42:55 (5665): wrapper (7.7.26016): starting
11:42:55 (5665): wrapper: running acemd3 (--boinc input --device 0)
ACEMD failed:
Error launching CUDA compiler: 32512
sh: 1: : Permission denied

11:42:56 (5665): acemd3 exited; CPU time 0.429069
11:42:56 (5665): app exit status: 0x1
11:42:56 (5665): called boinc_finish(195)


https://www.gpugrid.net/result.php?resultid=32632487
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57067 - Posted: 2 Jul 2021 | 16:00:16 UTC - in response to Message 57064.

sudo add-apt-repository ppa:mhier/libboost-latest
sudo apt-get update
sudo apt-get install libboost1.74
reboot


small correction here. it's "libboost1.74", not just "boost1.74"
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57068 - Posted: 2 Jul 2021 | 16:27:57 UTC - in response to Message 57066.

Maybe that your problem is an Ampere-specific one (?).

I've catched a new task in another of my hosts after applying the same remedy, and it is also running as expected.
e3s263_e1s419p0f938-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6959_2
25 minutes, 0,560% progress by now for this second task.
Turing GPUs and Nvidia drivers 465.31 on both hosts.
Installing libboost1.74 didn't worked for me by itself.
Resetting project didn't worked for me by itself.
Installing libboost1.74 and resetting project afterfards, did work for both my hosts.
I've doublechecked, and the commands that I employed were the previously published at message #57064
Watching Synaptic, this lead to libboost1.74 and libboost1.74-dev were correctly installed.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57069 - Posted: 2 Jul 2021 | 16:37:10 UTC - in response to Message 57068.
Last modified: 2 Jul 2021 | 16:41:30 UTC

Maybe that your problem is an Ampere-specific one (?).

I've catched a new task in another of my hosts after applying the same remedy, and it is also running as expected.
e3s263_e1s419p0f938-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6959_2
25 minutes, 0,560% progress by now for this second task.
Turing GPUs and Nvidia drivers 465.31 on both hosts.
Installing libboost1.74 didn't worked for me by itself.
Resetting project didn't worked for me by itself.
Installing libboost1.74 and resetting project afterfards, did work for both my hosts.
I've doublechecked, and the commands that I employed were the previously published at message #57064
Watching Synaptic, this lead to libboost1.74 and libboost1.74-dev were correctly installed.


I had this thought. I put in my old 2080ti to the problem-host, and will see if it starts processing, or if it's really a problem with the host-specific configuration. this isn't the first time this has happened though. and Toni previously fixed it with an app update. so it looks like that will be needed again even if it's Ampere-specifc.

I think the difference in install commands comes down to the use of apt vs. apt-get. although apt-get still works, transitioning to just apt will be better in the long term. Difference between apt and apt-get
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57070 - Posted: 2 Jul 2021 | 17:15:35 UTC - in response to Message 57069.
Last modified: 2 Jul 2021 | 17:22:47 UTC

Maybe that your problem is an Ampere-specific one (?).

I've catched a new task in another of my hosts after applying the same remedy, and it is also running as expected.
e3s263_e1s419p0f938-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6959_2
25 minutes, 0,560% progress by now for this second task.
Turing GPUs and Nvidia drivers 465.31 on both hosts.
Installing libboost1.74 didn't worked for me by itself.
Resetting project didn't worked for me by itself.
Installing libboost1.74 and resetting project afterfards, did work for both my hosts.
I've doublechecked, and the commands that I employed were the previously published at message #57064
Watching Synaptic, this lead to libboost1.74 and libboost1.74-dev were correctly installed.


I had this thought. I put in my old 2080ti to the problem-host, and will see if it starts processing, or if it's really a problem with the host-specific configuration. this isn't the first time this has happened though. and Toni previously fixed it with an app update. so it looks like that will be needed again even if it's Ampere-specifc.


well, it seems it's not Ampere specific. it failed in the same way on my 2080ti here: https://www.gpugrid.net/result.php?resultid=32632521

still the CUDA compiler error

unfortunately I can't easily move the 3080ti to another system since it's a watercooled model that requires a custom water loop.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57071 - Posted: 2 Jul 2021 | 18:08:24 UTC

I just used the ppa method on my other two hosts. But I did not reboot.
Picked up another task and it is running.
Waiting still on the luck of the draw for the other host without work.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57072 - Posted: 2 Jul 2021 | 18:19:53 UTC - in response to Message 57070.


well, it seems it's not Ampere specific. it failed in the same way on my 2080ti here: https://www.gpugrid.net/result.php?resultid=32632521

still the CUDA compiler error

unfortunately I can't easily move the 3080ti to another system since it's a watercooled model that requires a custom water loop.


I think I finally solved the issue! it's running on the 3080ti finally!

first I removed the manual installation of boost. and installed the PPA version. I don't think this was the issue though.

while poking around in my OS installs, I discovered that I had the CUDA 11.1 toolkit installed (likely from my previous attempts at building some apps to run on Ampere). I removed this old toolkit, cleaned up any files, rebooted, reset the project and waited for a task to show up.

so now it's running finally. now to see how long it'll take a 3080ti ;). it has over 10,000 CUDA cores so I'm hoping for a fast time. 2080ti runs about 12hrs, so it'll be interesting to see how fast I can knock it out. using about 310 watts right now. but with the caveat that ever since I've had this card, I've noticed some weird power limiting behavior. I'm waiting on an RMA now for a new card, and I'm hoping it can really stretch it's legs, plan to still power limit it to about 320W though.

____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57073 - Posted: 2 Jul 2021 | 18:35:05 UTC - in response to Message 57072.
Last modified: 2 Jul 2021 | 18:36:13 UTC

Congratulations!
Good news...Anxious to see the performance on a 3080 Ti

Vismed
Send message
Joined: 19 Nov 17
Posts: 1
Credit: 46,790,085
RAC: 0
Level
Val
Scientific publications
wat
Message 57074 - Posted: 2 Jul 2021 | 18:54:38 UTC - in response to Message 57041.

Well, it will be your problem, not mine. Even having decent hard- and software I am pretty astonished how folks like cosmology and the likes seemingly do not understand how VM and the like works. I am pretty pissed as an amateur, though...

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57075 - Posted: 2 Jul 2021 | 18:59:40 UTC

Just seen my first failures with libboost errors on Linux Mint 20.1, driver 460.80, GTX 1660 super.

Applied the PPA and reset the project - waiting on the next tasks now.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57076 - Posted: 2 Jul 2021 | 19:02:55 UTC - in response to Message 57074.

Well, it will be your problem, not mine. Even having decent hard- and software I am pretty astonished how folks like cosmology and the likes seemingly do not understand how VM and the like works. I am pretty pissed as an amateur, though...


what problem are you having specifically?

this project has nothing to do with cosmology, and this project does not use VMs.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57077 - Posted: 2 Jul 2021 | 22:47:18 UTC - in response to Message 57072.

I think I finally solved the issue! it's running on the 3080ti finally!

now to see how long it'll take a 3080ti ;). it has over 10,000 CUDA cores so I'm hoping for a fast time. 2080ti runs about 12hrs, so it'll be interesting to see how fast I can knock it out. using about 310 watts right now.

This is the moment of truth we're all waiting for.
My bet is 9h 15m.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57078 - Posted: 3 Jul 2021 | 1:22:16 UTC - in response to Message 57077.

I think I finally solved the issue! it's running on the 3080ti finally!

now to see how long it'll take a 3080ti ;). it has over 10,000 CUDA cores so I'm hoping for a fast time. 2080ti runs about 12hrs, so it'll be interesting to see how fast I can knock it out. using about 310 watts right now.

This is the moment of truth we're all waiting for.
My bet is 9h 15m.


I’m not sure it’ll be so simple.

When I checked earlier, it was tracking a 12.5hr completion time. But the 2080ti was tracking a 14.5hr completion time.

Either the new run of tasks are longer, or the CUDA 11.2 app is slower? We’ll have to see.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57079 - Posted: 3 Jul 2021 | 2:11:48 UTC - in response to Message 57078.

I'm curious how you have a real estimated time remaining calculated for a brand new application.

AFAIK, you JUST got the application working and I don't believe you have validated ten tasks yet to get an accurate APR which produces the accurate estimated time remaining numbers.

All my tasks are in EDF mode and multiple day estimates simply because I have returned exactly one valid task so far. A shorty Cryptic-Scout task.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57080 - Posted: 3 Jul 2021 | 3:18:39 UTC - in response to Message 57079.

I didn’t use the time remaining estimate from BOINC. I estimated it myself based on % complete and elapsed time, assuming a linear completion rate.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57081 - Posted: 3 Jul 2021 | 8:08:18 UTC - in response to Message 57078.
Last modified: 3 Jul 2021 | 8:15:46 UTC

When I checked earlier, it was tracking a 12.5hr completion time. But the 2080ti was tracking a 14.5hr completion time.

Either the new run of tasks are longer, or the CUDA 11.2 app is slower? We’ll have to see.
If the new tasks are longer, the awarded credit should be higher. The present ADRIA_New_KIXcMyb_HIP_AdaptiveBandit workunits "worth" 675.000 credits, while the previous ADRIA_D3RBandit_batch_nmax5000 "worth" 523.125 credits, so the present ones are longer.
My estimation was 12h/1.3=9h15m (based on my optimistic 30% performance improvement expectation).
Nevertheless we can use the completion times to estimate the actual performance improvement (3080Ti vs 2080Ti): The 3080 Ti completed the task in 44368s (12h 19m 28s) the 2080Ti completed the task in 52642s (14h 37m 22s), so the 3080Ti is "only" 18.65% faster. So the number of the usable CUDA cores in the 30xx series are the half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57082 - Posted: 3 Jul 2021 | 10:09:52 UTC

The PPA-reset trick worked - I have a new task running now. Another satisfied customer.

The completion estimate at 10% was 43.5 hours, both by extrapolation and by setting <fraction_done_exact/> in app_config.xml

It's a ADRIA_New_KIXcMyb_HIP_AdaptiveBandit task. I ran a couple of these about 10 days ago, under the old app: they took about 33 hours - previous 'D3RBandit_batch*' tasks had been 28 hours on average. Cards are GTX 1660 super.

So there's a possibility that the new app is slower, at least on 'before cutting edge' cards.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57083 - Posted: 3 Jul 2021 | 10:39:34 UTC - in response to Message 57081.

Some of the tasks were even longer. I have two more 2080ti reports that were 55,583 and 56,560s respectively. On Identical GPUs running the same clocks. There seems to be Some variability. If you use the slower one it puts it closer to 30%. This exposes the flaw of using a single sample to form a conclusion. More data is required.

Also note that I’ve been experiencing performance issues with this specific card. I believe it’s underperforming due to some incorrect power limiting behavior (I’ve done a lot of load testing and cross referencing benchmark results with others online). I have a replacement on the way to test.

These ADRIA tasks have hard coded reward. It isn’t necessarily based on run time. They increased the reward from the D3RBandit to these KIXcMyb tasks, but since they stopped distributing the CUDA 10 app, we can’t know for sure if the tasks are just longer or if there’s some inefficiency in the new 11.2 app that’s slowing it down. If the tasks aren’t longer, then the new app is almost 30% slower than the old CUDA app
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57084 - Posted: 3 Jul 2021 | 11:16:21 UTC

I looked back into the 'job_log_www.gpugrid.net.txt' in the BOINC data folder to get my comparison times. I haven't run many AdaptiveBandits yet, but I think the 'D3RBandit_batch*' time was a robust average over the many sub-types.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57085 - Posted: 3 Jul 2021 | 11:35:35 UTC - in response to Message 57083.
Last modified: 3 Jul 2021 | 11:36:42 UTC

After cross referencing runtimes for various Windows hosts, I think the new app is just slower. Windows hosts haven’t experienced an app change (yet) and haven’t shown any sudden or recent change in run time with the KIX AdaptiveBandit jobs. This suggests that that tasks haven’t really changed, leading the only other cause of the longer run time to be a slower 11.2 app.

I also noticed that the package distribution is different between the CUDA 10 and 11.2 apps. 10 included some library files that are not included with 11.2 (like cudart and cudafft libraries) so the app may have been compiled in a different way.

I hope Toni can bring the app back to par. It really shouldn’t be that much slower.
____________

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57086 - Posted: 3 Jul 2021 | 12:09:49 UTC
Last modified: 3 Jul 2021 | 12:14:32 UTC

GTX 1080
# Speed: average 75.81 ns/day, current 75.71 ns/day

RTX 2070S
# Speed: average 134.99 ns/day, current 132.17 ns/day

RTX 3070
# Speed: average 159.15 ns/day, current 155.75 ns/day
https://www.gpugrid.net/result.php?resultid=32632515
https://www.gpugrid.net/result.php?resultid=32632513

only task yet is with 3070 and ended after 18-19 hours
3000-series looks slow with 11.2 but they works. Progressbar and estimate looks to be close expected time and 2070 could probably end after around 21 hours.

It would be great if Toni could make application to print from progress.log

Had to add PPA for libboost needed and did try update one host to 21.04 to get latest boost but did not work.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57087 - Posted: 3 Jul 2021 | 12:24:39 UTC - in response to Message 57086.

Where did you get the ns/day numbers from?

But it’s not just 3000-series being slow. All cards seem to be proportionally slower with 11.2 vs 10.0, by about 30%
____________

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57088 - Posted: 3 Jul 2021 | 12:50:10 UTC - in response to Message 57087.
Last modified: 3 Jul 2021 | 12:51:12 UTC

Go to slot folder and cat progress.log

Yes looks like all cards are affected on new application. I compare with 1000-series also but do not have numbers of ns/day for them.

Where did you get 469 driver? Can't see it on nvidia site or PPA.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57089 - Posted: 3 Jul 2021 | 14:43:35 UTC - in response to Message 57088.

It’s not real. I’ve manipulated the coproc_info file to report what I want.

Actual driver in use is 460.84
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57090 - Posted: 3 Jul 2021 | 15:48:58 UTC - in response to Message 57086.

RTX 3070
# Speed: average 159.15 ns/day, current 155.75 ns/day
https://www.gpugrid.net/result.php?resultid=32632515
https://www.gpugrid.net/result.php?resultid=32632513

only task yet is with 3070 and ended after 18-19 hours
3000-series looks slow with 11.2 but they works. Progressbar and estimate looks to be close expected time and 2070 could probably end after around 21 hours.
The WU you linked had one wingman run it as cuda 10.1 and the other as 11.21 with 155,037 seconds versus 68,000. Isn't that faster?
https://www.gpugrid.net/workunit.php?wuid=27075862
What does ns mean?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57091 - Posted: 3 Jul 2021 | 16:16:33 UTC - in response to Message 57080.

I didn’t use the time remaining estimate from BOINC. I estimated it myself based on % complete and elapsed time, assuming a linear completion rate.

I usually employ the same method, since Progress % shown by BOINC Manager is quite linear.
At my low-end GPUs, I'm still waiting for the first task to complete :-)
Evaluating the small sample of tasks that I've received, tasks for this new version are taking longer to complete than previous ones (lets say "by the moment")
Estimated completion times for the 5 GPUs that I'm monitoring are as follows:



The last three GPUs are Turing GTX 1650 ones, but different graphics cards models and clock frequencies.
An editable version of the spreadsheet used can be downloaded from this link

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57092 - Posted: 3 Jul 2021 | 16:21:35 UTC - in response to Message 57090.

RTX 3070
# Speed: average 159.15 ns/day, current 155.75 ns/day
https://www.gpugrid.net/result.php?resultid=32632515
https://www.gpugrid.net/result.php?resultid=32632513

only task yet is with 3070 and ended after 18-19 hours
3000-series looks slow with 11.2 but they works. Progressbar and estimate looks to be close expected time and 2070 could probably end after around 21 hours.
The WU you linked had one wingman run it as cuda 10.1 and the other as 11.21 with 155,037 seconds versus 68,000. Isn't that faster?
https://www.gpugrid.net/workunit.php?wuid=27075862
What does ns mean?


nanosecond
https://en.wikipedia.org/wiki/Nanosecond#:~:text=A%20nanosecond%20(ns)%20is%20an,or%201%E2%81%841000%20microsecond.


Yes there big gap to runtime on other host but it was also using NVIDIA GeForce GTX 1070

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57093 - Posted: 3 Jul 2021 | 16:30:06 UTC - in response to Message 57089.

It’s not real. I’ve manipulated the coproc_info file to report what I want.

Actual driver in use is 460.84


Ok why i ask was that device name is unknown for my 3080Ti and had some hope that driver you used would fix that.
So i could go coproc file and edit instead.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57094 - Posted: 3 Jul 2021 | 16:53:19 UTC - in response to Message 57093.
Last modified: 3 Jul 2021 | 16:54:38 UTC

It’s not real. I’ve manipulated the coproc_info file to report what I want.

Actual driver in use is 460.84


Ok why i ask was that device name is unknown for my 3080Ti and had some hope that driver you used would fix that.
So i could go coproc file and edit instead.


What driver are you using? The 3080ti won’t be detected until driver 460.84. Anything older will not know what GPU that is.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57095 - Posted: 3 Jul 2021 | 17:05:39 UTC

Greger, I just can't get my head around what it means. So out of the 8.64E13 ns in a day you only calculate for 159 ns??? I'm not familiar with that figure of merit.

BTW, my 3080 is running 465.31. Still waiting to catch a WU after the PPA, reboot & reset.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57096 - Posted: 3 Jul 2021 | 17:51:03 UTC - in response to Message 57095.

The nanoseconds will be the biochemical reaction time that we're modelling - very, very, slowly - in a digital simulation.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57097 - Posted: 3 Jul 2021 | 18:00:23 UTC - in response to Message 57095.

Greger, I just can't get my head around what it means. So out of the 8.64E13 ns in a day you only calculate for 159 ns??? I'm not familiar with that figure of merit.

BTW, my 3080 is running 465.31. Still waiting to catch a WU after the PPA, reboot & reset.


Aren’t you big into folding? ns/day is a very common metric for measuring computation speed in molecular modeling.

____________

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57098 - Posted: 3 Jul 2021 | 18:47:55 UTC - in response to Message 57094.
Last modified: 3 Jul 2021 | 18:51:42 UTC

It’s not real. I’ve manipulated the coproc_info file to report what I want.

Actual driver in use is 460.84


Ok why i ask was that device name is unknown for my 3080Ti and had some hope that driver you used would fix that.
So i could go coproc file and edit instead.


What driver are you using? The 3080ti won’t be detected until driver 460.84. Anything older will not know what GPU that is.


NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3

Could not use 460 for 3080Ti so i had to move latest ubuntu provided and it would this version.
boinc-client detect name as
Coprocessors NVIDIA NVIDIA Graphics Device (4095MB) driver: 465.27

I edit coproc_info.xml but it does not change when i update to project and if i restart boinc-client it will wipe even if ai change driverversin inside file.

Maybe i could lock file to root only to prevent boinc to write permission but i better not.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57099 - Posted: 3 Jul 2021 | 18:53:41 UTC - in response to Message 57098.

You need driver 460.84 for 3080ti. You can use that one.

You can also use 465.31, but that driver is about a month older, 460.84 will be better unless you absolutely need some feature from the 465 branch.
____________

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57100 - Posted: 3 Jul 2021 | 18:56:34 UTC - in response to Message 57095.
Last modified: 3 Jul 2021 | 18:59:50 UTC

Greger, I just can't get my head around what it means. So out of the 8.64E13 ns in a day you only calculate for 159 ns??? I'm not familiar with that figure of merit.

BTW, my 3080 is running 465.31. Still waiting to catch a WU after the PPA, reboot & reset.


As mention before here it is possible time the device could genereate a folding event for that device but you need take in count the complexity of folding time in and amount of atoms have big affect on it and possible other parameters in modelling event.

Think of see it as a box and you have x y z and it build up protein with atoms then make fold of it. In total result it would be very very short event.

There was a free tool before and possible available still today that you could use to open data that done directly with after it was done. users have done this at folding@home and posted in forums.

Not sure if that is free for acemd

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57101 - Posted: 3 Jul 2021 | 19:22:23 UTC - in response to Message 57099.

You need driver 460.84 for 3080ti. You can use that one.

You can also use 465.31, but that driver is about a month older, 460.84 will be better unless you absolutely need some feature from the 465 branch.


ok thanks

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57102 - Posted: 3 Jul 2021 | 19:29:54 UTC

Yea, snagged a WU and it's running. My guesstimate is 19:44:13 on my 3080 dialed down to 230 Watts. Record breaking long heat wave here and summer peak Time-of-Use electric rates (8.5x higher) have started. Summer is not BOINC season in The Great Basin.

Rxn time, now that makes sense. Thx.

Linux Mint repository offers 465.31 and 460.84. Is it actually worth reverting to 460.84??? I wouldn't do it until after this WU completes anyway.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57103 - Posted: 3 Jul 2021 | 19:56:15 UTC - in response to Message 57102.

Linux Mint repository offers 465.31 and 460.84. Is it actually worth reverting to 460.84??? I wouldn't do it until after this WU completes anyway.


probably wont matter if the driver you have is working. i don't expect any performance difference between the two. I was just saying that I would use a more recent non-beta driver if i was updating, unless you need some feature in 465 branch specifically.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57104 - Posted: 3 Jul 2021 | 20:00:01 UTC

second 3080ti task completed in 11hrs

http://gpugrid.net/result.php?resultid=32632580
____________

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57105 - Posted: 3 Jul 2021 | 20:59:16 UTC

peak 28,9°C here today so suspend during daytime after 2 task done.
I run evening and night these days if temp i high. Ambient temp was above 35 inside and fan gone up to 80% on gpu i checked.

So i manage to go to 460.84 after a few remove and --purge nvidia*. Apparently there was a libnvidia-compute left and hold it back.

Got name correct but detect vram wrong (4095MB). Lets see if it would work.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57106 - Posted: 3 Jul 2021 | 21:32:29 UTC

Just take in mind that any change in Nvidia driver version while a GPUgrid task is in progress, will cause it to fail when computing is restarted.
Commented in message #56909

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57107 - Posted: 3 Jul 2021 | 21:58:34 UTC - in response to Message 57105.

peak 28,9°C here today so suspend during daytime after 2 task done.
I run evening and night these days if temp i high. Ambient temp was above 35 inside and fan gone up to 80% on gpu i checked.

So i manage to go to 460.84 after a few remove and --purge nvidia*. Apparently there was a libnvidia-compute left and hold it back.

Got name correct but detect vram wrong (4095MB). Lets see if it would work.


The VRAM reported wrong is not because of the driver. It’s a problem with BOINC. BOINC uses a detection technique that is only 32-bit (4GB). This can only be fixed by fixing the code in BOINC.
____________

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57108 - Posted: 3 Jul 2021 | 23:18:13 UTC - in response to Message 57107.
Last modified: 3 Jul 2021 | 23:50:23 UTC

I went back to my host and driver crashed. smi unable to open and task failed on another project. Restarted it and back on track. Few minutes later it fetch new task from GPUGrid. Let's hope it does not crash again.

https://www.gpugrid.net/result.php?resultid=32634065

# Speed: average 225.91 ns/day, current 226.09 ns/day
That is more like. this much better then my 3070 and 3060Ti got.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57109 - Posted: 3 Jul 2021 | 23:32:42 UTC - in response to Message 57108.

GPU detection is handled by BOINC, not any individual projects.

Driver updates always require a reboot to take effect.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57110 - Posted: 4 Jul 2021 | 9:09:53 UTC

Finally, my first result of a new version 2.12 task came out in my fastest card:
e4s126_e3s248p0f238-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND6347_7
It took 141948 seconds of total processing time. That is: 1 day 15 hours 25 minutes and 48 seconds
Predicted time in table shown at message #57091 was 142074 seconds after 61,439% done.
There is a slight difference of 126 seconds between estimated and true execution time. 0,09% deviation.
For me, it is approximate enough, and validates Ian&Steve C. theory of progress for these tasks being quite linear along their execution.

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,814,903,999
RAC: 20,225,743
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57111 - Posted: 4 Jul 2021 | 10:08:18 UTC

Compare old and new app on 2070S

old
52,930.87 New version of ACEMD v2.11 (cuda100)
WU 27069210 e130s1888_e70s25p0f44-ADRIA_D3RBandit_batch_nmax5000-0-1-RND2852_1

new
80,484.11 New version of ACEMD v2.12 (cuda1121)
WU: 27077230 e5s177_e4s56p0f117-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND4081_4

Not sure if size of units grown that much to be able compare them.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57112 - Posted: 4 Jul 2021 | 13:40:41 UTC - in response to Message 57102.
Last modified: 4 Jul 2021 | 13:46:54 UTC

My guesstimate is 19:44:13 on my 3080 dialed down to 230 Watts.

16:06:54
https://www.gpugrid.net/workunit.php?wuid=27077289

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57113 - Posted: 4 Jul 2021 | 14:23:15 UTC

At this moment, every of my 7 currently working GPUs have any new version 2.12 task in process.
Two tasks received today completed the quota.
Task e4s120_e3s763p0f798-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9850_3, hanging from WU #27076712
Task e5s90_e4s138p0f962-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6130_4, hanging from WU #27077322
Something to remark: These two tasks are repetitive resends of previously failed tasks with the following known problem:

acemd3: error while loading shared libraries: libboost_filesystem.so.1.74.0: cannot open shared object file: No such file or directory

Chance to remember that there is a remedy for this problem, commented at message #57064 in this same thread.

One last update for estimated times to completion on my GPUs:



An editable version of the spreadsheet used can be downloaded from this link
Changes since previous version:
- Lines for two more GPUs are added.
- A new cell is added for seconds to D:H:M:S conversion

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57114 - Posted: 4 Jul 2021 | 14:50:02 UTC - in response to Message 57081.
Last modified: 4 Jul 2021 | 14:52:34 UTC

So the number of the usable CUDA cores in the 30xx series are half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti.


Does using half of CUDA cores have implications for BOINCing?
GG+OPNG at <cpu_usage>1.0</cpu_usage> & <gpu_usage>0.5</gpu_usage> works fine.
GG+DaggerHashimoto crashes GG instantly.
I hope to try 2xGG today.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57115 - Posted: 4 Jul 2021 | 15:06:42 UTC - in response to Message 57114.

Does using half of CUDA cores have implications for BOINCing?
GG+OPNG at <cpu_usage>1.0</cpu_usage> & <gpu_usage>0.5</gpu_usage> works fine.
GG+DaggerHashimoto crashes GG instantly.
I hope to try 2xGG today.
You can't utilize the "extra" CUDA cores by running a second task (regardless of the project).
The 30xx series improved gaming experience much more, than the crunching performance.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57116 - Posted: 4 Jul 2021 | 15:10:22 UTC - in response to Message 57114.

So the number of the usable CUDA cores in the 30xx series are half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti.


Does using half of CUDA cores have implications for BOINCing?
GG+OPNG at <cpu_usage>1.0</cpu_usage> & <gpu_usage>0.5</gpu_usage> works fine.
GG+DaggerHashimoto crashes GG instantly.
I hope to try 2xGG today.


I think you misunderstand what's happening.

running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting.

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.
____________

WMD
Send message
Joined: 21 May 21
Posts: 1
Credit: 12,242,500
RAC: 0
Level
Pro
Scientific publications
wat
Message 57118 - Posted: 4 Jul 2021 | 15:46:13 UTC - in response to Message 57116.

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.

The way Ampere works is that half the cores are FP32, and the other half are either FP32 or INT32 depending on need. On Turing (and older), the INT32 half was always INT32. So you're probably right - either GPUGRID has some INT32 load that is using the cores instead, or some kind of application change is required to get it to use the other half.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57119 - Posted: 4 Jul 2021 | 16:30:28 UTC - in response to Message 57118.
Last modified: 4 Jul 2021 | 16:32:20 UTC

I'm not convinced that the extra cores "aren't being used" at all, ie, the cores are sitting idle 100% of the time as a direct result of the architecture or something like that. I think both the application and the hardware are fully aware of the available cores/SMs. just that the application is coded in such a way that it can't take advantage of the extra resources, either in optimization or in the number of INT instructions required.

nvidia's press notes do seem to show a 1.5x improvement in molecular modeling load for A100 vs V100, so maybe the amount of INT calls is inherent to this kind of load anyway. (granted the A100 is based on the GA100 core, which is a different architecture without the shared FP/INT cores for the doubling of FP cores like on GA102)

but in the case of GPUGRID, i think it's just their application. on folding Ampere performs much closer to the claims. a 3070 being only a bit slower than a 2080ti, which is what I would expect.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57120 - Posted: 4 Jul 2021 | 16:33:22 UTC - in response to Message 57115.
Last modified: 4 Jul 2021 | 16:57:55 UTC

The 30xx series improved gaming experience much more, than the crunching performance.

I'm thoroughly unimpressed by my 3080. Its performance does not scale with price making it much more expensive for doing calculations. I'll probably test it for a few more days and then sell it.

I like to use some metric that's proportional to calculations and optimize calcs/Watt. In the past my experience has been reducing max power improves performance. But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux. nvidia-settings -q all

It seems Nvidia chooses a performance level but I can't see how to force it to a desired level:
sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q '[gpu:0]/GPUPerfModes'
3080: 0, 1, 2, 3 & 4
Attribute 'GPUPerfModes' (Rig-05:0[gpu:0]):
perf=0, nvclock=210, nvclockmin=210, nvclockmax=420, nvclockeditable=1, memclock=405, memclockmin=405, memclockmax=405, memclockeditable=1, memTransferRate=810, memTransferRatemin=810, memTransferRatemax=810, memTransferRateeditable=1 ;
perf=1, nvclock=210, nvclockmin=210, nvclockmax=2100, nvclockeditable=1, memclock=810, memclockmin=810, memclockmax=810, memclockeditable=1, memTransferRate=1620, memTransferRatemin=1620, memTransferRatemax=1620, memTransferRateeditable=1 ;
perf=2, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=5001, memclockmin=5001, memclockmax=5001, memclockeditable=1, memTransferRate=10002, memTransferRatemin=10002, memTransferRatemax=10002, memTransferRateeditable=1 ;
perf=3, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=9251, memclockmin=9251, memclockmax=9251, memclockeditable=1, memTransferRate=18502, memTransferRatemin=18502, memTransferRatemax=18502, memTransferRateeditable=1 ;
perf=4, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=9501, memclockmin=9501, memclockmax=9501, memclockeditable=1, memTransferRate=19002, memTransferRatemin=19002, memTransferRatemax=19002, memTransferRateeditable=1

Nvidia has said, "The -a and -g arguments are now deprecated in favor of -q and -i, respectively. However, the old arguments still work for this release." Sounds like they're planning to reduce or eliminate customers ability to control the products they buy.

Nvidia also eliminated GPULogoBrightness so the baby-blinkie lights never turn off.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57121 - Posted: 4 Jul 2021 | 16:49:39 UTC - in response to Message 57116.

running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting.

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.

At less than 5% complete with two WUs running simultaneously and having started within minutes of each other:
WU1: 4840 sec at 4.7% implies 102978 sec total
WU2: 5409 sec at 4.6% implies 117587 sec total
From yesterday's singleton: 2 x 58014 sec = 116028 sec total if independent.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57122 - Posted: 4 Jul 2021 | 17:54:31 UTC - in response to Message 57121.

running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting.

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.

At less than 5% complete with two WUs running simultaneously and having started within minutes of each other:
WU1: 4840 sec at 4.7% implies 102978 sec total
WU2: 5409 sec at 4.6% implies 117587 sec total
From yesterday's singleton: 2 x 58014 sec = 116028 sec total if independent.


my point exactly. showing roughly half speed, with no real benefit to running multiples. pushing your completion time to 32hours will only reduce your credit reward since you'll be bumped out of the +50% bonus for returning in 24hrs.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57123 - Posted: 4 Jul 2021 | 18:16:56 UTC - in response to Message 57120.

Aurum wrote:
But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux.


these options still work. I use them for my 3080Ti. not sure what you mean?

this is exactly what I use for my 3080Ti (same on my Turing hosts)


/usr/bin/nvidia-smi -pm 1
/usr/bin/nvidia-smi -acp UNRESTRICTED

/usr/bin/nvidia-smi -i 0 -pl 320

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100"


it works as desired.

Aurum wrote:
It seems Nvidia chooses a performance level but I can't see how to force it to a desired level:


what do you mean by "performance level"? if you mean forcing a certain P-state, no you can't do that. and these cards will not allow getting into P0 state unless you're running a 3D application. any compute application will get a best of P2 state. this has been the case ever since Maxwell. workarounds to force P0 state stopped working since Pascal, so this isnt new.

if you mean the PowerMizer preferred mode (which is analogous to the power settings in Windows) you can select that easily in Linux too. I always run mine at "prefer max performance" do this with the following command:

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"


I'm unsure if this really makes much difference though except increasing idle power consumption (forcing higher clocks). the GPU seems to detect loads properly and clock up even when left on the default "Auto" selection.

Aurum wrote:
Nvidia also eliminated GPULogoBrightness so the baby-blinkie lights never turn off.

I'm not sure this was intentional, probably something that fell through the cracks that not enough people have complained about for them to dedicate resources to fixing. there's no gain for nvidia disabling this function. but again, this stopped working with Turing, so it's been this way for like 3 years, not something new. I have mostly EVGA cards, so when I want to mess with the lighting, I just throw the card on my test bench, boot into Windows, change the LED settings there, and then put it back in the crunching rig. the settings are preserved internal to the card (for my cards) so it stays and whatever I left it as. you can probably do the same

____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57124 - Posted: 4 Jul 2021 | 18:26:11 UTC

It sure does not look like running multiple GG WUs on the same GPU has any benefit.
My 3080 is stuck in P2. I'd like to try it in P3 and P4 but I can't make it change. I tried:
nvidia-smi -lmc 9251
Memory clocks set to "(memClkMin 9501, memClkMax 9501)" for GPU 00000000:65:00.0
All done.
nvidia-smi -lgc 240,2130
GPU clocks set to "(gpuClkMin 240, gpuClkMax 2130)" for GPU 00000000:65:00.0
All done.

But it's still in P2.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 57125 - Posted: 4 Jul 2021 | 18:34:34 UTC - in response to Message 57123.

Aurum wrote:
But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux.
these options still work. I use them for my 3080Ti. not sure what you mean?

this is exactly what I use for my 3080Ti (same on my Turing hosts)
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100"
it works as desired.

How do you prove to yourself they work? They don't even exist any more. Run
nvidia-settings -q all | grep -C 10 -i GPUMemoryTransferRateOffset
and you will not find either of them.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57126 - Posted: 4 Jul 2021 | 18:44:18 UTC

but all the slightly off-topic aside.

It was a great first step to getting the app working for Ampere. it's been long awaited and the new app is much appreciated and now many more cards can help contribute to the project, especially with these newer long running tasks lately. we need powerful cards to handle these tasks.

I think the two priorities now should be:

1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute.

2. investigate the cause and provide a remedy for the ~30% slowdown in application performance from the older cuda100 app. this isn't just affecting Ampere, but affecting all GPUs equally it seems. maybe some optimization flag was omitted or some change to the code was made that was undesirable or unintended. just changing from cuda100 to cuda1121 should not in itself have caused this if there were no other code changes. sometimes you can see slight performance changes like 1-2%, but a 30% reduction is a sign that something is clearly wrong.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57127 - Posted: 4 Jul 2021 | 18:54:32 UTC - in response to Message 57125.
Last modified: 4 Jul 2021 | 18:56:04 UTC

Aurum wrote:
But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux.
these options still work. I use them for my 3080Ti. not sure what you mean?

this is exactly what I use for my 3080Ti (same on my Turing hosts)
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100"
it works as desired.

How do you prove to yourself they work? They don't even exist any more. Run
nvidia-settings -q all | grep -C 10 -i GPUMemoryTransferRateOffset
and you will not find either of them.


I prove they work by opening Nvidia X Server Settings and observing that the clock speed offsets have been changed in accordance with the commands and don't give any error when running them. and they have. the commands work 100%. I see you're referencing some other command. I have no idea the function of the command you're trying to use. but my command works.

see for yourself:
https://i.imgur.com/UFHbhNt.png
____________

888
Send message
Joined: 28 Jan 21
Posts: 6
Credit: 106,022,917
RAC: 0
Level
Cys
Scientific publications
wat
Message 57139 - Posted: 5 Jul 2021 | 12:12:35 UTC

I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds.

http://www.gpugrid.net/result.php?resultid=32636087

I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57140 - Posted: 5 Jul 2021 | 12:31:15 UTC - in response to Message 57139.

I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds.

http://www.gpugrid.net/result.php?resultid=32636087

I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers.


How did you install the drivers? Have you ever installed the CUDA toolkit? This was my problem. If you have a CUDA toolkit installed, remove it. I would also be safe and totally purge your nvidia drivers and re-install fresh.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57142 - Posted: 5 Jul 2021 | 13:01:27 UTC - in response to Message 57126.

Ian&Steve C wrote:

It was a great first step to getting the app working for Ampere. it's been long awaited and the new app is much appreciated and now many more cards can help contribute to the project, especially with these newer long running tasks lately. we need powerful cards to handle these tasks.

I think the two priorities now should be:

1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute.

2. investigate the cause and provide a remedy for the ~30% slowdown in application performance from the older cuda100 app. ...

and last, but not least: an app for Windows would be nice :-)

888
Send message
Joined: 28 Jan 21
Posts: 6
Credit: 106,022,917
RAC: 0
Level
Cys
Scientific publications
wat
Message 57143 - Posted: 5 Jul 2021 | 13:31:53 UTC - in response to Message 57140.

I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds.

http://www.gpugrid.net/result.php?resultid=32636087

I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers.


How did you install the drivers? Have you ever installed the CUDA toolkit? This was my problem. If you have a CUDA toolkit installed, remove it. I would also be safe and totally purge your nvidia drivers and re-install fresh.



Thanks for the quick reply. I had the CUDA toolkit ver 10 installed, but after seeing your previous post about you problem, I had already removed it. I'll try purging and reinstalling my nvidia drivers, thanks.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57145 - Posted: 5 Jul 2021 | 13:45:03 UTC - in response to Message 57143.

did you use the included removal script to remove the toolkit? or did you manually delete some files? definitely try the removal script if you havent already. good luck!
____________

Profile trigggl
Send message
Joined: 6 Mar 09
Posts: 25
Credit: 102,324,681
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 57147 - Posted: 5 Jul 2021 | 14:56:33 UTC - in response to Message 57126.

...
1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute.
...

For those of us who are using the python app, the correct version is installed in the miniconda folder.
locate libboost_filesystem
/usr/lib64/libboost_filesystem-mt.so
/usr/lib64/libboost_filesystem.so
/usr/lib64/libboost_filesystem.so.1.76.0
/usr/lib64/cmake/boost_filesystem-1.76.0/libboost_filesystem-variant-shared.cmake
/var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so
/var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so.1.74.0
/var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake
/var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so
/var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so.1.74.0
/var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake

I definitely don't want to downgrade my system version to run a project. Perhaps gpugrid could include the libboost that they already supply for a different app.

Could the miniconda folder be somehow included in the app?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57192 - Posted: 10 Jul 2021 | 8:02:28 UTC

Richard Haselgrove sait at Message #57177:

Look at that timeout: host 528201. Oh, Mr. Kevvy, where art thou? 156 libboost errors? You can fix that...

Finally, Mr. Kevvy host #537616 processed successfully today these two tasks:
e4s113_e1s796p0f577-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND7908_0
e5s9_e3s99p0f334-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND8007_4
If it was due to your fix, congratulations Mr. Kevvy, you've found the right way.

Or perhaps it was some fix at tasks from server side?
Hard to know till there are plenty of new tasks ready to send.
Currently, 7:51:20 UTC, there are 0 tasks left ready to send, 28 tasks left in progress, as Server status page shows.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57193 - Posted: 10 Jul 2021 | 8:11:36 UTC - in response to Message 57192.

I got a note back from Mr. K - he saw the errors, and was going to check his machines. I imagine he's applied Ian's workround.

Curing the world's diseases, one computer at a time. It would be better if that bug could be fixed at source, for a universal cure.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57222 - Posted: 22 Jul 2021 | 21:39:41 UTC

On July 3rd 2021, Ian&Steve C. wrote at Message #57087:

But it’s not just 3000-series being slow. All cards seem to be proportionally slower with 11.2 vs 10.0, by about 30%

While organizing screenshots on one of my hosts, I happened to find comparative images for tasks of old Linux APP V2.11 (CUDA 10.0) and new APP V2.12 (CUDA 11.2)

* ACEMD V2.11 tasks on 14/06/2021:


* ACEMD V2.12 task on 20/07/2021:


Pay attention to device 0, the only comparable one.
- ACEMD V2.11 task: 08:10:18 = 29418 seconds past to process 15,04%. Extrapolating, this leads to 195598 seconds of total processing time (2d 06:19:58)
- ACEMD V2.12 task: 3d 02:51:01 = 269461 seconds past to process 96,48%. Extrapolating, this leads to 279292 seconds of total processing time (3d 05:34:52)
That is, about 42,8% of excess processing time for this particular host and device 0 (GTX 1650 GPU)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57223 - Posted: 23 Jul 2021 | 10:04:15 UTC - in response to Message 57222.

Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task.

They are different, and not directly comparable. How much of the observed slowdown is down to the data/algorithm, and how much is down to the new application, will need further examples to unravel.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57225 - Posted: 23 Jul 2021 | 13:12:06 UTC - in response to Message 57223.
Last modified: 23 Jul 2021 | 13:13:05 UTC

Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task.

Bright observer, and sharp appointment, as always.
I agree that tasks aren't probably fully comparable, but they are the most comparable I found: Same host, same device, same ADRIA WUs family, same base credit amount granted: 450000...
Now I'm waiting for the next move, and wondering about what will it consist of: An amended V2.12 APP?, a new V2.13 APP?, a "superstitious-proof" new V2.14 APP? ... ;-)

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 57230 - Posted: 4 Aug 2021 | 2:35:51 UTC

Is GPU grid still doing anything? I haven't gotten any work in like a month or more. And before that is was just sporadic. I used to always have work units. Now, nothing.

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 86,638,150
RAC: 17,737
Level
Thr
Scientific publications
wat
Message 57231 - Posted: 4 Aug 2021 | 7:53:39 UTC

I am not receiving Windows tasks anymore. My configuration is
Boinc 7.16.11 GenuineIntel Intel(R) Xeon(R) CPU E5620 @ 2.40GHz [Family 6 Model 44 Stepping 2](4 processors)

NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 461.40

Microsoft Windows 10 Professional x64 Edition, (10.00.19043.00)

Am I still within Spec's to get Windows acemd3 work ?

Thanks
Bill F
____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57232 - Posted: 4 Aug 2021 | 13:43:59 UTC - in response to Message 57231.

there hasnt been an appreciable amount of work available for over a month.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57233 - Posted: 5 Aug 2021 | 12:32:19 UTC - in response to Message 57232.

there hasnt been an appreciable amount of work available for over a month.

:-( :-( :-(

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 6,132,477,024
RAC: 9,758,775
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57234 - Posted: 5 Aug 2021 | 15:08:38 UTC - in response to Message 57233.

there hasnt been an appreciable amount of work available for over a month.

:-( :-( :-(

Currently it's like Gpugrid Project was hibernating.
From time to time, when tasks in progress reach zero, some automatism (?) launches 20 more CRYPTICSCOUT_pocket_discovery WUs. But lately only for Linux systems, and with these known problems unsolved.

Waiting for everything awakening soon again...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57235 - Posted: 5 Aug 2021 | 20:22:36 UTC

Yes, that is all I've been getting lately. I had 4 CRYPTICSCOUT_pocket_discovery tasks 4 days ago and I got 2 more today.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57236 - Posted: 6 Aug 2021 | 15:48:40 UTC

Another two more today.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57237 - Posted: 6 Aug 2021 | 16:24:25 UTC

what I don't understand is that there is no word whatsoever from the project team about an even tentative schedule :-(
Will there be new tasks available in say 1 week, 1 month, 3 months, ... ?
Will there be a new app which covers Ampere cards also (for both Linux and Windows) ... ?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57238 - Posted: 6 Aug 2021 | 16:33:58 UTC

still no tasks on the two hosts that have been having some issue getting work since the new app was released. I've set NNT on all other hosts in order to try to funnel any available work to these two hosts.

it's like they've been shadow-banned or something. even when work is available, it gets the message that no work is available. after an entire month, these hosts should have picked up at least one task.
____________

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 86,638,150
RAC: 17,737
Level
Thr
Scientific publications
wat
Message 57239 - Posted: 7 Aug 2021 | 1:08:56 UTC

Well I stepped out on a limb and side emailed the Principal Investigator listed for the project and the University regarding the lack of any communications.

If you see a puff of smoke in the Dallas TX area and the user count goes down by one you will know that I was hit by a lighting bolt or a small thermonuclear device.

Bill F
Dallas
____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57243 - Posted: 7 Aug 2021 | 15:04:28 UTC - in response to Message 57239.

It worked! And Texas is still there, the last time I checked.
http://www.gpugrid.net/forum_thread.php?id=5246

Profile trigggl
Send message
Joined: 6 Mar 09
Posts: 25
Credit: 102,324,681
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 57247 - Posted: 8 Aug 2021 | 19:49:07 UTC - in response to Message 57192.
Last modified: 8 Aug 2021 | 19:59:54 UTC

Moved to libboost thread.

Roland Glaubitz
Send message
Joined: 1 Feb 09
Posts: 3
Credit: 169,942,119
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57427 - Posted: 3 Oct 2021 | 12:25:28 UTC

Hello;
The new acemd3 app saved no provisional result, like "PrimeGrid" or "Einstein". I have a RTX3070 and i let there run with ~120W. With boinccmd and Systemdtimer, i start the GPU-Calculation for 15min and then 5min stop. At every start, the WU starts with 0%. Please look at this behaviour.
Thanks.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57430 - Posted: 3 Oct 2021 | 13:41:40 UTC - in response to Message 57427.

Hello;
The new acemd3 app saved no provisional result, like "PrimeGrid" or "Einstein". I have a RTX3070 and i let there run with ~120W. With boinccmd and Systemdtimer, i start the GPU-Calculation for 15min and then 5min stop. At every start, the WU starts with 0%. Please look at this behaviour.
Thanks.

Try once more, but let it run for five minutes after the restart. You may find the the progress display jumps back up to what it was showing before.

I had a total electricity blackout a week ago, while everything was running. I was seeing 0% as the machine started up, but it jumped back up to 50% or whatever was appropriate, and completed normally.

jiipee
Send message
Joined: 4 Jun 15
Posts: 19
Credit: 6,150,522,446
RAC: 4,437,291
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57759 - Posted: 4 Nov 2021 | 14:03:19 UTC

Why are so many Acemd3 tasks failing on Windows host(s)? Mine has not succeeded on any task lately. Same errors can be seen on many other hosts too, like WU 27085364.


Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57780 - Posted: 10 Nov 2021 | 11:56:09 UTC

There's another Acemd3 test ongoing today:

TEST_24_41-RAIMIS_TEST-0-1-RND0310_0

Seems to have worked for me, at least.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 57784 - Posted: 10 Nov 2021 | 14:26:12 UTC - in response to Message 57780.

all of mine failed, because they sent only the CUDA101 app to my Ampere host.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57787 - Posted: 10 Nov 2021 | 15:11:16 UTC - in response to Message 57784.

all of mine failed, because they sent only the CUDA101 app to my Ampere host.

I am wondering that this problem has not yet been solved :-(
There should have been time enough in the meantime.

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 86,638,150
RAC: 17,737
Level
Thr
Scientific publications
wat
Message 58012 - Posted: 2 Dec 2021 | 1:44:14 UTC

I am running my 1st New version of ACEMD 2.19 (cuda1121) task and it has reached a percentage of 35.833 and shows a Running status but the Elapsed time and Time remaining are not incrementing ?

Task properties

Application
New version of ACEMD 2.19 (cuda1121)
Name
e7s30_e1s279p0f463-ADRIA_BanditGPCR_APJ_b0-0-1-RND6681
State
Running
Received
12/1/2021 1:25:59 AM
Report deadline
12/6/2021 1:25:59 AM
Resources
0.985 CPUs + 1 NVIDIA GPU
Estimated computation size
5,000,000 GFLOPs
CPU time
09:42:23
CPU time since checkpoint
---
Elapsed time 10:11:24
Estimated time remaining 1d 10:10:29
Fraction done 35.833%
Virtual memory size 0 bytes
Working set size0 bytes
Directory slots/0
Process ID 10056
Progress rate 3.600% per hour
Executable wrapper_6.1_windows_x86_64.exe

Environment Win 10 NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 472.47

Has anyone else seen this ?

Thanks
Bill F

____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 4,047,021,065
RAC: 13,221,293
Level
Arg
Scientific publications
watwatwatwatwat
Message 58014 - Posted: 2 Dec 2021 | 3:09:08 UTC - in response to Message 58012.

I am running my 1st New version of ACEMD 2.19 (cuda1121) task and it has reached a percentage of 35.833 and shows a Running status but the Elapsed time and Time remaining are not incrementing ?

Task properties

Application
New version of ACEMD 2.19 (cuda1121)
Name
e7s30_e1s279p0f463-ADRIA_BanditGPCR_APJ_b0-0-1-RND6681
State
Running
Received
12/1/2021 1:25:59 AM
Report deadline
12/6/2021 1:25:59 AM
Resources
0.985 CPUs + 1 NVIDIA GPU
Estimated computation size
5,000,000 GFLOPs
CPU time
09:42:23
CPU time since checkpoint
---
Elapsed time 10:11:24
Estimated time remaining 1d 10:10:29
Fraction done 35.833%
Virtual memory size 0 bytes
Working set size0 bytes
Directory slots/0
Process ID 10056
Progress rate 3.600% per hour
Executable wrapper_6.1_windows_x86_64.exe

Environment Win 10 NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 472.47

Has anyone else seen this ?

Thanks
Bill F


https://www.gpugrid.net/forum_thread.php?id=5297&nowrap=true#58008

Ya need MS Visual C++ 2015 or later.

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 86,638,150
RAC: 17,737
Level
Thr
Scientific publications
wat
Message 58020 - Posted: 3 Dec 2021 | 4:20:45 UTC - in response to Message 58014.

Yes I updated my MS Visual C++ to the 2015 or newer level and my old NVDIA 1060 took right off !!!

Thanks
Bill F

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58385 - Posted: 27 Feb 2022 | 18:08:00 UTC

Anybody get any new acemd3 tasks and notice if the application name changed?

Went looking for acemd3 tasks in the server and see that they changed the name to

Advanced molecular dynamics simulations for GPUs

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58386 - Posted: 27 Feb 2022 | 18:19:31 UTC - in response to Message 58385.

Yes, I've had one running on a Linux machine for the last 10 hours or so. Still says v2.19 and cuda 1121. It's approaching 40%, so much the same speed as usual for a GTX 1660 Ti

Host 508381. Note that machine has an old errored task from 2020 in the list (v2.10, cuda 100), and that's got the new name too.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58388 - Posted: 27 Feb 2022 | 23:51:08 UTC - in response to Message 58385.

That's interesting.
You can track the change on the "applications" page.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58390 - Posted: 28 Feb 2022 | 11:00:43 UTC

The task I mentioned has now completed and reported - visible on the link I posted last night. The actual binary executable is still acemd3, dated 28 September 2021 - you can see the name in stderr.txt

So all that has changed is the 'friendly name' stored in the projects's database for that application ID.

The only other thing I noticed was that the big upload started incredibly slowly - averaging around 30 kilobyte/sec. But it must have speeded up to something closer to my raw upload link speed of 16 megabit/sec - the whole thing was finished in about half an hour. I can only ascribe it to roadworks somewhere on the route from UK to Spain. Probably more due to the current geopolitical situation, and not under the project's control.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 58391 - Posted: 28 Feb 2022 | 14:22:44 UTC

That's what I see from the client_state file:

<app>
<name>acemd3</name>
<user_friendly_name>Advanced molecular dynamics simulations for GPUs</user_friendly_name>
<non_cpu_intensive>0</non_cpu_intensive>
</app>

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 58392 - Posted: 28 Feb 2022 | 15:57:29 UTC

Then the next 2 WUs show up as 1.0 and:

<app>
<name>acemd4</name>
<user_friendly_name>Advanced molecular dynamics simulations for GPUs</user_friendly_name>
<non_cpu_intensive>0</non_cpu_intensive>
</app>

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,037,436,882
RAC: 1,314,112
Level
Trp
Scientific publications
watwatwat
Message 58393 - Posted: 28 Feb 2022 | 16:28:00 UTC - in response to Message 58388.
Last modified: 28 Feb 2022 | 16:33:03 UTC

That's interesting.
You can track the change on the "applications" page.

You'd think they'd have a link to the Apps page, but no.

The first 5 of those new acemd4 WUs failed within a few minutes.
Stderr output
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
07:45:46 (99083): wrapper (7.7.26016): starting
07:45:46 (99083): wrapper (7.7.26016): starting
07:45:46 (99083): wrapper: running /bin/tar (xf x86_64-pc-linux-gnu__cuda1121.tar.bz2)
07:52:57 (99083): /bin/tar exited; CPU time 424.196280
07:52:57 (99083): wrapper: running bin/python (pre_run.py)
File "/var/lib/boinc-client/slots/36/pre_run.py", line 1
<soft_link>../../projects/www.gpugrid.net/T1_3-RAIMIS_TEST-0-pre_run</soft_link>
^
SyntaxError: invalid syntax
07:52:58 (99083): bin/python exited; CPU time 0.137151
07:52:58 (99083): app exit status: 0x1
07:52:58 (99083): called boinc_finish(195)

</stderr_txt>
]]>

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58394 - Posted: 28 Feb 2022 | 19:06:25 UTC - in response to Message 58392.

Then the next 2 WUs show up as 1.0 and:
<app>
<name>acemd4</name>
<user_friendly_name>Advanced molecular dynamics simulations for GPUs</user_friendly_name>
<non_cpu_intensive>0</non_cpu_intensive>
</app>

Just had one of those run through to completion:

T5_5-RAIMIS_TEST-1-3-RND1908_0
I think that's the first I've seen from RAIMIS which both:
* Was explicitly designated as a GPU task (cuda 1121)
* Ran right through to validation

Congratulations!

It was a very quick test run - under 7 minutes - but all the moving parts seem to have been assembled into the right order. The actual binary (as listed in stderr_txt) is 'acemd', not acemd4: that might be worth tidying up in the future.

WR-HW95
Send message
Joined: 16 Dec 08
Posts: 7
Credit: 1,415,563,813
RAC: 228,145
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58610 - Posted: 7 Apr 2022 | 20:50:26 UTC

Ok.
After research and googling, I got this app work on my 3rd machine.
Reason tasks to fail seemed to be in vcruntime DLL´s.
Why that was? I have no idea since my other two Win10pro machines havent had this problem.

At least task I got now has been running 8mins ok instead usual 15-38s fail.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,388,072,716
RAC: 9,847,676
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58611 - Posted: 7 Apr 2022 | 22:30:36 UTC

I had 2 WUs running today. They both made it up to 66.666% complete, then they stayed there for a few hours doing nothing, There was no CPU nor GPU usage. So I aborted both of them. How long was I supposed to keep them "running" like that?


https://www.gpugrid.net/result.php?resultid=32880450

https://www.gpugrid.net/result.php?resultid=32880506


That enough of beta testing for a day or so.


Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58612 - Posted: 7 Apr 2022 | 23:22:31 UTC - in response to Message 58611.

You should have either exited BOINC and restarted or suspend/resume the tasks to get them moving again.

The Pythons tasks do checkpoint and can be resumed across different cards with no ill effect.

mrchips
Send message
Joined: 9 May 21
Posts: 9
Credit: 859,893,000
RAC: 432,288
Level
Glu
Scientific publications
wat
Message 58630 - Posted: 12 Apr 2022 | 19:34:23 UTC

ALL my tasks finish with

195 (0xc3) EXIT_CHILD_FAILED

WHY?
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 58631 - Posted: 12 Apr 2022 | 20:30:06 UTC - in response to Message 58630.

ALL my tasks finish with

195 (0xc3) EXIT_CHILD_FAILED

WHY?


you have several more specific errors.

"ACEMD failed:
Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)"

"ACEMD failed:
Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)"

"ACEMD failed:
Particle coordinate is nan"


it's possible a driver issue for the CUDA errors. use a program called DDU (Display Driver Uninstaller) to totally wipe out the drivers. then re-install fresh from the nvidia package. In my opinion, the 470-series driver are most stable for crunching. the newer drivers will get you the slightly different CUDA 11.21 app also.

"particle coordinate is nan" (nan= not a number) is usually an overclocking issue, or a bad WU.
____________

Profile [PUGLIA] kidkidkid3
Avatar
Send message
Joined: 23 Feb 11
Posts: 81
Credit: 954,353,044
RAC: 266,645
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58875 - Posted: 28 May 2022 | 12:41:29 UTC - in response to Message 58631.

Good afternoon everyone,
wu's have not been available in acemd3 environment for a long time.
While waiting to migrate to acemd4, can you tell me if there will be work for the Windows environment soon? Thanks in advance
____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 58877 - Posted: 29 May 2022 | 5:43:47 UTC - in response to Message 58875.

Good afternoon everyone,
wu's have not been available in acemd3 environment for a long time.
While waiting to migrate to acemd4, can you tell me if there will be work for the Windows environment soon? Thanks in advance

I would be surprised if you receive a reply :-(

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59080 - Posted: 8 Aug 2022 | 7:20:30 UTC - in response to Message 58877.

Good afternoon everyone,
wu's have not been available in acemd3 environment for a long time.
While waiting to migrate to acemd4, can you tell me if there will be work for the Windows environment soon? Thanks in advance

I would be surprised if you receive a reply :-(

so, did I promise too much?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59081 - Posted: 8 Aug 2022 | 17:11:21 UTC - in response to Message 59080.
Last modified: 8 Aug 2022 | 17:12:07 UTC

The preponderance of work lately has been 99:1 for the Python tasks.

I was very surprised to get a acemd3 a couple of days ago.

I haven't seen any acemd4 tasks since their initial beta run.

I have been doing nothing but Python tasks almost exclusively since Abouh opened the taps for them.

My gpus are constantly busy with Python tasks and haven't had a break in months.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59083 - Posted: 8 Aug 2022 | 17:44:15 UTC - in response to Message 59081.

I've had eight ACEMD 3 tasks since Friday - six 'ADRIA' (the long-running ones), and two 'CRYPTICSCOUT' (significantly shorter). One oddity is that the credit for 'ADRIA' tasks has been substantially reduced, but the credit for 'CRYPTICSCOUT' hasn't. Was that deliberate, I wonder?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59088 - Posted: 9 Aug 2022 | 15:25:05 UTC

Richard, maybe you can answer this puzzle.

I have reduced the resource share for GPUGrid on all my hosts.

I have observed no change in the frequency of Python tasks running. They run non-stop, one finishing and reporting and then downloading the next and run the next one with no interruption.

However the acemd3 task sat for two days before it finally started running.

I know the REC balancing mechanism came into play when I reduced the resource share among projects.

Does the REC mechanism somehow take account of the different APR ratings for separate applications?

I would have thought its lowest granularity would be at the simple project level.

But it seems to have been applied at the application level. The APR for the acemd3 tasks has been developed over many years along with the tally of credits for that application.

The python tasks however are relatively new and haven't produced as much credit so far compared to the total project credit for acemd3.

Was this the case for the REC mechanism? That the python tasks are still in need of balancing against the acemd3 credit history? And that is why the acemd3 task was not in need of immediate running compared to the python tasks?

Both types have the same 5 day deadlines. But a Python task is serviced immediately still and has not let my other project gpu applications a chance to run yet.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,794,611,851
RAC: 9,297,069
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59091 - Posted: 9 Aug 2022 | 19:26:19 UTC - in response to Message 59088.

The difference in behaviour will be down to the client scheduler (not a separate program - an integral part of the client code).

On the Linux machines, where I run Python, deadline will be exceeded by so much that the client actively throws other tasks off the card so that they start immediately.

On my Windows machines, where I run ACEMD 3, I think deadline pressure is much lower, so the client waits for a convenient moment to switch over (*). The trouble is: I'm running 2 x Einstein tasks when there's no ACEMD, and they don't necessarily finish at the same moment. So the client scheduler only sees 0.5 GPUs free, and (on my settings) an ACEMD task won't fit in half a card. So the client starts another Einstein half-task instead, and the cycle starts again. And can continue until the deadline pressure gets really serious.

The client knows about deadlines, and will honour them as best it can: but it doesn't know about 50% and 25% bonuses, so it takes no notice of them.

* unverified: I'll take a proper look tomorrow. I've been out all day.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59092 - Posted: 9 Aug 2022 | 22:29:56 UTC - in response to Message 59091.

OK, thanks for the comment. As usual, I overthought the problem.

It is simply a matter of the estimated time to completion in the rr_simulation and cpu scheduling code pushing the python tasks to the forefront because of their outlandish estimated runtimes.

The acemd3 having actual realistic values was not in any hurry to be started allowing my other gpu tasks a chance to run normally. I did lose out on any bonus points and was the only downside to the late running. It got less credit than the Python tasks. But as I mentioned, the acemd3 tasks lately have been a rarity here.

I have restricted the GPUGrid tasks to run only on the slower gpus to allow the most powerful gpus to run the Einstein and MW tasks where their production is most noticeable. If I need to I can NNT the GPUGrid work to get more production out of my hosts allowing all the gpus to be used for all my gpu projects.

Boca Raton Community HS
Send message
Joined: 27 Aug 21
Posts: 36
Credit: 2,942,356,809
RAC: 17,168,673
Level
Phe
Scientific publications
wat
Message 59094 - Posted: 11 Aug 2022 | 12:11:35 UTC

I am not sure if I am wording this question properly, but does acemd3 use single or double point precision? This is more just out of curiosity versus anything else. Also, is there a way to tell what a program/app is using by looking within the OS that is running it?

Even though this is an acemd3 thread, what about the python tasks?

Thanks for any insights.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59095 - Posted: 11 Aug 2022 | 17:22:49 UTC - in response to Message 59094.

Yes, the running tasks can be identified by their science applications in the running processes on a host.

For acemd3, logically it is acemd3 along with the BOINC wrapper application.

For Python on Gpu tasks, it is 32 python processes along with the BOINC wrapper application.

Depending on OS you can see the running processes in something called Task Manager or System Monitor or Process Explorer.

Boca Raton Community HS
Send message
Joined: 27 Aug 21
Posts: 36
Credit: 2,942,356,809
RAC: 17,168,673
Level
Phe
Scientific publications
wat
Message 59097 - Posted: 11 Aug 2022 | 19:51:28 UTC - in response to Message 59095.

Is there a way to tell if it is single or double point precision by looking/inspecting the process in the task manager (Windows)?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59098 - Posted: 11 Aug 2022 | 20:10:13 UTC

You would have to ask Toni whether the acemd3 application uses single or double precision.

All I've seen mentioned in this thread is that they use FP32 registers.

Without a specific answer by the developer or a look at the source code of the app we are just guessing.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 59100 - Posted: 11 Aug 2022 | 22:42:54 UTC

I'm going to guess that the vast majority is FP32 and INT32. I have not observed any correlation with FP64 across devices on GPUGRID tasks, so if any FP64 operations are being done, the percentage of compute time should be so small to be only marginal.
____________

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 12,875,793
RAC: 0
Level
Pro
Scientific publications
wat
Message 59448 - Posted: 13 Oct 2022 | 0:13:25 UTC

I have received 5 acemd tasks and they all failed.
And not only on my computer.
https://www.gpugrid.net/workunit.php?wuid=27319802
When i ran executable manually it said that licence is expired.

gemini8
Send message
Joined: 3 Jul 16
Posts: 31
Credit: 1,274,650,176
RAC: 2,732,392
Level
Met
Scientific publications
watwat
Message 59629 - Posted: 21 Dec 2022 | 14:03:19 UTC

I have several of those on two machines:

Stderr Ausgabe

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:52:08 (347837): wrapper (7.7.26016): starting
14:52:29 (347837): wrapper (7.7.26016): starting
14:52:29 (347837): wrapper: running bin/acemd3 (--boinc --device 0)
14:52:30 (347837): bin/acemd3 exited; CPU time 0.003638
14:52:30 (347837): app exit status: 0x1
14:52:30 (347837): called boinc_finish(195)

</stderr_txt>
]]>

Nice to have ACEMD back, but I'd consider this even nicer if the ACEMD's didn't crash. ;-)
____________
Greetings, Jens

Boca Raton Community HS
Send message
Joined: 27 Aug 21
Posts: 36
Credit: 2,942,356,809
RAC: 17,168,673
Level
Phe
Scientific publications
wat
Message 59630 - Posted: 21 Dec 2022 | 14:21:12 UTC

Looks like I received 37 work units overnight, all failed after about 10-30 seconds.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 59631 - Posted: 21 Dec 2022 | 15:39:37 UTC

same. all the acemd3 tasks failed without any informative error message. on Linux
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59632 - Posted: 21 Dec 2022 | 16:00:34 UTC
Last modified: 21 Dec 2022 | 16:03:48 UTC

3skh-ADRIA_KDeepMD_100ns_2489-0-1-RND9110_5

Stderr output
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
09:43:36 (18020): wrapper (7.9.26016): starting
09:43:36 (18020): wrapper: running bin/acemd3.exe (--boinc --device 0)
09:43:37 (18020): bin/acemd3.exe exited; CPU time 0.000000
09:43:37 (18020): app exit status: 0x1
09:43:37 (18020): called boinc_finish(195)

I have over 40 of these so far.

Host OS (win10) and drivers are up-to-date.

Are these ACEMD apps also failing under Linux?
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Boca Raton Community HS
Send message
Joined: 27 Aug 21
Posts: 36
Credit: 2,942,356,809
RAC: 17,168,673
Level
Phe
Scientific publications
wat
Message 59633 - Posted: 21 Dec 2022 | 16:05:57 UTC - in response to Message 59632.



Are these ACEMD apps also failing under Linux?



Yes, per Ian&Steve C.

bjstateson
Send message
Joined: 9 Sep 20
Posts: 1
Credit: 7,196,472
RAC: 0
Level
Ser
Scientific publications
wat
Message 59634 - Posted: 21 Dec 2022 | 16:09:19 UTC - in response to Message 57041.
Last modified: 21 Dec 2022 | 16:10:49 UTC

I just had 13 of the apps crash with computation error (195)
Running CUDA 12


5 12/20/2022 2:03:07 PM CUDA: NVIDIA GPU 0: NVIDIA GeForce GTX 1660 Ti (driver version 526.86, CUDA version 12.0, compute capability 7.5, 6144MB, 6144MB available, 5530 GFLOPS peak)
6 12/20/2022 2:03:07 PM OpenCL: NVIDIA GPU 0: NVIDIA GeForce GTX 1660 Ti (driver version 526.86, device version OpenCL 3.0 CUDA, 6144MB, 6144MB available, 5530 GFLOPS peak)

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59636 - Posted: 21 Dec 2022 | 18:07:34 UTC

Linux: Took almost 30 minutes to download but only seconds to error out


GPUGRID x86_64-pc-linux-gnu__cuda1121.zip.b4692e2ec3b7e128830af5c05a9f0037 98.225 1013587.50 K 00:28:10 587.44 KBps Downloading dual-linux
GPUGRID 2.19 ACEMD 3: molecular dynamics simulations for GPUs (cuda1121) 3sni-ADRIA_KDeepMD_100ns_3150-0-1-RND1427_7 00:00:24 (-) 0.00 100.000 - 12/26/2022 11:33:32 AM 0.993C + 1NV Computation error d
12/21/2022 9:59:43 AM CUDA: NVIDIA GPU 0: NVIDIA P102-100 (driver version 470.99, CUDA version 11.4, compute capability 6.1, 5060MB, 5060MB available, 10771 GFLOPS peak)
12/21/2022 9:59:43 AM OS: Linux Ubuntu: Ubuntu 20.04.5 LTS [5.4.0-135-generic|libc 2.31]

____________
try my performance program, the BoincTasks History Reader.
Find and read about it here: https://forum.efmer.com/index.php?topic=1355.0

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59645 - Posted: 23 Dec 2022 | 6:07:29 UTC

will there be more acemd3 tasks in the near future?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59647 - Posted: 23 Dec 2022 | 17:55:43 UTC - in response to Message 59645.

will there be more acemd3 tasks in the near future?

I would hope so. Would be nice to return to quick running acemd3 tasks that run only on the gpu.

Let's hope the developer can rework the parameters for these new tasks so that they don't fail instantly on everyone's hosts.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59649 - Posted: 24 Dec 2022 | 12:26:00 UTC - in response to Message 59647.

Let's hope the developer can rework the parameters for these new tasks so that they don't fail instantly on everyone's hosts.

+ 1

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59650 - Posted: 24 Dec 2022 | 22:34:02 UTC

Here's an alternate perspective, looking at the time it took to reach the error is probably a valid way to compare the speed of the various hosts that ran the same scripts. Assuming that the error is in the script, of course.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1288
Credit: 5,125,531,959
RAC: 9,537,868
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59651 - Posted: 25 Dec 2022 | 3:18:34 UTC

Well since the wrapper hasn't changed and the app hasn't changed, then the issue with the tasks is that the configuration of the task parameters is doing something rude.

Or unlikely but possible, the task parameters have uncovered a "edge case flaw" of the acemd3 application that hasn't been exposed up to this point.

I would put my money on the bet that simply the task generation configuration script is generating some values that are "out of bounds" in memory access.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59654 - Posted: 26 Dec 2022 | 7:02:11 UTC

I am surprised that the Server Status page still shows some 140 tasks "in process".
How come?

gemini8
Send message
Joined: 3 Jul 16
Posts: 31
Credit: 1,274,650,176
RAC: 2,732,392
Level
Met
Scientific publications
watwat
Message 59660 - Posted: 27 Dec 2022 | 9:06:44 UTC - in response to Message 59651.

I would put my money on the bet that simply the task generation configuration script is generating some values that are "out of bounds" in memory access.

This might well be the case, but I have a different shot at an explanation:
IIRC the certificates usually had to be renewed sometime in summer or early autumn, and I think it may be possible there's no working certificate laid down for ACEMD at all.
____________
Greetings, Jens

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59668 - Posted: 29 Dec 2022 | 7:41:33 UTC - in response to Message 59660.

[quote]... and I think it may be possible there's no working certificate laid down for ACEMD at all.

this has happened numerous times in the past :-(
So it would be no surprise if also now this is the reason for the problem.

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 86,638,150
RAC: 17,737
Level
Thr
Scientific publications
wat
Message 59717 - Posted: 12 Jan 2023 | 18:33:09 UTC

I managed to get one of the ACEMD3 tasks that came out and it ran fine on my old GTX 1060.
____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 37,003,757,483
RAC: 40,270,675
Level
Trp
Scientific publications
wat
Message 59718 - Posted: 12 Jan 2023 | 19:38:38 UTC

sad to see that they didnt update the CUDA version to support Ada
____________

Asghan
Send message
Joined: 30 Oct 19
Posts: 6
Credit: 405,900
RAC: 0
Level

Scientific publications
wat
Message 59729 - Posted: 17 Jan 2023 | 20:23:19 UTC

Yea :/ My 4090 is also waiting for work.

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 12,875,793
RAC: 0
Level
Pro
Scientific publications
wat
Message 59756 - Posted: 18 Jan 2023 | 21:52:28 UTC

Either new acemd tasks are smaller or new version is faster.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1091
Credit: 6,632,031,926
RAC: 4,945,760
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59779 - Posted: 20 Jan 2023 | 7:37:06 UTC - in response to Message 59756.

Either new acemd tasks are smaller or new version is faster.

my experience from the past few days is that they differ in size.
Also the credit points earned differ accordingly.

Post to thread

Message boards : News : Update acemd3 app

//