Advanced search

Message boards : News : New D3RBanditTest workunits

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 56504 - Posted: 15 Feb 2021 | 9:09:13 UTC

Dears,

as you may have noticed, we sent a new batch of WUs for a new experiment. This time the WUs are rather large and require relatively new cards. For reference, should be ~18h on a 1080 Ti.

Thanks!

T

johnnymc
Avatar
Send message
Joined: 7 Apr 11
Posts: 6
Credit: 92,079,090
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56505 - Posted: 15 Feb 2021 | 9:43:45 UTC
Last modified: 15 Feb 2021 | 9:44:28 UTC

For (quick/rough) reference...

My 2080
32522050 27029714 574925 13 Feb 2021 | 18:06:08 UTC 15 Feb 2021 | 09:28:47 UTC Completed and validated 49,391.24 48,091.83 435,937.50 New version of ACEMD v2.11 (cuda101)
32522013 27029677 574925 13 Feb 2021 | 18:06:08 UTC 14 Feb 2021 | 14:15:47 UTC Completed and validated 50,192.08 48,998.63 523,125.00 New version of ACEMD v2.11 (cuda101)
My 1080
32521736 27029492 233256 13 Feb 2021 | 13:53:06 UTC 14 Feb 2021 | 13:35:25 UTC Completed and validated 84,623.21 84,006.91 523,125.00 New version of ACEMD v2.11 (cuda101)

____________
Life's short; make fun of it!

Ohad
Send message
Joined: 27 Nov 09
Posts: 3
Credit: 59,408,345
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 56507 - Posted: 15 Feb 2021 | 10:46:28 UTC - in response to Message 56504.

It would be nice if you at least gave a longer Deadline.
I only got a GTX1070 and many other got ever weaker GPUs.

TIA

nicomanso
Send message
Joined: 31 Aug 20
Posts: 1
Credit: 1,844,462
RAC: 0
Level
Ala
Scientific publications
wat
Message 56508 - Posted: 15 Feb 2021 | 11:46:34 UTC

Yup, my old GT730 is out, take too much time. Unless the deadline have more days I stop accepting WU.

Good luck for the rest and happy crunching :)

baderman
Send message
Joined: 15 Jul 09
Posts: 1
Credit: 7,817,124
RAC: 0
Level
Ser
Scientific publications
wat
Message 56509 - Posted: 15 Feb 2021 | 12:17:29 UTC - in response to Message 56504.

Hi All!

This WU took already 7 hours and total time is estimated for 38 hours on Mobile 2070 RTX. Newest drivers, i7 9th. Is it proper estimate? If so, I won't make it into the deadline :(.

Regards!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56510 - Posted: 15 Feb 2021 | 12:40:23 UTC - in response to Message 56509.
Last modified: 15 Feb 2021 | 12:42:10 UTC

This WU took already 7 hours and total time is estimated for 38 hours on Mobile 2070 RTX. Newest drivers, i7 9th.
Is it proper estimate?
Yes. These workunits are very long.

If so, I won't make it into the deadline
Set a low cache (0+0 days, or 0.01+0 days). You should abort them. (at least the one which didn't start yet)
If you want to meet the 5 days deadline, you should let your laptop crunch for 24/7.
If you don't want that, then you should set GPUGrid to "no new tasks" for a while (until this batch is over).

bormolino
Send message
Joined: 16 May 13
Posts: 41
Credit: 79,726,864
RAC: 4,842
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 56512 - Posted: 15 Feb 2021 | 13:49:12 UTC

My GTX 780 did it in ~ 50h.

No chance for my GT 710 which runs 24/7 for boinc in my server.

Eternum
Send message
Joined: 24 Mar 20
Posts: 1
Credit: 8,671,465
RAC: 0
Level
Ser
Scientific publications
wat
Message 56513 - Posted: 15 Feb 2021 | 14:55:21 UTC

I received a new workunit on Feb 10. Deadline is today. Stats show 32% readiness after 22.5 hours of GPU use. 48 more hours of GPU time is required to finish. Not practical and not doable for me. With such unit size and deadline combination I can certainly find better use for my GPU resources somewhere else.

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56514 - Posted: 15 Feb 2021 | 15:25:02 UTC

I completed two running concurrently same machine on two GTX1060's in about 36.4 - 36.5 hours. Running two more about 30% completed in about 10 hours so far. Not too bad for lower end cards.

jimvt
Send message
Joined: 26 Apr 20
Posts: 3
Credit: 1,219,253
RAC: 0
Level
Ala
Scientific publications
wat
Message 56515 - Posted: 15 Feb 2021 | 15:53:18 UTC - in response to Message 56504.

Sadly the deadline on the tasks is less than the amount of time it will take to complete it.

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56517 - Posted: 15 Feb 2021 | 15:56:23 UTC

It should take about 50 hours on my GTX 1650.
Tullio
____________

Wilgard
Send message
Joined: 4 Mar 20
Posts: 14
Credit: 3,127,716
RAC: 0
Level
Ala
Scientific publications
wat
Message 56518 - Posted: 15 Feb 2021 | 15:58:24 UTC
Last modified: 15 Feb 2021 | 16:20:13 UTC

Hello,

I have the same problem.
How could we be on time ?

https://ibb.co/82f8Sn0

Please don't get into the game of planned obsolete by forcing users to change their hardware.
You will inevitably gain so many time if we are more to calculate your work units.

Please JUST give us more time to complet WUs.

Wilgard

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56519 - Posted: 15 Feb 2021 | 16:35:37 UTC

Funnily, it seems to be much faster on this PC with the GTX 1060 than on the other PC with the GTX 1650. They have different CPUs, this one an AMD Ryzen 5 1400 and the other an Intel i5 9400F. Does this make the difference?
Tullio
____________

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56520 - Posted: 15 Feb 2021 | 17:16:24 UTC

Sorry, after a fast start to 10% the task went back to 0.66%. It seemed too good to be true.
Tullio
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56521 - Posted: 15 Feb 2021 | 17:50:45 UTC - in response to Message 56520.

after a fast start to 10% the task went back to 0.66%.

That seems to be the normal behavior for this current batch of ADRIA WUs.
In BOINC Manager, when the task has reached stability at its low progress rate, if you click over the task and then in "Properties", the GTX 1650 graphics card should indicate some value around 2.160% per hour.
You can do the same with the GTX 1060 to compare values.
The higher the progress rate, the lower will be the completion time.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56522 - Posted: 15 Feb 2021 | 18:26:40 UTC - in response to Message 56514.
Last modified: 15 Feb 2021 | 18:27:14 UTC

I completed two running concurrently same machine on two GTX1060's in about 36.4 - 36.5 hours. Running two more about 30% completed in about 10 hours so far. Not too bad for lower end cards.


My GTX 1060 3GB cards take ~40 hrs.
My GTX 1650s need about 45 hrs.

My GTX 750ti should have finished in 100 hrs, but I found it stalled (not drawing any power, even though it showed 99% usage) and after I paused and restarted it it never got back up to 1%/hr. It needs 1 hour more than the 120 hr window provides. I'll try again with it overclocked gently.

Please may we have a 144hr window of expiration so everybody gets a chance to crunch.

For now there is work for older cards and AMD GPUs at FAH/COVID moonshot.
https://foldingathome.org/

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56523 - Posted: 15 Feb 2021 | 19:59:29 UTC

The progress on GTX 1650 is 2,160%/hour ; for GTX 1060 is 2,160%/hour. This seems to agree with my observations.The 1060 has only 3 GB of video memory, which was not sufficient for same Einstein@home gravitational waves task. The 1650 has 4 GB.These tasks use only 667 MB memory.
Tullio
____________

Wilgard
Send message
Joined: 4 Mar 20
Posts: 14
Credit: 3,127,716
RAC: 0
Level
Ala
Scientific publications
wat
Message 56524 - Posted: 15 Feb 2021 | 20:52:58 UTC - in response to Message 56522.

For now there is work for older cards and AMD GPUs at FAH/COVID moonshot.
https://foldingathome.org/


But it is not BOINC, that is the main issue in my case

hawks
Send message
Joined: 31 Oct 10
Posts: 1
Credit: 4,557
RAC: 0
Level

Scientific publications
wat
Message 56526 - Posted: 15 Feb 2021 | 22:16:22 UTC - in response to Message 56504.

What is a "18h on a 1080 Ti".

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56528 - Posted: 15 Feb 2021 | 22:21:37 UTC - in response to Message 56526.

Typical D3RBanditTest workunits will take 18 hours or 65,000 seconds to calculate and return a result on a Nvidia GTX 1080 Ti video card.

Wailing Angus Beef
Send message
Joined: 6 Jul 14
Posts: 4
Credit: 477,933,097
RAC: 10,183,123
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 56529 - Posted: 15 Feb 2021 | 22:23:48 UTC

Is there a minimum driver version or CUDA version required?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56530 - Posted: 15 Feb 2021 | 22:31:49 UTC - in response to Message 56529.
Last modified: 15 Feb 2021 | 22:32:30 UTC

Is there a minimum driver version or CUDA version required?


yes, all the new ACEMD tasks here are CUDA 10.0 on Linux, and CUDA 10.1 on Windows. so you need the appropriate drivers for that CUDA version.

Linux, CUDA 10.0 - >=410.48
Windows, CUDA 10.1 - >=418.96
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56531 - Posted: 15 Feb 2021 | 22:40:28 UTC
Last modified: 15 Feb 2021 | 23:28:25 UTC

My GTX 750ti appears to have been disqualified by the server. Despite over 1K WUs waiting to be sent my log says "Scheduler request completed: Got 0 new tasks" and "No tasks sent".

The previous WU was completed in 408,214 seconds, with just 23,786 seconds to spare. Is that why?

Edit;
I see the server gave it a grace period of ~35 min to finish. That might explain it.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56532 - Posted: 15 Feb 2021 | 23:32:50 UTC - in response to Message 56531.
Last modified: 15 Feb 2021 | 23:57:59 UTC

My GTX 750ti appears to have been disqualified by the server. Despite over 1K WUs waiting to be sent my log says "Scheduler request completed: Got 0 new tasks" and "No tasks sent".

The previous WU was completed in 408,214 seconds, with just 23,786 seconds to spare. Is that why?
It's not just your GTX 750Ti.
My GTX 1080Ti/Linux didn't receive work:
2021. febr. 16., Tuesday, 00:19:34 CET | GPUGRID | checking NVIDIA GPU 2021. febr. 16., Tuesday, 00:19:34 CET | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 25920.00 2021. febr. 16., Tuesday, 00:19:34 CET | GPUGRID | NVIDIA GPU set_request: 25920.000000 2021. febr. 16., Tuesday, 00:19:34 CET | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (25920.00 sec, 1.00 inst) 2021. febr. 16., Tuesday, 00:19:34 CET | GPUGRID | Sending scheduler request: To fetch work. 2021. febr. 16., Tuesday, 00:19:34 CET | GPUGRID | Requesting new tasks for NVIDIA GPU 2021. febr. 16., Tuesday, 00:19:35 CET | GPUGRID | work fetch suspended by user 2021. febr. 16., Tuesday, 00:19:36 CET | GPUGRID | Scheduler request completed: got 0 new tasks 2021. febr. 16., Tuesday, 00:19:36 CET | GPUGRID | No tasks sent 2021. febr. 16., Tuesday, 00:19:36 CET | GPUGRID | No tasks are available for New version of ACEMD 2021. febr. 16., Tuesday, 00:19:36 CET | GPUGRID | Project requested delay of 31 seconds

also my RTX 2080Ti/Windows didn't receive work:
2021. 02. 16. 0:23:06 | GPUGRID | checking NVIDIA GPU 2021. 02. 16. 0:23:06 | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 1.00 nidle_now 0.00 fetch share 1.00 req_inst 0.00 req_secs 23728.26 2021. 02. 16. 0:23:06 | GPUGRID | NVIDIA GPU set_request: 23728.255416 2021. 02. 16. 0:23:06 | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (23728.26 sec, 0.00 inst) 2021. 02. 16. 0:23:06 | GPUGRID | Sending scheduler request: To fetch work. 2021. 02. 16. 0:23:06 | GPUGRID | Requesting new tasks for NVIDIA GPU 2021. 02. 16. 0:23:08 | GPUGRID | Scheduler request completed: got 0 new tasks 2021. 02. 16. 0:23:08 | GPUGRID | No tasks sent 2021. 02. 16. 0:23:08 | GPUGRID | No tasks are available for New version of ACEMD 2021. 02. 16. 0:23:08 | GPUGRID | Project requested delay of 31 seconds

Something broke in the scheduler, as the tasks in progress is decreased by about 1400.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56533 - Posted: 16 Feb 2021 | 0:06:05 UTC

I've managed my other host to get a new task by updating manually a couple of times, but the others still didn't get one.
It looks like the scheduler thinks that the majority of the unsent tasks aren't for the "new version of ACEMD" app despite they are shown next to that label.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56534 - Posted: 16 Feb 2021 | 0:48:01 UTC - in response to Message 56533.

Thanks Zoltan, I gave up for now and switched that GPU to FAH for now.
It seems to be doing more FLOPS/hr when running FAHcore CUDA vs ACEMD, but that may be just the difference in scoring procedures.

peter braun
Send message
Joined: 4 Jun 20
Posts: 1
Credit: 1,954,798
RAC: 0
Level
Ala
Scientific publications
wat
Message 56535 - Posted: 16 Feb 2021 | 1:47:11 UTC

I have a 1660 super that takes about 34-38 hours on these new units, still seeing temps in the 60s, only uses 15% of gpu in task manager tho

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56536 - Posted: 16 Feb 2021 | 3:07:06 UTC - in response to Message 56533.

I've managed my other host to get a new task by updating manually a couple of times, but the others still didn't get one.
It looks like the scheduler thinks that the majority of the unsent tasks aren't for the "new version of ACEMD" app despite they are shown next to that label.


I haven't seen my systems have any issue with getting new tasks.

but I do wonder what's going on with the massive shift of tasks from out in the field to waiting to be sent. are they erroring en masse somehow? today is about 5 days since these new tasks started showing up, so perhaps that's why. thousands of tasks hitting their deadlines from systems not fast enough to process, or systems that are fast enough if they run 24/7, but aren't processing 24/7, or systems that downloaded but were shut off for the past 5 days. or some combination of all 3.
____________

mac
Send message
Joined: 15 Mar 20
Posts: 1
Credit: 13,297,375
RAC: 0
Level
Pro
Scientific publications
wat
Message 56537 - Posted: 16 Feb 2021 | 5:22:48 UTC

我发现我只接收到一个任务,完成后不再有任务,我是GTX1650,这正常吗

d_a_dempsey
Send message
Joined: 18 Dec 09
Posts: 6
Credit: 969,687,328
RAC: 81,899
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56539 - Posted: 16 Feb 2021 | 5:56:57 UTC

I have a dual GPU system, GTX 980 and GTX 1080 TI, and all of these work units have failed. Drivers are current as of Decemeber. I had to roll back January update as it didn't play nice with Milkyway@Home while you folks were on Holiday. Suddenly I can't complete a work unit without error.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56540 - Posted: 16 Feb 2021 | 6:43:45 UTC

I'm not seeing many resends, mostly _0 and _1 original tasks.

No issues getting work or returning valid results.

lukeu
Send message
Joined: 14 Oct 11
Posts: 31
Credit: 75,720,504
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56544 - Posted: 16 Feb 2021 | 7:55:09 UTC

Does the scheduler know the correct size estimate for these WUs?

A WU on my GTX 1060-6GB should take ~34 hours (~ 4 calendar days) yet it keeps sending me 2. My queue's set to 0.4 days of work, and I believe it should know that my computer is only 30% active, so I would expect it to only send 1 WU.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56545 - Posted: 16 Feb 2021 | 8:13:58 UTC - in response to Message 56536.

but I do wonder what's going on with the massive shift of tasks from out in the field to waiting to be sent. are they erroring en masse somehow? today is about 5 days since these new tasks started showing up, so perhaps that's why. thousands of tasks hitting their deadlines from systems not fast enough to process, or systems that are fast enough if they run 24/7, but aren't processing 24/7, or systems that downloaded but were shut off for the past 5 days. or some combination of all 3.

In fact, giving a certain time offset before overdue tasks are resent, would effectively act as extending the deadline by this offset for them to be reported by slower GPUs
One example: WU #27023500 has been reported by a GTX 750 Ti at one of my hosts in 446,845.54 seconds, more than 4 hours after deadline.
This task has been rewarded with 348,750.00 credits, and hasn't been resent to any other host, with "Didn't need" legend.
May be the project managers are hiddenly attending this way the request of many Gpugrid users in this regard (?)

Ryan Munro
Send message
Joined: 6 Mar 18
Posts: 32
Credit: 359,000,077
RAC: 3,026,526
Level
Asp
Scientific publications
wat
Message 56546 - Posted: 16 Feb 2021 | 10:29:49 UTC

Do these work with Ampere?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56547 - Posted: 16 Feb 2021 | 10:45:35 UTC - in response to Message 56546.

Do these work with Ampere?
No.

Ryan Munro
Send message
Joined: 6 Mar 18
Posts: 32
Credit: 359,000,077
RAC: 3,026,526
Level
Asp
Scientific publications
wat
Message 56548 - Posted: 16 Feb 2021 | 10:48:49 UTC

Ah damn ill keep waiting then :)

Philip C Swift [Gridcoin]
Send message
Joined: 23 Dec 18
Posts: 12
Credit: 50,868,500
RAC: 0
Level
Thr
Scientific publications
wat
Message 56549 - Posted: 16 Feb 2021 | 11:04:37 UTC - in response to Message 56504.

Re: New WU's and tuning GPU's
Interested in OverClocking to reduce WU duration to hit target 'due date/times'?
Even if your GPU is locked down you can improve by using a curve with Frequency and Voltage of your GPU auto managed.
Message me if you have questions or need help.

I'm crunching the new WU's with RTX2080 mobile with MSI Afterburner and MSI Kombustor linked (that auto overclocks the GPU with a good curve).
I am getting 2 days and 2 hours for e20s2_e11s14p0f75-ADRIA_D3RBandit_batch1-0-1-RND2090. 1.08% per hour = 92.59 hours or 3.58 days.


0.998 CPUs + 1 NVIDIA GPU
Estimated computation size
5,000,000 GFLOPs
CPU time
00:49:23
CPU time since checkpoint
00:08:14
Elapsed time
00:49:40
Estimated time remaining
2d 01:07:38
Fraction done
1.333%
Virtual memory size
668.21 MB
Working set size
343.72 MB
Progress rate
1.080% per hour
Executable
wrapper_6.1_windows_x86_64.exe

Philip C Swift [Gridcoin]
Send message
Joined: 23 Dec 18
Posts: 12
Credit: 50,868,500
RAC: 0
Level
Thr
Scientific publications
wat
Message 56550 - Posted: 16 Feb 2021 | 11:06:21 UTC - in response to Message 56504.

Re: New WU's and tuning GPU's
Interested in OverClocking to reduce WU duration to hit target 'due date/times'?
Even if your GPU is locked down you can improve by using a curve with Frequency and Voltage of your GPU auto managed.
Message me if you have questions or need help.

I'm crunching the new WU's with RTX2080 mobile with MSI Afterburner and MSI Kombustor linked (that auto overclocks the GPU with a good curve).
I am getting 2 days and 2 hours for e20s2_e11s14p0f75-ADRIA_D3RBandit_batch1-0-1-RND2090. 1.08% per hour = 92.59 hours or 3.58 days.


0.998 CPUs + 1 NVIDIA GPU
Estimated computation size
5,000,000 GFLOPs
CPU time
00:49:23
CPU time since checkpoint
00:08:14
Elapsed time
00:49:40
Estimated time remaining
2d 01:07:38
Fraction done
1.333%
Virtual memory size
668.21 MB
Working set size
343.72 MB
Progress rate
1.080% per hour
Executable
wrapper_6.1_windows_x86_64.exe

____________

Philip C Swift [Gridcoin]
Send message
Joined: 23 Dec 18
Posts: 12
Credit: 50,868,500
RAC: 0
Level
Thr
Scientific publications
wat
Message 56551 - Posted: 16 Feb 2021 | 11:09:24 UTC - in response to Message 56504.

Dears,

as you may have noticed, we sent a new batch of WUs for a new experiment. This time the WUs are rather large and require relatively new cards. For reference, should be ~18h on a 1080 Ti.

Thanks!

T


Re: New WU's and tuning GPU's
Interested in OverClocking to reduce WU duration to hit target 'due date/times'?
Even if your GPU is locked down you can improve by using a curve with Frequency and Voltage of your GPU auto managed.
Message me if you have questions or need help.

I'm crunching the new WU's with RTX2080 mobile with MSI Afterburner and MSI Kombustor linked (that auto overclocks the GPU with a good curve).
I am getting 2 days and 2 hours for e20s2_e11s14p0f75-ADRIA_D3RBandit_batch1-0-1-RND2090. 1.08% per hour = 92.59 hours or 3.58 days.

If you get despondent or cannot meet deadlines but want GPU points, check out Moo Wrapper. (28 minutes per WU on an RTX 2080 mobile)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56553 - Posted: 16 Feb 2021 | 12:18:53 UTC - in response to Message 56551.

Re: New WU's and tuning GPU's
Interested in OverClocking to reduce WU duration to hit target 'due date/times'?
Even if your GPU is locked down you can improve by using a curve with Frequency and Voltage of your GPU auto managed.
If your GPU is crunching a single workunit for days, it's not worth the risk of a computing error caused by overclocking.
You'll receive 0 credits for a failed workunit (after many hours, even days of crunhing it's very frustrating).
Therefore I do not recommend overclocking and especially overvolting a GPU, especially a mobile GPU.
GPUGrid workunits are very power hungry compared to games or other projects (except for FAH).
The cooling of an average GPU is made for general use, not for crunching 24/7.
Laptops with mobile GPUs can't have that big coolers as discrete GPUs have in desktop PCs.
If you have a GPU with decent cooling, then it's usually overclocked by the factory. In this case you don't have to overclock it more.

Power dissipation is a product of two key factors:
· It's in direct ratio with GPU frequency.
· It's in direct ratio with GPU voltage squared.
Say you raise the frequency and the voltage by 10% (it's a bit of an exaggeration, as you can't raise the GPU voltage by 10%).
In this case the power dissipation of your GPU is raised by 33.1% (1.1 by the frequency, and 1.1*1.1=1.21 by the voltage, 1.1*1.21=1.331).
Luckily you can't raise your GPU's power consumption above it's limits set by the factory.
You can check these limits from an administrative command prompt by
nvidia-smi -q -d power
Raising the GPU's power dissipation raises its temperature, as its cooling stays the same, while it should be better to achieve the same temperatures (and life expentancy). Usually you improve the cooling of your GPU only by raising the RPM of it's fans, which could be very annoying (especially if it's a laptop), it also reduces the lifespan of the fans.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56554 - Posted: 16 Feb 2021 | 13:21:02 UTC - in response to Message 56540.

I'm not seeing many resends, mostly _0 and _1 original tasks.

No issues getting work or returning valid results.


There’s no quorum required here. So _1 is a resend.
____________

YamFan
Send message
Joined: 11 Feb 16
Posts: 2
Credit: 35,442,043
RAC: 142,824
Level
Val
Scientific publications
wat
Message 56557 - Posted: 16 Feb 2021 | 14:24:00 UTC

WU:
New version of ACEMD 2.11 (cuda101)
Name
e16s23_e1s182p0f136-ADRIA_D3RBandit_batch0-0-1-RND8763

Currently 4days;6hours. Approx. 6hours left (just in time for the deadline, I hope). GPU 750Ti, running continuously, although I have been using the computer for other things too, at times.

So, yes, current work units are long. Much longer than "normal".
This post is just for reference.

YamFan
Send message
Joined: 11 Feb 16
Posts: 2
Credit: 35,442,043
RAC: 142,824
Level
Val
Scientific publications
wat
Message 56558 - Posted: 16 Feb 2021 | 14:26:02 UTC - in response to Message 56553.

I agree. Overclocking GPU is much more likely to cause WU failure.

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56560 - Posted: 16 Feb 2021 | 15:15:04 UTC
Last modified: 16 Feb 2021 | 15:15:35 UTC

Task 27021821 was canceled by server. Why? It was waiting to run.
Tullio
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56561 - Posted: 16 Feb 2021 | 15:18:30 UTC - in response to Message 56560.

Task 27021821 was canceled by server. Why? It was waiting to run.
Tullio


http://www.gpugrid.net/workunit.php?wuid=27021821

because it was no longer needed. the original host returned the task after the deadline, but before you had processed it. since a quorum is not required, they only need one result. allowing you to process something which already has a result is just a waste of time.
____________

Bill
Send message
Joined: 28 Oct 20
Posts: 1
Credit: 68,352,697
RAC: 107,445
Level
Thr
Scientific publications
wat
Message 56562 - Posted: 16 Feb 2021 | 15:31:46 UTC - in response to Message 56535.
Last modified: 16 Feb 2021 | 15:32:12 UTC

I have a 1660 super that takes about 34-38 hours on these new units, still seeing temps in the 60s, only uses 15% of gpu in task manager tho

Interesting. My 1660 Ti is completing these tasks in about 25 hours. I would not have thought that there would be that much of a lead in crunching time.

You have a 2600 as well, I'm running a 2200G, so my theory of a slower CPU doesn't seem to hold here.

EDIT: Oh hey, this is my first post here. Hi everyone!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56564 - Posted: 16 Feb 2021 | 16:03:37 UTC - in response to Message 56562.

the 1660ti is still faster than a 1660S. you're both on Windows so your times should be a little more comparable to my 1660S Linux times (27hrs).

his first task actually completed in about 29hrs. which seems right. the second task took 40hrs, but I can only speculate the reason. maybe he was doing other things on the system to slow down processing.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56566 - Posted: 16 Feb 2021 | 16:11:20 UTC

Is it overclocking if you're underpowering???
This script runs 2080 Ti WUs in about 15 hours at 36% less power. First run this command:

sudo nvidia-xconfig --enable-all-gpus --cool-bits=28 --allow-empty-initial-configuration
Then execute this script:
#!/bin/bash
/usr/bin/nvidia-smi -pm 1
/usr/bin/nvidia-smi -acp UNRESTRICTED
/usr/bin/nvidia-smi -i 0 -pl 160
/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" #0=Adaptive, 1=Prefer Maximum Performance , 2=Auto
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=400" -a "[gpu:0]/GPUGraphicsClockOffset[3]=100"

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56568 - Posted: 16 Feb 2021 | 16:26:54 UTC - in response to Message 56566.

160W seems too low for a 2080ti IMO. you're really restricting performance at that point.

for reference,

my RTX 2070's run in about 17hr @ 150W (not far away from the performance of your 2080ti, but 2070 is much less expensive)

my RTX 2080ti's run in about 10hr @ 225W, 50% faster for 40% more power.


____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56569 - Posted: 16 Feb 2021 | 16:37:57 UTC - in response to Message 56545.

One example: WU #27023500 has been reported by a GTX 750 Ti at one of my hosts in 446,845.54 seconds, more than 4 hours after deadline.
This task has been rewarded with 348,750.00 credits, and hasn't been resent to any other host, with "Didn't need" legend.
May be the project managers are hiddenly attending this way the request of many Gpugrid users in this regard (?)


This is an interesting opportunity to study how the server handles tardy but completed WUs. My GTX 750ti was also allowed to run past the deadline and received equal credit to yours. https://www.gpugrid.net/workunit.php?wuid=27023657

Could it be that the server can detect a WU being actively crunched and stop cancellation? Or is this a normal delay function we see?

It had already created another task but had not sent it yet, and marked it as "didn't need" when my host reported.
Run time was 408,214.47 sec. CPU time was 406,250.30 sec
GPU clock was 1366MHz
Mem clock was 2833MHz
With my homespun fan intake mod it ran at 55C with 22C ambient room temp.

Just can't kill it.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56570 - Posted: 16 Feb 2021 | 16:39:16 UTC - in response to Message 56568.

160W seems too low for a 2080ti
My circuit breakers are perfectly optimized. Not to mention heat management. I'm going to have to sit this race out.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56571 - Posted: 16 Feb 2021 | 16:51:35 UTC - in response to Message 56569.

One example: WU #27023500 has been reported by a GTX 750 Ti at one of my hosts in 446,845.54 seconds, more than 4 hours after deadline.
This task has been rewarded with 348,750.00 credits, and hasn't been resent to any other host, with "Didn't need" legend.
May be the project managers are hiddenly attending this way the request of many Gpugrid users in this regard (?)


This is an interesting opportunity to study how the server handles tardy but completed WUs. My GTX 750ti was also allowed to run past the deadline and received equal credit to yours. https://www.gpugrid.net/workunit.php?wuid=27023657

Could it be that the server can detect a WU being actively crunched and stop cancellation? Or is this a normal delay function we see?

It had already created another task but had not sent it yet, and marked it as "didn't need" when my host reported.
Run time was 408,214.47 sec. CPU time was 406,250.30 sec
GPU clock was 1366MHz
Mem clock was 2833MHz
With my homespun fan intake mod it ran at 55C with 22C ambient room temp.

Just can't kill it.


I think it's probably just first come first serve. so whoever returns it first "wins" and the second other WU gets the axe. another user reported that his task was cancelled off of his host (but he hadn't started crunching yet). Unsure what would happen to a task that was axed while crunching had already started.

____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56572 - Posted: 16 Feb 2021 | 16:54:16 UTC - in response to Message 56570.
Last modified: 16 Feb 2021 | 17:01:15 UTC

160W seems too low for a 2080ti
My circuit breakers are perfectly optimized. Not to mention heat management. I'm going to have to sit this race out.


I understand. I actually have my 6x 2080ti host split between 2 breakers to avoid overloading a single 20A one (there are other systems on one of the circuits). I'm just saying that if you're going to restrict it that far, you might be better off with a cheaper card to begin with and save some money.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56575 - Posted: 16 Feb 2021 | 17:26:46 UTC - in response to Message 56534.
Last modified: 16 Feb 2021 | 17:38:52 UTC

Thanks Zoltan, I gave up for now and switched that GPU to FAH for now.
It seems to be doing more FLOPS/hr when running FAHcore CUDA vs ACEMD, but that may be just the difference in scoring procedures.


I have received another WU for the GTX 750ti. The send glitch has been remediated.

I'm the 5th user to receive this WU iteration.
https://www.gpugrid.net/workunit.php?wuid=27024256

It is a batch_1 (the 2nd batch) and the 0_1 refers to it being number 0 of 1, hence it's the only iteration that will exist of this WU.

[edit]
It just dawned on me that these WUs might be an experiment in having the same host perform all the generations of the model simulation consecutively instead of on different hosts. Or am I way off?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56576 - Posted: 16 Feb 2021 | 17:31:28 UTC - in response to Message 56554.

I'm not seeing many resends, mostly _0 and _1 original tasks.

No issues getting work or returning valid results.


There’s no quorum required here. So _1 is a resend.

Uhh, Duh . . forgot where I'm at.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56579 - Posted: 16 Feb 2021 | 17:57:03 UTC - in response to Message 56571.

another user reported that his task was cancelled off of his host (but he hadn't started crunching yet). Unsure what would happen to a task that was axed while crunching had already started.

When a resent task is started to process and then reported by the overdue host, this resent task will be let run to the end, and three scenes may produce:
- 1) Deadline for the resent task is reached at this second host. Then, even if it is completed afterwards, it won't receive any credits, because there is already a previous valid result for it. It will be labeled by server as "Completed, too late to validate".
- 2) What I call a "credit paradox" will happen when this resent task is finished in time for full bonus or mid bonus at the second host. It will receive anyway the standard credit amount without any bonus, to match the same credit amount that has already received the overdue task.
- 3) When the resent task is finished past 48 hours but before its deadline, it also will receive the standard (no bonus) credit amount.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56581 - Posted: 16 Feb 2021 | 18:37:28 UTC - in response to Message 56572.

better off with a cheaper card to begin with and save some money.

I guess you haven't been shopping for a GPU lately. Prices are really high. Perfect time to sell cards not buy them.

Best thing I did was switch computers over to 240 V 20 A circuits. The most frequent problem I used to have was the A-phase getting out of balance with the B-phase and tripping one leg of the main breaker. With 240 V both phases are exactly balanced and PSUs run 6-8% more efficiently.

Now if I could just find a diagnostic tool to tell me when a PSU is on its last leg. E.g, this Inline PSU Tester for $590 is pricey but looks like it's the most comprehensive I've found: https://www.passmark.com/products/inline-psu-tester/index.php
If anyone knows of other brands please share.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56583 - Posted: 16 Feb 2021 | 19:14:55 UTC - in response to Message 56581.
Last modified: 16 Feb 2021 | 19:16:11 UTC

better off with a cheaper card to begin with and save some money.

I guess you haven't been shopping for a GPU lately. Prices are really high. Perfect time to sell cards not buy them.

Best thing I did was switch computers over to 240 V 20 A circuits. The most frequent problem I used to have was the A-phase getting out of balance with the B-phase and tripping one leg of the main breaker. With 240 V both phases are exactly balanced and PSUs run 6-8% more efficiently.

Now if I could just find a diagnostic tool to tell me when a PSU is on its last leg. E.g, this Inline PSU Tester for $590 is pricey but looks like it's the most comprehensive I've found: https://www.passmark.com/products/inline-psu-tester/index.php
If anyone knows of other brands please share.


I'm aware of the situation. but if you sell a 2080ti, and re-buy a 2070. you're still left with more money, no? even at higher prices, everything just shifts up because of the market. You're restricting the 2080ti so much that it performs similarly to a 2070S, so why have the 2080ti? that was my only point.

I agree about 240V, and I normally run my systems in a remote location on 240V, but due to renovations, I have one system temporarily at my house. but if you're on 240V, why restrict it so much?

I use the voltage telemetry (via IPMI) to identify when a PSU might be failing.
____________

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 56587 - Posted: 16 Feb 2021 | 23:07:56 UTC
Last modified: 16 Feb 2021 | 23:10:51 UTC

Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal?

Weird, I just suspended/unsuspended, it jumped back to the latest checkpoint and immeadiately the GPU util spiked back to normal levels. I'll see if the same issue comes up again between now and the next checkpoint.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56589 - Posted: 17 Feb 2021 | 0:17:31 UTC - in response to Message 56587.

Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal?

Weird, I just suspended/unsuspended, it jumped back to the latest checkpoint and immeadiately the GPU util spiked back to normal levels. I'll see if the same issue comes up again between now and the next checkpoint.

not normal from my experience. mine stay pegged at 98% GPU utilization for the entire run.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56591 - Posted: 17 Feb 2021 | 0:33:18 UTC - in response to Message 56587.

Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal?

Weird, I just suspended/unsuspended, it jumped back to the latest checkpoint and immediately the GPU util spiked back to normal levels. I'll see if the same issue comes up again between now and the next checkpoint.

No that's not normal. I haven't seen that behavior myself. Are you using an app_config.xml to say run 2 WUs on the same GPU or max out the CPUs?
This is mine:
<app_config>
<app>
<name>acemd3</name>
<gpu_versions>
<cpu_usage>1.00</cpu_usage>
<gpu_usage>1.00</gpu_usage>
</gpu_versions>
</app>
<project_max_concurrent>4</project_max_concurrent>
</app_config>
Might be something else you're running.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56593 - Posted: 17 Feb 2021 | 1:11:59 UTC - in response to Message 56583.

I'm aware of the situation. but if you sell a 2080ti, and re-buy a 2070. you're still left with more money, no? even at higher prices, everything just shifts up because of the market. You're restricting the 2080ti so much that it performs similarly to a 2070S, so why have the 2080ti? that was my only point.
I'm selling off my second string (1070s & 1080s) but keeping my 2080s. Waiting for RTC 3080s with the new design rules but that'll probably be April.
I agree about 240V, and I normally run my systems in a remote location on 240V, but due to renovations, I have one system temporarily at my house. but if you're on 240V, why restrict it so much?
This is funny since you gave me the script in one of the GG threads and I've been grateful since it cut my electric bill by a third. My 100 Amp load center is maxxed out. No wiggle room left.
I use the voltage telemetry (via IPMI) to identify when a PSU might be failing.
Sounds good but Dr Google threw so much stuff at me. Voltage telemetry can be so many different things. The IPMI article seemed to indicate that AMT might be better for me: https://en.wikipedia.org/wiki/Intel_Active_Management_Technology.
Most of my PSUs have been running for about 7 years now and starting to die of old age. Hard failures are nice since they're easy to diagnose. It's the flaky ones with intermittent problems. I swap parts between a good computer and a bad actor and sometimes I get lucky and I can convince myself that a PSU is over the hill. I have one now that after a couple of hours randomly stops communicating while remaining powered up. I need to put a head on it and play swapsies.
Does the voltage telemetry you observe give you an unambiguous indication of the demise of a PSU???

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56594 - Posted: 17 Feb 2021 | 2:29:47 UTC

IPMI stands for Intelligent Platform Management Interface.
https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface
Always found on server motherboards with a BMC. Baseboard Management Controller.
https://searchnetworking.techtarget.com/definition/baseboard-management-controller

You can look at the voltages coming out of the power supply on the system under load and spot issues with a power supply flaking out or on the edge of stability.

Set warnings about voltage levels and such. Very handy. All done remotely.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56595 - Posted: 17 Feb 2021 | 4:55:54 UTC - in response to Message 56593.

As Keith said, IPMI is the remote management interface built into the ASRock Rack and Supermicro motherboards that I use. They have voltage monitoring built in for the most part. It just measures the voltages that it sees at the 24-pin and reports it out over the IPMI interface. I access the telemetry via the dedicated webGUI that is provided at the configurable IP address on the LAN.

Tell tale signs of failure are usually sagging voltages. And not always where you expect. I have a PSU that I think is on the way out. It’s a 1200W PSU, only loaded maybe 600W now, but it’s10 years old, and was previously run pretty hard at the limit, in hot temps, for a long time. I’ve noticed random system restarts, and lots of warnings in the IPMI about low 3.3V below the 3.04v threshold. I’ll need to replace this soon.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56596 - Posted: 17 Feb 2021 | 5:00:07 UTC - in response to Message 56593.

“RTC 3080s with new design rules” ? Huh?

It’s unfortunate that even existing 30-series cards still don’t work for GPUGRID. The app is holding them back. Right now the only options for 30-series are unoptimized OpenCL projects, FAH, or PrimeGrid (if you think finding big prime numbers is useful?)
____________

Jeffwy
Send message
Joined: 15 Jan 12
Posts: 5
Credit: 59,000,574
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56599 - Posted: 17 Feb 2021 | 9:49:39 UTC

Seems I did not get one yet, and I have a Geforce RTX 2060

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56600 - Posted: 17 Feb 2021 | 12:14:12 UTC - in response to Message 56599.
Last modified: 17 Feb 2021 | 12:14:32 UTC

Seems I did not get one yet, and I have a Geforce RTX 2060
According to the status page of your host, you have two, and you had 7 before.

Jeffwy
Send message
Joined: 15 Jan 12
Posts: 5
Credit: 59,000,574
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56602 - Posted: 17 Feb 2021 | 12:39:02 UTC - in response to Message 56600.

Ok, maybe I was looking for a different name than what is there, so what are the names of the two new ones then?

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56603 - Posted: 17 Feb 2021 | 13:16:19 UTC
Last modified: 17 Feb 2021 | 14:11:02 UTC

GTX 1060 completed in 157,027.38 s
GTX 1650 completed in 175,814.54 s
Both on a Windows 10 computer, the first with a Ryzen 5 1400 CPU.the second with an Intel i5 9400F.
Tullio
Sorry, I had exchanged the computers The GTX 1060 has 3 GB of Video RAM and was excluded from running Einstein@home Gravitational wave tasks which needed more than 3 GB. The GTX 1650 has 4 GB of Video RAM.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56605 - Posted: 17 Feb 2021 | 15:14:24 UTC - in response to Message 56602.

Seems I did not get one yet, and I have a Geforce RTX 2060
According to the status page of your host, you have two, and you had 7 before.
Ok, maybe I was looking for a different name than what is there, so what are the names of the two new ones then?
Click on the link in my reply, and you'll see.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56606 - Posted: 17 Feb 2021 | 17:49:12 UTC - in response to Message 56596.

“RTC 3080s with new design rules” ? Huh?

They're switching some or all from Samsung 8 nm design rules to TSMC 7 nm design rules. Also an RTX 3080 Ti with more memory is expected.
https://hexus.net/tech/news/graphics/146170-nvidia-shift-rtx-30-gpus-tsmc-7nm-2021-says-report/

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56607 - Posted: 17 Feb 2021 | 17:56:08 UTC - in response to Message 56606.
Last modified: 17 Feb 2021 | 17:58:43 UTC

“RTC 3080s with new design rules” ? Huh?

They're switching some or all from Samsung 8 nm design rules to TSMC 7 nm design rules. Also an RTX 3080 Ti with more memory is expected.
https://hexus.net/tech/news/graphics/146170-nvidia-shift-rtx-30-gpus-tsmc-7nm-2021-says-report/


oh, it was a typo on the RTX/RTC.

They've had those rumors since pretty much launch (note the date on that article lol). Personally I wouldn't hold my breath. If they release anything, expect more paper launches with a few thousand cards available day one, then basically nothing for months again.

but its all moot for GPUGRID anyway until they get around to making the app compatible with Ampere. there are 5 different ampere models that i've seen attemtped to be used here (A100, 3090, 3080, 3070, 3060ti), and the 3060 is set for launch late February. more models are meaningless for us if we can't use them :(
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56608 - Posted: 17 Feb 2021 | 22:35:54 UTC - in response to Message 56607.

its all moot for GPUGRID anyway until they get around to making the app compatible with Ampere.

I bet they don't have an Ampere GPU to do development and testing on.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56609 - Posted: 18 Feb 2021 | 2:28:06 UTC - in response to Message 56608.

The thing is. They don’t even need to. They can just download the new CUDA toolkit, edit a few arguments in the config file or make file and re-compile the app as-is. It’s not a whole lot of work. And it’ll work. They basically just need to unlock the new architecture. The app now doesn’t even try to run, it fails at the architecture check.
____________

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 155,858
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56611 - Posted: 18 Feb 2021 | 5:22:10 UTC - in response to Message 56539.

I have a dual GPU system, GTX 980 and GTX 1080 TI, and all of these work units have failed. Drivers are current as of Decemeber. I had to roll back January update as it didn't play nice with Milkyway@Home while you folks were on Holiday. Suddenly I can't complete a work unit without error.

Running for how many hours a day? The GTX 1080 Ti should be adequate if you run it 24 hours a day, but I'm not sure if the GTX 980 will be.

Jeffwy
Send message
Joined: 15 Jan 12
Posts: 5
Credit: 59,000,574
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56612 - Posted: 18 Feb 2021 | 10:42:44 UTC - in response to Message 56605.

I'm talking about the new test units that were being talked about by ADMIN, not WUs from ACEMD, and if they have similar names than I wouldn't know. But they are most certainly not taking 18 hours to crunch, they are taking the same amount of time to crunch so I think they are the same WUs that I have been getting for six months, not the new ones listed at the top of this thread.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56613 - Posted: 18 Feb 2021 | 12:53:58 UTC - in response to Message 56612.
Last modified: 18 Feb 2021 | 12:56:45 UTC

I'm talking about the new test units that were being talked about by ADMIN,
These workunits are the same as the "test" batch.
not WUs from ACEMD,
ACEMD is tha app that process all of the GPUGrid workunits, regardless of their size.
and if they have similar names than I wouldn't know.
They are named like: e22s16_e2s343p0f9-ADRIA_D3RBandit_batch2-0-1-RND4443_1
But they are most certainly not taking 18 hours to crunch,
They take about 72,800~74,000 seconds on your host with an RTX 2060. That is 20h 13m ~ 20h 33m.
they are taking the same amount of time to crunch so I think they are the same WUs that I have been getting for six months, not the new ones listed at the top of this thread.
They are definitely not the same, as those took less than 2 hours on a similar GPU.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56614 - Posted: 18 Feb 2021 | 13:31:47 UTC - in response to Message 56539.

I have a dual GPU system, GTX 980 and GTX 1080 TI, and all of these work units have failed. Drivers are current as of Decemeber. I had to roll back January update as it didn't play nice with Milkyway@Home while you folks were on Holiday. Suddenly I can't complete a work unit without error.


they have not all failed. you actually have a few that were submitted fine.

see your tasks for that system here: http://www.gpugrid.net/results.php?hostid=514949

of your errors:
00:31:16 (11412): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

00:28:21 (11224): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

00:28:21 (3488): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

23:35:51 (3068): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

23:39:12 (1400): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

23:39:12 (4088): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!



so it's clear what's causing your issue. you're either routinely starting and stopping BOINC computation or you have some task switching with other projects going on due to the long run of these tasks, which sometimes results in the process restarting on a different card. it's fairly well known that this will result in an error for GPUGRID tasks. you should increase the time threshold for task switching to longer than the run of these tasks (24 hrs?), and avoid interrupting the computation. Do not turn off the computer or let it go to sleep. I would probably even avoid the use of the "suspend GPU while computer is in use" option in BOINC. anything to avoid interrupting these very long tasks.

____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56615 - Posted: 18 Feb 2021 | 16:38:45 UTC - in response to Message 56614.

you should increase the time threshold for task switching to longer than the run of these tasks (24 hrs?), and avoid interrupting the computation.

Try setting Resource=Zero on the other program you wish to time-slice with. Then it should only send you its WU if you have no GG WUs left.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56616 - Posted: 18 Feb 2021 | 16:46:39 UTC - in response to Message 56615.

true, I do this.

but some people like to concurrently crunch multiple projects giving some love to them all, not just prime/backup.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56617 - Posted: 18 Feb 2021 | 17:53:39 UTC - in response to Message 56535.

I have a 1660 super that takes about 34-38 hours on these new units, still seeing temps in the 60s, only uses 15% of gpu in task manager tho
Click on the GPU graph on the left pane, then on the right pane select "Cuda" instead of "3D" on the top left subgraph.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56620 - Posted: 18 Feb 2021 | 19:34:04 UTC - in response to Message 56575.
Last modified: 18 Feb 2021 | 19:38:51 UTC

It just dawned on me that these WUs might be an experiment in having the same host perform all the generations of the model simulation consecutively instead of on different hosts. Or am I way off?
You are right, this bacth consists of threads having only one single, but very long generation, making a single workunit the whole thread of the batch.
It's not an experiment, as this is not the first time to issue such a batch in GPUGrid's history. In this way the progress of a batch is much faster, as the project doesn't have to wait for many*5 days for a thread to finish. As the workunits get assigned randomly to hosts, it's sure that some of the generations of a multiple generation long simulation will be assigned to slow or unreliable hosts, this will add significant latency for completing a single thread of a simulation.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56621 - Posted: 18 Feb 2021 | 20:14:59 UTC

Sorry, I took the last cookie... ;-)

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56622 - Posted: 18 Feb 2021 | 20:20:15 UTC - in response to Message 56621.
Last modified: 18 Feb 2021 | 20:23:28 UTC

Sorry, I took the last cookie... ;-)

don't worry, in 4-5 days there will be plenty more resends available from all the hit-n-runs and systems that are too slow.
____________

RockLr
Send message
Joined: 14 Mar 20
Posts: 7
Credit: 11,208,845
RAC: 4
Level
Pro
Scientific publications
wat
Message 56626 - Posted: 19 Feb 2021 | 4:15:42 UTC

These tasks are too big for my 1050ti
:(
30 times bigger than previous MDAD tasks!
I think I have to go to Folding@home until GG have smaller WU.
F@h have small WU relate to covid-19.About 40min in my 1050 ti

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56628 - Posted: 19 Feb 2021 | 12:42:28 UTC - in response to Message 56622.

in 4-5 days there will be plenty more resends available from all the hit-n-runs and systems that are too slow.

but this cannot be the purpose of the exercise, or can it?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56634 - Posted: 19 Feb 2021 | 19:12:10 UTC
Last modified: 19 Feb 2021 | 19:22:54 UTC

looks like some new tasks are being loaded up. I've received several _0s

they're labelled batch0, so maybe some tasks that had too many errors that needed to be resent manually as new tasks?
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56635 - Posted: 19 Feb 2021 | 20:02:03 UTC

I've got some of these new batch0 tasks too.

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 12,875,793
RAC: 0
Level
Pro
Scientific publications
wat
Message 56636 - Posted: 20 Feb 2021 | 10:48:26 UTC
Last modified: 20 Feb 2021 | 10:49:29 UTC

I have CUDA: NVIDIA GPU 0: GeForce GTX 1650 (driver version 461.40, CUDA version 11.2, compute capability 7.5, 4096MB, 3327MB available, 2849 GFLOPS peak)


and https://www.gpugrid.net/result.php?resultid=32538103 estimates two days to finish

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 12,875,793
RAC: 0
Level
Pro
Scientific publications
wat
Message 56637 - Posted: 20 Feb 2021 | 10:54:38 UTC - in response to Message 56526.

Hours?

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56638 - Posted: 20 Feb 2021 | 13:55:09 UTC - in response to Message 56637.

Hours?

about 172,897.00 seconds
http://www.gpugrid.net/results.php?hostid=526190

erikolsson
Send message
Joined: 20 Feb 21
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 56639 - Posted: 20 Feb 2021 | 14:14:35 UTC - in response to Message 56504.

RTX3090 just repeats Computation error.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56640 - Posted: 20 Feb 2021 | 14:59:25 UTC - in response to Message 56639.
Last modified: 20 Feb 2021 | 15:00:06 UTC

RTX3090 just repeats Computation error.
As yet, the RTX 3xxx series is not supported by GPUGrid.
If you own such a card, please set "No new tasks" for GPUGrid until further notice.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56641 - Posted: 20 Feb 2021 | 15:31:20 UTC - in response to Message 56636.
Last modified: 20 Feb 2021 | 15:31:48 UTC

I have CUDA: NVIDIA GPU 0: GeForce GTX 1650 (driver version 461.40, CUDA version 11.2, compute capability 7.5, 4096MB, 3327MB available, 2849 GFLOPS peak)


and https://www.gpugrid.net/result.php?resultid=32538103 estimates two days to finish


This is normal for that GPU. These tasks are very long and that GPU is pretty weak.
____________

kksplace
Send message
Joined: 4 Mar 18
Posts: 53
Credit: 1,401,151,749
RAC: 3,563,300
Level
Met
Scientific publications
wat
Message 56642 - Posted: 20 Feb 2021 | 16:10:23 UTC
Last modified: 20 Feb 2021 | 16:12:04 UTC

Seeking some help if possible. I have had only one of the new work units download to my computer on 15 Feb. Unfortunately, it was interrupted by a power loss in my area, and when power came back on I discovered the Nvidia driver had somehow been corrupted. After the fix, the WU ended up with a compute error. The problem is that since then I have not received any more WUs, either automatically or during mulitple Updates in BOINC. Can someone look at my computer and see what I may have set wrong (driver etc) that may be preventing me getting work units? I have checked what my limited knowledge allows. Thank you for any help.

EDIT: Computer with the Nvidia 1080.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56643 - Posted: 20 Feb 2021 | 16:20:24 UTC - in response to Message 56642.

It appears that the drivers are not installed correctly. No driver is reported by BOINC.

Driver version being available via BOINC is important here at GPUGRID because the project needs to know that you are using a compatible driver to send you work units.

So I would totally remove your drivers and completely reinstall them fresh
____________

kksplace
Send message
Joined: 4 Mar 18
Posts: 53
Credit: 1,401,151,749
RAC: 3,563,300
Level
Met
Scientific publications
wat
Message 56644 - Posted: 20 Feb 2021 | 21:53:54 UTC

Thank you! It worked. Odd, since it was working on Einstein. Just for my ongoing learning, where did you see that no driver was being reported to GPUGRID? Again, thank you for your help.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56645 - Posted: 20 Feb 2021 | 22:31:06 UTC - in response to Message 56644.
Last modified: 20 Feb 2021 | 22:33:20 UTC

If you look at your host details here: http://www.gpugrid.net/hosts_user.php?userid=524258

You can see in the section showing your GPU, below it lists the driver in use. When I looked at it before, no driver was listed. It was just blank.

Einstein’s apps are openCL and really only need the openCL portion of the drivers, which you must have had installed, just some problem with the proprietary drivers I guess. You did say you had some driver corruption so it’s hard to say exactly what happened.
____________

Short Final
Send message
Joined: 26 May 20
Posts: 4
Credit: 135,351,932
RAC: 191,331
Level
Cys
Scientific publications
wat
Message 56646 - Posted: 21 Feb 2021 | 1:07:19 UTC

My ole Nvidia 1080 is turning new WU's around in about 25 hours.

On another host my brand new Gigabyte RTX3070 is patiently waiting to do some real work on GPUGrid. I daresay that it would be crunching these WU's a lot quicker and more efficiently.

So I'm stuck with F@H for the new sports car. Hopefully a short term problem.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56647 - Posted: 21 Feb 2021 | 1:44:47 UTC - in response to Message 56646.
Last modified: 21 Feb 2021 | 2:26:52 UTC

So I'm stuck with F@H for the new sports car.


You might not consider yourself as having been 'stuck' if your host tests the candidate that turns out to be the magic bullet for curing COVID-19.

You can also brag that you are a part of the world's largest supercomputer when you crunch for Greg. 2.8 million "donors" and rising.

If you're a "points person" you'll be gratified by the very generous credit they award there, even though it's not BOINC applicable.

Too bad Bowman Lab and FAH left the Berkley format IMO (I dislike all things proprietary), but the COVID moonshot is too important not to participate in, as a cruncher who is focused on promoting science which is urgently relevant. I just remind myself that this is not all about me.

Together we crunch
To check out a hunch,
But wish that our credit
Would just buy our lunch.
(traditional)

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56648 - Posted: 21 Feb 2021 | 7:03:33 UTC

The WorldCommunityGrid has started releasing some GPU tasks of OpenPandemicsbeta but I haven't been lucky to get them.
Tullio
____________

RockLr
Send message
Joined: 14 Mar 20
Posts: 7
Credit: 11,208,845
RAC: 4
Level
Pro
Scientific publications
wat
Message 56656 - Posted: 22 Feb 2021 | 7:14:54 UTC - in response to Message 56648.

The WorldCommunityGrid has started releasing some GPU tasks of OpenPandemicsbeta but I haven't been lucky to get them.

Luckily I got some of them. It looks like WCG is looking for suitable size of WU.
Excitedly, they seem to be much faster than the CPU version.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 56658 - Posted: 22 Feb 2021 | 15:29:30 UTC

Hey Toni!

Thanks for the new supply of work first of all. I was wondering if you'd possibly think about adjusting time limit for bonus points upon a timely return of a WU slightly upwards, to allow volunteers with less powerful cards (like 1660 series cards) to not be penalized for a mere 5% over the defined limit. I reckon that the limit was put in place to ensure a timely computation of the WU, but while 24hrs might be fine for just a 2-4 hrs WU, the situation drastically shifted with average runtimes of ~10h on the fastest cards.

I understand that increasing the deadline of WU is sth you don't want to do in order to ensure timeliness of WUs, but just extending the originally set 24hrs time limit to say 25/26 hrs surely wouldn't hurt this performance goal a lot.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56659 - Posted: 22 Feb 2021 | 16:03:09 UTC - in response to Message 56658.

I don't think you should think of it as a "penalty" you're not getting penalized, just not getting the quick return bonus since you didn't make the cutoff.

Personally I don't think this needs to be changed. it's a bonus for exceptional work, not an entitlement. if you return within 2 days you still get some bonus. I think if you still want the bonus, you should invest in your systems to make them faster.

and yes, I am myself subject to missing out on the bonus for one of my systems since the GTX 1660Super (device-1 in the RTX3070 host) is unable to meet the 24hr cutoff (routinely ~27hrs).
____________

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56664 - Posted: 22 Feb 2021 | 21:10:04 UTC

I have quite a high ERROR rate on this host: http://www.gpugrid.net/results.php?hostid=523675
Quite a few WUs crash with:
"Error invoking kernel: CUDA_ERROR_ILLEGAL_INSTRUCTION (715)" (5 units)
"Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)" (1 unit)
"Particle coordinate is nan" (1 unit)
"process exited with code 195 (0xc3, -61)</message>" (1 unit)
Any idea what the cause might be? The other Linux computer does work flawlessly. And the other two Windows 10 computers do not produces ERRORs either.

DJStarfox
Send message
Joined: 14 Aug 08
Posts: 18
Credit: 16,944
RAC: 0
Level

Scientific publications
wat
Message 56665 - Posted: 22 Feb 2021 | 21:25:17 UTC

I did not see a GTX960 mentioned here. BOINC thinks the new WU will take 3 hours to complete. Is that accurate?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56666 - Posted: 22 Feb 2021 | 21:51:33 UTC - in response to Message 56665.

I did not see a GTX960 mentioned here. BOINC thinks the new WU will take 3 hours to complete. Is that accurate?

Not yet. After the first one has completed, subsequent estimates will bemore realistic.

With the current tasks, I'd guess something in the range 1.5 days - 2 days. I ditched my 970s last year, because I could see the writing on the wall - after a good few years of faithful service, they were no longer fit to match the current beasts. I went for 1660 (super or Ti) instead.

This project tends to run different sub-projects, working with data and parameters from different researchers. And they don't reset the task estimates when they change the jobs. Your card would have been very comfortable with the previous run, but not so happy with this one. Only time will tell what the next one will be - we tend not to find out until after it's started.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56668 - Posted: 22 Feb 2021 | 22:08:11 UTC - in response to Message 56664.

I have quite a high ERROR rate on this host: http://www.gpugrid.net/results.php?hostid=523675
Quite a few WUs crash with:
"Error invoking kernel: CUDA_ERROR_ILLEGAL_INSTRUCTION (715)" (5 units)
"Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)" (1 unit)
"Particle coordinate is nan" (1 unit)
"process exited with code 195 (0xc3, -61)</message>" (1 unit)
Any idea what the cause might be? The other Linux computer does work flawlessly. And the other two Windows 10 computers do not produces ERRORs either.

“Particle coordinate is nan” is usually too much overclocking. Or card too hot causing instability. Remove any overclock and ensure the card has good airflow for reasonable temps.

The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process.

That would be my next steps.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56669 - Posted: 22 Feb 2021 | 22:22:03 UTC - in response to Message 56668.

... download the latest drivers for your system ...

I'm always a bit cautious about going for 'the latest' of anything. Driver release is usually driven by gaming first, and sometimes they break other things - like computing for science - along the way.

I usually go for the final, bugfix, sub-version of the previous major release.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56670 - Posted: 22 Feb 2021 | 23:05:05 UTC - in response to Message 56669.

... download the latest drivers for your system ...

I'm always a bit cautious about going for 'the latest' of anything. Driver release is usually driven by gaming first, and sometimes they break other things - like computing for science - along the way.

I usually go for the final, bugfix, sub-version of the previous major release.


whatever suits your fancy. The important bit is to totally wipe the old drivers, and do not allow Microsoft to auto-install their own, and do a clean install of the package provided by Nvidia.

(I prefer to avoid Geforce Experience as well, but up to you I guess).
____________

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56671 - Posted: 23 Feb 2021 | 0:41:12 UTC - in response to Message 56668.
Last modified: 23 Feb 2021 | 1:01:43 UTC

The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process.

It is a Linux Box...
I switched already back to Nouveau driver. Restarted - no image… luckily I was able to restart with GRUB (recovery mode). After that I was able to install latest Nvidia Driver again.
Hope this will solve the problem!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56674 - Posted: 23 Feb 2021 | 1:44:50 UTC - in response to Message 56671.

The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process.

It is a Linux Box...
I switched already back to Nouveau driver. Restarted - no image… luckily I was able to restart with GRUB (recovery mode). After that I was able to install latest Nvidia Driver again.
Hope this will solve the problem!


apologies, i must have read your previous post too quickly and thought you said it was a windows system. I'd still try to reinstall the drivers, and do a full uninstall/purge/reinstall.

also, you should make sure the system isnt going to sleep or hibernation or anything like that. if you can make sure the computation isnt interrupted that seems to run the best in my experience.
____________

d_a_dempsey
Send message
Joined: 18 Dec 09
Posts: 6
Credit: 969,687,328
RAC: 81,899
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56676 - Posted: 23 Feb 2021 | 15:13:19 UTC - in response to Message 56614.

As of the date of the post, all work units had failed. I didn't complete one successfully until the 17th.

To date, I have:

    3 In progress
    9 Error while computing
    3 Completed and validated
    1 Cancelled by server



3 out of 14 is not a good ratio.

I do not crunch 24/7. I actually use my computer for real life stuff, and some of that includes me using my GPUs for something other than this project. Has not been a problem before these work units. Before Feb. 10, my last failed work units were in 2018.

jimvt
Send message
Joined: 26 Apr 20
Posts: 3
Credit: 1,219,253
RAC: 0
Level
Ala
Scientific publications
wat
Message 56677 - Posted: 23 Feb 2021 | 15:14:26 UTC - in response to Message 56504.

Is there no way to make the work units smaller so those of us that have older systems can still participate in the project?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56678 - Posted: 23 Feb 2021 | 15:37:35 UTC - in response to Message 56676.
Last modified: 23 Feb 2021 | 16:27:54 UTC

As of the date of the post, all work units had failed. I didn't complete one successfully until the 17th.

To date, I have:
    3 In progress
    9 Error while computing
    3 Completed and validated
    1 Cancelled by server



3 out of 14 is not a good ratio.

I do not crunch 24/7. I actually use my computer for real life stuff, and some of that includes me using my GPUs for something other than this project. Has not been a problem before these work units. Before Feb. 10, my last failed work units were in 2018.



from what I can infer from other posts, this project has never had, nor promised, a homogeneous supply of tasks. it seems like the relatively small MDADs that we had for several months was the exception.

as to your errors, on your single GTX 660 host. you were given two tasks. but that GPU is too slow to complete a single task in the 5-day limit, let alone two. it looks like you started one, and the other sat waiting until it hit the deadline, at which point it was canceled for not even started yet. this is standard BOINC behavior. your other task, appears to still be in-progress on your system even though it's past the deadline. you may as well just cancel that unit, since it was already sent out to another system, and received a valid result 4 days ago. even if you continue crunching it and submit it, it's unlikely that you will receive any credit for it. I would just cancel it and set NNT for GPUGRID on that system until suitable WUs are available here again.

http://www.gpugrid.net/workunit.php?wuid=27025213

as to your other system with 2 GPUs. almost of all of the errors are for the same reason:
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!


this is a known problem with the app here. if a task is interrupted, and tries to restart on a different device, you are likely to get this error. the only real solution is to not interrupt the task. which means not stopping computation and not turning the system off.

I understand that not everyone wants to operate this way, but there are also several other projects to choose from that will allow you to operate this way. Folding@home seems to be a popular choice for folks around here for older nvidia cards, or for people who wish to contribute less often/less resources.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56679 - Posted: 23 Feb 2021 | 15:38:21 UTC - in response to Message 56677.

Is there no way to make the work units smaller so those of us that have older systems can still participate in the project?


no way to make the WUs smaller. we get what the project gives us. you cannot manipulate those tasks client-side at all.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56682 - Posted: 24 Feb 2021 | 5:49:51 UTC - in response to Message 56678.

it seems like the relatively small MDADs that we had for several months was the exception.

yes and no.

until some time ago, there were so-called "short runs" and "long runs" (you can still see this when looking at the lower left section in the server status page), and the user could choose in his/her settings.
The small MDADs we recently got would definitely have fallen under "short runs".
But never before there were such long runs like the current series, not even under "long runs".
So, as I said before, for users with older cards it would help if the 5-days-deadline would be extended by 1 or 2 days. No idea why this is not being done :-(

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56683 - Posted: 24 Feb 2021 | 13:19:50 UTC - in response to Message 56682.

short runs seem to be defined as 2-3 hrs on the fastest card. the MDADs were way shorter than that. running only about 15-20mins on a 2080ti. I'd say that's out of the norm for the project history.

even long runs are defined as 8-12hrs on fastest card. and I'd say these bandit tasks certainly fall into that category. with a 2080ti usually taking about 10hrs.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56684 - Posted: 24 Feb 2021 | 14:23:40 UTC - in response to Message 56683.

short runs seem to be defined as 2-3 hrs on the fastest card. ...
long runs are defined as 8-12hrs on fastest card.

these definitions on the server status page:

Short runs (2-3 hours on fastest card)
Long runs (8-12 hours on fastest card)


have been like this over many years. It's never been changed, as far as I can remember.


Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56685 - Posted: 24 Feb 2021 | 15:10:01 UTC - in response to Message 56684.

It's never been changed, as far as I can remember.

But "the fastest card", being a relative term, has certainly changed its meaning over the years.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56690 - Posted: 24 Feb 2021 | 20:09:45 UTC - in response to Message 56685.
Last modified: 24 Feb 2021 | 20:11:05 UTC

It's never been changed, as far as I can remember.

But "the fastest card", being a relative term, has certainly changed its meaning over the years.

This relative term is used intentionally for practital reasons: the staff don't have to change this definition at the release of every new GPU generation, instead they can release longer workunits.

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 24
Credit: 67,905,687
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwat
Message 56695 - Posted: 24 Feb 2021 | 23:08:34 UTC

Will there be any new work or is the batch over now?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56697 - Posted: 25 Feb 2021 | 1:07:50 UTC - in response to Message 56695.

Will there be any new work or is the batch over now?


maybe just resends at the moment.

____________

Kevin
Send message
Joined: 6 Dec 20
Posts: 2
Credit: 44,437,695
RAC: 0
Level
Val
Scientific publications
wat
Message 56714 - Posted: 27 Feb 2021 | 14:30:13 UTC - in response to Message 56656.

"OpenPandemics for GPU

The process of porting the research application code to GPU is well underway as we work to ensure that work units function well and securely before they’re sent to volunteers. The security review of the code is completed."

https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=680&mynews=Y

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56715 - Posted: 27 Feb 2021 | 15:03:32 UTC

There are beta tasks already released, but they are few and I haven't received any. They last only a few minutes.
Tullio
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56716 - Posted: 27 Feb 2021 | 15:29:12 UTC - in response to Message 56714.

Greetings everyone,

I'm sorry to say, but I am having a build issue at the moment and I have not been able to create a production ready version of the application. At the moment, the next round of beta is on hold while I'm still working out the issue.

Thanks,
-Uplinger
[Feb 26, 2021 11:10:50 PM]

They're getting closer!

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56717 - Posted: 27 Feb 2021 | 21:36:55 UTC - in response to Message 56716.

They must be having a hard time getting them long enough.
They should come here for advice.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56718 - Posted: 27 Feb 2021 | 23:45:26 UTC - in response to Message 56717.

They must be having a hard time getting them long enough.
They should come here for advice.

Really length isn’t important as long as the project can compile the results. SETI had work units that only ran for less than a minute on fast GPUs.

But longer tasks ideally would be more time efficient, less wasted time between tasks.
____________

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 56719 - Posted: 28 Feb 2021 | 1:40:57 UTC
Last modified: 28 Feb 2021 | 1:51:41 UTC

I guess, it would be a perfect place then to shift the slower cards to if they don't finish tasks in time here. My 750 Ti and 970 will do work over there as soon as they ship the production ready version of the app.

I've read on the forums that a 1650 Super which definitely is not the most powerful card there, reached runtimes of only a few minutes. The powerful cards must kick ass over there, finishing WUs in seconds then - even it'd be it just for a few hours, they'd probably produce a week's worth of CPU crunching :)

Stacie
Send message
Joined: 29 Mar 20
Posts: 18
Credit: 600,197,371
RAC: 91,436
Level
Lys
Scientific publications
wat
Message 56720 - Posted: 28 Feb 2021 | 2:28:09 UTC - in response to Message 56504.

I have an odd question regarding these work units. Was the credit awarded dependent upon how quickly they were completed? They all had the same application identifier as far as I could tell, and they all took about the same amount of processor time to complete. They seemed to award 3 different amounts of credit, either approximately 348,000 points, 435,000 or 520,000. I completed about a dozen and the longer I took the less credit given. The ones I returned in less than 2 days all awarded 435,000 points, those that took 3 days or longer all awarded 348,000. Was this an actual factor or just a wild coincidence?
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56721 - Posted: 28 Feb 2021 | 4:28:33 UTC - in response to Message 56720.
Last modified: 28 Feb 2021 | 4:29:08 UTC

Yes. Tasks returned in under 24hrs get a 50% bonus. Under 48hrs get a 25% bonus. And tasks between 2-5 days get normal base credit.
____________

Remanco
Send message
Joined: 4 Mar 13
Posts: 3
Credit: 30,169,077
RAC: 0
Level
Val
Scientific publications
watwatwat
Message 56722 - Posted: 28 Feb 2021 | 4:44:20 UTC

Can we know what exactly we are crunching?

Thanks!

Sylvain.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56725 - Posted: 28 Feb 2021 | 19:28:53 UTC - in response to Message 56722.

Normally we don't know any specifics until a paper is generated and cites the workunits that were used for the investigation.

The paper details what the investigation was all about.

If you have citation badges under your account you can click on the badge you were awarded and it will take you to the paper synopsis.

Stacie
Send message
Joined: 29 Mar 20
Posts: 18
Credit: 600,197,371
RAC: 91,436
Level
Lys
Scientific publications
wat
Message 56727 - Posted: 1 Mar 2021 | 1:00:48 UTC - in response to Message 56721.

Ah, thank you. I wish I had known! It took my 1070 GPU's about 27 hours to complete them but I kept most of my resources on other projects and only made sure to return them before they timed out. I could have picked up a few hundred thousand more points and got my 100 million molecule. Oh well...
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56730 - Posted: 2 Mar 2021 | 6:04:22 UTC

as of this morning, on one of my machines I had still 2 tasks: one was running, the other one waiting. An hour later I noticed that the latter one was "aborted by server".
How nice :-(

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 32,536
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56731 - Posted: 2 Mar 2021 | 6:22:45 UTC - in response to Message 56730.

as of this morning, on one of my machines I had still 2 tasks: one was running, the other one waiting. An hour later I noticed that the latter one was "aborted by server".
How nice :-(


That task was a resent, the server cancelled it because the host originally assigned was finally able to finish and deliver it back. The server checked first it was not been being crunched in your host.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56732 - Posted: 2 Mar 2021 | 7:37:59 UTC - in response to Message 56731.

as of this morning, on one of my machines I had still 2 tasks: one was running, the other one waiting. An hour later I noticed that the latter one was "aborted by server".
How nice :-(

That task was a resent, the server cancelled it because the host originally assigned was finally able to finish and deliver it back. The server checked first it was not been being crunched in your host.

so if the task was still being crunched on the other host (and finally got finished there) - why was it then ever sent to me, too?

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 32,536
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56733 - Posted: 2 Mar 2021 | 7:59:20 UTC - in response to Message 56732.

as of this morning, on one of my machines I had still 2 tasks: one was running, the other one waiting. An hour later I noticed that the latter one was "aborted by server".
How nice :-(

That task was a resent, the server cancelled it because the host originally assigned was finally able to finish and deliver it back. The server checked first it was not been being crunched in your host.

so if the task was still being crunched on the other host (and finally got finished there) - why was it then ever sent to me, too?


Because it was already over 5 days since that host downloaded the unit, so beyond the deadline for sending a new instance of the wu. It is the standard BOINC way of working (each project sets its own deadlines).

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56734 - Posted: 2 Mar 2021 | 8:16:26 UTC - in response to Message 56733.

Because it was already over 5 days since that host downloaded the unit, so beyond the deadline for sending a new instance of the wu.

oh, okay, I was not aware of that.

The interesting thing is: I received it 2 days ago. So if the original host finished it this morning, the task must have been 7 days "old" then (and obviously got credit).

Recently, one of my slower hosts finshed a task after 5 days plus a few hours, and it was not accepted any more. No credits: "too late".

How does this fit together?

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 32,536
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56735 - Posted: 2 Mar 2021 | 12:54:15 UTC - in response to Message 56734.
Last modified: 2 Mar 2021 | 13:18:24 UTC

Because it was already over 5 days since that host downloaded the unit, so beyond the deadline for sending a new instance of the wu.

oh, okay, I was not aware of that.

The interesting thing is: I received it 2 days ago. So if the original host finished it this morning, the task must have been 7 days "old" then (and obviously got credit).

Recently, one of my slower hosts finshed a task after 5 days plus a few hours, and it was not accepted any more. No credits: "too late".

How does this fit together?



What I think could have accourred is that the server issued a new wu once yours were over 5 days and that the new host crunched and delivered the result before you finished yours. In that situation you should normally receive no credit.

I can't find that wu in your hosts, if you can point to it I will have a look.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56736 - Posted: 2 Mar 2021 | 13:55:06 UTC - in response to Message 56734.
Last modified: 2 Mar 2021 | 13:56:02 UTC

Because it was already over 5 days since that host downloaded the unit, so beyond the deadline for sending a new instance of the wu.

oh, okay, I was not aware of that.

The interesting thing is: I received it 2 days ago. So if the original host finished it this morning, the task must have been 7 days "old" then (and obviously got credit).

Recently, one of my slower hosts finshed a task after 5 days plus a few hours, and it was not accepted any more. No credits: "too late".

How does this fit together?



I told you this before what happened in that case. There’s some grace period where if you return a result that has already been received by another host, you’ll still get credit. I’m guessing it’s about 1 day. Maybe less. In that case you returned it 4 days after the first result. So you missed the validate period.

If the person returned it in 7 days, but they were the first to return it, they get credit. Doesn’t matter if it’s late, if you’re first you will get credit.

It’s a good thing that the project cancelled that WU from your host to prevent unnecessary and wasted computation. You would have spent another 5 days crunching something that they already have a result for, then you would have not received credit, and been upset about that.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56741 - Posted: 2 Mar 2021 | 20:18:07 UTC - in response to Message 56735.

I can't find that wu in your hosts, if you can point to it I will have a look.

here it is:
https://www.gpugrid.net/result.php?resultid=32550373

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56742 - Posted: 2 Mar 2021 | 20:28:06 UTC - in response to Message 56736.

... There’s some grace period where if you return a result that has already been received by another host, you’ll still get credit. I’m guessing it’s about 1 day. Maybe less.
...
If the person returned it in 7 days, but they were the first to return it, they get credit. Doesn’t matter if it’s late, if you’re first you will get credit.

so how long is the grace period?
about 1 day? or less? Or any time longer?
Or are there different types of grace periods?
This system is somewhat obscure, anyway.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56743 - Posted: 2 Mar 2021 | 20:43:42 UTC - in response to Message 56742.

A true, formal, grace period would result in the deadline shown on the website being a day or few later than the deadline shown on your computer at home. The BOINC client will try to finish the job by the deadline shown locally, but provided its returned by the website deadline, nothing is lost. But we don't use that here.

More colloquially, an informal grace period occurs because you've got "until your replacement wingmate, after you've failed to return it in time, returns their copy". So,

However long it takes them to download the data, plus
However long the task hangs about before their computer starts working on it, plus
However long it tales them to compute it.

Don't rely on the first or second lasting longer then a few seconds. I think the shortest time reported so far for the third stage is about 10 hours with the current work.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56745 - Posted: 2 Mar 2021 | 21:37:05 UTC - in response to Message 56743.
Last modified: 2 Mar 2021 | 21:48:01 UTC

it's certainly informal. I don't know how long the grace period is, I'm just using my own experience to make an educated guess about the ~1 day length. but it's certainly shorter than the 4 days from Erich's previous situation since he got a validate error when he returned it.

i know i've returned a result that was 12+hrs past the return of the previous person (who blew their 5-day deadline, but returned it a few hours after it was sent to me).

I still got credit for it, but only the base credit based on the original host's 5+ day crunch, no bonus for me even though i was well within the 1 day. crunch time from when it hit my system.

the instance that we are referencing has already been purged though, so I can't link it unfortunately

edit: i found one in my list.

https://www.gpugrid.net/workunit.php?wuid=27035408

32544714 483418 21 Feb 2021 | 21:59:03 UTC 22 Feb 2021 | 23:19:24 UTC Error while computing 64,567.96 64,163.00 --- New version of ACEMD v2.11 (cuda101)
32547573 564623 23 Feb 2021 | 1:58:35 UTC 28 Feb 2021 | 3:37:59 UTC Completed and validated 170,762.44 108,992.30 348,750.00 New version of ACEMD v2.11 (cuda101)
32550287 543446 28 Feb 2021 | 1:58:40 UTC 28 Feb 2021 | 18:52:42 UTC Completed and validated 60,467.52 60,460.20 348,750.00 New version of ACEMD v2.11 (cuda100)


host before me blew their deadline
it was sent to me for crunching (i started it nearly right away due to small cache on this host)
host before me returned their result 2hrs after deadline, got base credit
i crunched it for 17hrs, returned it 15hrs after previous host, also got base credit.
____________

Eos Yu
Send message
Joined: 27 Jan 21
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 56747 - Posted: 2 Mar 2021 | 21:47:48 UTC

i can`t get any WU! why@@?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56748 - Posted: 2 Mar 2021 | 21:54:12 UTC - in response to Message 56747.

i can`t get any WU! why@@?

none available right now.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56751 - Posted: 3 Mar 2021 | 6:29:24 UTC - in response to Message 56745.

i know i've returned a result that was 12+hrs past the return of the previous person (who blew their 5-day deadline, but returned it a few hours after it was sent to me).

I still got credit for it, but only the base credit based on the original host's 5+ day crunch, no bonus for me even though i was well within the 1 day. crunch time from when it hit my system.

the instance that we are referencing has already been purged though, so I can't link it unfortunately

edit: i found one in my list.

https://www.gpugrid.net/workunit.php?wuid=27035408

32544714 483418 21 Feb 2021 | 21:59:03 UTC 22 Feb 2021 | 23:19:24 UTC Error while computing 64,567.96 64,163.00 --- New version of ACEMD v2.11 (cuda101)
32547573 564623 23 Feb 2021 | 1:58:35 UTC 28 Feb 2021 | 3:37:59 UTC Completed and validated 170,762.44 108,992.30 348,750.00 New version of ACEMD v2.11 (cuda101)
32550287 543446 28 Feb 2021 | 1:58:40 UTC 28 Feb 2021 | 18:52:42 UTC Completed and validated 60,467.52 60,460.20 348,750.00 New version of ACEMD v2.11 (cuda100)


host before me blew their deadline
it was sent to me for crunching (i started it nearly right away due to small cache on this host)
host before me returned their result 2hrs after deadline, got base credit
i crunched it for 17hrs, returned it 15hrs after previous host, also got base credit.

This agrees my own experience.
Your case hits scene number 2 at this previous post.

it's certainly informal. I don't know how long the grace period is, I'm just using my own experience to make an educated guess about the ~1 day length. but it's certainly shorter than the 4 days from Erich's previous situation since he got a validate error when he returned it.

I think that there isn't a fixed grace period. The only criterion is getting a valid result for each workunit.
And chance for these credit inconsistencies increases when (like now) the work units available are only rensends.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56752 - Posted: 3 Mar 2021 | 8:09:17 UTC - in response to Message 56751.

And chance for these credit inconsistencies increases when (like now) the work units available are only rensends.

this statement seems perfectly correct :-)

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 32,536
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56753 - Posted: 3 Mar 2021 | 13:09:23 UTC

New Gerard tasks?

https://www.gpugrid.net/workunit.php?wuid=27038831

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56754 - Posted: 3 Mar 2021 | 14:08:19 UTC - in response to Message 56753.

New Gerard tasks?

https://www.gpugrid.net/workunit.php?wuid=27038831

these pop up from time to time. always only a handful of them. they're a rare gem. not enough to feed the masses though.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56755 - Posted: 3 Mar 2021 | 17:23:06 UTC - in response to Message 56754.

New Gerard tasks?

https://www.gpugrid.net/workunit.php?wuid=27038831

these pop up from time to time. always only a handful of them. they're a rare gem. not enough to feed the masses though.


I got 3 of them this morning (1_3-GERARD_pocket_discovery_...), and after they have waited a few hours in the queue, the server abortet them - "202 (0xca) EXIT_ABORTED_BY_PROJECT"
:-)

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56756 - Posted: 3 Mar 2021 | 17:25:12 UTC - in response to Message 56755.
Last modified: 3 Mar 2021 | 17:35:14 UTC

I got two of them. They started processing right away. I have GPUGRID set to resource share of 100 and my other GPU project (Einstein) set to 0. So when I get GPUGRID tasks, they take priority over any backup project work already in the queue and begin right away.

One finished in about 2.5hrs (2080ti) and the other is in progress and will take probably 6hrs (1660Super)

Looks like it’s following the same rules outlined above.

If you haven’t even started processsing yet by the time someone else completes and returns a result, then it cancels the unstarted task. This is a good idea in my opinion and reduces wasteful computation. There’s no need to have you even start the task if they already have the result. If you had started the tasks, they would have been allowed to complete. But since they were not started, they get cancelled.

The difference here is that it looks like these Gerard tasks were send out in pairs from the beginning. Maybe they are trying to weed out the hosts that hit and run (download tasks and never return them). So it goes to two hosts at once to increase the chances that they get a valid result in the first 5-day window.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56757 - Posted: 4 Mar 2021 | 6:08:44 UTC - in response to Message 56756.

I got two of them. They started processing right away.

I had a GPUGRID task running, so the downloaded tasks were in waiting position.
Had I known that they will disappear that soon, I would have interrupted the running task for short time, in order to get at least one of the three newly downloaded tasks started (thus preventing it from being aborted by the server).
Well, next time I know :-)

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 56758 - Posted: 6 Mar 2021 | 15:18:55 UTC - in response to Message 56756.
Last modified: 6 Mar 2021 | 15:19:13 UTC

I got two of them. They started processing right away. I have GPUGRID set to resource share of 100 and my other GPU project (Einstein) set to 0. So when I get GPUGRID tasks, they take priority over any backup project work already in the queue and begin right away.


Thanks! Didn't know I could do that. I had suspended Einstein so it would pick up the GPUGRID work. Now I have Einstein set to 0% and GPUGRID to 100%.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56759 - Posted: 6 Mar 2021 | 16:55:21 UTC - in response to Message 56758.

... and begin right away.

Don't expect them to run instantly. But 'next in queue' when an Einstein task completes is usually good enough.

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 56760 - Posted: 6 Mar 2021 | 20:43:06 UTC - in response to Message 56759.

... and begin right away.

Don't expect them to run instantly. But 'next in queue' when an Einstein task completes is usually good enough.


No problem. The Einstein jobs are taking less than 20 minutes currently.

Philip C Swift [Gridcoin]
Send message
Joined: 23 Dec 18
Posts: 12
Credit: 50,868,500
RAC: 0
Level
Thr
Scientific publications
wat
Message 56764 - Posted: 9 Mar 2021 | 22:04:17 UTC

Correct me if I am wrong but the [deadline] is the date and time the task has to be started by NOT completed. It is a deadline for the task start to be completed not the start finish.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56765 - Posted: 9 Mar 2021 | 22:45:50 UTC - in response to Message 56764.

Correct me if I am wrong but the [deadline] is the date and time the task has to be started by NOT completed. It is a deadline for the task start to be completed not the start finish.


If the task isn’t completed and returned by the deadline, it gets sent to another host. You can still submit it late, but the project really wants the result before the deadline.
____________

zharkov70
Send message
Joined: 10 Mar 21
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 56766 - Posted: 10 Mar 2021 | 19:17:35 UTC - in response to Message 56504.

хорошо

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56767 - Posted: 10 Mar 2021 | 19:58:34 UTC - in response to Message 56766.
Last modified: 10 Mar 2021 | 20:00:18 UTC

добро пожаловать в Gpugrid

Welcome to Gpugrid

Jeffwy
Send message
Joined: 15 Jan 12
Posts: 5
Credit: 59,000,574
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56776 - Posted: 18 Mar 2021 | 8:15:02 UTC

No WUs again? Was there a problem with the longer ones? I have not seen any WUs from GPUgrid for at least 1-2 weeks now.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56777 - Posted: 18 Mar 2021 | 12:14:06 UTC - in response to Message 56776.

Work units are currently so scarce, that the chance to get one of them is like winning the lottery... ;-)
Waiting for better times to come

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56778 - Posted: 19 Mar 2021 | 18:16:50 UTC

May be somebody's played the rain dance, and some WUs are flowing just now...
https://www.gpugrid.net/server_status.php

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56779 - Posted: 19 Mar 2021 | 18:25:26 UTC - in response to Message 56778.

WOOOOO. got a full tank now. glad to see these long run tasks back again :)
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56780 - Posted: 19 Mar 2021 | 18:41:47 UTC

I got only two each on my hosts. Still glad to have some back if only briefly.

Alain Maes
Send message
Joined: 8 Sep 08
Posts: 63
Credit: 1,437,484,959
RAC: 69,868
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56781 - Posted: 19 Mar 2021 | 19:33:22 UTC

Work cache is a BOINC feature that was undoubtedly useful in the days that internet access was for many depending on dial-in access.
Nowadays I would assume that most if not all users have basicely 24/7 high speed internet access allowing completed work to be reported immediately and downloading new tasks only a few minutes before completing the current one.
Caching loads of tasks on only a few systems basicely beats the purpose of the tests, which is to test tasks ASAP and on so many different systems as possible.
With large work caches tasks are distributed to only a limited number of systemes and completing/reporting takes way longer than necessary.

As a result my new RTX2060 is idling since weeks now ....

So why not limit max tasks sent at a time to 1, as e.g. for WCG ARP project?
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56782 - Posted: 19 Mar 2021 | 19:46:43 UTC - in response to Message 56781.

Work cache is a BOINC feature that was undoubtedly useful in the days that internet access was for many depending on dial-in access.
Nowadays I would assume that most if not all users have basicely 24/7 high speed internet access allowing completed work to be reported immediately and downloading new tasks only a few minutes before completing the current one.
Caching loads of tasks on only a few systems basicely beats the purpose of the tests, which is to test tasks ASAP and on so many different systems as possible.
With large work caches tasks are distributed to only a limited number of systemes and completing/reporting takes way longer than necessary.

As a result my new RTX2060 is idling since weeks now ....

So why not limit max tasks sent at a time to 1, as e.g. for WCG ARP project?


this project already limits to 2 per GPU, no matter what size cache limits you have set in BOINC.

but with these long running tasks, folks with very old GPUs (think GTX750ti, GT1030, etc) can still struggle to submit even 1 task before the deadline. in these cases it doesn't make sense to cache more than 1 task.

personally I only cache 1 task per GPU on these D3RBandit tasks, even on relatively fast GPUs like RTX2070 and RTX2080. I only cache 2 per GPU on my hosts with RTX 2080tis as those can complete 2 tasks within 24hrs, while the others can't quite make the cut (~17hrs on a 2070, ~13hrs on a 2080)
____________

Stacie
Send message
Joined: 29 Mar 20
Posts: 18
Credit: 600,197,371
RAC: 91,436
Level
Lys
Scientific publications
wat
Message 56783 - Posted: 20 Mar 2021 | 2:50:30 UTC - in response to Message 56782.

Hi, do you know if these work units have the bonus credit for prompt turnaround like the previous batch? I am ready to hammer these babies out as quick as possible...I want that next molecule, dammit!
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56784 - Posted: 20 Mar 2021 | 5:03:21 UTC

I've seen no policy changes posted or changed. So they should still have the early reporting credit bonus as all previous tasks.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56785 - Posted: 20 Mar 2021 | 19:14:56 UTC - in response to Message 56781.

Alain Maes wrote:


As a result my new RTX2060 is idling since weeks now ....

I thought the Ampere cards (RTX ...) do not function (yet) with GPUGRID ???

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56786 - Posted: 20 Mar 2021 | 19:32:29 UTC - in response to Message 56785.

Alain Maes wrote:

As a result my new RTX2060 is idling since weeks now ....

I thought the Ampere cards (RTX ...) do not function (yet) with GPUGRID ???

Ampere is RTX3000 series
RTX 2060 is fine with GPUGrid. Just finished all my WU's on my RTX2000 Turing cards with no issues.

Stacie
Send message
Joined: 29 Mar 20
Posts: 18
Credit: 600,197,371
RAC: 91,436
Level
Lys
Scientific publications
wat
Message 56787 - Posted: 20 Mar 2021 | 22:14:28 UTC - in response to Message 56784.

Finally 100 million. Yay!!
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56788 - Posted: 20 Mar 2021 | 23:30:34 UTC - in response to Message 56787.

Congratz! Big first step.

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 78
Credit: 12,875,793
RAC: 0
Level
Pro
Scientific publications
wat
Message 56792 - Posted: 22 Mar 2021 | 12:34:58 UTC - in response to Message 56641.

Compared to my old gpu it is not very weak.
AMD AMD Radeon HD 6570/6670/7570/7670 series (Turks) (2048MB) driver: 1.4.1848 OpenCL: 1.2

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56793 - Posted: 22 Mar 2021 | 12:35:12 UTC - in response to Message 56786.

Alain Maes wrote:

As a result my new RTX2060 is idling since weeks now ....

I thought the Ampere cards (RTX ...) do not function (yet) with GPUGRID ???

Ampere is RTX3000 series
RTX 2060 is fine with GPUGrid. Just finished all my WU's on my RTX2000 Turing cards with no issues.

anyone any guess as to when GPUGRID will make their code fit for Ampere cards ?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56794 - Posted: 22 Mar 2021 | 12:43:21 UTC - in response to Message 56792.

Compared to my old gpu it is not very weak.
AMD AMD Radeon HD 6570/6670/7570/7670 series (Turks) (2048MB) driver: 1.4.1848 OpenCL: 1.2


but weak in comparison to other GPUs available. a 1080ti will take 18hrs. and 2080ti takes 10+hrs. so it makes sense that something like a 1650 would take several days. these new tasks are really huge.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56795 - Posted: 22 Mar 2021 | 18:33:41 UTC - in response to Message 56793.

Alain Maes wrote:

As a result my new RTX2060 is idling since weeks now ....

I thought the Ampere cards (RTX ...) do not function (yet) with GPUGRID ???

Ampere is RTX3000 series
RTX 2060 is fine with GPUGrid. Just finished all my WU's on my RTX2000 Turing cards with no issues.

anyone any guess as to when GPUGRID will make their code fit for Ampere cards ?


No ETA, and the project admins/devs seem very tight lipped about it. Many have asked about it several times and no response.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56796 - Posted: 22 Mar 2021 | 19:40:38 UTC - in response to Message 56795.

... and no response.

which is, unfortunately, nothing unusal here :-(

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56799 - Posted: 22 Mar 2021 | 21:54:53 UTC - in response to Message 56778.

Written on March 19th 2021, 18:16 UTC:

May be somebody's played the rain dance, and some WUs are flowing just now...

...What implies that on Wednesday 24th (five days later) an aftershock wave of overdue tasks might be coming. Have your fishing lines ready!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56800 - Posted: 24 Mar 2021 | 16:42:50 UTC - in response to Message 56799.

Looks like the timeout resends are already starting to flow.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56801 - Posted: 24 Mar 2021 | 16:49:10 UTC - in response to Message 56800.

Yes, I picked up work for all my hosts.

Paul
Send message
Joined: 25 Apr 13
Posts: 26
Credit: 176,808,053
RAC: 157,332
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56807 - Posted: 28 Mar 2021 | 20:28:10 UTC - in response to Message 56504.

I have a RTX 2080 Super with a I9-9900K @ 3.60GHz

Why no tasks?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56808 - Posted: 28 Mar 2021 | 20:44:26 UTC

You need to regularly ask for work to get any.

They are not building any task caches, but releasing work piecemeal, like for instance tasks named ADRIA_D3RBandit_singlebatch.

As the name implies, this is not a big series of tasks that would be released in hundred or thousand large cache, but more like one-offs. They are long 12-14 hour tasks.

They are also releasing short 4-5 hour tasks named ADRIA_HomeoFolded100ns

But to get them you have to ask for them. Set up a batch or script file to update the client every 10 minutes or so and you will get work.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 56809 - Posted: 29 Mar 2021 | 11:05:43 UTC
Last modified: 29 Mar 2021 | 11:06:32 UTC

A couple of these ones just popped up on my desktop machine. Haven't seen those before.

homeodomain_lowRMSD_100ns_18-ADRIA_HomeoLowRMSD100ns

From preliminary runtimes and progress reported, they seem to be in line with the aforementioned Adria tasks.

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 126
Credit: 107,156,939
RAC: 166,633
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 56810 - Posted: 29 Mar 2021 | 16:24:30 UTC - in response to Message 56808.

You need to regularly ask for work to get any.

They are not building any task caches, but releasing work piecemeal, like for instance tasks named ADRIA_D3RBandit_singlebatch.

As the name implies, this is not a big series of tasks that would be released in hundred or thousand large cache, but more like one-offs. They are long 12-14 hour tasks.

They are also releasing short 4-5 hour tasks named ADRIA_HomeoFolded100ns

But to get them you have to ask for them. Set up a batch or script file to update the client every 10 minutes or so and you will get work.



So in other words GPU Grid has gone the way of another GPU project in TX that used Boinc to farm out small stuff and ran the massive projects on their super computer. That was also hit and miss.

Well, nice knowing you GPU Grid, time to look for something else now.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56811 - Posted: 29 Mar 2021 | 16:50:37 UTC - in response to Message 56810.
Last modified: 29 Mar 2021 | 16:55:15 UTC



So in other words GPU Grid has gone the way of another GPU project in TX that used Boinc to farm out small stuff and ran the massive projects on their super computer. That was also hit and miss.

Well, nice knowing you GPU Grid, time to look for something else now.


what other project are you referring to? but, not sure how you've come to that conclusion, or what you consider "large" or "small" projects, is this based on individual task run times? or the duration of WU availability? don't think GPUGRID has a supercomputer. we ARE their supercomputer.

it takes time and money to form the data sets necessary to crunch. and it takes time to petition to get the money to enable their research. some projects have huge backers and lots of resources and can seemingly run continuously indefinitely, while others run on a shoestring budget and/or only have data intermittently. This project has always been hit or miss. sometimes with long running tasks, sometimes with very short tasks (like the MDAD series some months ago), sometimes with constant work supply for months, sometimes with short bursts of work with extended dry spells.

If you're looking for something with similar research, using GPUs, and an "infinite" supply of work, look into Folding@home.
____________

Speedy
Send message
Joined: 19 Aug 07
Posts: 42
Credit: 28,391,082
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 56817 - Posted: 2 Apr 2021 | 4:02:29 UTC

I am curious to hear how long it would take to run 1 of these big tasks on Windows 10 with a RTX 2070? TIA

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56818 - Posted: 2 Apr 2021 | 8:58:06 UTC - in response to Message 56817.

I am curious to hear how long it would take to run 1 of these big tasks on Windows 10 with a RTX 2070? TIA

You have available this rod4x4 comprehensive runtime summary, in which RTX 2070 is included.

ADRIA D3RBandit task runtime summary

Mads Nissen
Send message
Joined: 23 Jun 11
Posts: 2
Credit: 281,376,224
RAC: 190,962
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56819 - Posted: 3 Apr 2021 | 20:01:02 UTC

I just wish that my Nvidia RTX 2080 graphics card and my Nvidia RTX 1080 in another PC, would get something to do, It seams everything has stalled and i receive no tasks ..

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56820 - Posted: 3 Apr 2021 | 20:23:50 UTC - in response to Message 56819.

I just wish that my Nvidia RTX 2080 graphics card and my Nvidia RTX 1080 in another PC, would get something to do, It seams everything has stalled and i receive no tasks ..


the tasks are running thin the past few days. the project seems to not be distributing new work right now. only expect a random resend every now and then until the project starts creating tasks again.
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56821 - Posted: 4 Apr 2021 | 5:22:32 UTC - in response to Message 56820.

... the tasks are running thin the past few days.

let's face it: NOT the past few days, rather the past few weeks :-(

And it's too bad that we are not given any even tentative information as to when we can expect new tasks.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56822 - Posted: 4 Apr 2021 | 16:10:46 UTC - in response to Message 56821.

i had a rather consistent run of work until only a few days ago.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56823 - Posted: 4 Apr 2021 | 18:51:34 UTC

Current wasn't a working week at spanish universities.
Also, tomorrow Monday is holiday at Barcelona: Easter Monday
In the meantime, project seems to have been working inertially, running slowly out of tasks.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56828 - Posted: 9 Apr 2021 | 18:01:33 UTC

What's this?
1_4-CRYPTICSCOUT_pocket_discovery_06717f9d_f915_4b08_b353_b636b8abc488-0-2-RND6700...
Looks like direct virus research to me from the label.

It only takes around 13 hours to run on a GTX 1060 3GB.

Cheers and best success for this project. We're ready to crunch.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56829 - Posted: 9 Apr 2021 | 18:17:42 UTC - in response to Message 56828.

13hrs? wow! it doesn't look like you've submitted that task yet though. your host details says still in progress.

I got one too, ran for about 45mins on my 2080ti. I wonder if you got an unusually long one or something. a 2080ti is certainly faster than a 1060 3GB, but not 15x faster. mine paid about 22,000 creds.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56830 - Posted: 9 Apr 2021 | 20:09:51 UTC - in response to Message 56829.

Now that you mention it Ian, I found it running in tandem with a FAHcore task when it was already 50% finished. I paused the F@H task and the rest went much faster. I see the CPU time was around half an hour less, showing that this machine was a bit overloaded by me running Winamp with MilkDrop visualization in the desktop mode. Meanwhile I was web-browsing, mail, etc.

My GTX 1650 host erred on a Bandit WU- it might be running too fast but this is the first task to fail after quite a few Bandit tasks. "particle is not a number" errors usually appear as the program is starting when overclocking is the cause, IIRC. The same error appeared on computer 362147, the first to run it. Keith Myers successfully completed it.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56831 - Posted: 9 Apr 2021 | 22:46:30 UTC

I wish they would leak out a few more of these one-offs. My RAC is plummeting.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56832 - Posted: 10 Apr 2021 | 1:05:34 UTC - in response to Message 56831.

I wish they would leak out a few more of these one-offs. My RAC is plummeting.
+1

Philip C Swift [Gridcoin]
Send message
Joined: 23 Dec 18
Posts: 12
Credit: 50,868,500
RAC: 0
Level
Thr
Scientific publications
wat
Message 56834 - Posted: 11 Apr 2021 | 9:52:45 UTC
Last modified: 11 Apr 2021 | 9:55:02 UTC

Re: D3RBanditTest

Philip C Swift [Gridcoin]
Send message
Joined: 23 Dec 18
Posts: 12
Credit: 50,868,500
RAC: 0
Level
Thr
Scientific publications
wat
Message 56835 - Posted: 11 Apr 2021 | 9:53:57 UTC

Any news on WU's coming up?
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56836 - Posted: 12 Apr 2021 | 15:58:22 UTC

Got a couple of the crypticscout_pocket_discovery work units last night.

Ran for a couple of hours on my 2080's

One-offs apparently.

Speedy
Send message
Joined: 19 Aug 07
Posts: 42
Credit: 28,391,082
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 56838 - Posted: 12 Apr 2021 | 20:52:03 UTC - in response to Message 56836.


One-offs apparently.

Keith by chance was a resend? Reason I ask is because there is only currently 16 in progress

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56839 - Posted: 12 Apr 2021 | 23:44:18 UTC - in response to Message 56838.
Last modified: 12 Apr 2021 | 23:47:36 UTC


One-offs apparently.

Keith by chance was a resend? Reason I ask is because there is only currently 16 in progress

Nope, initial two task replications.
https://www.gpugrid.net/workunit.php?wuid=27048098
https://www.gpugrid.net/workunit.php?wuid=27048097

tsain
Send message
Joined: 11 Jan 21
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 56840 - Posted: 17 Apr 2021 | 15:56:55 UTC

Hello: I have a question about the task of GPUGRID. I have tried it 2 times and the same problem occurs every time. I can log in normally, but I can't get any tasks! How can I troubleshoot this problem!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56841 - Posted: 17 Apr 2021 | 17:10:01 UTC - in response to Message 56840.

There is no work being produced, or very, very little actually.

So the chance so snagging any of the sporadic tasks is very low.

I would try some other project for work that has a better supply.

World Community Grid is producing gpu work on a limited basis now.

Might try that project which is doing similar work to GPUGrid.

But it too is also not producing as much work as the demand.

But much more than GPUGrid.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56842 - Posted: 17 Apr 2021 | 18:29:37 UTC - in response to Message 56841.

There is no work being produced, or very, very little actually.

So the chance so snagging any of the sporadic tasks is very low.

I would try some other project for work that has a better supply.

World Community Grid is producing gpu work on a limited basis now.

Might try that project which is doing similar work to GPUGrid.

But it too is also not producing as much work as the demand.

But much more than GPUGrid.


This is all quite unfortunate. I was a long time believer in gpugrid going back to when they could run on playstations. I recall they "owned" the ps2grid name at one time. WCG quit developing for GPUs some time ago but I switched to them as soon as I found out they were developing again. Their GPU tasks are all beta so one has to sign up for the beta program. Some statistics I put together there
https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,43367

Greger
Send message
Joined: 6 Jan 15
Posts: 74
Credit: 14,345,001,749
RAC: 31,236,125
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 56843 - Posted: 17 Apr 2021 | 18:42:49 UTC - in response to Message 56842.

They have left Beta and now in production but low amount work units (1700 work units on average every 30 minutes).

I hope GPUGrid would step up in this game.

GPUGrid have not left PS2 yet they still hold domain http://www.ps3grid.net

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56844 - Posted: 18 Apr 2021 | 6:57:28 UTC - in response to Message 56843.


I hope GPUGrid would step up in this game.

unfortunately, no indication at this time :-(
Also, so far no progress what concerns Ampere cards :-(
While Ampere works on WCG and F&H

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56845 - Posted: 20 Apr 2021 | 1:09:15 UTC - in response to Message 56842.

I have caught a few World Community Grid GPU tasks. IIRC, they were OpenCL. They ran concurrently with FahCore CUDA tasks without too much slowdown. I also noted that some WCG tasks ran on my Intel GPUs. Intel GPUs are being used also by Einstein@home in OpenCL.

..and now for something completely different...

Machine Learning Comprehension is getting close to completing the crunching stage of their project.
https://www.mlcathome.org/
Please join me and help finish up the project. there are only 2308 users currently.
GPU tasks are available as CUDA only.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Speedy
Send message
Joined: 19 Aug 07
Posts: 42
Credit: 28,391,082
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 56846 - Posted: 20 Apr 2021 | 21:20:58 UTC - in response to Message 56845.


..and now for something completely different...

Machine Learning Comprehension is getting close to completing the crunching stage of their project.
https://www.mlcathome.org/
Please join me and help finish up the project. there are only 2308 users currently.
GPU tasks are available as CUDA only.

I don't see any thing suggesting this project is nearing the end to "completing the crunching stage of the project" however I do believe they are entering the open quote home "stretch" on "DS3"

to bring this thread back on topic I have not received any D3RBanditTest tasks (am aware they are hard to come by)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,606,061,851
RAC: 8,672,972
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56849 - Posted: 22 Apr 2021 | 16:03:44 UTC

Wow - I caught a Bandit! task 32565770: I happened to ask 1 second after it was created.

micropro
Send message
Joined: 4 Feb 20
Posts: 8
Credit: 674,423
RAC: 0
Level
Gly
Scientific publications
wat
Message 56850 - Posted: 22 Apr 2021 | 20:23:40 UTC - in response to Message 56794.

but weak in comparison to other GPUs available. a 1080ti will take 18hrs. and 2080ti takes 10+hrs. so it makes sense that something like a 1650 would take several days. these new tasks are really huge.


Several days with a GTX 1650 at stock speed? (I always run at stock speed with my GPU when it's a boinc-usage.)

My RTX 2060 died and it was the best I could get so far (and actually it runs pretty cool on tasks where my old (and probably deffective) RTX was 15°C higher at least.
Also, I know some projects are not Ampere-ready yet so I'm sticking at Turing for now.

So question is now... does this long new tasks on GPUGRID have checkpoint or does it save the work already done when the task is stopped?

I'd like at least try (and complete) one of these tasks. But if it's for running a GPU-task for 10 hours and losing the work at the end of the day, I think we'll agree it's not worth it in this precise condition.

Meanwhile... yeah WCG has started to drop GPU tasks, working with Ampere architecture as well as previous architectures (I asked on their forum). But very few and very fast tasks.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56852 - Posted: 23 Apr 2021 | 16:27:39 UTC - in response to Message 56850.

The tasks checkpoint and can be stopped and restarted . . . . as long as the task is restarted on the same type of device.

It the task starts on a 2080 and then restarts on a 2070, the task will instantly error out with a message stating "unable to restart on a different device"

micropro
Send message
Joined: 4 Feb 20
Posts: 8
Credit: 674,423
RAC: 0
Level
Gly
Scientific publications
wat
Message 56853 - Posted: 23 Apr 2021 | 17:13:38 UTC - in response to Message 56852.

The tasks checkpoint and can be stopped and restarted . . . . as long as the task is restarted on the same type of device.

It the task starts on a 2080 and then restarts on a 2070, the task will instantly error out with a message stating "unable to restart on a different device"


Thank you for the answers concerning the checkpoints.

Good to know as well that you cannot change the device is "linked" a task once it has begun to work on this specific task.

Waiting for tasks now but I'm not in a hurry, the GPU's working on another task for now (more for a test about my hardware but I still have 10 hours to run at least).

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56854 - Posted: 27 Apr 2021 | 3:53:34 UTC

I am getting WorldCommunityGridi OpenPandemics-COVID-19 GPU tasks which take about three to four minutes on my GTX 1060 board.
Tullio
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 56855 - Posted: 27 Apr 2021 | 20:46:34 UTC

The new beta "stress" WCG OPNG tasks are much larger and taking up to 15 minutes to compute on a RTX 2080.

Seen up to 165 jobs in a single task. Credit = ~ 1400-1600

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56857 - Posted: 28 Apr 2021 | 17:37:30 UTC - in response to Message 56855.

The OPNG vary in length, and are not marked beta anymore, for whatever that is worth. So you have to average the times.

Under Ubuntu 20.04.2, they are averaging 7:37 on a GTX 1060, and 20:12 on a GTX 750 Ti. That is very nice, but they could go longer. I think the scientists are having a problem creating enough of them.

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56858 - Posted: 29 Apr 2021 | 5:16:14 UTC
Last modified: 29 Apr 2021 | 5:17:01 UTC

Sorry, double post.
Tullio
____________

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56859 - Posted: 29 Apr 2021 | 5:16:14 UTC

The latest OPNG tasks took about 20 minutes on my GTX 1060. In the first 7 minutes the GPU was not engaged.
Tullio
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56885 - Posted: 21 May 2021 | 15:32:48 UTC

Glad to see a large batch of these new units coming out. should keep us well fed for another few weeks.

my fast GPUs really like these long units.

~10hrs on a 2080ti (225W)
~13hrs on a 2080 (185W)
~17hrs on a 2070 (150W)
~27hrs on a 1660S (100W)
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56886 - Posted: 21 May 2021 | 18:11:39 UTC - in response to Message 56885.

Glad to see a large batch of these new units coming out.

I suspect they still won't run on Ampere cards ?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56887 - Posted: 21 May 2021 | 18:29:38 UTC - in response to Message 56886.

nope. still CUDA 10.0/10.1 = no Ampere support. still waiting for that CUDA 11.1+ app.

note, the compatibility issue is with the application, not the tasks. there are many types of tasks here (MDAD, Pocket Discovery, D3RBandit, etc) but they all use the same acemd3 app.

keep tabs on the applications the project has available here: https://www.gpugrid.net/apps.php

unless you see "cuda111" or "cuda112" listed, don't count on Ampere support
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56888 - Posted: 21 May 2021 | 19:18:56 UTC - in response to Message 56887.

unless you see "cuda111" or "cuda112" listed, don't count on Ampere support

thanks for the information; what a pitty :-(

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 56892 - Posted: 26 May 2021 | 15:17:09 UTC

Does anyone know by any chance, what the current batch of tasks (D3RBandit) are all about? What do we compute?

And any pointer as to what the nmax parameter indicates? I have seen max 1000/2000 and 5000 WUs, but all are taking pretty much the same time to compute.

Thx

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56893 - Posted: 26 May 2021 | 15:43:29 UTC - in response to Message 56887.

Lately I've caught lots of WUs that have bounced off one or more hosts running Ampere GPUs. There is much untapped resource, even with the new anti-mining feature.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56894 - Posted: 26 May 2021 | 15:51:09 UTC - in response to Message 56893.

Lately I've caught lots of WUs that have bounced off one or more hosts running Ampere GPUs. There is much untapped resource, even with the new anti-mining feature.


I gave up waiting for Ampere support here. It was clear the project devs have it at lowest priority (every time I asked about it, I was ignored, even when they were responsive about any other topic).

I traded my 3070 for a 2080ti and moved on.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56899 - Posted: 26 May 2021 | 22:33:04 UTC - in response to Message 56894.

Ian (and Bozz4science), something else we're apparently not allowed to ask is what it is we are crunching.

I was so bold as to put it on the wish list, no reply. I don't mean that anyone should reveal proprietary info, just a general categorization of the type of research it is, as Tony did on the previous methods project.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 56903 - Posted: 27 May 2021 | 14:42:37 UTC
Last modified: 27 May 2021 | 14:43:32 UTC

Yeah, sadly that is very disappointing to say the least. Information policy here is annoying sometimes due to its non-existence. I am just a small fish with my little machine, but it would certainly drive me nuts not getting a single statement from the project team with respect to planned Ampere support.

Otherwise, IMO GPUGrid is certainly doing many things right in terms of website curation and the research publications list, but I hate to not know what I am computing for atm ahead of any prospective paper months or years down the line. What is so hard in telling us the top-level category of research our GPUs are computing. is it about cancer? cov2? methods? brain? Is that too much to ask for? Noone expects much more than that. in that regard I think F@H is much ahead. they offer more comprehensive and easy-to access information about any WU/project a volunteer is computing for.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56905 - Posted: 27 May 2021 | 17:59:59 UTC - in response to Message 56903.

Yeah, sadly that is very disappointing to say the least. Information policy here is annoying sometimes due to its non-existence. I am just a small fish with my little machine, but it would certainly drive me nuts not getting a single statement from the project team with respect to planned Ampere support.

+ 1

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56907 - Posted: 28 May 2021 | 10:59:35 UTC

Yeah, sadly that is very disappointing to say the least. Information policy here is annoying sometimes due to its non-existence. I am just a small fish with my little machine, but it would certainly drive me nuts not getting a single statement from the project team with respect to planned Ampere support.

Otherwise, IMO GPUGrid is certainly doing many things right in terms of website curation and the research publications list, but I hate to not know what I am computing for atm ahead of any prospective paper months or years down the line. What is so hard in telling us the top-level category of research our GPUs are computing. is it about cancer? cov2? methods? brain? Is that too much to ask for? Noone expects much more than that. in that regard I think F@H is much ahead. they offer more comprehensive and easy-to access information about any WU/project a volunteer is computing for.

+1 (adding "Please") <--|
->-----------------------------|

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56908 - Posted: 28 May 2021 | 14:27:18 UTC

over 4000 tasks ready to send.

looks like we'll have work available for some time to come.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56909 - Posted: 28 May 2021 | 16:12:01 UTC

Error on D3RBandit task below wasn't due to a "restart on a different device" known problem.
This failed task ran always on the same device 0.
It was actually a reboot after a Nvidia driver update from version 460.80 to version 465.27
I had suspended BOINC activity during the transition, but it wasn't enough for avoiding the task to fail...
I take note of this.
On a next time, I'll schedule such a driver version upgrade for a moment when no Gpugrid tasks are running.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56910 - Posted: 28 May 2021 | 17:29:31 UTC - in response to Message 56909.

I've seen this happen on occasion too. Definitely have to be more careful with these long running D3RBandit tasks, or risk throwing away a lot of computation time.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 399
Credit: 13,024,100,382
RAC: 766,238
Level
Trp
Scientific publications
watwatwat
Message 56914 - Posted: 29 May 2021 | 11:32:15 UTC

These current WUs perform worse than anything I've ever seen from GG. Far more failures than even WUs that run.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56915 - Posted: 29 May 2021 | 12:23:00 UTC - in response to Message 56914.
Last modified: 29 May 2021 | 12:53:10 UTC

These current WUs perform worse than anything I've ever seen from GG. Far more failures than even WUs that run.

sounds like something wrong on your end. I've had very few failures.

I have only 2 legitimate computation errors with the latest D3RBandit series, from any of my systems, of the hundreds of tasks that I've processed in the past few weeks. that's not including things like me aborting them for whatever reason, or the server cancelling a resend, or a download error.

one of the failures was a bad WU (all hosts failed)
the other looks like it was some random problem on my host, as it was processed by another host eventually

if you un-hide your hosts, I might be able to see what the problem is.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56918 - Posted: 29 May 2021 | 12:52:18 UTC - in response to Message 56914.

These current WUs perform worse than anything I've ever seen from GG. Far more failures than even WUs that run.

That would have an explanation if you had upgraded your hosts to Ampere GPUs.
That series of graphics cards are not supported by current Gpugrid applications, and every tasks will fail immediately.

bozz4science
Send message
Joined: 22 May 20
Posts: 109
Credit: 68,936,176
RAC: 0
Level
Thr
Scientific publications
wat
Message 56919 - Posted: 29 May 2021 | 12:56:20 UTC

Those Anaconda Python 3 Environment tasks admittedly had a very high failure rate but were part of a test batch though without the intent to compute on actual data. D3RBandit tasks are finishing just fine except for the known suspend/resume issue.

Hope you can quickly figure out what is causing this issue for you

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56920 - Posted: 29 May 2021 | 13:01:02 UTC - in response to Message 56919.

Those Anaconda Python 3 Environment tasks admittedly had a very high failure rate but were part of a test batch though without the intent to compute on actual data. D3RBandit tasks are finishing just fine except for the known suspend/resume issue.

Hope you can quickly figure out what is causing this issue for you


agreed that the Python tasks had a lot of failures, but this is the D3RBandit thread so I have to assume he's referring to those.

____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56928 - Posted: 3 Jun 2021 | 8:15:04 UTC

There are 200 workunits left.
It will last for 8 hours.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56929 - Posted: 4 Jun 2021 | 2:38:55 UTC

I have found a small clue to what it is we have been crunching, at least the cryptic scouts. Please see this video.
https://www.youtube.com/watch?v=O3biz-q8VCI&t=95s

If the rest of Adria's research is coherent with the cryptic scouts, then it is most likely research on druggable cryptic pockets of protein structures. That could well relate to anti-viral drugs, if you recall Toni's announcement that Covid related research was coming.

So bozz4science (et al) there appears to be a possibility that we are crunching very similar research to the Covid Moonshot project (e.g. F@H and World Community Grid), that is finding a low cost and easily dispensed drug to cure the Covid virus.

Only my speculation, though.

____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56930 - Posted: 4 Jun 2021 | 2:53:16 UTC
Last modified: 4 Jun 2021 | 3:12:29 UTC

Here's a bit of promise for owners of Ampere GPUs: https://www.acellera.com/index.php/2021/04/06/release-of-acemd-3-4/

This new version brings support to the latest NVIDIA GPUs, including the Ampere architecture, as well as performance improvements. The simulation speed has been benchmarked against several systems at typical production conditions on different GPU devices (including GTX 1080, GTX1080 Ti, RTX 2080 Ti and RTX 3090). For the DHFR benchmark, on RTX 3090, ACEMD achieves the speed of ~1.3 µs/day.


This appears to be an explanation of why this project has not upgraded so far. Acellera is the owner and developer of the software, so we had to wait on them, not the GPUGRID team so far. Hopefully the license here will not need to be upgraded ($$$).
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 56931 - Posted: 4 Jun 2021 | 18:28:04 UTC - in response to Message 56929.
Last modified: 4 Jun 2021 | 19:03:22 UTC

I have found a small clue to what it is we have been crunching, at least the cryptic scouts. Please see this video.
https://www.youtube.com/watch?v=O3biz-q8VCI&t=95s

If the rest of Adria's research is coherent with the cryptic scouts, then it is most likely research on druggable cryptic pockets of protein structures. That could well relate to anti-viral drugs, if you recall Toni's announcement that Covid related research was coming.

So bozz4science (et al) there appears to be a possibility that we are crunching very similar research to the Covid Moonshot project (e.g. F@H and World Community Grid), that is finding a low cost and easily dispensed drug to cure the Covid virus.

Only my speculation, though.


i don't believe they are the same research.

and i've noticed some significant GPU behavior between running these two types.

The CRYPTICSCOUT tasks seem to be more computationally intense, clocks on my 2080ti are ~100MHz less than running D3RBandit, at the same power limit of 225W. Also the PCIe utilization is about 2x for CRYPTICSCOUT (41% @ PCIe 3.0 x16) vs D3RBandit (23% @ PCIe 3.0 x16). and similar observations from a 2070 on a completely separate system. CRYPTICSCOUT (60% @ PCIe 3.0 x8) vs D3RBandit (39% @ PCIe 3.0 x8) with 90-100MHz clock reduction at the same power limit (150W). Also noted a bit higher VRAM use

all of this leads me to believe that this is most likely something completely different.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56933 - Posted: 5 Jun 2021 | 21:23:43 UTC - in response to Message 56930.

Here's a bit of promise for owners of Ampere GPUs: https://www.acellera.com/index.php/2021/04/06/release-of-acemd-3-4/

The news from Acellera is 2 months old by now. Any update on using the new version here?

MrS
____________
Scanning for our furry friends since Jan 2002

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57002 - Posted: 22 Jun 2021 | 17:22:33 UTC
Last modified: 22 Jun 2021 | 17:23:21 UTC

Well these latest runs do not seem to take kindly to system restarts.https://www.gpugrid.net/result.php?resultid=32625822 and https://www.gpugrid.net/result.php?resultid=32625926. Two different machines each with a GTX1060.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 57003 - Posted: 22 Jun 2021 | 18:13:43 UTC - in response to Message 57002.

Well these latest runs do not seem to take kindly to system restarts.https://www.gpugrid.net/result.php?resultid=32625822 and https://www.gpugrid.net/result.php?resultid=32625926. Two different machines each with a GTX1060.

this is a long standing "problem"/idiosyncrasy with the acemd3 app. not really specific to these new tasks, though it was less of a setback with the shorter run tasks to lose maybe 10-20mins of work vs several hours you could lose with these long running tasks. even the fastest GPUs that can contribute at the moment (2080ti) will take 12hrs to complete a task. system restarts should be avoided/delayed until the WU processing is complete.

yours seems to have exited without saying why though, which is a little strange. usually system restarts could give an error stating that you cant restart on a different device (even when it restarted on the same device) but that doesn't look to have happened for you.

still. I'd recommend not restarting the system until computation has finished.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 57004 - Posted: 22 Jun 2021 | 22:23:00 UTC - in response to Message 57002.

Well these latest runs do not seem to take kindly to system restarts.https://www.gpugrid.net/result.php?resultid=32625822 and https://www.gpugrid.net/result.php?resultid=32625926. Two different machines each with a GTX1060.


I actually picked up a resend of this one. failed on my system too. but it failed immediately. not sure why yours ran for several hours before failing. but it looks like all the resends are failing too. must be a problem with the WU.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,927,231,959
RAC: 6,464,152
Level
Arg
Scientific publications
watwatwatwatwat
Message 57005 - Posted: 22 Jun 2021 | 23:59:49 UTC

I had 3 insta fails on one of my hosts this morning also.

goldfinch
Send message
Joined: 5 May 19
Posts: 31
Credit: 395,274,685
RAC: 609,056
Level
Asp
Scientific publications
wat
Message 57022 - Posted: 28 Jun 2021 | 11:45:49 UTC

Can these new tasks be processed on a laptop? My GTX 1060 shows between 0.36% and 0.72% per hour, while in the past it used to show 1.08% per hour. At such rates i am missing deadlines... And, according to GPU-Z, working at as low as 600-700 MHz, it has temperature of 91-92 C. I already asked this question... Is something wrong with my Gigabyte Aero 15, or should I expect better performance from a laptop? Or is it even impossible due to poor cooling compared to desktops/rigs?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 57025 - Posted: 28 Jun 2021 | 13:44:09 UTC - in response to Message 57022.

Can these new tasks be processed on a laptop? My GTX 1060 shows between 0.36% and 0.72% per hour, while in the past it used to show 1.08% per hour. At such rates i am missing deadlines... And, according to GPU-Z, working at as low as 600-700 MHz, it has temperature of 91-92 C. I already asked this question... Is something wrong with my Gigabyte Aero 15, or should I expect better performance from a laptop? Or is it even impossible due to poor cooling compared to desktops/rigs?


what are your ambient temps?

laptops will run hot no matter what. and with BOINC loads, and especially a project like GPUGRID which pushes the GPU harder than other projects, you should expect to see very high temps and thermal throttling. you might be able to help the situation a little by blowing out the fans if they are clogged up with dust or something. even with clear airflow, the laptop will likely still thermal throttle, but you should get better GPU clocks and process faster. you might not make the 5-day deadline still. these tasks are very long running.
____________

goldfinch
Send message
Joined: 5 May 19
Posts: 31
Credit: 395,274,685
RAC: 609,056
Level
Asp
Scientific publications
wat
Message 57032 - Posted: 29 Jun 2021 | 1:31:29 UTC - in response to Message 57025.

Thanks for confirming what I was suspecting. Yes, I blow the intake grid every now and then, and my laptop is in the coldes room in the house with temperature dropping to 10-15 C over night (it's winter in Australia now). But it's still throttling due to high temperature, and I don't want to increase the threshold beyond the current ninety-something degrees...

Surprisingly, the last task, despite taking more than 6 days and missing the deadline, was rewarded. But I wish it could be processed faster to make results timely...

Kind regards,
Vlad.

jjch
Send message
Joined: 10 Nov 13
Posts: 98
Credit: 15,288,150,388
RAC: 1,732,962
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57033 - Posted: 29 Jun 2021 | 2:53:56 UTC - in response to Message 57032.

Laptops generally are not the best for CPU/GPU computing. I don't know if you may already have one but get a decent cooling pad to set your laptop on. The best ones have adjustable fans that can be moved to the correct locations to obtain the best configuration. Also, if you are up to it I would suggest cleaning the internal workings of the laptop and replacing the thermal compound on the CPU and GPU.

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57034 - Posted: 29 Jun 2021 | 16:16:25 UTC

Surprisingly, the last task, despite taking more than 6 days and missing the deadline, was rewarded. But I wish it could be processed faster to make results timely...

It has been my experience that if you complete a work unit started before the deadline, but complete it past the deadline, you will receive credit as long as a resend does not return it first. With these new long WU's, it would be tough for a resend to complete before a one in progress already completes.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1031
Credit: 35,645,807,483
RAC: 75,297,575
Level
Trp
Scientific publications
wat
Message 57035 - Posted: 29 Jun 2021 | 16:34:50 UTC - in response to Message 57034.

Surprisingly, the last task, despite taking more than 6 days and missing the deadline, was rewarded. But I wish it could be processed faster to make results timely...

It has been my experience that if you complete a work unit started before the deadline, but complete it past the deadline, you will receive credit as long as a resend does not return it first. With these new long WU's, it would be tough for a resend to complete before a one in progress already completes.


really depends on the relative performance of the systems in question. my 2080tis can knock these tasks out in 12hrs.

if the original task hits a system that will take 6 days to complete, and the resend makes it to me, I will complete it in 12hrs from the 5 day deadline and still beat the original system.
____________

ChrisA
Send message
Joined: 6 Oct 21
Posts: 6
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 57551 - Posted: 9 Oct 2021 | 9:30:49 UTC - in response to Message 57035.

I joined GPUGrid only a few days ago but have yet to be sent any work. For much of this time, there hasn't been any available but, even when work is available, I haven't been sent any.

I am running on a Windows 10 pc with a 1050ti card. Is it simply the case that I am unpowered for the work that has come up (and so it is not sent to me) or could it be something else? Prime Grid runs without any problems.

Any thoughts welcome.

Apologies in advance if this is not an appropriate thread to raise this - but as I don't have any credit, I couldn't start my own thread.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57552 - Posted: 9 Oct 2021 | 10:27:39 UTC - in response to Message 57551.
Last modified: 9 Oct 2021 | 10:41:02 UTC

I joined GPUGrid only a few days ago but have yet to be sent any work. For much of this time, there hasn't been any available but, even when work is available, I haven't been sent any.

Please, consult this FAQ - Acemd3 application general guidelines thread.
Specially paragraphs:
What should I do to receive acemd3-based workunits?
What driver/card/OS combinations are supported?
Your first step would be to upgrade your current unsupported Nvidia v384.76 drivers to a newer supported version.

Edit:
Also take in mind that current extremely long tasks will take about 3 to 4 days to process for a GTX 1050 Ti card working 24/7.

ChrisA
Send message
Joined: 6 Oct 21
Posts: 6
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 57553 - Posted: 9 Oct 2021 | 11:19:37 UTC - in response to Message 57552.

Many thanks.

ChrisA
Send message
Joined: 6 Oct 21
Posts: 6
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 57570 - Posted: 10 Oct 2021 | 20:30:50 UTC - in response to Message 57552.

I updated my drivers as suggested and work has been successfully downloaded.

Even though my PC is on 24/7 it has managed only 10% in the first 24 hours. This means that I will exceed the deadline (which is only four days away) by a very considerable margin.

Should I just abort the task or is there still some merit in letting it finish (in another nine days or so)?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57571 - Posted: 10 Oct 2021 | 20:59:27 UTC - in response to Message 57570.

Even though my PC is on 24/7 it has managed only 10% in the first 24 hours

I've recently awakened my GTX 1050 Ti host, and it is currently processing task e7s79_e5s156p1f761-ADRIA_AdB_KIXCMYB_HIP-0-2-RND3813_1.
Its progress is about 33% after 27 hours...
I'm estimating it to finish in a total processing time of about 3 days and 9 hours, well inside its deadline for getting 450.000 credits.
I guess that there is something at your setup slowing down your GTX 1050 Ti performance.
My trick consists of setting Boinc Manager preferences to Use at most 50% of the CPUs on my 4 threads-CPU system.
This way, I ensure that CPU resources are not overcommitted for feeding the GPU at its maximum performance.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 18,783,925
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57573 - Posted: 11 Oct 2021 | 2:57:14 UTC - in response to Message 57570.
Last modified: 11 Oct 2021 | 2:57:45 UTC

I updated my drivers as suggested and work has been successfully downloaded.

Even though my PC is on 24/7 it has managed only 10% in the first 24 hours. This means that I will exceed the deadline (which is only four days away) by a very considerable margin.

Should I just abort the task or is there still some merit in letting it finish (in another nine days or so)?

go ahead and abort the task. There is no merit in having it finished after the deadline. You will not get any credits.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,461,293
RAC: 1,617,116
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57574 - Posted: 11 Oct 2021 | 3:37:00 UTC - in response to Message 57573.

I updated my drivers as suggested and work has been successfully downloaded.

Even though my PC is on 24/7 it has managed only 10% in the first 24 hours. This means that I will exceed the deadline (which is only four days away) by a very considerable margin.

Should I just abort the task or is there still some merit in letting it finish (in another nine days or so)?

go ahead and abort the task. There is no merit in having it finished after the deadline. You will not get any credits.

I disagree! This is your first WU ever at this system! So BOINC will not give an adequate time estimate! If ServicEnginIC estimates that his GTX 1050ti will finish this WU in about 3 days and 9 hours, there is a high chance that your GPU will finish in this time frame as well! BOINC needs about 10 complete WUs until it will estimate times correctly!
Do not abort!

ChrisA
Send message
Joined: 6 Oct 21
Posts: 6
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 57578 - Posted: 11 Oct 2021 | 8:58:32 UTC - in response to Message 57571.


My trick consists of setting Boinc Manager preferences to Use at most 50% of the CPUs on my 4 threads-CPU system.
This way, I ensure that CPU resources are not overcommitted for feeding the GPU at its maximum performance.


I adjusted the settings as suggested and, for good measure, suspended all my other Boinc tasks. Twelve hours later, it hasn't quite reached 15% - so it is still averaging a little under 10% per 24 hour period.

Whilst this PC isn't dedicated exclusively to running Boinc - it is a normal "home computer" that is used for everyday things and usually runs backups etc overnight - I am surprised it is not able to achieve something at least a bit closer to the results you are getting with your 1050ti.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57579 - Posted: 11 Oct 2021 | 10:17:32 UTC - in response to Message 57578.

Whilst this PC isn't dedicated exclusively to running Boinc - it is a normal "home computer" that is used for everyday things and usually runs backups etc overnight - I am surprised it is not able to achieve something at least a bit closer to the results you are getting with your 1050ti.
The CPU of this system is 12 years old, probably using DDR2 memory and PCIe2.0. If your GPU is not connected to a x16 capable PCIe slot it reduces the bandwith of the GPU even further. I suggest to disable all other CPU crunching on this host, or put it on another project. If the task exceeds the deadline, there is little chance that you will receive any credit for it.

ChrisA
Send message
Joined: 6 Oct 21
Posts: 6
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 57636 - Posted: 19 Oct 2021 | 21:12:37 UTC - in response to Message 57579.

First off, I would like to thank all those who offered views on the issues I had been experiencing with (very) slow-running GPUGrid.

At the time of my last post, nothing seemed to be working to speed things up and I seemed destined not to ever be able to complete any work units in the allocated time.

I did, however, have a breakthrough (of sorts) and I am sharing that now in case it proves to be of help to any other users.

I was on the point of aborting the task when I noticed that "elapsed time" had diverged significantly from the passage of "real" time. A look through Boinc's log showed the reason - tasks were continually being "suspended", "resumed" then "suspended" and so on, over and over; for long periods of time, tasks barely ran at all - they were just in a cycle of starting and stopping.

I looked again at Boinc's "Computing Preferences" and shifted the "Suspend when non-Boinc CPU usage is above" figure from 25% (which I believe is the default setting) to 40%. Immediately, the incessant cycle of suspending and restarting ended and the temparature on the GPU rose significantly which suggested that it was working harder than hitherto.

After a period of monitoring, I found that progress was close to that which ServicEnginIC reported he was able to get from his 1050ti. I gradually added back other Boinc projects until, eventually, all four cores were running different work units (from other projects) - none of which had an adverse reaction on the speed of the GPUGrid work unit.

Unfortunately, the story does not end there. To date every single work unit that has been processed (apart from one which I aborted) has failed with a "computation error". This can happen after the work unit has been running for a few seconds, minutes, hours or, in one case, days. The Boinc log doesn't offer any clues - it just refers to the task ending and being uploaded.

The reports on my "Results" page are way beyond me so unless anyone who does understand them is able to spot something which gives a hint of the problem, it rather looks as if I will have to end my efforts with GPUGrid - at least for now!

Thanks again for the assistance offered.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,943,927,024
RAC: 10,733,819
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57637 - Posted: 20 Oct 2021 | 5:21:09 UTC - in response to Message 57636.
Last modified: 20 Oct 2021 | 5:43:42 UTC

The reports on my "Results" page are way beyond me so unless anyone who does understand them is able to spot something which gives a hint of the problem, it rather looks as if I will have to end my efforts with GPUGrid - at least for now!

Maybe you wish giving a last try after reading the Managing non-high-end hosts thread.
I was thinking of problems like the ones you are experiencing when I wrote it...

Edit.
boost::filesystem::rename: The process cannot access the file because it is being used by another process: ".restart.chk", "restart.chk"

Also, this warning at your failed tasks could mean that some other process is interfering the Gpugrid environment, perhaps an antivirus application

ChrisA
Send message
Joined: 6 Oct 21
Posts: 6
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 57654 - Posted: 26 Oct 2021 | 10:30:38 UTC - in response to Message 57637.
Last modified: 26 Oct 2021 | 10:32:46 UTC

Maybe you wish giving a last try after reading the Managing non-high-end hosts thread.
I was thinking of problems like the ones you are experiencing when I wrote it...

Edit.
boost::filesystem::rename: The process cannot access the file because it is being used by another process: ".restart.chk", "restart.chk"

Also, this warning at your failed tasks could mean that some other process is interfering the Gpugrid environment, perhaps an antivirus application
[/quote]

In the light of your further encouragement, I gave this one more go having first read the thread to which you referred and then adopting the same settings that you use. Unfortunately, after over 12 hours, the process fell over again and ended. Clearly there is something in my system that is causing the problem. I had a look for the "restart.chk" file (while GPU was not running) but couldn't find anything on my system; also, I note that the reference to the "restart.chk" file tends to be at the very start of running a work unit (rather than immediately before it gives up) - but maybe, in those first few seconds, it is sewing the seeds of failure?

Either way, I think it is time to call it a day with GPU Grid for now. I built my system in 2009 and apart from new drives and a new(er) GPU, it is still "original" and so building a new one is long overdue! Perhaps I will have better success with something more up to date.

Thanks again to all who contributed in an effort to resolve this.

Post to thread

Message boards : News : New D3RBanditTest workunits

//