Advanced search

Message boards : Number crunching : Peer certificate cannot be authenticated

Author Message
Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57437 - Posted: 4 Oct 2021 | 5:19:54 UTC
Last modified: 4 Oct 2021 | 5:24:24 UTC

after long time, I intended to resume GPUGRID crunching, and so I tried to download GPU tasks on my machine with GTX980ti inside.
However, the BOINC event log shows the following:

04.10.2021 07:14:02 | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
04.10.2021 07:14:04 | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates
04.10.2021 07:14:05 | | Project communication failed: attempting access to reference site
04.10.2021 07:14:06 | | Internet access OK - project servers may be temporarily down.

I don't think that the server is down, as on the server status page I can see that the number of tasks available dropping continually.

So, what's the problem with the "Peer certificate" ?

Edit: just tried it with another PC - same problem :-(
FYI: both systems are Windows 10

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57439 - Posted: 4 Oct 2021 | 5:57:14 UTC

What I also found out now:

On a new PC with two RTX3070 inside (OS: Windows 10) I tried to add the GPUGRID project; it did not work. BOINC gave me the message "adding of project failed".

However,I don't think that servers are down at GPUGRID. On the server status page, I can observe a permanently chaning figure for "unsent" tasks, and I am able to access everything on the GPUGRID website.

So what's the reason for which I cannot 1) download any tasks, and 2) not add GPUGRID as a project on a new PC ?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1358
Credit: 7,894,103,302
RAC: 7,266,669
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57440 - Posted: 4 Oct 2021 | 6:17:22 UTC

Please use the fix for the peer certificate problem with Windows BOINC hosts at the BOINC forums.
https://boinc.berkeley.edu/forum_forum.php?id=10
https://boinc.berkeley.edu/forum_thread.php?id=14413

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57441 - Posted: 4 Oct 2021 | 6:38:34 UTC

thanks you, Keith, for the hint. I downloaded the certificate and replaced it in the BOINC program folder.
Then, downloading tasks worked fine.

However, both tasks failed after about 11 seconds with
195 (0xc3) EXIT_CHILD_FAILED

see here:
https://www.gpugrid.net/result.php?resultid=32649276
and here:
https://www.gpugrid.net/result.php?resultid=32649255

anyone any idea what the problem is?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,266,666
RAC: 3,493,740
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57442 - Posted: 4 Oct 2021 | 6:53:12 UTC - in response to Message 57441.
Last modified: 4 Oct 2021 | 6:53:51 UTC

You should update the Microsoft Visual C++ runtime library as well:
https://aka.ms/vs/16/release/vc_redist.x86.exe
https://aka.ms/vs/16/release/vc_redist.x64.exe
Then restart Windows.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57443 - Posted: 4 Oct 2021 | 7:05:26 UTC - in response to Message 57442.

Zoltan, thanks for the hint.
I now tried to crunch tasks on the new PC with the two RTX3070 inside.
On this machine, I updated with the Microsoft Visual C++ runtime libraries several weeks ago, after there was a problem with Folding@home.

However, the tasks are failing, too. Whereas one interesting thing was that with the three tasks which were downloaded, in one case the cuda version was 101, in the two other cases it was 1121.

https://www.gpugrid.net/result.php?resultid=32649296 (101)
https://www.gpugrid.net/result.php?resultid=32649291 (1121)
https://www.gpugrid.net/result.php?resultid=32649279 (1121)

so I suspect the tasks don't run on Ampere cards yet :-(

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,376,466,723
RAC: 19,051,824
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57444 - Posted: 4 Oct 2021 | 7:15:40 UTC - in response to Message 57443.
Last modified: 4 Oct 2021 | 7:22:11 UTC

All three tasks failed with

app exit status: 0xc0000135

That's the characteristic signal of the missing VC runtime package. Are you sure you installed the correct version? There should be a file 'vcruntime140_1.dll' in your C:\Windows\System32 directory.

With that fixed, Ampere cards should run the 1121 version of the app, but the 101 version will still fail.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57445 - Posted: 4 Oct 2021 | 7:43:58 UTC - in response to Message 57444.

Richard,

'vcruntime140_1.dll' is indeed missing. Although for sure I had installed the two Visual CC++ files several weeks ago.
No idea what happened.

I will try to reinstall the two files and see what happens.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57446 - Posted: 4 Oct 2021 | 10:18:13 UTC

VC runtime package re-installed, a double check has shown that 'vcruntime140_1.dll' is present in C:\Windows\System32 directory.
System restart and new download of GPUGRID tasks.

However, they also failed after a few seconds:

https://www.gpugrid.net/result.php?resultid=32649623
https://www.gpugrid.net/result.php?resultid=32649596

The reason obviously is: ACEMD v2.18 (cuda101)

Why do I not received the correct version cuda1121? What's going wrong?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57447 - Posted: 4 Oct 2021 | 10:35:38 UTC - in response to Message 57446.

The reason obviously is: ACEMD v2.18 (cuda101)
Why do I not received the correct version cuda1121? What's going wrong?

I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121, because crunching works well.
What I notice is that these tasks challenge the GPU quite a lot, the GPUs become markedly hotter than e.g. with Folding@home or WCG GPU tasks. So I had to reduce the power input accordingly in order not to overheat my two RTX3070; I use to run them at 60/61°C, not higher.

On another machine with a GTX970 inside the task got downloaded and running, after also there having installed the new peer certificate.

However, a rough calculation of the total runtime of this task (e2s130_e1s109p0f1198-ADRIA_AdB_KIXCMYB_HIP-0-2-RND3205_2) yields about 70hours :-(((
Which shows that these new task are obviously not really good for older cards.

However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-(

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,266,666
RAC: 3,493,740
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57455 - Posted: 4 Oct 2021 | 14:58:13 UTC - in response to Message 57447.

The reason obviously is: ACEMD v2.18 (cuda101)
Why do I not received the correct version cuda1121? What's going wrong?

I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121 ...
However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-(

I think you should install the latest NVidia driver (472.12) to fix this.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 27
Level
Trp
Scientific publications
wat
Message 57456 - Posted: 4 Oct 2021 | 15:14:27 UTC - in response to Message 57455.

The reason obviously is: ACEMD v2.18 (cuda101)
Why do I not received the correct version cuda1121? What's going wrong?

I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121 ...
However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-(

I think you should install the latest NVidia driver (472.12) to fix this.


his drivers on the 3070 host are already adequate for the cuda1121 app (which is why he received some). The problem is the project scheduler sending an incompatible app to the Ampere cards. A cuda101 app will never work on Ampere with the gpu-architecture checks in place in the app. a cuda101 app has no knowledge of the Ampere architecture and can't be added in. This is why a CUDA 11.2+ app was required for Ampere cards. (technically the admins could make their app architecture agnostic if they built PTX versions of the kernel to include with the app, but it's clear they don't want to do this or it would require too much work on their end)

The project admins are aware of the issue of the cuda101 app being sent to Ampere hosts, and have commented that they are working on a fix to the scheduler.
____________

Profile PDW
Send message
Joined: 7 Mar 14
Posts: 15
Credit: 5,480,724,525
RAC: 10,196,778
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57457 - Posted: 4 Oct 2021 | 15:39:31 UTC - in response to Message 57456.

I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts, the same hosts also had other cuda101 tasks that failed today as well. Drivers are 470.xx

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 27
Level
Trp
Scientific publications
wat
Message 57459 - Posted: 4 Oct 2021 | 17:27:16 UTC - in response to Message 57457.

I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts


since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task?

when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation?

was it one of the Beta cuda101 apps?
____________

Profile PDW
Send message
Joined: 7 Mar 14
Posts: 15
Credit: 5,480,724,525
RAC: 10,196,778
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57465 - Posted: 4 Oct 2021 | 19:15:14 UTC - in response to Message 57459.

since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task?

when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation?

was it one of the Beta cuda101 apps?

I only run single GPU systems, unless you count the iGPUs (which I don't use for BOINC) in some Windows machines. I'm only running some of my linux machines with no iGPUs on GPUGrid at the moment, so no secondary GPUs involved.

No, it wasn't a Beta cuda101 task or app.

I'll show you the more interesting task as it first failed on a 3070 with a newer driver than mine. It is labelled as "New version of ACEMD v2.18 (cuda101)" and it showed as cuda101 whilst running on the client. http://gpugrid.net/workunit.php?wuid=27080654

I also have a third cuda101 task running, nearly 70% completed, on a different machine with only an Ampere card in it.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1358
Credit: 7,894,103,302
RAC: 7,266,669
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57467 - Posted: 4 Oct 2021 | 21:34:09 UTC - in response to Message 57441.

Did you reboot the host after fixing the file?

Looks like BOINC still threw a fault possibly with the zipping feature that may still depend on the expired SSL certificate.

You may have to wait for a new BOINC that has been hinted in development for ASAP release.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 27
Level
Trp
Scientific publications
wat
Message 57469 - Posted: 4 Oct 2021 | 22:18:16 UTC - in response to Message 57465.

since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task?

when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation?

was it one of the Beta cuda101 apps?

I only run single GPU systems, unless you count the iGPUs (which I don't use for BOINC) in some Windows machines. I'm only running some of my linux machines with no iGPUs on GPUGrid at the moment, so no secondary GPUs involved.

No, it wasn't a Beta cuda101 task or app.

I'll show you the more interesting task as it first failed on a 3070 with a newer driver than mine. It is labelled as "New version of ACEMD v2.18 (cuda101)" and it showed as cuda101 whilst running on the client. http://gpugrid.net/workunit.php?wuid=27080654

I also have a third cuda101 task running, nearly 70% completed, on a different machine with only an Ampere card in it.


My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run.

____________

Profile PDW
Send message
Joined: 7 Mar 14
Posts: 15
Credit: 5,480,724,525
RAC: 10,196,778
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57471 - Posted: 4 Oct 2021 | 22:51:05 UTC - in response to Message 57469.

My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run.

This "bug/idiosyncrasy" is also present on a different host with a different card and different OS !

When a cuda101 task normally fails it has something like this in the stderr file:

08:31:00 (56461): wrapper: running bin/acemd3 (--boinc --device 0)
ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

08:31:01 (56461): bin/acemd3 exited; CPU time 1.217885

So the program that is run is acemd3 (with the parameters), it then fails and exits.
The cuda101 that is currently running on a 3080 on a different machine is currently showing acemd3 in 'top' with the parameters. Surely that is the same program ?

The progress.log file in the slot folder shows a 3080 running so it knows what hardware it has to work with.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 27
Level
Trp
Scientific publications
wat
Message 57473 - Posted: 5 Oct 2021 | 1:10:41 UTC - in response to Message 57471.

It could be a bug in the app too I guess. I just haven’t seen that anywhere else. If you look in the gpugrid.net project folder you’ll find two separate executables for acemd3. Only difference is one is compiled with cuda101 and the other is compiled with cuda1121, according to top, both are referred to as just “acemd3”. I don’t know the exact mechanism that could cause one app to be used in place of another, but since both are present I can imagine it happening. BOINC isn’t the most robust software, lots of bugs, especially in older versions. But to me this makes more sense than some cuda101 tasks randomly working on Ampere when cuda101 has no knowledge of Ampere.
____________

Post to thread

Message boards : Number crunching : Peer certificate cannot be authenticated

//