Message boards : Number crunching : Peer certificate cannot be authenticated
Author | Message |
---|---|
after long time, I intended to resume GPUGRID crunching, and so I tried to download GPU tasks on my machine with GTX980ti inside. | |
ID: 57437 | Rating: 0 | rate: / Reply Quote | |
What I also found out now: | |
ID: 57439 | Rating: 0 | rate: / Reply Quote | |
Please use the fix for the peer certificate problem with Windows BOINC hosts at the BOINC forums. | |
ID: 57440 | Rating: 0 | rate: / Reply Quote | |
thanks you, Keith, for the hint. I downloaded the certificate and replaced it in the BOINC program folder. | |
ID: 57441 | Rating: 0 | rate: / Reply Quote | |
You should update the Microsoft Visual C++ runtime library as well: | |
ID: 57442 | Rating: 0 | rate: / Reply Quote | |
Zoltan, thanks for the hint. | |
ID: 57443 | Rating: 0 | rate: / Reply Quote | |
All three tasks failed with app exit status: 0xc0000135 That's the characteristic signal of the missing VC runtime package. Are you sure you installed the correct version? There should be a file 'vcruntime140_1.dll' in your C:\Windows\System32 directory. With that fixed, Ampere cards should run the 1121 version of the app, but the 101 version will still fail. | |
ID: 57444 | Rating: 0 | rate: / Reply Quote | |
Richard, | |
ID: 57445 | Rating: 0 | rate: / Reply Quote | |
VC runtime package re-installed, a double check has shown that 'vcruntime140_1.dll' is present in C:\Windows\System32 directory. | |
ID: 57446 | Rating: 0 | rate: / Reply Quote | |
The reason obviously is: ACEMD v2.18 (cuda101) I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121, because crunching works well. What I notice is that these tasks challenge the GPU quite a lot, the GPUs become markedly hotter than e.g. with Folding@home or WCG GPU tasks. So I had to reduce the power input accordingly in order not to overheat my two RTX3070; I use to run them at 60/61°C, not higher. On another machine with a GTX970 inside the task got downloaded and running, after also there having installed the new peer certificate. However, a rough calculation of the total runtime of this task (e2s130_e1s109p0f1198-ADRIA_AdB_KIXCMYB_HIP-0-2-RND3205_2) yields about 70hours :-((( Which shows that these new task are obviously not really good for older cards. However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-( | |
ID: 57447 | Rating: 0 | rate: / Reply Quote | |
The reason obviously is: ACEMD v2.18 (cuda101) I think you should install the latest NVidia driver (472.12) to fix this. | |
ID: 57455 | Rating: 0 | rate: / Reply Quote | |
The reason obviously is: ACEMD v2.18 (cuda101) his drivers on the 3070 host are already adequate for the cuda1121 app (which is why he received some). The problem is the project scheduler sending an incompatible app to the Ampere cards. A cuda101 app will never work on Ampere with the gpu-architecture checks in place in the app. a cuda101 app has no knowledge of the Ampere architecture and can't be added in. This is why a CUDA 11.2+ app was required for Ampere cards. (technically the admins could make their app architecture agnostic if they built PTX versions of the kernel to include with the app, but it's clear they don't want to do this or it would require too much work on their end) The project admins are aware of the issue of the cuda101 app being sent to Ampere hosts, and have commented that they are working on a fix to the scheduler. ____________ | |
ID: 57456 | Rating: 0 | rate: / Reply Quote | |
I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts, the same hosts also had other cuda101 tasks that failed today as well. Drivers are 470.xx | |
ID: 57457 | Rating: 0 | rate: / Reply Quote | |
I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation? was it one of the Beta cuda101 apps? ____________ | |
ID: 57459 | Rating: 0 | rate: / Reply Quote | |
since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? I only run single GPU systems, unless you count the iGPUs (which I don't use for BOINC) in some Windows machines. I'm only running some of my linux machines with no iGPUs on GPUGrid at the moment, so no secondary GPUs involved. No, it wasn't a Beta cuda101 task or app. I'll show you the more interesting task as it first failed on a 3070 with a newer driver than mine. It is labelled as "New version of ACEMD v2.18 (cuda101)" and it showed as cuda101 whilst running on the client. http://gpugrid.net/workunit.php?wuid=27080654 I also have a third cuda101 task running, nearly 70% completed, on a different machine with only an Ampere card in it. | |
ID: 57465 | Rating: 0 | rate: / Reply Quote | |
Did you reboot the host after fixing the file? | |
ID: 57467 | Rating: 0 | rate: / Reply Quote | |
since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run. ____________ | |
ID: 57469 | Rating: 0 | rate: / Reply Quote | |
My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run. This "bug/idiosyncrasy" is also present on a different host with a different card and different OS ! When a cuda101 task normally fails it has something like this in the stderr file: 08:31:00 (56461): wrapper: running bin/acemd3 (--boinc --device 0) ACEMD failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) 08:31:01 (56461): bin/acemd3 exited; CPU time 1.217885 So the program that is run is acemd3 (with the parameters), it then fails and exits. The cuda101 that is currently running on a 3080 on a different machine is currently showing acemd3 in 'top' with the parameters. Surely that is the same program ? The progress.log file in the slot folder shows a 3080 running so it knows what hardware it has to work with. | |
ID: 57471 | Rating: 0 | rate: / Reply Quote | |
It could be a bug in the app too I guess. I just haven’t seen that anywhere else. If you look in the gpugrid.net project folder you’ll find two separate executables for acemd3. Only difference is one is compiled with cuda101 and the other is compiled with cuda1121, according to top, both are referred to as just “acemd3”. I don’t know the exact mechanism that could cause one app to be used in place of another, but since both are present I can imagine it happening. BOINC isn’t the most robust software, lots of bugs, especially in older versions. But to me this makes more sense than some cuda101 tasks randomly working on Ampere when cuda101 has no knowledge of Ampere. | |
ID: 57473 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Peer certificate cannot be authenticated