Advanced search

Message boards : Graphics cards (GPUs) : Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error

Author Message
Jari Kosonen
Send message
Joined: 5 May 22
Posts: 22
Credit: 8,923,305
RAC: 639
Level
Ser
Scientific publications
wat
Message 58805 - Posted: 12 May 2022 | 0:43:49 UTC

After rebooting the system and restarting the boinc GPUGRID,
it first runs normally, but then appear error:
$ nvidia-smi
Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error

Erich56
Send message
Joined: 1 Jan 15
Posts: 944
Credit: 3,685,513,165
RAC: 826,467
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 58807 - Posted: 12 May 2022 | 5:21:13 UTC - in response to Message 58805.

which of your two machines are you talking about?
The one running Linux or the other one running Windows?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 744
Credit: 4,948,103,494
RAC: 778,033
Level
Arg
Scientific publications
wat
Message 58809 - Posted: 12 May 2022 | 12:06:30 UTC - in response to Message 58805.
Last modified: 12 May 2022 | 12:13:45 UTC

After rebooting the system and restarting the boinc GPUGRID,
it first runs normally, but then appear error:
$ nvidia-smi
Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error


this happens when the driver crashes or the GPU has some kind of problem and drops off. only a reboot can bring it back.

check your power and PCIe connections to make sure they are good. I mostly encountered this issue with dodgy power cables.

edit- scratch that, I see that these are laptops now. so not much you can do really for checking power connections. it could be that the cards are overheating when trying to run GPUGRID tasks. make sure the laptops have adequate airflow and are maintaining reasonable temps. maybe reduce overclocks if any. that might be all you can do without getting into the weeds and taking it apart to replace thermal paste, etc.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 744
Credit: 4,948,103,494
RAC: 778,033
Level
Arg
Scientific publications
wat
Message 58810 - Posted: 12 May 2022 | 12:14:15 UTC - in response to Message 58807.

which of your two machines are you talking about?
The one running Linux or the other one running Windows?


it's the linux one.
____________

Jari Kosonen
Send message
Joined: 5 May 22
Posts: 22
Credit: 8,923,305
RAC: 639
Level
Ser
Scientific publications
wat
Message 58812 - Posted: 12 May 2022 | 17:12:32 UTC - in response to Message 58807.
Last modified: 12 May 2022 | 17:14:48 UTC

Linux and I think that looks like driver crash as explained here.

Windows machine case I think I don't have the right OpenCL NVIDIA library/driver
or some issue like that and the GPUGRID didn't start at all in the windows11 machine.
Could you advice where to download the required driver for the windows11?
Or is there any other reasons causing this?

Jari Kosonen
Send message
Joined: 5 May 22
Posts: 22
Credit: 8,923,305
RAC: 639
Level
Ser
Scientific publications
wat
Message 58822 - Posted: 16 May 2022 | 12:28:28 UTC - in response to Message 58809.

I will try the thermal paste change as soon as I receive it by post.
Currently I have the Kryonaut extreme 14.2W/mK and I will try
some other brand that says 14.6W/mK.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1072
Credit: 1,451,778,214
RAC: 400,326
Level
Met
Scientific publications
watwatwatwatwat
Message 58823 - Posted: 16 May 2022 | 18:34:31 UTC - in response to Message 58812.

Always best to grab Nvidia drivers straight from Nvidia. Get the Studio drivers.
https://www.nvidia.com/Download/index.aspx?lang=en-us#

Jari Kosonen
Send message
Joined: 5 May 22
Posts: 22
Credit: 8,923,305
RAC: 639
Level
Ser
Scientific publications
wat
Message 58843 - Posted: 23 May 2022 | 2:15:29 UTC - in response to Message 58823.
Last modified: 23 May 2022 | 2:16:56 UTC

ok.
I got that drivers, but installation could be difficult issue.
Currently with MX-Linux driver version is: 510.47.03
and downloaded version is later: 510.73.05.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1072
Credit: 1,451,778,214
RAC: 400,326
Level
Met
Scientific publications
watwatwatwatwat
Message 58854 - Posted: 23 May 2022 | 21:33:07 UTC

Do a sudo apt purge *nvidia* to get rid of the existing drivers and reboot

That will put you on the stock Nouveau drivers.

Then install the new Studio 510.73.05 drivers.

You should get the OpenCL component necessary for other projects and the current CUDA 11.6 libraries bundled into the Desktop driver.

Jari Kosonen
Send message
Joined: 5 May 22
Posts: 22
Credit: 8,923,305
RAC: 639
Level
Ser
Scientific publications
wat
Message 58870 - Posted: 27 May 2022 | 10:29:59 UTC - in response to Message 58854.
Last modified: 27 May 2022 | 10:44:08 UTC

I removed the drivers with:
sudo ddm-mx -p nvidia

But the NVIDIA-installer still says that the drivers are there and refuses to install the later driver version:

nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Fri May 27 18:38:09 2022
installer version: 510.73.05

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
./nvidia-installer

Using: nvidia-installer ncurses v6 user interface
-> Detected 8 CPUs online; setting concurrency level to 8.
-> Installing NVIDIA driver version 510.73.05.
-> The NVIDIA driver appears to have been installed previously using a different installer. To prevent potential conflicts, it is recommended either to update the existing installation using the same mechanism by which it was originally installed, or to uninstall the existing installation before installing this driver.

Please review the message provided by the maintainer of this alternate installation method and decide how to proceed:

Please use the Debian packages instead of the .run file.


(Answer: Continue installation)
-> Running distribution scripts
executing: '/usr/lib/nvidia/pre-install'...




If you want to use the nvidia-installer please uninstall the Debian packages
first. The two methods of installation cannot be used at the same time.

Terminating nvidia-installer in 1 seconds.
Killing nvidia-installer

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1072
Credit: 1,451,778,214
RAC: 400,326
Level
Met
Scientific publications
watwatwatwatwat
Message 58872 - Posted: 28 May 2022 | 4:26:10 UTC - in response to Message 58870.

Can't help you here. I know nothing about MX-Linux.

Commands are entirely foreign to me.

I know Ubuntu and Debian. And use the graphics-drivers ppa.

I get rid of older drivers with a purge.

It sounds like you are running the Nvidia .run.sh installer perchance.

I believe it has its own uninstaller by running the Nvidia driver .run.sh script again with the --uninstall option.

Jari Kosonen
Send message
Joined: 5 May 22
Posts: 22
Credit: 8,923,305
RAC: 639
Level
Ser
Scientific publications
wat
Message 58873 - Posted: 28 May 2022 | 4:40:38 UTC - in response to Message 58872.
Last modified: 28 May 2022 | 4:41:17 UTC

The command sequence found to remove the NVIDIA MXLinux driver is possibly:

apt purge nvidia* -y
apt-get purge $FORCE $(apt-cache pkgnames | grep nvidia | grep -v detect | grep -v cleanup | cut -d':' -f1) bumblebee* primus* primus*:i386 2>&1
apt autoremove

and then new driver version 510.73.05 was installed and the system
stopped crashing:
Sat May 28 12:40:23 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05 Driver Version: 510.73.05 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| N/A 72C P0 N/A / N/A | 2291MiB / 4096MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1418 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3344 C bin/python 2285MiB |
+-----------------------------------------------------------------------------+

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 520
Credit: 2,275,558,465
RAC: 186
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58874 - Posted: 28 May 2022 | 10:48:47 UTC - in response to Message 58873.

Congrats, well done!

Post to thread

Message boards : Graphics cards (GPUs) : Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error

//