Message boards : Graphics cards (GPUs) : All WUs on GTX660 failing
Author | Message |
---|---|
Have just built a new system with a GTX660 with the intention of running GPUGRID (encouraged by another system with a GTX660 that is working well). So far all 4 WUs that I have got start processing but fail after between 2 and 10 minutes. I think they have all had the "Driver has recovered after stopped working" message in Windows which I guess is an indication that the GPU hardware has failed. Both the old and new cards are factory overclocked (old one 1006 (1072 boost), new one 1033 (1098 boost). Memory on both is 1502.3 (which I think is not overclocked). GPU Core clocks reported by GPU-Z when running are 1162.7 (new) and 1123.5 (old) respectively. The new 660 is fine running PrimeGrid (two concurrent - pegged at 99% busy according to GPU-Z). Environmentals on both systems seem fine (temp about 60 degrees). | |
ID: 31427 | Rating: 0 | rate: / Reply Quote | |
Have just built a new system with a GTX660 with the intention of running GPUGRID (encouraged by another system with a GTX660 that is working well). So far all 4 WUs that I have got start processing but fail after between 2 and 10 minutes. I think they have all had the "Driver has recovered after stopped working" message in Windows which I guess is an indication that the GPU hardware has failed. Both the old and new cards are factory overclocked (old one 1006 (1072 boost), new one 1033 (1098 boost). Memory on both is 1502.3 (which I think is not overclocked). GPU Core clocks reported by GPU-Z when running are 1162.7 (new) and 1123.5 (old) respectively. The new 660 is fine running PrimeGrid (two concurrent - pegged at 99% busy according to GPU-Z). Environmentals on both systems seem fine (temp about 60 degrees). I'd suggest drivers. I notice the failing system has an ATI as well as Nvidia card. From memory you had to install ATI driver first followed by Nvidia. Also get drivers from Nvidia.com or GeForce.com and do clean install, don't rely on windows to install drivers. ____________ BOINC blog | |
ID: 31429 | Rating: 0 | rate: / Reply Quote | |
Thanks Mark, | |
ID: 31433 | Rating: 0 | rate: / Reply Quote | |
I'd suggest drivers. I notice the failing system has an ATI as well as Nvidia card. From memory you had to install ATI driver first followed by Nvidia. Also get drivers from Nvidia.com or GeForce.com and do clean install, don't rely on windows to install drivers. Eight of my systems have both ATI/AMD and NVidia. It makes no difference which driver is installed first. What does make a difference on at least some AMD chipset MBs is that the ATI/AMD GPU should be in PCIe slot 0 (master) and the NVidia GPU(s) in other slots . | |
ID: 31439 | Rating: 0 | rate: / Reply Quote | |
The system is Intel chipset. | |
ID: 31453 | Rating: 0 | rate: / Reply Quote | |
Driver resets are a known issue in windows that few people know about. There is a simple regedit that should cure the driver timeouts. I have used it on all of my ATI cards and it has worked perfectly. I don't see any reason why it shouldn't work on Nvidia cards too. I posted it once before but I don't know if anyone here tried it. I can post it again if you'd like to try it. The regedit will not do anything else to your system if it doesn't work. | |
ID: 31454 | Rating: 0 | rate: / Reply Quote | |
Well it's looking like hardware. I swapped the two cards and the original (that works fine in the old system) is working fine in the new system, while the new card (that was failing in the new system) is also failing in the old system (errored the GPUGRID WU that was in process within a minute or two). | |
ID: 31457 | Rating: 0 | rate: / Reply Quote | |
Driver resets are a known issue in windows that few people know about. There is a simple regedit that should cure the driver timeouts. I have used it on all of my ATI cards and it has worked perfectly. I don't see any reason why it shouldn't work on Nvidia cards too. I posted it once before but I don't know if anyone here tried it. I can post it again if you'd like to try it. The regedit will not do anything else to your system if it doesn't work. Sure, post it, and thanks in advance. | |
ID: 31463 | Rating: 0 | rate: / Reply Quote | |
Well it's looking like hardware. I swapped the two cards and the original (that works fine in the old system) is working fine in the new system, while the new card (that was failing in the new system) is also failing in the old system (errored the GPUGRID WU that was in process within a minute or two). I had a similar problem with a factory overclocked GTX660Ti - it was failing on both my systems. I managed to get Amazon to swap it for a standard clocked card and the failures stopped. | |
ID: 31465 | Rating: 0 | rate: / Reply Quote | |
Thanks - I'll certainly try to negotiate something like that (this card is even failing when I set it back to reference clocks!). The reason I got the card was not because of its overclock, but because it has good cooling | |
ID: 31494 | Rating: 0 | rate: / Reply Quote | |
98% GPU usage on W7 sounds high and 60% TDP sounds low. | |
ID: 31499 | Rating: 0 | rate: / Reply Quote | |
Try to run some fairly recent 3D Mark - if this is not stable you've certainly got a problem. And one which is easy to reproduce for the RMA guys. | |
ID: 31509 | Rating: 0 | rate: / Reply Quote | |
Is this a machine you are dedicating to GPUGRID? You may have a hardware issue, but seeing as Windows 7 is the least efficient OS for the project and you are having potential issues with the software, I would encourage you to try Linux. This would help narrow down if it is a software or hardware issue, and if it succeeds, you will have a very efficient dedicated machine for GPUGRID. | |
ID: 31521 | Rating: 0 | rate: / Reply Quote | |
Yes the memory load was 1% (also on the WU running now on the "good" 660 - it's been running for almost 24 hours and is saying it is 59% complete). Looks like you have seen or heard of this before?! <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> MDIO: cannot open file "output.restart.coor" SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> | |
ID: 31531 | Rating: 0 | rate: / Reply Quote | |
The WU eventually failed after mor than 1000,000 seconds That sucks, what a waste of time and power. You ran that work unit for 11.5 days? Or maybe you misplaced a decimal point, that really sucks. I think it's 111,587.05, still that's 31 hours. | |
ID: 31532 | Rating: 0 | rate: / Reply Quote | |
Whoops my mistake - should have been 100,000 secs. Sorry.. | |
ID: 31533 | Rating: 0 | rate: / Reply Quote | |
Hi MrS, | |
ID: 31534 | Rating: 0 | rate: / Reply Quote | |
Just read the thread "Current Noelia WUs" and that explains some of what I am seeing (log-running Noelia_MGs) but of course my 660s are 2GB. Does explain the fact that the Noelia_MGs took twice as ling to complete on my (1 GB) 560 ti. | |
ID: 31538 | Rating: 0 | rate: / Reply Quote | |
I find the 314 driver to be more reliable than the 310 driver on a GTX660 (W7), and I'm not keen on the 320.x drivers and nor would you be if you read the release notes. | |
ID: 31541 | Rating: 0 | rate: / Reply Quote | |
Driver resets are a known issue in windows that few people know about. There is a simple regedit that should cure the driver timeouts. I have used it on all of my ATI cards and it has worked perfectly. I don't see any reason why it shouldn't work on Nvidia cards too. I posted it once before but I don't know if anyone here tried it. I can post it again if you'd like to try it. The regedit will not do anything else to your system if it doesn't work. Been away for a few days and I'm at work now. I'll post it when I get home this evening. | |
ID: 31542 | Rating: 0 | rate: / Reply Quote | |
I'm trying another tack here - wonder if there is some bad memory on the card? After the short WU failed, I underclocked the memory a bit. The next short WU worked OK! OK there were a couple after that did not, but they have failed on other machines also. I've lowered the memory clock a bit more and will see how it goes. I've found a few memory testing tools which I will try also - if there's a memory problem, hopefully I can find a way to reliably show it that will allow the seller to verify the problem. | |
ID: 31552 | Rating: 0 | rate: / Reply Quote | |
Ran some memory tests (memtestcl and memtestG80) with no errors (but probably need to run for a day or two to do a thorough check). Then set the card to reference GPU core clock (980/1045), and memory underclocked from 6008 to 5494. It has processed a NOELIA klebe successfully, and is now nearly 2 hours into a NOELIA 7MG (that is working properly with 38% memory controller load). | |
ID: 31569 | Rating: 0 | rate: / Reply Quote | |
Hm thats normal here. If wus begin to fail overvolt by 0.025V and underclock the mem by 100mhz was the recommend solution here in the forum. And it really works for me since a longer time now. | |
ID: 31587 | Rating: 0 | rate: / Reply Quote | |
Driver resets are a known issue in windows that few people know about. There is a simple regedit that should cure the driver timeouts. I have used it on all of my ATI cards and it has worked perfectly. I don't see any reason why it shouldn't work on Nvidia cards too. I posted it once before but I don't know if anyone here tried it. I can post it again if you'd like to try it. The regedit will not do anything else to your system if it doesn't work. Sorry for the delay. Copy and paste the entire code below (including the Windows Registry Editor Version 5.00 part) into notepad. Rename it fix.reg or something else if you'd like as long as it ends with the .reg extension. After renaming it right click on it and open it with registry editor. You'll get warnings about editing the registry. Just click yes and the code will be added to your registry. Reboot and you should be good to go. This should stop the driver has stopped responding messages and the errors to the WUs when the driver restarts. Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog] "DisableBugCheck"="1" [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display] "EaRecovery"="0" | |
ID: 31602 | Rating: 0 | rate: / Reply Quote | |
I run 320.18 and 320.49 with the GTX660's and the errors are Noelia's or a short run from Santi. Almost none Nathan's error out. | |
ID: 31606 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : All WUs on GTX660 failing