Message boards : News : Windows GPU Applications broken
Author | Message |
---|---|
Currently we have the windows applications broken. We are looking into it. | |
ID: 49925 | Rating: 0 | rate: / Reply Quote | |
Thanks, but all the GPU work has been cancelled for Linux also. Maybe you could add some back? | |
ID: 49926 | Rating: 0 | rate: / Reply Quote | |
I have now deprecated the Windows apps, so we can put some more work for Linux | |
ID: 49929 | Rating: 0 | rate: / Reply Quote | |
We are trying to create a new app for Windows, but it might take few days. | |
ID: 49930 | Rating: 0 | rate: / Reply Quote | |
We are trying to create a new app for Windows, but it might take few days. | |
ID: 49931 | Rating: 0 | rate: / Reply Quote | |
PS: can you post here some of the WU failed for Windows so that I can easily find the error message? Error message is always the same.. <core_client_version>7.10.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -44 (0xffffffd4)</message> ]]> take some of my recent ones. https://www.gpugrid.net/results.php?userid=146761 ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49932 | Rating: 0 | rate: / Reply Quote | |
Thanks for the update, GDF. Noticed this yesterday on my 2 Win boxes. Here's a couple of the errors: | |
ID: 49933 | Rating: 0 | rate: / Reply Quote | |
We had the same error number (I searched for 0xffffffd4) and the same symptoms - every task failing on Windows machines, for one of the Windows apps, at the same time - on 14/15 April 2017. | |
ID: 49934 | Rating: 0 | rate: / Reply Quote | |
I think if all UTs generate the same error, it should be pretty simple. | |
ID: 49936 | Rating: 0 | rate: / Reply Quote | |
Thanks, but all the GPU work has been cancelled for Linux also. Maybe you could add some back? This is still the case. Now everyone is doing nothing. | |
ID: 49937 | Rating: 0 | rate: / Reply Quote | |
I just got 243000 credits for a GPU task on my main Linux box.It and the the Linux laptop are running QC tasks. | |
ID: 49938 | Rating: 0 | rate: / Reply Quote | |
Could someone please explain this? (Windows hosts still receive and fail GPU workunits, while the Windows app is deprecated) | |
ID: 49973 | Rating: 0 | rate: / Reply Quote | |
Just caught myself a couple of those on my mine canary - the one which will tell me when good work starts to flow again. 21/07/2018 00:16:01 | | [unparsed_xml] FILE_REF::parse(): unrecognized: 'rboinc/' in the log - one for every downloaded task file. They look like <file_ref> <rboinc/> is meaningless in that context and shouldn't be there. | |
ID: 49974 | Rating: 0 | rate: / Reply Quote | |
A volunteer effort. doubtless "Measure twice cut once" at this time. | |
ID: 49976 | Rating: 0 | rate: / Reply Quote | |
... (Windows hosts still receive and fail GPU workunits, while the Windows app is deprecated) this seems to explain why on the Server Status Page the tasks show error rates of 81% and higher. | |
ID: 49978 | Rating: 0 | rate: / Reply Quote | |
I had been running the Windows App on my fastest system successfully after the 14th July license problems (which occur annually but seem to take you by surprise every time) by turning my system time back. However, now you seem to have deprecated it and I get nothing. | |
ID: 49985 | Rating: 0 | rate: / Reply Quote | |
I had been running the Windows App on my fastest system successfully after the 14th July license problems (which occur annually but seem to take you by surprise every time) by turning my system time back. However, now you seem to have deprecated it and I get nothing. How come you didn't tell us to set the time back before? We could've kept crunching | |
ID: 49986 | Rating: 0 | rate: / Reply Quote | |
Yes, sorry that I didn't. However, the problem is a recurring one and I thought everyone knew from the last time it happened. | |
ID: 49987 | Rating: 0 | rate: / Reply Quote | |
We just have to be patient for the scientists to fix the problems. In the meantime, some other GPU projects I'm attached to (with 0 resource share, as backup projects), are getting some extra work done for them :) | |
ID: 49988 | Rating: 0 | rate: / Reply Quote | |
If your running other projects other than GPUGrid, I would strongly recommend that you don't set your clock back, you could get validation errors or WU's canceled because of deadline over-run. | |
ID: 49989 | Rating: 0 | rate: / Reply Quote | |
... the problem is a recurring one ... indeed it is. What I am wondering is that obviously no one keeps track of the expiration dates of the various licenses :-( | |
ID: 49990 | Rating: 0 | rate: / Reply Quote | |
I am surprised it is taking this long to fix the issue... | |
ID: 49994 | Rating: 0 | rate: / Reply Quote | |
I am surprised it is taking this long to fix the issue... Holidays, weekends, too few people. | |
ID: 49995 | Rating: 0 | rate: / Reply Quote | |
I am surprised it is taking this long to fix the issue... Oh, c'mon. We know that there are too few people on workdays too. | |
ID: 49996 | Rating: 0 | rate: / Reply Quote | |
Maybe just too few people interested. | |
ID: 49997 | Rating: 0 | rate: / Reply Quote | |
As far as I know a new app was uploaded but still not working. Licensing is not related to BOINC. | |
ID: 49998 | Rating: 0 | rate: / Reply Quote | |
Licensing is not related to BOINC. I didn't say it was but it is related to your project and App. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 49999 | Rating: 0 | rate: / Reply Quote | |
@betting slip | |
ID: 50000 | Rating: 0 | rate: / Reply Quote | |
I turned the system clock back to before the time the license expired which allowed the Windows App to work but then someone deprecated the App thus I couldn't do that any longer. | |
ID: 50001 | Rating: 0 | rate: / Reply Quote | |
I turned the system clock back to before the time the license expired ... hm, how did you know in advance ? | |
ID: 50002 | Rating: 0 | rate: / Reply Quote | |
The problem with BOINC is that it runs the apps in an almost opaque environment, so when things go wrong there is no useful indication on the direction to take to a speedy fix. did you ever consider to come up with your own application, indipendent of BOINC (similar to what FAH is doing)? | |
ID: 50003 | Rating: 0 | rate: / Reply Quote | |
Yes but it's a huge development effort. Admittedly once done you have full control but we don't have the resources to do this. | |
ID: 50004 | Rating: 0 | rate: / Reply Quote | |
This may be incredibly ignorant of me but why not release the old windows app with a different version number and renewed license. | |
ID: 50005 | Rating: 0 | rate: / Reply Quote | |
I had been running the Windows App on my fastest system successfully after the 14th July license problems (which occur annually but seem to take you by surprise every time) by turning my system time back. If your running other projects other than GPUGrid, I would strongly recommend that you don't set your clock back, you could get validation errors or WU's canceled because of deadline over-run. I would like to add that setting of the system clock back in time will break Windows Update and the update automation of many antivirus products, so this is highly not recommended thus could serve only as a temporary measure and do it only at your own risk. | |
ID: 50008 | Rating: 0 | rate: / Reply Quote | |
I will be switching all of my crunching systems to Linux as I don't see this will be fixed anytime soon. | |
ID: 50009 | Rating: 0 | rate: / Reply Quote | |
I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude. | |
ID: 50010 | Rating: 0 | rate: / Reply Quote | |
I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude. Well their data still needs to be processed at the end of the day, and if we wait around and don't adapt to the situation the simulations will still be sitting there waiting to be processed and we will be that much further away from a cure. | |
ID: 50011 | Rating: 0 | rate: / Reply Quote | |
I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems +1 ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 50012 | Rating: 0 | rate: / Reply Quote | |
I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude. I will be switching all of my crunching systems to Linux as I don't see this will be fixed anytime soon.+1 I've swapped my GTX 1080 Ti from my main rig and installed Linux to my 3 online hosts on the last weekend. I wanted to do it anyway to get rid of WDDM. I'd like to have SWAN_SYNC in the Linux app too. | |
ID: 50014 | Rating: 0 | rate: / Reply Quote | |
...I wanted to do it anyway to get rid of WDDM. good point. Just out of interest, my question is: what slows up GPUGRID crunching more: the Windows WDDM or the missing Swan_sync with Linux? | |
ID: 50020 | Rating: 0 | rate: / Reply Quote | |
BOINC provides the capabilities/procedures to run a BOINC application in a test environment outside of the normal download/upload process. The licensing problem has nothing to do with BOINC, otherwise other projects would be failing right and left... | |
ID: 50021 | Rating: 0 | rate: / Reply Quote | |
Just out of interest, my question is: what slows up GPUGRID crunching more: the Windows WDDM or the missing Swan_sync with Linux? I will offer my 2 cents. There is a much bigger gain getting rid of WDDM and going to Linux. I have used Swan_sync with Windows only, but I would be surprised if you see much gain using Swan_sync with Linux, even if you can figure out how to do it. | |
ID: 50022 | Rating: 0 | rate: / Reply Quote | |
I will be a patient donator, and my systems will gladly do work when the admins have had enough time to fix the problems. Some of you guys get upset too easily, and your knee-jerk reactions are a bit rude. SWAN_SYNC is not needed. I have never had to adjust priority of CPU apps or reserve a CPU thread for GPU projects in Linux. GPU apps just take what is needed and the CPUs apps get what is left. GPU utilization is just higher in Linux w/o any settings and CPU util is around 15-20% for GPUGrid. I am currently running FAH on a GPU and 4 BOINC CPU tasks on one PC. Another PC is running GPUGrid and 16 CPU tasks. | |
ID: 50023 | Rating: 0 | rate: / Reply Quote | |
Well, the stats pages do not support your argument.I've swapped my GTX 1080 Ti from my main rig and installed Linux to my 3 online hosts on the last weekend. 1. Before the Windows app broke down, I was the #1 on the "Performance" tab in the "Top average performers (last week Long Runs)" with my three Windows 10 + SWAN_SYNC ON + GTX 1080 Ti hosts (my GPUs are factory overclocked, but I don't use fancy water cooling) 2. Check the following batches on the Performance page the "Top performers per batch": PABLO_IDP_P01106_2_ASNP21P_ID PABLO_IDP_P01106_2_ASNP3P_ID PABLO_IDP_P01106_4_LEUP14P_ID You'll find that my GTX 980 Ti beats, or gets very close to GTX 1080 Tis, and GTX TITAN X (Pascal) GPUs running under Linux. That's because it was running under Windows XP (without WDDM) and with SWAN_SYNC ON. GPU utilization is just higher in Linux w/o any settings and CPU util is around 15-20% for GPUGrid.True, but it could be even higher with SWAN_SYNC ON. I am currently running FAH on a GPU and 4 BOINC CPU tasks on one PC. Another PC is running GPUGrid and 16 CPU tasks.Well, that's irrelevant for me, as I don't run CPU tasks at all, as I want to optimize my PC for GPUGrid. All in all: I'd like to have the option under Linux to assign a full CPU thread / core to my GPUGrid tasks with SWAN_SYNC on, as it will make tasks crunch faster on Linux too. | |
ID: 50024 | Rating: 0 | rate: / Reply Quote | |
Well, the stats pages do not support your argument.I've swapped my GTX 1080 Ti from my main rig and installed Linux to my 3 online hosts on the last weekend. I never once mentioned overall performance and was not referencing anything about performance but GPU utilization. You failed to see that even with many things running in different situations that the GPU is fully utilized in Linux w/o wasting a CPU thread. | |
ID: 50026 | Rating: 0 | rate: / Reply Quote | |
You failed to see that even with many things running in different situations that the GPU is fully utilized in Linux w/o wasting a CPU thread.You failed to see that a GTX 1080 Ti can't be fully utilized under Linux if a fully utilized GTX 980 Ti (previous generation) can achieve 98.66% of its performance. I see that you and I use or computers in a different manner: I do not consider feeding a GPU with a full CPU thread as waste, because I know that otherwise I'm wasting 5-15% performance of my GPUs. The lack of SWAN_SYNC in the Linux client forces me to waste that much GPU performance. I want to have this choice, while you don't. Therefore you don't need SWAN_SYNC, while I (and many others) do. So there's no point for us to go on with this argument. Also, this argument is off topic here. This is my last post in this thread about this topic. | |
ID: 50028 | Rating: 0 | rate: / Reply Quote | |
The GPU is clocked up to its maximum clock frequency when computing, let's say 1999mhz for Pascal. It takes upwards of 1.0620 volts to maintain this frequency. If you aren't feeding this frequency with data at an acceptable rate, you are technically wasting power because most of the cycles are going to waste. The GPU only draws slightly more current when loading the GPU at the same voltage, thus making the whole process more efficient. | |
ID: 50029 | Rating: 0 | rate: / Reply Quote | |
Is our crunching work directly used for a cure / medicine, or is it just published as theoretically simulated / calculated results? | |
ID: 50030 | Rating: 0 | rate: / Reply Quote | |
Toni said (23 Jul 2018 | 8:40:32 UTC): As far as I know a new app was uploaded but still not working. Licensing is not related to BOINC. James C. Owens said (23 Jul 2018 | 19:29:45 UTC): BOINC provides the capabilities/procedures to run a BOINC application in a test environment outside of the normal download/upload process. The licensing problem has nothing to do with BOINC, otherwise other projects would be failing right and left... Exactly! I've been told that BOINC provides tons of tools for figuring where and why failures happen. And that link seems very useful. https://boinc.berkeley.edu/trac/wiki/AppDebug Also, if the admins are looking for help in ways to solve problems or improve BOINC, they might post to the boinc_projects email list: https://boinc.berkeley.edu/trac/wiki/EmailLists Regards, Jacob | |
ID: 50031 | Rating: 0 | rate: / Reply Quote | |
Is our crunching work directly used for a cure / medicine, or is it just published as theoretically simulated / calculated results? Keep in mind, both are useful. Other researchers can use simulated protein folding to calculate what to do for their drug. | |
ID: 50032 | Rating: 0 | rate: / Reply Quote | |
Is our crunching work directly used for a cure / medicine, or is it just published as theoretically simulated / calculated results? The short answer is no to the first part and yes to the second. One of its purposes is as a teaching tool for PHD students. If they discovered a method or an insight that was commercially valuable and helped the biomedical industry they would patent it and sell or license it. In the meantime they produce scientific papers with methods or insights that gets the student their PHD or not. The best you can hope for as far as a cure is concerned is that the simulations may point the way for someone else to explore or that one of their successful PHD students goes on in later years to make a difference such as finding a real cure for cancer or other major disease. But really, they are never going to run anything seriously groundbreaking on your computer ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 50033 | Rating: 0 | rate: / Reply Quote | |
The short answer is no to the first part and yes to the second. Well, my answer to the first part would be a little more optimistic, say "not...yet", as it is all about computing power. Imagine, modern high end GPUs are now as powerful as super-computers back in the year 2000. Still too slow to handle big proteins, but there is some progress. The upcoming Turing Generation seems to be again 20-40% faster than its predecessor Pascal and this will continue until tunnel effects obstruct further shrinks. Having said this, there are some new technologies in development to reduce that effects. And of course Quantum Computers will be in the ascendant in a couple of years, as big companies like IBM, Microsoft or Google put a lot of capital in it (but for reasons other than drug science). Lets keep on crunching and see where this road goes to. One is for sure, computers science and medicine will be entirely different in 10 years from now. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 50034 | Rating: 0 | rate: / Reply Quote | |
As I wrote, there is some progress... | |
ID: 50036 | Rating: 0 | rate: / Reply Quote | |
It appears to use a 4 qubits quantum computer, probably provided by Google since one of the authors is a Goggle person. | |
ID: 50037 | Rating: 0 | rate: / Reply Quote | |
Eh, I would not go as far as to say it's mostly a tool for PhD students as BettingSlip mentioned (although I'm sure he didn't mean it in a negative way). The theoretical research being published is used to progress science and the specific field, it's not like this work ends up as fluff for a PhD thesis. | |
ID: 50038 | Rating: 0 | rate: / Reply Quote | |
Eh, I would not go as far as to say it's mostly a tool for PhD students as BettingSlip mentioned (although I'm sure he didn't mean it in a negative way). The theoretical research being published is used to progress science and the specific field, it's not like this work ends up as fluff for a PhD thesis. Good clarification - then it's just the way I already assumed it to be. | |
ID: 50040 | Rating: 0 | rate: / Reply Quote | |
In the realm of non-profit research, it is entirely collaborative. Even if what someone is researching doesn't seem like it would make a difference, what they discovered could be the holy grail for another researcher team. You see this time and time again throughout our scientific history. | |
ID: 50045 | Rating: 0 | rate: / Reply Quote | |
Question related to the thread-title: | |
ID: 50047 | Rating: 0 | rate: / Reply Quote | |
Question related to the thread-title: I know you're not directing your question at me but as far as XP is concerned read this post https://gpugrid.net/forum_thread.php?id=4552&nowrap=true#46982 ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 50048 | Rating: 0 | rate: / Reply Quote | |
In the realm of non-profit research, it is entirely collaborative. Even if what someone is researching doesn't seem like it would make a difference, what they discovered could be the holy grail for another researcher team. You see this time and time again throughout our scientific history. I am electrical engineer and software developer myself, so I'm aware of how development processes take place in general. Also I know, that our work is / can be helpful, otherwise I would not be here, of course - but this did not answer my question / is nothing new to me. But then finally Stefan answered my question perfectly. I would like to see direct drug design in the future :) | |
ID: 50049 | Rating: 0 | rate: / Reply Quote | |
I would like to see direct drug design in the future :) As exciting as that prospect might sound to all contributors to public distributed computing projects such as this one it will never happen. The reasons for this are many and you may like to read this https://sciencenode.org/feature/isgtw-opinion-volunteer-computing-grid-or-not-grid.php There is always the inconvenient fact that few (if not all) scientists have little confidence in public distributed computing. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 50050 | Rating: 0 | rate: / Reply Quote | |
Question related to the thread-title: I know this thread and it's content. Still I don't stop hoping that they may have changed their mind and provide once more an app for XP. So, we'll see ... | |
ID: 50051 | Rating: 0 | rate: / Reply Quote | |
Question related to the thread-title: The only reply I can give to someone who asks a question that they already have read the official answer to is "dream on" ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 50052 | Rating: 0 | rate: / Reply Quote | |
There is always the inconvenient fact that few (if not all) scientists have little confidence in public distributed computing. But confidence in what? It may not be used for developing commercial drugs, but for basic science the results could be quite useful. That really depends on how relevant the questions are that the researchers are asking. The real limitation is that in the academic world, they may not know what issues to investigate that are most relevant. A tie-in between the university and industry (as in an advisory board) might help that. | |
ID: 50053 | Rating: 0 | rate: / Reply Quote | |
Well guys... It doesn't work that way. Science as a "whole thing" is so complex and unpredictable that we can't even assume what is going to be important and what is not. | |
ID: 50056 | Rating: 0 | rate: / Reply Quote | |
Well guys... It doesn't work that way. Science as a "whole thing" is so complex and unpredictable that we can't even assume what is going to be important and what is not. +1 | |
ID: 50057 | Rating: 0 | rate: / Reply Quote | |
We also had nuclear bombs, nuclear fission reactors and, hopefully, fusion reactors. Science can make both good and bad fruits. | |
ID: 50058 | Rating: 0 | rate: / Reply Quote | |
We also had nuclear bombs, nuclear fission reactors and, hopefully, fusion reactors. Science can make both good and bad fruits. Yea, but relax ... we are the good guys. Claims to the contrary are FAKE NEWS! ;) ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 50059 | Rating: 0 | rate: / Reply Quote | |
Are we? I read the Bulletin of the Atomic Scientists. USA have 4850 nuclear warheads, 160 of which are stored in Italy, 20 at an air base 200 km from my home. Russia has the same amount, and I don't mention UK,France,China, India,Pakistan and Israel. | |
ID: 50060 | Rating: 0 | rate: / Reply Quote | |
Are we? I read the Bulletin of the Atomic Scientists. USA have 4850 nuclear warheads, 160 of which are stored in Italy, 20 at an air base 200 km from my home. Russia has the same amount, and I don't mention UK,France,China, India,Pakistan and Israel. We can withdraw the U.S. ones from Europe. Then the Russians will be free to move in theirs. | |
ID: 50061 | Rating: 0 | rate: / Reply Quote | |
I don't think UK and France would allow it. They have nuclear warheads too. | |
ID: 50062 | Rating: 0 | rate: / Reply Quote | |
Let's keep the explosions on-topic -- Let's talk about exploding GPUGrid apps and tasks :) | |
ID: 50063 | Rating: 0 | rate: / Reply Quote | |
Let's keep the explosions on-topic -- Let's talk about exploding GPUGrid apps and tasks :) Why not keep it totally on topic -- shutup and wait for an announcement that the damn thing is fixed!! | |
ID: 50064 | Rating: 0 | rate: / Reply Quote | |
hi Gianni | |
ID: 50065 | Rating: 0 | rate: / Reply Quote | |
Let's keep the explosions on-topic -- Let's talk about exploding GPUGrid apps and tasks :) That was a bit unnecessarily rude. I also hope that they can fix it. | |
ID: 50066 | Rating: 0 | rate: / Reply Quote | |
@ GDF, Toni, Stefan ... | |
ID: 50068 | Rating: 0 | rate: / Reply Quote | |
We are working round the clock to restore it... sorry for the delay. It's not easy. By the way the cuda65 app should be ok, although there are no WUs. | |
ID: 50069 | Rating: 0 | rate: / Reply Quote | |
We know you're doing your best, be strong and good luck! | |
ID: 50070 | Rating: 0 | rate: / Reply Quote | |
We are working round the clock to restore it... sorry for the delay. It's not easy. By the way the cuda65 app should be ok, although there are no WUs. It works on windows xp. http://www.gpugrid.net/result.php?resultid=18260692 | |
ID: 50072 | Rating: 0 | rate: / Reply Quote | |
It should work as before. I sent some test WUs. Work to come. | |
ID: 50073 | Rating: 0 | rate: / Reply Quote | |
Toni, it seems to work well - both for Windows 10 and Windows XP :-))) | |
ID: 50074 | Rating: 0 | rate: / Reply Quote | |
...and this thread is back on topic ;) | |
ID: 50076 | Rating: 0 | rate: / Reply Quote | |
Got one on a Windows box: 7/27/2018 6:35:48 AM | GPUGRID | Aborting task e37s19_e36s5p0f20-ADRIA_FOLDT1019_v2_predicted_pred_ss_contacts_50_T1019s1_4-0-1-RND2194_2: exceeded elapsed time limit 7288.00 (250000000.00G/34302.98G) | |
ID: 50078 | Rating: 0 | rate: / Reply Quote | |
What did you guys have to do to fix the application? | |
ID: 50080 | Rating: 0 | rate: / Reply Quote | |
Toni, it seems to work well - both for Windows 10 and Windows XP :-))) Iam afraid I was to early with my above statement :-( The task on the Windows 10 machine broke off after 8.963 seconds with: 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED exceeded elapsed time limit 8947.10 (250000000.00G/27942.01G) for more details, see here: http://gpugrid.net/result.php?resultid=18262430 | |
ID: 50081 | Rating: 0 | rate: / Reply Quote | |
The same here. | |
ID: 50082 | Rating: 0 | rate: / Reply Quote | |
The same here http://www.gpugrid.net/result.php?resultid=18262508 <core_client_version>7.6.33</core_client_version> other task http://www.gpugrid.net/result.php?resultid=18262784 | |
ID: 50084 | Rating: 0 | rate: / Reply Quote | |
I think something (either failures, or likely recent short tasks) made some machine over-optimistic about its own fp-ops. As a consequence, BOINC estimated that tasks could be run in a few hours, which is untrue. | |
ID: 50086 | Rating: 0 | rate: / Reply Quote | |
I think something (either failures, or likely recent short tasks) made some machine over-optimistic about its own fp-ops. As a consequence, BOINC estimated that tasks could be run in a few hours, which is untrue. Can't be true, my machine have only run GpuGrid long WU's and have the same problem. Anyway GpuGrid was my last BOINC project and I have decided to hang up my BOINC boots. The satisfaction of contributing has just left me. Good Luck to all. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 50087 | Rating: 0 | rate: / Reply Quote | |
Try re-running the benchmarks. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4273 I guess I gathered what you mean, but this machine has not run any BOINC tasks in the meantime. So there should not be any (too short) runtime values somewhere deep in BOINC. Anyway, followed this advise (shown in your link) with a newly downloaded GPUGRID task: You can help yourself out of this situation by increasing <rsc_fpops_bound> of Sixtrack tasks 1000 times larger or possible even more However, I increased the value by the factor 10, guess this should be sufficient. So, I'll see what happens | |
ID: 50088 | Rating: 0 | rate: / Reply Quote | |
Yeah! Two WUs downloaded. I love the sound of my GPU fans spinning up. | |
ID: 50089 | Rating: 0 | rate: / Reply Quote | |
Anyway GpuGrid was my last BOINC project and I have decided to hang up my BOINC boots.The satisfaction of contributing has just left me. It's a pity... | |
ID: 50090 | Rating: 0 | rate: / Reply Quote | |
Have you also tried selecting the "re-run benchmarks" (or something) menu option? | |
ID: 50091 | Rating: 0 | rate: / Reply Quote | |
[ You are right. Don't know what so say except that it's frustrating on this side too. There is an excess of hidden state and undocumented checks. My hope is that it will resolve by itself at some point (maybe resetting the project). | |
ID: 50092 | Rating: 0 | rate: / Reply Quote | |
Do you require help from some of the BOINC devs? They're pretty responsive on the BOINC Projects email group, and if there is some sort of transparency problem, they'd want to hear about it. | |
ID: 50094 | Rating: 0 | rate: / Reply Quote | |
Do you require help from some of the BOINC devs? They're pretty responsive on the BOINC Projects email group, and if there is some sort of transparency problem, they'd want to hear about it. I've been sitting in the same conference room as about 25 BOINC developers for the last three days. If someone had called, we could have answered... But today was the group walk in the Oxfordshire countryside, and we meet in an hour for our final group meal before they get their 5 am flights home. I have a simple 200 mile drive home before I'm reunited with my GPUs - I'll look at it Sunday, report Monday, Running benchmarks won't solve it. because they measure the CPU speed, and this is a GPU app. But you're on the right lines - the initial speed estimate will be low, and the quickest workround will be to increase the <rsc_fpops_bound> for the new v9.22 app by a factor of at least 10 and perhaps 100. You may have to generate new workunits with the uprated bound. Runtime estimates will almost certainly appear to users as vastly inflated in the initial stages, but hang in there - they will become 'accurate' (-ish) after the first 11 completed tasks. More when I can eyeball it. | |
ID: 50095 | Rating: 0 | rate: / Reply Quote | |
Thinking about it in the shower, that's the wrong way round - apps faster than expected shouldn't cause a problem. | |
ID: 50098 | Rating: 0 | rate: / Reply Quote | |
No, I just started a 4 day vacation, sorry. | |
ID: 50099 | Rating: 0 | rate: / Reply Quote | |
The interesting thing is that this problem does NOT come up in the cuda65 app (for Windows XP), but only in the cuda80 app (for Windows10). | |
ID: 50100 | Rating: 0 | rate: / Reply Quote | |
I've downloaded 2 Long run tasks on my Windows 10 PC and one is running. It seems to run OK but,according to the Task manager, it seems to use both the CPU and GPU (GTX 1050 Ti), very scarcely compared to SETI@home GPU tasks on the same host. | |
ID: 50101 | Rating: 0 | rate: / Reply Quote | |
What I noticed so far is that with the Cuda_80 app the GPU load now is about 75%, whereas for the same type of task, crunched on WinXP with the Cuda_65 app, the GPU load is between 96% and 98% (like is was with the former Cuda_80 app, too). | |
ID: 50102 | Rating: 0 | rate: / Reply Quote | |
Just had 2 cuda 8 GPU work unit error out due to the time exceeded on my 1080TIs on Win 7. | |
ID: 50103 | Rating: 0 | rate: / Reply Quote | |
happens to me also | |
ID: 50104 | Rating: 0 | rate: / Reply Quote | |
Thinking about it in the shower, that's the wrong way round - apps faster than expected shouldn't cause a problem. Do you need this?: <app_version>
<app_name>acemdlong</app_name>
<version_num>922</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>1.000000</avg_ncpus>
<flops>43004890022276.586000</flops>
<plan_class>cuda80</plan_class>
<api_version>6.7.0</api_version>
...
<coproc>
<type>NVIDIA</type>
<count>1.000000</count>
</coproc>
<gpu_ram>512.000000</gpu_ram>
<dont_throttle/>
</app_version>
<workunit>
<name>e38s4_e29s9p0f212-ADRIA_FOLDT1015_v2_predicted_pred_ss_contacts_50_T1015s1_3-0-1-RND4166</name>
<app_name>acemdlong</app_name>
<version_num>922</version_num>
<rsc_fpops_est>5000000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>250000000000000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>300000000.000000</rsc_memory_bound>
<rsc_disk_bound>4000000000.000000</rsc_disk_bound>
...
</workunit>
I've got lost in that many zeroes, so I've cut 12 of them: App flops: 43e12 rsc_fpops_est: 5 000e12 rsc_fpops_bound: 250 000 000e12 The outcome will be here. I think it will succeed. Judging by the previous error message: exceeded elapsed time limit 5659.12 (250000000.00G/43782.46G) the rsc_fpops_bound was only 250 000e12 before. | |
ID: 50105 | Rating: 0 | rate: / Reply Quote | |
That's good start - thanks. | |
ID: 50106 | Rating: 0 | rate: / Reply Quote | |
The errors on that machine earlier today were after ~5,670 seconds - maybe they took my advice while I was out, and upped it by three orders of magnitude?That is my conclusion too (see the end of my previous post). | |
ID: 50107 | Rating: 0 | rate: / Reply Quote | |
The errors on that machine earlier today were after ~5,670 seconds - maybe they took my advice while I was out, and upped it by three orders of magnitude?That is my conclusion too (see the end of my previous post). How much longer will they need to let tasks run before they get enough information to fix the problem? It looks like one more order of magnitude for run time should at least give them more information. Also, users might help by mention whether their tasks were able to write a checkpoint, and then continue after this. | |
ID: 50108 | Rating: 0 | rate: / Reply Quote | |
No help here; | |
ID: 50109 | Rating: 0 | rate: / Reply Quote | |
It's currently running on my Windows 10 w/ 1080ti. 86.4 % complete in 14:31 (m:s). It's an ADRIA job. So the jobs are running much faster than they did before. I leave that up to your interpretation. That job took 16:30 to complete about the same as the job that ran before it. Now starting on job 3. This one's a PABLO and took 2:01 to reach 1%. 2% done and estimate is 2:07:20 (and falling) to completion. That's good start - thanks. | |
ID: 50110 | Rating: 0 | rate: / Reply Quote | |
No help here;The workunits generated with the improper <rsc_fpops_bound> will be around until they error out. I have two such workunits on my host, so I've manually edited the client_state.xml file to have the right <rsc_fpops_bound> value. the method of this fix: 1. exit BOINC manager 2. windows key + r 3. type or copy and paste: notepad c:\ProgramData\BOINC\client_state.xml 4. press <ENTER>5. CTRL + H 6. search field: <rsc_fpops_bound>250000000000000000.000000</rsc_fpops_bound> 7. replace field: <rsc_fpops_bound>250000000000000000000.000000</rsc_fpops_bound> 8. it should replace as many times as the number of GPUGrid tasks on the given host9. save and exit notepad 10. restart BOINC manager | |
ID: 50111 | Rating: 0 | rate: / Reply Quote | |
Thanks | |
ID: 50112 | Rating: 0 | rate: / Reply Quote | |
No go | |
ID: 50113 | Rating: 0 | rate: / Reply Quote | |
No go exit code -80 is a Driver Issue (OpenCL Missing) as can also be C++ Runtimes issue maybe even missing. You need both the x86 (32Bit) and the x64 Bit versions. As well as unstable GPU and or CPU. This is not the same issue as "Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED" ____________ Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community. | |
ID: 50114 | Rating: 0 | rate: / Reply Quote | |
SETI@home tasks complete using opencl_nvidia_SoG. Temperature using Thundermaster is 66 C. | |
ID: 50115 | Rating: 0 | rate: / Reply Quote | |
Please start a new thread for the Simulation Unstable issue, if you must. It typically means your GPU is overclocked too much, and this project pushes it harder than other projects. If you want help determining a max stable overclock, PM me and be patient. | |
ID: 50116 | Rating: 0 | rate: / Reply Quote | |
MY GPU is not overclocked. I never overclock. | |
ID: 50117 | Rating: 0 | rate: / Reply Quote | |
How much longer will they need to let tasks run before they get enough information to fix the problem? Do we have to do this from now on, on each GPU Task? From what I saw yesterday, somehow the system got itself into a state where it thought our machines were much faster than they really are. 'machine speed' comes from one of two places: either the aggregate returns across the whole project, or the actual behaviour of each individual computer. The speed of the individual computer takes over in the end - after 11 tasks have made it all the way through and been validated. So "11 times per computer" should be the maximum number of manual interventions required. But since they seem to have put in a workround for the faulty kill-switch, you may not have to do it that many times, or even at all. Because work is now being completed properly, the system-wide speed assessment will be correcting itself at the same time, so that machines which have been inactive while waiting for the new app may never even see the problem. But it's hard to predict when that will kick in: I may find out when I get home. As Retvari has pointed out, there will be faulty workunits circulating around the system for a while yet, and they are a problem because they waste resources for a significant length of time. Those are the ones it is most helpful to patch via the file edit: once they have been completed and validated, they won't come back to haunt us again. | |
ID: 50119 | Rating: 0 | rate: / Reply Quote | |
To summarize: the problem AFAIK were the test WUs, sent without changing the ops estimate. I now cancelled them all, and temporarily raised the OPS bound by 10^3. | |
ID: 50123 | Rating: 0 | rate: / Reply Quote | |
To summarize: the problem AFAIK were the test WUs, sent without changing the ops estimate. I now cancelled them all, and temporarily raised the OPS bound by 10^3. That sounds good. I agree with you about the cause, and the workround will let the system clean itself out with no further intervention. Just one final task: buy a 2019 calendar, and put a big red circle round the next licence expiry date! (or perhaps a month before...) I think you once said that the rsc_fpops_est was fixed by the workunit generation script: it might be a good idea to start thinking about making it easier to vary that. But not this weekend - take some time off! | |
ID: 50124 | Rating: 0 | rate: / Reply Quote | |
To summarize: the problem AFAIK were the test WUs, sent without changing the ops estimate. I now cancelled them all, and temporarily raised the OPS bound by 10^3.I still received a task which has the lower rsc_fpops_bound value. So we should watch these workunits carefully (and fix those which have the lower rsc_fpops_bound) until they've cleared out from the scheduler. | |
ID: 50125 | Rating: 0 | rate: / Reply Quote | |
Unfortunately rsc ops values can't be changed once the task is created. I'm waiting that the newly created tasks make the flops estimate return to normal, and then the old tasks should work as well. | |
ID: 50126 | Rating: 0 | rate: / Reply Quote | |
Thanks Retvari Zoltan for your fix as for the most part worked for me. | |
ID: 50128 | Rating: 0 | rate: / Reply Quote | |
I have noted that work units have been coming through beginning on the 27th however they all seem to be failing. Refer to a sample system here: https://www.gpugrid.net/results.php?hostid=176801 | |
ID: 50129 | Rating: 0 | rate: / Reply Quote | |
Could the cause of this (197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED) been the new version of the Client Software 7.12.1? That's a firm NO. I am currently closely involved in the preparation, testing, and releasing of new client versions. The new client was released well before this problem arose, and (in this respect) the new client works exactly the same as previous ones, going back several releases. We've now got a pretty clear handle on the release of GPUGrid application 9.22 as the culprit, though I will still test my own machines as I start each of them back up (which will happen after the next transfusion of coffee - only just got back home). | |
ID: 50130 | Rating: 0 | rate: / Reply Quote | |
OK, I've started the first. | |
ID: 50131 | Rating: 0 | rate: / Reply Quote | |
All my machines have now completed tasks without error and without manual intervention. I think we're out of the woods. | |
ID: 50136 | Rating: 0 | rate: / Reply Quote | |
Seems that we need to remove the settings now or Reset the Project. | |
ID: 50209 | Rating: 0 | rate: / Reply Quote | |
All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS | |
ID: 50211 | Rating: 0 | rate: / Reply Quote | |
If I try to run directly from a slot, I get this : D:\BOINC\data\slots\9>acemd-922-80.exe Any ideas ? | |
ID: 50214 | Rating: 0 | rate: / Reply Quote | |
I am glad not to be the only one. This is what I was getting. | |
ID: 50215 | Rating: 0 | rate: / Reply Quote | |
# GPU 0 : 78CThis message is usually the sign of too high GPU clocks and / or too high GPU temperature (Yes, 78°C could be high). You should use some 3rd party GPU monitoring software (like MSI Afterburner) to: 1. increase the GPU fan speed, 2. reduce the power target of your GPU 3. reduce GPU clock frequency. This error message has nothing to do with the new Windows app. | |
ID: 50216 | Rating: 0 | rate: / Reply Quote | |
All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS Have you installed BOINC manager in "protected application execution" mode? (as a system service?) If you did so, you should uninstall it, and reinstall without this setting. | |
ID: 50217 | Rating: 0 | rate: / Reply Quote | |
. The same app on my SUSE Linux box with a GTX 750 Ti board runs at 62 C. Tullio | |
ID: 50218 | Rating: 0 | rate: / Reply Quote | |
All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS No ... other projects are working fine. See my post after the one you quoted with the real error ... | |
ID: 50219 | Rating: 0 | rate: / Reply Quote | |
The same GPU board runs SETI@home GPU tasks at 71 C, fan speed 50%, clock 1695 MHz and no error. Tullio | |
ID: 50221 | Rating: 0 | rate: / Reply Quote | |
The same GPU board runs SETI@home GPU tasks at 71 C, fan speed 50%, clock 1695 MHz and no error.That's irrelevant. The GPUGrid app is much harder on GPUs than other apps, partly because it's based on CUDA8.0, while the other apps based on earlier CUDA versions. | |
ID: 50222 | Rating: 0 | rate: / Reply Quote | |
Not necessarily true. The Seti Linux CUDA9 app runs gpus a lot harder than the stock OpenCL application. I don't see more than 62° C. on my air cooled cards. | |
ID: 50223 | Rating: 0 | rate: / Reply Quote | |
The SETI@home GPU tasks run on opencl_nvidia_SoG | |
ID: 50224 | Rating: 0 | rate: / Reply Quote | |
Open_Cl | |
ID: 50225 | Rating: 0 | rate: / Reply Quote | |
You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster. | |
ID: 50226 | Rating: 0 | rate: / Reply Quote | |
You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster. I run what does not fail. Times are not important. I am running SETI@home on a ulefone smart watch, on a Linux box and a Windows 10 PC. Tullio | |
ID: 50227 | Rating: 0 | rate: / Reply Quote | |
Of course ulefone is a smart phone, not a smart watch as I wrote. It runs Android 7.1.1 and has also a GPU which SETI sees but Einstein does not. Or maybe their BOINC servers. It has eight processors and a 4 GB RAM. | |
ID: 50229 | Rating: 0 | rate: / Reply Quote | |
You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.I don't see any CUDA8 or CUDA9 apps on the list of SETI@home applications. The highest CUDA version used for Linux is 6.0. | |
ID: 50231 | Rating: 0 | rate: / Reply Quote | |
You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster. SETI has a long history of encouraging volunteer developers to improve their stock applications. The best of the resulting applications (with high reliability and high validations rates) are accepted as new stock applications - the opencl_nvidia_SoG application mentioned earlier is one such. The cuda8 and cuda9 apps are candidates, but haven't yet reached a sufficient level of acceptance to be deployed as stock. | |
ID: 50232 | Rating: 0 | rate: / Reply Quote | |
I run what does not fail. Times are not important.Then it would fit the above ideas if you would lower the power target and/or the clock frequency of your GTX 1050 Ti to make it stable with the GPUGrid app, right? | |
ID: 50234 | Rating: 0 | rate: / Reply Quote | |
I am not a GPU expert and uses default values both on the 1050 Ti on the Windows 10 PC and 750 Ti on the Linux box. This last runs GPUGRID GPU tasks with no problem, so I leave 1050 Ti to run SETI@home tasks. | |
ID: 50235 | Rating: 0 | rate: / Reply Quote | |
hello to all the guys in a while comes the super NVIDIA GeForce GTX 1180 https://www.techpowerup.com/gpudb/3224/geforce-gtx-1180 I can not wait for me that you think of this new graphics card ? | |
ID: 50236 | Rating: 0 | rate: / Reply Quote | |
This thread is supposed to be about a license expiring, and how that broke Windows GPU applications. | |
ID: 50237 | Rating: 0 | rate: / Reply Quote | |
You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster. I'm curious as to where the threshold is "for a sufficient level of acceptance" for the CUDA special apps. What is the target? I have less than a 2.5% ratio of Inconclusives to Valid tasks. I think the stated goal for the science apps is less than a 5% Inconclusive ratio. On my systems, I believe I have reached a "sufficient level of acceptance". I see no reason not to have the zi3v special app qualify for stock. | |
ID: 50238 | Rating: 0 | rate: / Reply Quote | |
You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster. It's not the performance on any one machine - yours, or anybody else's. You would have to convince Eric Korpela (and nobody else) that the overall validation rate, across all computers that might be eligible - under the rules of eligibility that you will have to supply him with - to download the app, will be acceptable within the project's standards. Which I don't know, but Eric does. My personal validation rate at this moment is 17 inconclusive from 1038 valid, with the SoG app on NVidia under Windows. Previous experience tells me that the inconclusives are usually against wingmates running 'the usual suspects' - yup, there's a v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin in there. That's why offline bench testing against known good reference results is so important - it eliminates the variability of unverified wingmates. | |
ID: 50239 | Rating: 0 | rate: / Reply Quote | |
Hello, | |
ID: 50320 | Rating: 0 | rate: / Reply Quote | |
Hello, See this thread. https://www.gpugrid.net/forum_thread.php?id=4822 | |
ID: 50328 | Rating: 0 | rate: / Reply Quote | |
Message boards : News : Windows GPU Applications broken