Message boards : Graphics cards (GPUs) : Development BOINC 6.10.7 released
Author | Message |
---|---|
Another new one. Currently just for Windows. Below is the offical change log. Change Log: ____________ BOINC blog | |
ID: 12703 | Rating: 0 | rate: / | |
Fedora 11 x86_64 RPMS | |
ID: 12708 | Rating: 0 | rate: / | |
Well they seem to have splatted the cuda preempting bug in this version. | |
ID: 12711 | Rating: 0 | rate: / | |
Well they seem to have splatted the cuda preempting bug in this version. There have been problems with the amount of work fetched, both here and at AQUA, because of inaccurate project estimates and DCF interactions. I read Paul's report, and it was unclear whether this possibility had been ruled out in the case he quoted. | |
ID: 12712 | Rating: 0 | rate: / | |
Well they seem to have splatted the cuda preempting bug in this version. I think this mornings information rules that out... There are at least two remaining problems ... one, which may or may not be minor, I see inconsistent updates to the debt numbers. This is for ATI GPUs only in that I have not tried this version on my CUDA systems with multiple GPU projects (all my CUDA to this point has been aimed at GPU Grid). This is the main issue I tried to document since yesterday and this morning, in part because I think it more likely that they will address it ... The second and to my mind larger problem is that the aim to now is to run GPU tasks in FIFO order. I am not clear as to why this was done as I could not follow the arguments, but, it seems to me that some of it was because of the two bugs Richard mentioned. Both now thankfully dead. This second situation arises with MW and Collatz because of the disparities in run times 00:52 MW and 17:09 Collatz and because MW restricts the downloads to 24 (on my system) ... the net effect is that Collatz will download up to 90 tasks (25 hours run time) and MW 24 minutes ... run that in strict order and my 800 to 25 resource share is inverted ... to say the least ... You can only see this if: - you watch the execution patterns: UCB doesn't; or, - You wade through 32M logs (with debugs turned on you get lots of stuff): UCB doesn't, or, - You trust those that report: UCB doesn't, or, - You think about the descriptions of execution patterns, UCB hasn't, at least not yet ... I suspect that this is going to be an issue with almost any selection of GPU projects if you pay attention. I suspect that it will be worse with projects that have task limits MW and GPU Grid though execution time disparities is more likely to be the driving issue. When I get my GTX 280 card back today and running again I think I am going to turn one of my systems from dedicated to GPU Grid to a split system and then see where it goes. In this case, I may share it with MW, Collatz, and GPU Grid and see if it runs balanced or not ... I suspect not ... Last point, to this recent point in history there has been almost no one that has been running more than one GPU project at a time on a system. We are only now able to attach to multiple GPU capable projects and so this is virgin territory. For CUDA the only real choice (IMHO) has been GPU Grid, others likely opted for only SaH... but now one has GPU Grid, MW and Collatz as CUDA choices and MW and Collatz as ATI choices ... and now we are seeing the issues ... {edit - add} Oh, and if you have been avoiding 6.10.4 through .6 I am moving my recommendation to "suggested" over 6.10.3 ... to play safe stay with 6.10.3 till I have another couple days ... but 6.10.7 looks like the next stable version that is usable ... | |
ID: 12720 | Rating: 0 | rate: / | |
There remains the work fetch issue regarding GPU and CPU mixes - 6.10.x (including 6.10.7) STILL goes looking for CPU tasks from GPU projects (including GPUGRID and Collatz) and still goes looking for GPU tasks from CPU only projects (Spinhenge, POEM, etc.). That strikes me as something which could and should be fixed. Seems that few have observed this, or that any who have don't seem bothered by it -- at least on the developer side.
| |
ID: 12725 | Rating: 0 | rate: / | |
There remains the work fetch issue regarding GPU and CPU mixes - 6.10.x (including 6.10.7) STILL goes looking for CPU tasks from GPU projects (including GPUGRID and Collatz) and still goes looking for GPU tasks from CPU only projects (Spinhenge, POEM, etc.). That strikes me as something which could and should be fixed. Seems that few have observed this, or that any who have don't seem bothered by it -- at least on the developer side. Well, how do you test-out to see if a project has added support for new hardware, if you don't ask about this ocassionally? Currently the only way client knows this is to send a scheduler-request for work. And, since projects now can give upto 4 weeks deferral for a hardware-resource they don't currently have any application for, there isn't really any big reason to change client further. WCG is now using this new functionality, and last GPU-request included this in scheduler-reply: <cuda_backoff>604800</cuda_backoff> <ati_backoff>604800</ati_backoff> v6.10.7 immediately detected this, and next GPU-request to WCG is deferred for 7 days, as told by WCG's scheduling-server. Hmm, not sure, but it can look like the deferrals will be included in all Scheduler-replies, except then server is down. If so, as long as client is ocassionally connecting to ask for supported work or report results or send trickle-up, it will never ask for the unsupported work, except if you manually hits "update"... So, it's just for the various projects to add the neccessary functionality to their server, and choose upto a 4-week deferral... | |
ID: 12727 | Rating: 0 | rate: / | |
Well, how do you test-out to see if a project has added support for new hardware, if you don't ask about this ocassionally? Currently the only way client knows this is to send a scheduler-request for work. You use the publish model. Instead of having millions of requests constantly asking about something that may never happen ... you publish to the client new capabilities. If you need to make the client aware of a new capability ... But, the bottom line is, I really don't want a project deciding what it is going to run where ... and it is senseless for BOINC to be asking this "question" over and over and even this fix is lame ... they made a flawed decision and instead of acknowledging that, come up with this lame "well, the project can increase the back off" ... the client should not be asking in the first place. When the project comes up with a new capability the publish it in the news as they should and then I will make the adjustments in the settings on the site and THEN have my client start asking for work for the new resource. Today I decided to try the CUDA app at MW and when I turned on my machine to get some work it took I thin 6 to 8 CPU work requests before the client would ask for GPU work ... and I had done the update to tell the client that I would only be running GPU work on this machine ... as BarryAZ notes, this is something that does not make sense ... If you have done any systems work you know that the primary rule is that you do nothing that you do not have to ... you run no module, no test, no code that you do not absolutely need to run ... except in BOINC ... I am having a debate with JM VII on the mailing list about just this subject ... he is appalled that I would want to test his system for quality to make sure that it is returning valid results and in turn find out how fast it is besides ... and he considers that a complete waste ... But this death of a thousand cuts pinging on the servers and running RR SIM and other code in the client as often as 6 or more times a minute he is fine with ... even though there is no real need to do so ... and makes some of the resource scheduing bugs much more severe than they might other wise be ... Anyway ... it is a bad design decision ... but now that Dr. Anderson has had the idea it is going to linger like the smell of my spouse's dead fish ... | |
ID: 12730 | Rating: 0 | rate: / | |
I posted this over on the BOINC client message board in response to Richard's comment of "the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application." | |
ID: 12733 | Rating: 0 | rate: / | |
I posted this over on the BOINC client message board in response to Richard's comment of "the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application." I would note that even with 6.10.x versions people are still having a hard time getting this to work properly if I am reading the messages correctly. Or to put it another way ... it still is not working as advertised. I'd note that he ignores that the other thing I gain by moving back to say 6.4.5 is not only do I ready GET work from projects like POEM or Spinhenge since they only support CPU work and will fail a 'too quick' work fetch (which you generate when you ask for non-existent GPU work), but also, I don't extraneously ping the servers (as Paul noted). He's also ignoring that the current work fetch routine FAILS way too often. I noted this in a post this AM to the Alpha list where, as a consequence of design choices, the support for multiple projects is slowly being compromised for reasons that are not entirely clear to me (well, maybe I am stupid, but I still don't see it). And options that would allow the participant more control have been rejected (like being able to say don't try to get more than one CPDN task at a time (though when in the last few days overlap would be acceptable) ... But as Paul noted, this no doubt is a losing effort regarding the client, and since the multi-project concept of BOINC (hello BOINC developers) means it makes sense to have multiple GPU and multiple CPU projects attached, I find myself compelled to use the later troublesome, noisy client unless I am either CPU only (5.10.45 is lovely there), or GPUGrid as a single CUDA project (6.4.5) is OK here UNLESS GPUGrid forces CUDA 2.3). I will note on the other hand that judgingby some of the complaints (historically) the way BOINC handles single projects is not that effective either ... though I have not been to the SaH boards in months ... :) That might be what you want to do with 'default users' -- those who install, attach and don't do anything else. There used to be a LOT of those folks back in the old Seti@home days. To the extent that many of them are running BOINC today, I suspect they are using legacy clients and haven't changed much since they set up BOINC on their computers. I talk to a guy who knows a bunch of people that were heavy crunchers in SETI@Home Classic and they never made the transition to BOINC. My hazy memory says that almost half of the raw processing power of the project never made the transition. They opted out so to speak. When you strip away all the noise the fundamental issue was that UCB/Dr. Anderson did not listen to them and their concerns. Note almost all of those people were of the large farm class. They had lots of machines and lots of power and we lost all of it ... Years later and it is the same ... Dr. Anderson and the cohorts at UCB may be smarter than me, they may be smarter than you ... but they are not smarter than all of us put together ... | |
ID: 12747 | Rating: 0 | rate: / | |
A follow up and my response -- from Richard (note his passing comment about the great DA at the end of his post) | |
ID: 12749 | Rating: 0 | rate: / | |
OK -- and sorry about the testiness of my replies as well -- it seems there is that classic 80% of agreement. Um, did I give you the impression I thought you were being testy? My fault as that was not intended. I was just trying to amplify and clarify some points. So I am sorry you think I only agree with 80% of what you are saying ... heck, read my latest on the mailing list ... As to that last point ... Richard is dreaming ... it was a bad design choice, but because UCB never makes mistakes and they slapped in the change to the back-off it is unlikely that this "feature" is going to change ... ever ... | |
ID: 12755 | Rating: 0 | rate: / | |
Paul, I copied that message over from another message board - that 'testy' comment was to Richard -- his post riled me up and he apologized for its tone so I completed the loop and apoligised for my tone -- over there.
| |
ID: 12758 | Rating: 0 | rate: / | |
Paul -- no, you and I agree on this issue almost completely -- that 80% message was to Richard.
| |
ID: 12759 | Rating: 0 | rate: / | |
Paul, here is an example of just how brain dead the current fetch routine isn note, this particular workstation has a 9800GT, and Collatz is configured for GPU only at the project level: | |
ID: 12760 | Rating: 0 | rate: / | |
It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details. | |
ID: 12763 | Rating: 0 | rate: / | |
Yeah I noticed -- but a .8 then .9 bump within 24 hours doesn't give rise to confidence. The thing is, the condition causing the problem I posted here isn't considered a bug that should be fixed. It appears that DA LIKES that sort of situation. It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details. | |
ID: 12766 | Rating: 0 | rate: / | |
It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details. The .9 release ONLY had minor adjustments for ATI configurations and even then there are considerable additional changes that ALSO need to be made to server and the science application(s) before it will be effective ... at least that is what Rom Said on the Collatz board (I think I saw you there ...) As to the rest... Ok, I got confused ... just so you know ... two lonely voices in the wilderness ... but I do think that Richard is one of the good guys ... Rom essentially says that this asking for CPU work or GPU work at the wrong places is not considered a bug at this time ... I pointed out that that *IS* the problem in a nutshell ... UCB does not consider lots of bugs to not be bugs ... As I pointed out in one of my list posts in the last day or so one of the problems is that neither Rom nor Dr. Anderson are heavy users of BOINC ... it is obvious from their comments about issues that this is still true and I know that at least as of last month or so ago (I forget exactly when I had the conversation with someone who is in a position to know) that there is no UCB "lab" where they actually run various versions of BOINC to see what it does ... and does not do ... I mean how am I supposed to take seriously a software development effort that does not appear to use the product that they are developing except casually? | |
ID: 12770 | Rating: 0 | rate: / | |
Further, and I have seen this in other non BOINC technical situations, so I know the IT <> User interaction can be 'suboptimal', I get the sense often enough that any problems active users have with the ever changing client iterations are because the users are not using the client the 'right' way (the 'right' way being the small mini-lab environment the developers apparently are using).
| |
ID: 12800 | Rating: 0 | rate: / | |
Further, and I have seen this in other non BOINC technical situations, so I know the IT <> User interaction can be 'suboptimal', I get the sense often enough that any problems active users have with the ever changing client iterations are because the users are not using the client the 'right' way (the 'right' way being the small mini-lab environment the developers apparently are using). Which, if they had a lab one could even support that mode. If you look at the user stats on Willy's site you will see that the vast majority of participants >50% (IIRC, it has been awhile and it is not something I memorized) run only one project. when you look at small suites of less than 5 projects you cover nearly 90% of all participants. Logic would suggest that the primary focus of BOINC development would be to get it to work to make the single project (or single project with safety project(s)) type users most happy. From there work to make sure that BOINC works well with small suites of projects ... and lastly, to try to make sure that the 3,000 or so of us that run 50+ projects have an adequate tool. And, if the goal is to get more people to run more projects then this should be incentivized. Like a credit bonus based on the amount of cross-poject credit earned each month ... I need to work on that idea! :) Like I've said before, my increased 'noise level' on this has been that in the past I was insulated from much of the 'improvements' foisted on folks with newer client development policy. I didn't have GPU supported workstations -- so I went with the 5.10.45 client -- which works fine for Win2K and XP environments, work fetch is as expected there. The roots of the troubles go all the way back to when the first 4 CPU systems became available. I know, I found a scheduling anomaly and JM VII came up with a fix that was not allowed... we have gone down hill from there ... as more and more changes are piled on and more an more of the original concepts of how BOINC should work are tossed under the bus on account of "because"... When I added a couple of Vista workstations, I first went to the 6.18/6.19 client since they incorporated a change to allow the client to start with boot up and not fight the Vista 'protection scheme'. When I started adding some GPU support (9400GT, 9600GT, 9800GT, 250GS) and GPUGrid, I went to the 6.4.5 client -- it had some work fetch quirks, but generally handled things well enough. I tried the 6.6.36 client and got out of that quickly finding its work fetch schema deeply flawed, notwithstanding the gospel of Dave. I like that ... :) "Gospel" I would have stayed 'dumb and happy' (the way developers often like to keep users) except for the changes over in Collatz -- with them marrying up support for low end video processors including ATI, along with a requirement for CUDA 2.3, I've been compelled to diddle with the 6.10.x series on a number of workstations in mixed project environments (I typically have 6 to 9 projects on a workstation with 4 to 6 of them being configured as CPU only -- or CPU only at the project level, and 2 to 3 configured as GPU only (or like GPUGrid GPU at the project level). This clearly is an environment with which the developers have very scarce experience or awareness. Early to Mid year this year before I hit a two month low spell I demonstrated how some of the internals are being run as often as 6 times a minute on my systems and they are not the fastest or the "widest" though they are faster and wider than most. Add in projects with short run times and you have internal chaos where the running of the Resource Scheduler for all the triggers means that trivial reasons are the cause of the constant reordering of the work schedule. The saddest point is that with a 1 day queue and no project with a deadline less than three or four days in the future means that there is zero schedule pressure ... yet BOINC would go into panic after panic after panic ... The most pathetic thing is that JM VII keeps bringing up a project, now defunct, that had a 6 minute deadline as justification for this lunacy. And I say pathetic because with TSI being 60 minutes the tasks from this mythical project would cost, on average, 30 minutes of processing on running tasks because of preemptions. So, being forced to work with (or against) the 6.10.x client and it 'from above the developers directed force', I've take to 'railing against the machine' and joining you in a number of venues. And I thank you for your support ... :) But history says that it will not matter in the slightest. Sadly the only thing that I think will save BOINC is when and if Dr. Anderson leaves ... I agree he had the one, or two, great idea(s), but that, to my mind, does not excuse the 10,000 blunders that followed ... | |
ID: 12805 | Rating: 0 | rate: / | |
I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. Barry, Check out the BOINC message board again. We have progress: it'll need server updates as well as a new client build, and I can already see some fine tuning needed during testing, but the direction of movement is positive. Berkeley is not deaf, merely hard of hearing! | |
ID: 12858 | Rating: 0 | rate: / | |
I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. And it begs the question what made them change their minds ... in famous pigeon expiriments they rewarded randomly and the pigeons developed elaborate "dances" to get the food pellet ... because it is what works ... just as one of my dogs knows that if she scratches on the automatic door that that is what causes it to open... Now if they will start to address the myriad of other issues that are killing us ... like the strict GPU FIFO rule that negates resource share unless you run with no queue (or a very short one (0.1 days has been working, I have not increased it yet, maybe later this week)... What is saddest is that as best I can tell the FIFO rule was added because of the execution order issues caused by bugs that have since been addressed in 6.10.7 ... sigh ... | |
ID: 12872 | Rating: 0 | rate: / | |
Message boards : Graphics cards (GPUs) : Development BOINC 6.10.7 released