Message boards : Number crunching : Strange really big wrong ETAs on workunits
Author | Message |
---|---|
Recently i see that my hosts show a ETA on the WUs as example 46d so the host does not download another workunits until it finishing out 100% (and the systems are finishing the WUs depending of its compelxity in 1-12h. So it triggers always the backupproject in the upload/downloadphase after 100% timeslot. What can be done to correct this wrong benchmarking from the GPUs in GPUGrid only? | |
ID: 62050 | Rating: 0 | rate: / Reply Quote | |
They set it the way they did to fix something. | |
ID: 62051 | Rating: 0 | rate: / Reply Quote | |
Thx i will try that out! | |
ID: 62052 | Rating: 0 | rate: / Reply Quote | |
Definitely use the fraction exact in an app_info. | |
ID: 62053 | Rating: 0 | rate: / Reply Quote | |
Definitely use the fraction exact in an app_info. some time ago, I applied the above mentioned change in the client_state.xml. However, after a couple of days the value fell back to what it was before :-( So I gave up. | |
ID: 62054 | Rating: 0 | rate: / Reply Quote | |
It reset with benchmarking turned off? KeithM send me the link for app_config which states that you can't even manually benchmark when off. | |
ID: 62055 | Rating: 0 | rate: / Reply Quote | |
benchmarking does not change the DCF value. 0.01 is the minimum value acceptable for DCF in BOINC. if Erich tried setting it lower than that, that's why it didnt stick. ____________ | |
ID: 62056 | Rating: 0 | rate: / Reply Quote | |
no, I did NOT set it lower | |
ID: 62057 | Rating: 0 | rate: / Reply Quote | |
I just did CPU benchmark even though I have it turned off and it ran anyway. My ETA for a future task went from 36d 8h to 89d 19h after the benchmark. Prior to the benchmark, my dcf was 0.686320 and now its 1.696240. I wasn't expecting it to run but thankfully didn't trash my 3.5h running task. | |
ID: 62058 | Rating: 0 | rate: / Reply Quote | |
Quick follow-up: the above experiment was on a Win11 core laptop. That one task did finish thankfully. | |
ID: 62059 | Rating: 0 | rate: / Reply Quote | |
Quick follow-up: the above experiment was on a Win11 core laptop. That one task did finish thankfully. Why would you expect it to? Only the task itself, the application and the host compute performance and loading determines how long a task takes to finish computation. All the DCF does is affect the way that BOINC attempts to estimate each tasks estimated computation time. BUT, DCF applies across the ENTIRE project, meaning ONE DCF value applies to ALL tasks sub-types. So, every time you change tasks subspecies, the client/scheduler combination has to recompute the DCF value. Run a long running task type and the next time you run a short running task type the estimates times are skewed wildy. Follow a run of short running tasks by the next long running species and the DCF is again wildly skewed in the other direction. If there was a DCF value applied to EACH sub-species of tasks, the estimated times would stabilize and be pretty much on the spot. But BOINC server code on GPUGrid does not allow that. So we have to just accept that on projects that use the DCF mechanism in their server code, and run many different sub-species of tasks, you will get DCF values that ping-pong back and forth and estimated completion times will never be correct. The most we a user can to is set the DCF values the lowest the BOINC code limits allow. Which is 0.01. Or get the project admins to run a different server code base. The current BOINC code removed the DCF mechanism and changed the DCF to a static value of 1.0. So projects that run that code do not see gyrations in the estimated times. But it is up to each project what Boinc server code they decide to run and how much they have modified it to suit their needs. Benchmarking itself does not change the DCF value. It's the variation in task running times among all the varied sub-species that changes the DCF value. | |
ID: 62060 | Rating: 0 | rate: / Reply Quote | |
That's very helpful. Makes sense. | |
ID: 62061 | Rating: 0 | rate: / Reply Quote | |
There probably is some interaction between running the benchmarks and computing the dcf. | |
ID: 62062 | Rating: 0 | rate: / Reply Quote | |
My observations on the estimated runtimes are the following: | |
ID: 62065 | Rating: 0 | rate: / Reply Quote | |
Strange that I have never had a single instance of "tasks won't finish in time" message on my two hosts with a 2080 Ti running every type of task that GPUGrid offers in all the time that I've run this project. | |
ID: 62066 | Rating: 0 | rate: / Reply Quote | |
The maximum queue length you can set in BOINC manager is 10+10 days. When a task's runtime estimate says that it will run for 30+ days, then it could not be the queue (cache) size causing the "Tasks won't finish in time" message. | |
ID: 62067 | Rating: 0 | rate: / Reply Quote | |
I'd like to add, that there are two distict batches in ACEMD3 (ADRIA and ANTONIOM), and when an ADRIA task gets between ANTONIOM tasks, it breaks the duration correction factor of the latter, so I have to adjust it in the client_state.xml file manually on a daily basis if I want to fill up my queue (4 tasks). | |
ID: 62068 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Strange really big wrong ETAs on workunits