Advanced search

Message boards : Number crunching : Incorrectly reported run time

Author Message
catavalon21
Send message
Joined: 1 Feb 09
Posts: 4
Credit: 409,584,890
RAC: 1,352,584
Level
Gln
Scientific publications
watwatwatwatwat
Message 60015 - Posted: 5 Mar 2023 | 12:27:32 UTC

I notice that the run times reported for Python tasks on Win10 are much larger than the actual elapsed clock time; rather it is mirroring the value of CPU time. Example, a recently completed WU took just shy of 14 hours to complete (50739 seconds exactly based on first and last time reported in the log) while the Completed Tasks list shows the task taking 208,842.7 seconds to complete. Curious, is this a new anomaly?

Task is here: https://www.gpugrid.net/result.php?resultid=33337456

KAMasud
Send message
Joined: 27 Jul 11
Posts: 138
Credit: 527,147,131
RAC: 183,879
Level
Lys
Scientific publications
watwat
Message 60016 - Posted: 5 Mar 2023 | 17:42:03 UTC - in response to Message 60015.

I notice that the run times reported for Python tasks on Win10 are much larger than the actual elapsed clock time; rather it is mirroring the value of CPU time. Example, a recently completed WU took just shy of 14 hours to complete (50739 seconds exactly based on first and last time reported in the log) while the Completed Tasks list shows the task taking 208,842.7 seconds to complete. Curious, is this a new anomaly?

Task is here: https://www.gpugrid.net/result.php?resultid=33337456

________

Not a new issue. Go to the news section and then the Python concerning thread. Every issue is covered in that thread.

catavalon21
Send message
Joined: 1 Feb 09
Posts: 4
Credit: 409,584,890
RAC: 1,352,584
Level
Gln
Scientific publications
watwatwatwatwat
Message 60017 - Posted: 5 Mar 2023 | 19:58:23 UTC - in response to Message 60016.

I notice that the run times reported for Python tasks on Win10 are much larger than the actual elapsed clock time; rather it is mirroring the value of CPU time. Example, a recently completed WU took just shy of 14 hours to complete (50739 seconds exactly based on first and last time reported in the log) while the Completed Tasks list shows the task taking 208,842.7 seconds to complete. Curious, is this a new anomaly?

Task is here: https://www.gpugrid.net/result.php?resultid=33337456

________

Not a new issue. Go to the news section and then the Python concerning thread. Every issue is covered in that thread.


Thanks. Searched a bit, but obviously not in the correct location.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1362
Credit: 7,922,257,621
RAC: 4,607,857
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60018 - Posted: 5 Mar 2023 | 22:06:47 UTC - in response to Message 60017.

The posts about this ARE in there.

The gist is that since these tasks are of dual cpu-gpu nature and primarily compute on the cpu, BOINC has no mechanism for handling or understanding these tasks.

It only knows that a task is either run on the cpu or run on the gpu. It does not compute correctly the cpu_time or run_time for any task that is run on both components.

Ignore the cpu_times and run_times, they are meaningless values. Just pay attention to the delta between time sent and time returned to determine the period of time the task takes to crunch.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1627
Credit: 9,472,145,543
RAC: 13,725,970
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60019 - Posted: 5 Mar 2023 | 22:30:47 UTC - in response to Message 60018.

Not quite.

Runtime and CPU time are recorded locally by the BOINC client, and reported accurately to the project server.

At some point, the server throws its hands up, and says "the runtime can't be a low as that, if the CPU time is so high", and 'corrects' the runtime to what it regards is the true value - even if that's impossible given the clock times when the task was issued and returned.

I've never summoned up the energy to investigate further, and find out whether that's

1) an error by GPUGrid
2) a deliberate decision by BOINC
3) because GPUGrid is running an outdated server
4) something else

But it could be worked out.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1362
Credit: 7,922,257,621
RAC: 4,607,857
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60021 - Posted: 6 Mar 2023 | 1:26:03 UTC - in response to Message 60019.
Last modified: 6 Mar 2023 | 1:45:26 UTC

But neither the run_time or cpu_time is correctly reported on ANY of my hosts.

I return a PythonGPU task in 8.5 - 9 hours consistently on my fastest host. Times reported to the scheduler are ~80000 seconds or equivalent to 22.2 hours.

So off by a factor of 3X there consistently.

My slowest hosts report using 135K seconds or 37.8 hours of run_time. Tasks actually run and report in 12 - 13 hours. So also off by ~3X factor.

So I ignore the reported values as nonsense.

[Edit] OK, I see what you are talking about. You are referring the time recorded in the job_log for the task aren't you?

But even that is not correct.

This is the job_log output for this task.

1678063389 ue 3024664.119761 ct 80540.580000 et 32929.492896
ABOU_rnd_ppod_expand_demos29_2_exp3-0-1-RND8739_0

ue = Estimated runtime = 3024664 seconds = !wrong!
ct CPU time = 80540 seconds = !correct! = that is correctly reported value to the project from the job_log.
et = elapsed time = 32929 seconds = ! actual correct runtime not reported to the project.


But does not reflect any true relation to the actual time the calculation needs to run and report.

And here is the task at the project.
https://www.gpugrid.net/result.php?resultid=33336333

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1627
Credit: 9,472,145,543
RAC: 13,725,970
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60023 - Posted: 6 Mar 2023 | 12:50:40 UTC - in response to Message 60021.

[Edit] OK, I see what you are talking about. You are referring the time recorded in the job_log for the task aren't you?

Actually, no - I'm not relying on historical logs. I'm looking at the real-time data interchange between BOINC client and BOINC server.

I've just reported task e00006a07567-ABOU_rnd_ppod_expand_demos29_2_exp3-0-1-RND8382_1

The event log messages relating to that task are:

Mon 06 Mar 2023 12:34:06 GMT | GPUGRID | Computation for task e00006a07567-ABOU_rnd_ppod_expand_demos29_2_exp3-0-1-RND8382_1 finished
Mon 06 Mar 2023 12:34:26 GMT | GPUGRID | Sending scheduler request: To report completed tasks.
Mon 06 Mar 2023 12:34:26 GMT | GPUGRID | Reporting 1 completed tasks
Mon 06 Mar 2023 12:34:27 GMT | GPUGRID | [sched_op] handle_scheduler_reply(): got ack for task e00006a07567-ABOU_rnd_ppod_expand_demos29_2_exp3-0-1-RND8382_1

That report was accomplished by sending a file called sched_request_www.gpugrid.net.xml

The significant content of that file was:

<result>
<name>e00006a07567-ABOU_rnd_ppod_expand_demos29_2_exp3-0-1-RND8382_1</name>
<final_cpu_time>79615.790000</final_cpu_time>
<final_elapsed_time>30189.014614</final_elapsed_time>
...

So, the CPU and elapsed times were different when the report left my machine.

But they are the same on the website:

Run time 79,615.79
CPU time 79,615.79

Neglecting the possibility that it was intercepted by a trojan router between northern England and eastern Spain, I'm asserting that the change happened within the GPUGrid server.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1362
Credit: 7,922,257,621
RAC: 4,607,857
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60024 - Posted: 6 Mar 2023 | 20:30:52 UTC

OK, poTaytoe - poTahtoe

Same values in the job log culled from the sched_reply.

But my statement about what the website displaying is nonsense values is correct.

Post to thread

Message boards : Number crunching : Incorrectly reported run time

//