Advanced search

Message boards : Number crunching : Question re rebooting during an active Work Unit in Progress

Author Message
STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 47160 - Posted: 3 May 2017 | 4:38:29 UTC

It has been sometime since I last participated with GPUGRID so I don't remember the proper procedure to reboot a machine after a kernel update and save an in progress gpu work unit. This happened a few days ago and I ended up loosing 5 hours of crunching time causing the WU to error out following the reboot.

I have checked the cpu slot using ACMED and did not see an obvious check point file so the point of this post is to inquire if it possible to reboot an in progress work unit and if so, what is the procedure? I would hate to think that I would have to suspend future work and wait until the current project completed prior to rebooting a new kernel.

I use Fedora and boinc runs as a daemon on all my systems. Any advice greatly appreciated.
____________

Crunching since Feb 2003 (United Devices, Find-a-Drug)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47223 - Posted: 15 May 2017 | 19:02:32 UTC - in response to Message 47160.

Select no new work from the project. Allow existing work to complete, then update+restart before accepting work from the project again.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,906,230,840
RAC: 22,258,642
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47260 - Posted: 17 May 2017 | 18:32:37 UTC - in response to Message 47160.

... This happened a few days ago and I ended up loosing 5 hours of crunching time causing the WU to error out following the reboot. ...

hm, whenever I close the BOINC Manager by pushing the "exit" button, after a reboot of the PC GPUGRID and other projects continue their work exactly where they had interrupted it before.

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 47349 - Posted: 31 May 2017 | 23:46:47 UTC

Select no new work from the project. Allow existing work to complete, then update+restart before accepting work from the project again.

I was hoping to avoid having to do that but you are correct in that it appears to be the only safe way to avoid loosing time.

hm, whenever I close the BOINC Manager by pushing the "exit" button, after a reboot of the PC GPUGRID and other projects continue their work exactly where they had interrupted it before.

I have had a couple of power outages since I first posted about this and those WU's did continue where they left off upon restarting. Leads me to believe this issue could be more WU related or just an unfortunate coincident.
____________

Crunching since Feb 2003 (United Devices, Find-a-Drug)

Post to thread

Message boards : Number crunching : Question re rebooting during an active Work Unit in Progress

//