Author |
Message |
|
During a routine update request, the following happened:
5/30/2011 10:41:58 AM | GPUGRID | update requested by user
5/30/2011 10:42:01 AM | GPUGRID | Sending scheduler request: Requested by user.
5/30/2011 10:42:01 AM | GPUGRID | Not reporting or requesting tasks
5/30/2011 10:42:03 AM | GPUGRID | Scheduler request completed
5/30/2011 10:42:03 AM | GPUGRID | Result A587-TONI_AGGsoup1-7-100-RND6153_1 is no longer usable
5/30/2011 10:42:03 AM | GPUGRID | Result p17-IBUCH_7_wtEGFR_110419-18-20-RND5765_1 is no longer usable
5/30/2011 10:42:03 AM | GPUGRID | Result A163-TONI_AGGdense1-3-100-RND8813_2 is no longer usable
5/30/2011 10:42:04 AM | GPUGRID | Computation for task A587-TONI_AGGsoup1-7-100-RND6153_1 finished
5/30/2011 10:42:04 AM | GPUGRID | Computation for task p17-IBUCH_7_wtEGFR_110419-18-20-RND5765_1 finished
5/30/2011 10:42:37 AM | GPUGRID | Sending scheduler request: To report completed tasks.
5/30/2011 10:42:37 AM | GPUGRID | Reporting 3 completed tasks, requesting new tasks for NVIDIA GPU
5/30/2011 10:42:41 AM | GPUGRID | Scheduler request completed: got 2 new tasks
I don't see why this happened. I updated to the latest driver and boinc versions a while back. Two of the units were partially complete, with no errors. The two new units are running fine, so far. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Tasks listed as Client Detached
Are you using Bam? |
|
|
|
I don't even know what Bam is. So, I would say no. |
|
|
|
It happened to me again:
6/5/2011 9:09:13 PM | GPUGRID | Sending scheduler request: To fetch work.
6/5/2011 9:09:13 PM | GPUGRID | Requesting new tasks for NVIDIA GPU
6/5/2011 9:09:15 PM | GPUGRID | Scheduler request completed: got 1 new tasks
6/5/2011 9:09:15 PM | GPUGRID | Result A570-TONI_AGGsoup1-13-100-RND6941_0 is no longer usable
6/5/2011 9:09:15 PM | GPUGRID | Result A229-TONI_AGGsoup1-11-100-RND0938_1 is no longer usable
6/5/2011 9:09:16 PM | GPUGRID | Computation for task A570-TONI_AGGsoup1-13-100-RND6941_0 finished
6/5/2011 9:09:16 PM | GPUGRID | Computation for task A229-TONI_AGGsoup1-11-100-RND0938_1 finished
6/5/2011 9:09:39 PM | GPUGRID | Started download of A435-TONI_AGG1-24-LICENSE
6/5/2011 9:09:39 PM | GPUGRID | Started download of A435-TONI_AGG1-24-COPYRIGHT
6/5/2011 9:09:40 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-LICENSE
6/5/2011 9:09:40 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-COPYRIGHT
6/5/2011 9:09:40 PM | GPUGRID | Started download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_1
6/5/2011 9:09:40 PM | GPUGRID | Started download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_2
6/5/2011 9:09:48 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_2
6/5/2011 9:09:48 PM | GPUGRID | Started download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_3
6/5/2011 9:09:49 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_1
6/5/2011 9:09:49 PM | GPUGRID | Started download of A435-TONI_AGG1-24-pdb_file
6/5/2011 9:09:49 PM | GPUGRID | Sending scheduler request: To report completed tasks.
6/5/2011 9:09:49 PM | GPUGRID | Reporting 2 completed tasks, requesting new tasks for NVIDIA GPU
6/5/2011 9:09:52 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_3
This started after, I upgraded to Boinc 6.12.26 and Nvidia version 27061, and then to 27533. It is only happening to my windows 7 machine. Everything was going fine until the scheduler request. Is the problem in my machine or in the server? |
|
|
DagorathSend message
Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level
Scientific publications
|
6/5/2011 9:09:15 PM | GPUGRID | Result A570-TONI_AGGsoup1-13-100-RND6941_0 is no longer usable
6/5/2011 9:09:15 PM | GPUGRID | Result A229-TONI_AGGsoup1-11-100-RND0938_1 is no longer usable
The server decided that the 2 results in question were no longer usable and sent the 2 messages quoted above to your BOINC client. Your computer responded to the above messages by ending computation on the 2 tasks, as indicated in the 2 messages quoted below.
6/5/2011 9:09:16 PM | GPUGRID | Computation for task A570-TONI_AGGsoup1-13-100-RND6941_0 finished
6/5/2011 9:09:16 PM | GPUGRID | Computation for task A229-TONI_AGGsoup1-11-100-RND0938_1 finished
What's puzzling is that when I look at the time the tasks were sent to you and the time the tasks were returned, I see that the one task was returned about 5 hours after it was sent and the other task was returned about 7 hours after it was sent. Tasks should not get the "result is no longer needed" message until after the deadline has passed. The deadline is 5 days but your 2 tasks were canceled less than 12 hours after they were sent to you. That's weird.
Another weird thing is that on your list of tasks for that computer, both of those tasks show a status of "Client detached". If they were canceled by the server then they should show status "Canceled" not "Client detached". That makes me wonder... Did you detach the client from GPUgrid after the tasks were canceled and before they were reported?
So we have 2 mysteries here. It's hard to give you any advise about what to do about the problem until those 2 mysteries are solved.
|
|
|
|
No, I did not detach from the client. This was in fact a computer scheduler request. |
|
|
DagorathSend message
Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level
Scientific publications
|
Hmmm. If you didn't detach from the project and the project server canceled your 2 tasks less than 12 hours after you received them, as the messages you posted indicate, then you've exposed some bugs on the server, perhaps in the client too. The problem is nobody else seems to be afflicted by the bugs so I doubt they exist.
The only other scenario I can think of is not pretty. In that scenario you did in fact detach from the project and you fabricated the messages to make it look like the server canceled the 2 tasks prematurely. Maybe you're just screwing with our heads? Sorry if that offends and I hope I'm wrong. I sincerely hope someone else can think of a third scenario to explain what's going on.
|
|
|
|
No, I am not making this up. I know you had to mention this to rule out all possibilities, but please don't mention this again. Thank you! |
|
|
|
is it possible that the WUs were resends because the original, which was sent to someone else, had not been returned within 48 hours but then before you crunched they, the originally finally returned?
____________
Thanks - Steve |
|
|
DagorathSend message
Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level
Scientific publications
|
No, that did not happen. Follow the audit trail for yourself if you wish.
Click on Bedrich's name to the left of one of his messages. Then click on the "Computers: View" link to bring up his list of attached computers. Now recall from one of his posts that he said it happened on his Windows 7 machine. That would be computer number 74707. Click on the "Tasks" link for 74707 then find the 2 tasks with status "Client detached". Click the "work unit ID" link for each task. Note that for each task the work unit name at the top of that page matches one of the 2 canceled tasks in the messages he quoted in the first post in this thread.
Now note that for work unit 2516266 the first replication was reported 5 Jun 2011 13:50:57 UTC but Bedrich's computer 7407 didn't receive the second replication until 5 Jun 2011 18:10:34 UTC, about 4 hours after the first replication was returned. Thus Bedrich's computer was not crunching the work unit simultaneously with another host.
For work unit 2516397 we can see that Bedrich's computer 74707 reported the task at 5 Jun 2011 23:22:49 UTC and the next com,puter in line didn't receive the second replication until 6 Jun 2011 4:40:53 UTC. Again the work unit was not crunching simultaneously on 2 computers.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I'm sure Bedrich Hajek is not making it up; such errors have been seen and reported before (here and in other project threads).
If it's a server side problem (task issue or other) one of the researchers might be able to help. If its the result of corrupt user data a project reset might help. It could also be the result of an Antivirus/Firewall blocking something, or something else I can't think of.
Have you recently changed your user details, and do they match up well with other projects?
Did your system do a system restore, or recover from a problem? |
|
|
|
Have you recently changed your user details, and do they match up well with other projects?
No.
Did your system do a system restore, or recover from a problem?
No.
The only things that I did that were unusual were update boinc and Nvidia, as I mentioned earlier. I only updated the Nvidia drivers. Today, I reinstalled Nvidia, doing clean installation, afterwards I cleaned out the registry. Maybe, this will do the trick. If this doesn't work, I can go back to the earlier versions. |
|
|