Advanced search

Message boards : Number crunching : Task status Running

Author Message
Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 144,838,375
RAC: 105,485
Level
Cys
Scientific publications
wat
Message 57452 - Posted: 4 Oct 2021 | 13:48:51 UTC

The following new task has a Running status but the Time(s)are not updating increasing and decreasing and the Percentage complete is stuck at 32.520%

Task 27080203 ACEMD 2.18 (cuda1121)

Application
New version of ACEMD 2.18 (cuda1121)
Name
e2s144_e1s731p0f642-ADRIA_AdB_KIXCMYB_HIP-1-2-RND5774
State
Running
Received
9/30/2021 11:07:28 PM
Report deadline
10/5/2021 11:07:30 PM
Resources
0.986 CPUs + 1 NVIDIA GPU
Estimated computation size
5,000,000 GFLOPs
CPU time
16:36:51
CPU time since checkpoint
---
Elapsed time
19:07:15
Estimated time remaining
1d 07:08:43
Fraction done
35.520%
Virtual memory size
0 bytes
Working set size
0 bytes
Directory
slots/6
Process ID
28096
Progress rate
2.160% per hour
Executable
wrapper_6.1_windows_x86_64.exe

Do you have any suggestions ?

Thank you
Bill F

____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 24
Level
Trp
Scientific publications
wat
Message 57453 - Posted: 4 Oct 2021 | 14:02:01 UTC

Is it “stuck” or is it just running very slow?

These tasks are pretty long running. And a 1060 isn’t very powerful. I can believe it would take a couple days to complete. Check back in a few hours to see if the percentage increased a bit.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,381,866,723
RAC: 19,056,016
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57454 - Posted: 4 Oct 2021 | 14:36:39 UTC - in response to Message 57453.

I had one like that this morning: WU 27081359

The key factor was that it was using zero CPU resources - zero CPU time for the run as a whole, zero CPU time since last checkpoint. I paused and restarted it, but no better. So I aborted it.

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 144,838,375
RAC: 105,485
Level
Cys
Scientific publications
wat
Message 57472 - Posted: 5 Oct 2021 | 0:21:59 UTC - in response to Message 57453.

Is it “stuck” or is it just running very slow?

These tasks are pretty long running. And a 1060 isn’t very powerful. I can believe it would take a couple days to complete. Check back in a few hours to see if the percentage increased a bit.


8 hours later the Percentage and Time Run and Time remaining are all the same.

The GPU is pretty old and slow but it has been very successful on GPUGRID ACEMD tasks in the past.

Suspending and resuming did not affect it.

Unless someone has a better suggestion I will Abort it later tonight.

Thanks
Bill F


____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 24
Level
Trp
Scientific publications
wat
Message 57474 - Posted: 5 Oct 2021 | 1:17:17 UTC

Is the GPU actually working? Can you see the utilization percentage? What about the CPU support percentage?
____________

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 144,838,375
RAC: 105,485
Level
Cys
Scientific publications
wat
Message 57475 - Posted: 5 Oct 2021 | 2:28:10 UTC - in response to Message 57474.

Is the GPU actually working? Can you see the utilization percentage? What about the CPU support percentage?


Here is a copy of the Task properties with the Task in Running Status

It looks like there is no GPU activity or CPU activity

Application
New version of ACEMD 2.18 (cuda1121)
Name
e2s144_e1s731p0f642-ADRIA_AdB_KIXCMYB_HIP-1-2-RND5774
State
Running
Received
9/30/2021 11:07:28 PM
Report deadline
10/5/2021 11:07:30 PM
Resources
0.986 CPUs + 1 NVIDIA GPU
Estimated computation size
5,000,000 GFLOPs
CPU time
16:36:51
CPU time since checkpoint
---
Elapsed time
19:07:15
Estimated time remaining
1d 07:08:43
Fraction done
35.520%
Virtual memory size
0 bytes
Working set size
0 bytes
Directory
slots/6
Process ID
28096
Progress rate
2.160% per hour
Executable
wrapper_6.1_windows_x86_64.exe

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 24
Level
Trp
Scientific publications
wat
Message 57476 - Posted: 5 Oct 2021 | 2:49:37 UTC - in response to Message 57475.
Last modified: 5 Oct 2021 | 2:50:31 UTC

Double post.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1078
Credit: 40,231,533,983
RAC: 24
Level
Trp
Scientific publications
wat
Message 57477 - Posted: 5 Oct 2021 | 2:49:55 UTC - in response to Message 57475.

I mean use a program like GPUz or HWinfo to report the utilization percentage. Checking the running temperature can be another indication of if it’s doing anything.
____________

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 144,838,375
RAC: 105,485
Level
Cys
Scientific publications
wat
Message 57495 - Posted: 6 Oct 2021 | 3:49:08 UTC - in response to Message 57477.

I mean use a program like GPUz or HWinfo to report the utilization percentage. Checking the running temperature can be another indication of if it’s doing anything.


I can see nothing that indicates that any work is happening even thought the Task says Running.

Bill F


____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 32
Credit: 144,838,375
RAC: 105,485
Level
Cys
Scientific publications
wat
Message 57500 - Posted: 6 Oct 2021 | 13:48:29 UTC

UPDATE and resolution for the moment. I found that if I suspend all BOINC activity and exit BOINC after waiting about 45 seconds to allow any late process writing to be completed that after restarting the BOINC client the GPUGRID task resumed computation. Due to the lost days it will not make deadline but I will allow it to complete to validate my GPU and other hardware. Total run time expected to be 2 days 2 hours on a NVDIA 1060.

Thanks for your help and assistance.
Bill F

____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Post to thread

Message boards : Number crunching : Task status Running

//