Author |
Message |
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Same as 6.44 for Linux but it reports also the approximate total elapsed time.
gdf |
|
|
|
Same as 6.44 for Linux but it reports also the approximate total elapsed time.
gdf
Nice improvements! :)
1% cpu usage (when it uses the cpu), and the workunit is 50% done, shows 8 seconds of CPU time, normally that would have been about 7 hours 20 minutes.
____________
Down with the Kredit Kops!!! |
|
|
|
Is it still in beta testing or in production (and I guess a project reset is necessary) ?
And to get that small CPU usage, are you using a pre-release of Boinc 6.3.11 or 6.3.10 ?
Thx ! |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Is it still in beta testing or in production (and I guess a project reset is necessary) ?
And to get that small CPU usage, are you using a pre-release of Boinc 6.3.11 or 6.3.10 ?
Thx !
We are still in beta testing. You should not need a project reset if you were running the previous application.
A project reset is necessary for people that have download errors.
gdf
|
|
|
|
Thanks, just downloaded a wu to be crunched with 6.45. |
|
|
|
I tested 6.45, the display is a lot slowed down :( . It is almost unusable on vista64 + 9600GT |
|
|
|
App started fine and is almost not using the CPU. If there is a reduced responsivity of the system it is very small, i.e. I didn't notice a change. Running XP32 SP2, 9800GTX+, 6.3.10 and 177.92.
GDF, may I ask how you implemented this great new feature?
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
I'm on my first 6.45 windows task.
CPU usage in task manager shows 02 to 06, instead of 50 (for 1 HT CPU)
After 30 minutes of run time, BOINC CPU usage shows 1m 34s.
This means too my CPU runs cooler, Wattage usage drops 34W, from 1 BOINC CPU task to 1 GPU task with a little CPU. Every little bit helps.
This is a very nice improvement :)
---
One thing to note, when the first 6.45 started, Windows XP SP3 popped up a security window, I had to unblock the application. I never had to do that before with any apps. SP3 was recently installed and this is the first app version change since then, so it may have changed some setting. Just to let those know who may run this on some unattended machine, they need to watch for this. |
|
|
|
Is it still in beta testing or in production (and I guess a project reset is necessary) ?
And to get that small CPU usage, are you using a pre-release of Boinc 6.3.11 or 6.3.10 ?
Thx !
See the front page Join with your Nvidia graphics card (beta).
Links to the client in that section indicate what client should be used (if you hover over the link, you see the download URL, the version number is part of the URL. |
|
|
|
Ok, a new app is running well and almost not occupies CPU resources. So when we will be able to run 3 distributed computing projects WUs - 1 on GPU and 2 on double core CPU? |
|
|
|
My 1st v6.45 workunit finished ok.
Shows a total cpu time of 16.78 Seconds, and the task ID says approx 53211 seconds
____________
Down with the Kredit Kops!!! |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
One thing to note, when the first 6.45 started, Windows XP SP3 popped up a security window, I had to unblock the application.
Keith, do you recall what the warning said?
Matt
|
|
|
|
Keith, do you recall what the warning said?
Probably this standard execution prevention dialog, telling you that MS doesn't know where this .exe comes from and asks you if you really want to execute it.
@Rabinovitch: when BOINC 6.3.11 is released. Currently there's no definite ETA, but it shouldn't be too long (as long as you're not too impatient ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Oh, and looking at NaRyans result currently there seem to be 850000 steps in each WU?
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Probably this standard execution prevention dialog, telling you that MS doesn't know where this .exe comes from and asks you if you really want to execute it.
MrS
Not really, on vista64 I had the same thing and it was to communicate with the internet |
|
|
|
One thing to note, when the first 6.45 started, Windows XP SP3 popped up a security window, I had to unblock the application.
Keith, do you recall what the warning said?
Matt
Sorry no, who read those things anymore. I was just the standard windows firewall security block. I selected the unblock button since I trust you guys.
I see now that acemd_6.45... is on the exceptions list. Like I said it never did this before. Something windows changed. Matter of fact its the only application type on the list. I do have boinc and client in there. Unless before the firewall had been turned off and the update turned it back on ? |
|
|
|
Ok, a new app is running well and almost not occupies CPU resources. So when we will be able to run 3 distributed computing projects WUs - 1 on GPU and 2 on double core CPU?
When the developer at Berkeley who does the builds gets off vacation. This will be 6.3.11. Until he returns and decides when it is time to do another release, we just have to wait. I do not know when he will be back. Another week or two maybe, maybe more, maybe less. |
|
|
|
My 1st v6.45 workunit finished ok.
Shows a total cpu time of 16.78 Seconds, and the task ID says approx 53211 seconds
Is this time correct, 16 seconds ?
My first 6.45 has been running, now 38% done and has taken 24 minutes and 42 seconds of CPU time (1482 seconds). A check in task manager show same number of minutes and seconds. This would make the total be approx 3900 seconds when done.
His final time will be about 2000 seconds faster than what mine usually run of 55,000 seconds. Not that much different on the GPU times.
Why the big difference on CPU times ? ? ? |
|
|
|
I'm at 4:37 min CPU usage after 58.5% on a Q6600 @ 3 GHz. I noticed that initially (the first 1 - 1.5 h) the cpu usage was very low, at the order of few seconds, but when I started to use the machine it seemed to go up.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
With the difference in CPU times, are you using an Intel CPU?
I noticed on my Q6600 that cpu time is higher compared to my AMD Dual core.
AMD = 16 Seconds
Intel = 47 Minutes 32 Seconds
And the 2nd workunit on the AMD is heading the same way.
The 2nd workunit on the Intel is at 3% yet it shows the same cpu time (4 seconds) as the AMD system that is at 42%.
Perhaps the new app works better on AMD?
____________
Down with the Kredit Kops!!! |
|
|
|
Mine is only an Intel P4(HT) at 3.8GHz on Windows XP (32bit). I've been using it lightly off and on all day.
I'm up to 35:52 at 54% done.
I can understand some difference in P4's vs Quad Cores or linux vs windows
But still this is a big difference.
Why would an AMD show so much less useage ?
Even if it were 50% less I might understand, but this is like 99.6% less. |
|
|
|
Mine is only an Intel P4(HT) at 3.8GHz on Windows XP (32bit). I've been using it lightly off and on all day.
I'm up to 35:52 at 54% done.
I can understand some difference in P4's vs Quad Cores or linux vs windows
But still this is a big difference.
Why would an AMD show so much less useage ?
Even if it were 50% less I might understand, but this is like 99.6% less.
Well if the 1st one is puzzling you, then the 2nd workunit at 9.53 seconds is going to add even more confusion ;)
____________
Down with the Kredit Kops!!! |
|
|
JKuehl2Send message
Joined: 18 Jul 08 Posts: 33 Credit: 3,233,174 RAC: 0 Level
Scientific publications
|
Application 6.45 under Windows Vista 64 is much slower (about 56.000 Seconds - old one: about 30.000 Seconds), and uses much less GPU.
WU: http://www.ps3grid.net/workunit.php?wuid=44433
only 74 Degrees with Fan Speed of 700 RPM - old client (with one CPU-Core usage) 83 Degrees and 1000 RPM.
Is there a new version coming regarding this issue? |
|
|
|
Application 6.45 under Windows Vista 64 is much slower (about 56.000 Seconds - old one: about 30.000 Seconds), and uses much less GPU.
WU: http://www.ps3grid.net/workunit.php?wuid=44433
only 74 Degrees with Fan Speed of 700 RPM - old client (with one CPU-Core usage) 83 Degrees and 1000 RPM.
Is there a new version coming regarding this issue?
This is the one that fixes the CPU things.
As all the work is done on the GPU, The cpu just done other things.
Have a look at the cpu usage now on windows, it uses about 1% of the cpu.
And since all the work is done on the GPU, GPU time is more imprortant.
Have a look at one of your Task ID's and you see that info.
____________
Down with the Kredit Kops!!! |
|
|
JKuehl2Send message
Joined: 18 Jul 08 Posts: 33 Credit: 3,233,174 RAC: 0 Level
Scientific publications
|
I am aware of the fact that the CPU-core is freed now, but i just wanted to point out the fact, that the GPU is not utilized to the max with the new client.
Otherwise, there would be more heat generated by the calculation and load on the GPU core like in the old client - even if the parameters of the simulation have changed, leading to the increase in computation time.
As already stated in my previous post, the GPU time it took to calculate the WU was 56002 seconds. The old client took about 30.000 seconds (GPU time AND CPU time, as one core was used 100% we can assume that the used GPU time is proportional to the used CPU time).
The actual running WU also will be finished after around 56k seconds after looking at the calculation time of the first 10%. GPU utilization still stays low.
|
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
I am aware of the fact that the CPU-core is freed now, but i just wanted to point out the fact, that the GPU is not utilized to the max with the new client.
Otherwise, there would be more heat generated by the calculation and load on the GPU core like in the old client - even if the parameters of the simulation have changed, leading to the increase in computation time.
As already stated in my previous post, the GPU time it took to calculate the WU was 56002 seconds. The old client took about 30.000 seconds (GPU time AND CPU time, as one core was used 100% we can assume that the used GPU time is proportional to the used CPU time).
The actual running WU also will be finished after around 56k seconds after looking at the calculation time of the first 10%. GPU utilization still stays low.
In our tests the speed of the application was not affected by the better utilization of the cpu core. Let's run another wu and see. Also, you have far too many compute error on your machine. You should expect zero errors.
gdf |
|
|
JKuehl2Send message
Joined: 18 Jul 08 Posts: 33 Credit: 3,233,174 RAC: 0 Level
Scientific publications
|
Those compute errors were from a) from using another CUDA dll (trying to speed up the computation) and b) a faulty power supply in my computer. As you can see, all WUs have completed successfully a week ago (old app, working power supply ;-)
Dont get me wrong - 5000 Credits a day additional to CPU-crunched WIs is still a lot, but i don´t think the client makes use of the full capabilites of the GPU anymore.
Well - i´ll give it another chance for a few more days and hope its only an issue of the current workunits.
Edit:
i just checked with a fellow from my team. Same OS (Vista x64) but his Setup using a 9800 GTX has a) MUCH lower CPU Usage and b) MUCH faster computation on the GPU. (about 46-47 ms per Step).
His results:
http://www.ps3grid.net/result.php?resultid=58207
http://www.ps3grid.net/result.php?resultid=57930
What could be the reason? Nothing changed on my system instead of the new client. |
|
|
JKuehl2Send message
Joined: 18 Jul 08 Posts: 33 Credit: 3,233,174 RAC: 0 Level
Scientific publications
|
I FOUND THE PROBLEM
After Rebooting the machine, windows firewall popped up "acemd_6.45_windows_intelx86__cuda.exe blocked...allow?"
I allowed the process, and voila - GPU Utilization went up, Temperature went up, computation time (estimated) went down to 30.000 - 31.000 seconds.
CPU Utilization also dropped from about 3/4 of an hour per WU to about nothing..
VERY strange!
Someone know about that issue? |
|
|
|
Overall I'm impressed with this near zero CPU utilization release! Too many obvious good points.
However, I have one concern.
My scheduler got messed up!
To clarify:
Before this release, I gave ps3grid 25% more resource priority than other projects, and another 25% went to burp.boinc.dk, the rest of the 10s of projects each get less than 3%.
This made ps3grid run non stop 24/7.
Now, the CPU time dropped from 7 hours to 12 minutes, and each time a ps3grid task would finish the scheduler would think that the rest of the ps3grid tasks on queue can wait because it thinks it would take 12 minutes, and not 7 hours to complete.
Even if I further increase resource sharing for ps3grid the scheduler would still lower the priority because of this 12 minutes cpu completion!
Any workarounds? |
|
|
|
It seems like I'm way slower with the new app. Previously I had ~44.000s / 12h. My first 6.45 WU came in at 60.000 s / 70 ms, but I was gaming for 1 or 2 (probably 2 :p ) hours, so this may have messed the time up. The 2nd one was 53100s / 62 ms and the machine was almost undisturbed. I checked the wall clock time between start and finish of the WU, it's 14:46h, which agrees very well with the time reported by the app. So this speed decrease is real!
And my room temp dropped ~1°C but the GPU ~5°C, so I think it's safe to say it's used less now.
My config: XP32, SP2, firewall deactivated, 6.3.10, 177.92, 9800GTX+ w/o OC. I rebootet last weekend but should probably do it again, just to be sure. Oh, and CPU utilization was very low for the 2nd result, just 14.9s.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
JKuehl2Send message
Joined: 18 Jul 08 Posts: 33 Credit: 3,233,174 RAC: 0 Level
Scientific publications
|
Computation time increased once again, seems to be a problem with the Windows Scheduler.
Although one Processor core (of 4 in my quad) is unused: as early as i start another task on the computer which takes one core (single-threaded-mp3 encoder using only one core for example) PS3Grid Application drops in performance again. Also as earlier visible a significant drop in GPU temperatures. After exiting the encoder, GPU utilization goes up again.
Manual priorization of the process even to "highest" or "real time" doesn´t help.
Heres an image - temperature increases again as the mp3-encoder is closed.
|
|
|
|
I also tried to increase the priority of the CPU part. However, I couldn't: "access denied".
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
I also tried to increase the priority of the CPU part. However, I couldn't: "access denied".
MrS
Tonight, we will upload a new windows application that will remove the firewall issue, hoping that this is the cause of the slower application (I doubt it).
gdf |
|
|
|
It appears to me 6.45 is slower.
Home #0 took 55,000s before
Result 58254 CPU 3723s GPU 58289s 68ms
Result 58428 CPU 3436s GPU 59270s 69ms
Work #1 took 59,000s-60,000s before
Result 58017 CPU 285s GPU 68463s 80ms
Result 58579 CPU 203s GPU - ERROR - See Below
Work #2 took same as #1
Result 58264 CPU 3456s GPU 61631s 72ms
Work #1 and #2 are both same GPU and only a little slower than Home #0. Not counting the GPU slowdown time, they fall in an understandable range. The CPU time also, except for the one that is 285s (4m) vs 3456s (57m)? These two hosts are nearly identical in hardware and software, one may have a few extra applications.
This is the same question I asked before when another user reported 16s vs my estimated time, actual above is 62m and 57m (an hour).
Why this wild difference in CPU times ?
I would expect work #1 and work #2 to be nearly identical and only a little longer, since they are a little slower, than home #0.
---
Also this drill down into each work unit to get the GPU time is not liked. It is going to take a lot of extra effort, especially with multiple hosts during testing to see what is going on. I think it would be desirable to have the GPU time and maybe the ms step included in the table of result information next to the CPU time. I know this will take a custom hack as it's not part of the BOINC default web layout, but give it some thought.
My other thought too was to have version number in that table, especially during all this testing. So you can know which results are from older apps and easily see differnece in run times vs app version.
I'll add these suggestions on the wish list.
---
Error:
<core_client_version>6.3.10</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 8800 GT"
# Clock rate: 1512000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [frc_sum_kernel_dihed] failed in file 'force.cu' in line 568 : unspecified launch failure.
</stderr_txt>
]]>
No credit and no GPU time listed, so I do not know how far along task was when error occured. |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
It appears to me 6.45 is slower.
Home #0 took 55,000s before
Result 58254 CPU 3723s GPU 58289s 68ms
Result 58428 CPU 3436s GPU 59270s 69ms
Work #1 took 59,000s-60,000s before
Result 58017 CPU 285s GPU 68463s 80ms
Result 58579 CPU 203s GPU - ERROR - See Below
Work #2 took same as #1
Result 58264 CPU 3456s GPU 61631s 72ms
Keith, was the Work #2 machine running anything else, or being used interactively whilst it was crunching?
Matt
|
|
|
|
For the CPU time difference.
The machine that had the 16 Second time (9 second lowest), all it does is crunch.
So there is no activity on the screen for it to update.
I have noticed when looking at task manager on my main PC. the PS3GRID app sits at 0% CPU load, however when something happens on screen the cpu load increases.
So that will explain why my AMD has such a low CPU time, since I never use the computer to do anything (apart from Boinc), the screen will hardly ever change, resulting in 9 seconds of cpu time, and since my Intel is the PC I use the most, the screen is updated quite a lot (web pages, msn, video, etc) it ends up with the higher CPU time of 44 minutes.
And yes 6.45 is slower.
Looking at my Boincview logfiles, there used to be just under a 12 hour gap between workunits, now it has gone to 14.5 hours between workunits :(
____________
Down with the Kredit Kops!!! |
|
|
JKuehl2Send message
Joined: 18 Jul 08 Posts: 33 Credit: 3,233,174 RAC: 0 Level
Scientific publications
|
epsecially to GDF
The increase in computation time did not result from the firewall, this was a false positive from me - sorry for that. Here is true reason (at least it was reproducable by many others on their computers). Maybe its a issue of windows scheduler (maybe only on vista - don´t know right now if it happened also on xp, will look into it).
http://www.ps3grid.net/forum_thread.php?id=371&nowrap=true#2387 |
|
|
|
To add my 2 cents, I also encounter slowdown with 6.45 with a 9800 GT on XP32 (PIV HT 3.00):
Previously with 6.43 around 57k sec, example1
Now with 6.45 around 66k seconds:
- example2
- example3
In the meantime I increased slightly the overclocking, thus the real loss for a defined set of frequencies is somewhat higher.
Hope this helps.
McRoger
Edit: tests done with a machine 100% dedicated to GPUGRID |
|
|
|
It appears to me 6.45 is slower.
Home #0 took 55,000s before
Result 58254 CPU 3723s GPU 58289s 68ms
Result 58428 CPU 3436s GPU 59270s 69ms
Work #1 took 59,000s-60,000s before
Result 58017 CPU 285s GPU 68463s 80ms
Result 58579 CPU 203s GPU - ERROR - See Below
Work #2 took same as #1
Result 58264 CPU 3456s GPU 61631s 72ms
Keith, was the Work #2 machine running anything else, or being used interactively whilst it was crunching?
Matt
Yes both work #1 and work #2 are used during an 8 hour day by me at the same time, except that work #2 is used slighly less at times and slightly more at times. Basically both get about the same use.
There is something running at home and on work #2 that is common to both that is not on work #1, but at this time I do not know what it is. Those two exhibit the largely slower CPU times.
Except at home the first task was processed with only 1 CPU running BOINC, not both, so at least 41% of the system was idle.
It's hard to figutre out what may be the cause. |
|
|
Kokomiko Send message
Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level
Scientific publications
|
The new Windows-application shows since hours the same percentage and the time to complete is rising. 2 hours ago they show 97,294% and a completion time 4:08, now 97,294% and a completion time of 4:25. The shown CPU-time is 2:30:38, the taskmanager shows 3% for the core. Looks like there is no progress. The real total time is running since 8:44 UTC, now it's 17:34 UTC, that's far too long for my GTX280.
____________
|
|
|
|
Kokomiko: looks like your task hangs due to some strange reason. Did you restart BOINC? (not just the manager)
@CPU usage: since on my machine I have both, a few seconds or several minutes, I can clearly correlate the times to interactive usage.
@lower performance: I can clearly say it's not the firewall, since I have it turned off and have no other ones running, except my hardware router.
NaRyan wrote: Looking at my Boincview logfiles, there used to be just under a 12 hour gap between workunits, now it has gone to 14.5 hours between workunits :(
That's the same magnitude of slow-down which I am seeing on XP32.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
The new Windows-application shows since hours the same percentage and the time to complete is rising. 2 hours ago they show 97,294% and a completion time 4:08, now 97,294% and a completion time of 4:25. The shown CPU-time is 2:30:38, the taskmanager shows 3% for the core. Looks like there is no progress. The real total time is running since 8:44 UTC, now it's 17:34 UTC, that's far too long for my GTX280.
Same problem here, my last 3 work units are slower than normal.
Time per step: 32.684 ms
Approximate elapsed time for entire WU: 27781.558 s
Time per step: 44.760 ms
Approximate elapsed time for entire WU: 38046.248 s
Time per step: 33.262 ms
Approximate elapsed time for entire WU: 28272.461 s
My average used to be 26,000s plus or minus 300.
|
|
|
Kokomiko Send message
Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level
Scientific publications
|
Kokomiko: looks like your task hangs due to some strange reason. Did you restart BOINC? (not just the manager)
Yes, have made a restart. Now the running time is 2:37:29, the percentage is furthermore 97,294%, the time to complete is now 4:36. I will wait one hour more. Should I make a reset on the project? With this behavior I get more credits on MilkyWay for this idle core :D
____________
|
|
|
|
I'd either
1. abort the current WU and see how the next one does
or
2. switch the project to "no new work", abort the current WUs, update the project and then reset it
I'd prefer the 1st option, but if it's OK with your daily quota the 2nd is the more secure one.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Kokomiko Send message
Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level
Scientific publications
|
I'm don't the only cruncher with problems with this WU, don't know why.
http://www.gpugrid.net/workunit.php?wuid=42214
Had abort the task, the next is running fine. Will have a eye on it, hope, this is running better.
12 hours for the birds ... :(
____________
|
|
|
|
12 hours for the birds ... :(
Nevermind, it's beta after all! The devs will surely notice this WU and may find out something useful.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Kokomiko Send message
Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level
Scientific publications
|
... and the next WU is broken. Looks like the GTX280 don't like the 6.45 ... :(
WU broken again
... and again I'm not alone with a error on this WU ...
____________
|
|
|
sigma-7Send message
Joined: 27 Aug 08 Posts: 3 Credit: 1,608,041 RAC: 0 Level
Scientific publications
|
Application 6.45 is way to slow on gtx280.
Gone from 7 hours a wu to 9 hours :( |
|
|
|
... and the next WU is broken. Looks like the GTX280 don't like the 6.45 ... :(
WU broken again
... and again I'm not alone with a error on this WU ...
Since last sunday most of my WUs are brolen. Only 1 WU ist finished succesfull. The sizuation ist awfully, I become desperate.
Here are my tasks:
http://www.ps3grid.net/results.php?userid=5402
I gone back with the ddriber from 177.98 to 177.84 and U detached the PS3GGRID in Boinc, Whatcan I do else?
I have a Gainward 9800GTX+ without OC.
Some of the errors:
<core_client_version>6.3.10</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1836000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [frc_sum_kernel_angle] failed in file 'force.cu' in line 539 : unknown error.
</stderr_txt>
]]>
Or this one:
<core_client_version>6.3.10</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1836000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
</stderr_txt>
]]>
Or another error:
<core_client_version>6.3.10</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1836000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [frc_sum_kernel_bond] failed in file 'force.cu' in line 553 : unknown error.
</stderr_txt>
]]>
|
|
|
Kokomiko Send message
Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level
Scientific publications
|
... next broken WU on a GTX280:
Logfile (10:00 MESZ = 8:00 UTC):
17.09.2008 10:01:09|PS3GRID|Computation for task pY11683-GPUTEST3-0-10-acemd_0 finished
17.09.2008 10:01:09|PS3GRID|Output file pY11683-GPUTEST3-0-10-acemd_0_1 for task pY11683-GPUTEST3-0-10-acemd_0 absent
17.09.2008 10:01:09|PS3GRID|Output file pY11683-GPUTEST3-0-10-acemd_0_2 for task pY11683-GPUTEST3-0-10-acemd_0 absent
17.09.2008 10:01:09|PS3GRID|Output file pY11683-GPUTEST3-0-10-acemd_0_3 for task pY11683-GPUTEST3-0-10-acemd_0 absent
17.09.2008 10:01:11|PS3GRID|Started upload of pY11683-GPUTEST3-0-10-acemd_0_0
17.09.2008 10:01:15|PS3GRID|Finished upload of pY11683-GPUTEST3-0-10-acemd_0_0
Link to WU
<core_client_version>6.3.10</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1296000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [frc_sum_kernel_impr] failed in file 'force.cu' in line 583 : unknown error.
</stderr_txt>
]]>
Same problem: File restart.coor is missing.
Remark: Since the 6.45 is running, the PC has under Vista 64 bit make at 3 times a reboot without any hint in the logfile.
btw: The other machine (also Vista 64bit) with the 8800GT is running without problems.
____________
|
|
|
|
... next broken WU on a GTX280:
<core_client_version>6.3.10</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
In all my broken WUs I have the reason "Unzulässige Funktion" or in english "unsopported function" and I think it is a problem in the 6.45 code. Coulf you locate zhe error im your code? |
|
|
|
These result from 6.45 still confuse me
Work #1 177.84 8800GT 600MHz-512MB P4-HT 3.0GHz 2GB Windows XP SP3
TskID CPUs GPUsec ms/Step
58017 285s 68463s 80.545
58579 203s Error
59231 623s 68343s 80.404
Work #2 177.84 8800GT 600MHz-512MB P4-HT 3.0GHz 2GB Windows XP SP3
TskID CPU-s GPUsec ms/Step
58264 3456s 61631s 72.508ms
59024 3540s 61432s 72.274ms
59183 3513s 61384s 72.217ms
Home #1 177.92 8800GT 640MHz-512MB Intel P4-HT 3.8GHz 2GB Windows XP SP3 Media Center
TskID CPU-s GPUsec ms/Step
58254 3723s 58289s 68.576ms
58428 3436s 59270s 69.730ms
58997 3545s 59464s 69.958ms
59533 3703s 60286s 70.925ms
Home #1 is faster on GPU becuase it is a slightly faster clock on the GPU, factory overclocked model. All three are same brand. This matches previous apps, in that it was faster by about 2000s. Which it still is compared to work #2.
Home #1 still has similar CPU times to work #2. I cannot find any software running that would be common to home #1 and work #2. Most is common to all three or common between work #1 and work #2.
About the only common element between home #1 and work #2 is MS SQL Server Service Manager, I believe this is what they call SQLExpress. Home is Windows Media Center Edition, It came installed with that, I think it would only be in use when running media center, which I don't on this system. Work #2 has one piece of software installed, but not in use which installed it there. I do not know what else really uses it, so basically it sits dormant on both. I do not really see how that would interfere.
Now looking at the times, why (?) on work #1 is the CPU faster and the GPU slower than work # 2 ? These two computers have identical hardware, CPU speed, GPU's, daily usage. Basically I use those 8 hours a day, then they are untouched for 16 hours. At home this computer gets maybe 5 hours max useage. So if using the GPU for video slows down the cpu and app, why would this not be shown across all hosts. Certainly 1 task would be running while I'm using host, but then 1 would be able to run undisturbed during the time I'm not using it, so as if suggested by some some use of the gpu by other programs is causing the slowdown, then each host would have fast and slow cpu times.
Work #1 and #2 get about the same usage, They will be unused with screens on standby for 16 hours, then for about 8 hours both are awake. I would use work #1 slightly more as that is my primary. I do email on it, document scanning and some of our daily operations and invoicing. It has the printers attached to it. Work #2 I primarily do invoicing for shipments, while I read email on #1 and on #3 process the shipment in UPS software. Work #2 also is used for some internet browsing while on #1 I would be entering information viewed on #2. Yet work #1 shows the faster CPU times over work #2. |
|
|
|
Do you have any screensaver activated or simply a "blank screen" ?
Are the cards "really" identical ? SP (shaders), frequency matters more than the GPU frequency for GPUgrid.
You can check that with GPU-Z .
edit: spelling |
|
|
|
...
Same problem: File restart.coor is missing.
...
That's not a problem, restart.coor is the checkpoint file. It only means there was no checkpoint file because it ran from the beginning until it errored out.
But please don't ask me what causes the other errors... ;)
____________
pixelicious.at - my little photoblog |
|
|
|
Do you have any screensaver activated or simply a "blank screen" ?
Are the cards "really" identical ? SP (shaders), frequency matters more than the GPU frequency for GPUgrid.
You can check that with GPU-Z .
edit: spelling
No screen saver (none), windows powers down the monitor after 15 minutes of non-use.
Yes work #1 and work #2 are identical, bought at same time, same brand, same frequency 600MHz, 512MB at 900MHz and stock settings, no tinkering or overclocking, same driver installed from same downlaod. Yes they have same number of shaders, 112 at 1500MHz. Same everything. The CPUs are the same, same frequency and memory in each host. Same ahrd drives. Only some small software differences.
Yes I already have GPUz, Just ran again, all info reported on both is identical down to the bios verison. I do not have a way at the moment to upload pictures (screenshots). I work on that and psot those later. |
|
|
|
In all my broken WUs I have the reason "Unzulässige Funktion" or in english "unsopported function"
Wolfram, did you reboot your machine? Maybe also tried a project reset, in case some file on your hdd became corrupted.
And btw, "Unzulässige Funktion" should be more like "invalid function" ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
@Keith: your situation puzzles me as well.
Your data may suggest a correlation: high cpu usage - lower computation time. I checked with my WUs: 3 of them have 62.5 ms/step and CPU usages of 14, 14 and 291s. One has CPU 851 and "only" 60.1 ms/step. That's not really a clear-cut picture either.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Kokomiko Send message
Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level
Scientific publications
|
Looks like I have solved the problem for my machine, don't know, if it is relevant for others.
I use a Gigabyte MA790FX-DS5 board, a Phenom 9950 BE with 4 x 2 GB OCZ RAM 1066 and a GTX280 from XFX. My RAM was running in DUAL and UNGANGED mode. Since I switched to GANGED mode, the PC is still running without crashing the PS3Grid-WU with the 6.45. Maybe there is a problem with the 2 x 64 bit RAM access simultaneous for different cores in the UNGANGED mode, so I use now the 1 x 128 bit access of the GANGED mode.
____________
|
|
|
|
In all my broken WUs I have the reason "Unzulässige Funktion" or in english "unsopported function"
Wolfram, did you reboot your machine? Maybe also tried a project reset, in case some file on your hdd became corrupted.
And btw, "Unzulässige Funktion" should be more like "invalid function" ;)
MrS
Thank you for your help. Yes I rebooted and reset the project. I had 1 option in the cc_config.xml file. I deleted this file now. rebooted and started a new WU nearly 3 hours ago and has now 23,7% finished. I hope, I have a good night.
Thx again for your help. |
|
|
|
Looks like I have solved the problem for my machine... Since I switched to GANGED mode, the PC is still running without crashing the PS3Grid-WU with the 6.45
Although it's possible it seems very random. I would wait a few more WUs to see if it's stable. Then switch back to unganged and I wouldn't be surprised if it was stable as well ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
@Keith: your situation puzzles me as well.
Your data may suggest a correlation: high cpu usage - lower computation time. I checked with my WUs: 3 of them have 62.5 ms/step and CPU usages of 14, 14 and 291s. One has CPU 851 and "only" 60.1 ms/step. That's not really a clear-cut picture either.
MrS
Previous tests did not have this high a margin.
6.44 and earlier all had similar results. Work 1 and Work 2 ran pretty even or close. There would be some difference but always within a small range of the average. My usage has not changed much between versions. What I did then I still do now. I do not remember changing any other software between versions. So all things being equal, work 1 and work 2 should still have similar run times between the two, I can accept that 6.45 is a little slower, but the margin between the two makes no sense when there was not one before.
Average CPU sec of 6.44 and earlier versions of all work completed.
Home = 55,900s
Work 1 = 59,254s
Work 2 = 59,124s
There was a difference of 2 minutes 10.42 seconds for work 1 vs work 2
There was a difference of 53-55 minutes for home to work.
Now with 6.45
Home = 59,327s
Work 1 = 68,403s
Work 2 = 61,482s
OK so Home went up by 3,426s (57m)
I would expect work 1 and 2 to do about the same margin, allowing a little for the speed difference. In past versions when there was a change in version, this was observed, an equal increase or decrease.
Now with 6.45
Work 1 went up 9148s (152m)
Work 2 went up 2350s (39m)
152m and 39m are not the same or even close. There is now a 133m differnce in these two's averages vs the previous 2m difference.
The increase is not even close to the 57m increase of my home system, one is much more and 1 less.
[SCRATCHES HEAD]
[EDIT]
Another thing, if I add up all the run times for the first 36 tasks (30 day period) of work 1 and work 2, the totals 2,133,162 and 2,128,182 only have a difference of 4,979s (82m) total.
Now the gap between the two per task is larger than that 30 day gap. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
keith,
on a 8800GT you should get approximately 70 ms/step.
This is consistent with the Linux application and it is what we had before when the application was using 100% of cpu.
I don't quite understand why your other 8800GT takes 80 ms /step still having the same shader frequency:
# Clock rate: 1512000 kilohertz
# Time per step: 74.176 ms
#
# Device 0: "GeForce 8800 GT"
# Clock rate: 1512000 kilohertz
# Time per step: 80.404 ms
#
Note that previous CPU time from older application is not a reliable measure of elapsed time, as the cpu was just polling the gpu and so, probably using less than 100%.
gdf |
|
|
|
keith,
on a 8800GT you should get approximately 70 ms/step.
This is consistent with the Linux application and it is what we had before when the application was using 100% of cpu.
I don't quite understand why your other 8800GT takes 80 ms /step still having the same shader frequency:
# Clock rate: 1512000 kilohertz
# Time per step: 74.176 ms
#
# Device 0: "GeForce 8800 GT"
# Clock rate: 1512000 kilohertz
# Time per step: 80.404 ms
#
Note that previous CPU time from older application is not a reliable measure of elapsed time, as the cpu was just polling the gpu and so, probably using less than 100%.
gdf
I may be on to something now. Yesterday I switched off BOINCview both at home and on work #2. I did not have time to check results at home this morning.
But this morning after I rebooted both work computers, and then checked times. work 1 is about 60% done with 3:46 cpu time, about same as it was the first few. Work #2 is now 70% done with 00:55 cpu time. Work #2 never had one that quick yet.
My thinking was memory useage or network traffic, specifically communication with BOINC. BOINCview will acculate a lot of messages in memory, just as BOINC does, except my BOINCview is monitoring 16 computers. I'm going to leave it off the next couple of days and over the weekend to see what happens. The only other thing reduced by this is network traffic and communication with the BOINC client. Although this has nothing to do with GPU useage.
It is still too early to tell if that is a cause. I'm still PUZZELED. |
|
|
|
Here are the last 5 results, all from 6.45 from work #2.
Something I did, I guess turning off BOINCview has made a difference, but in the wrong direction. GPU time is now more. CPU time is less though. CPU time is about the same as work #1, and GPU time is close but only about half way. This was the only change I made yesterday. You can see a drop in CPU and increase in GPU on the last two (first to on list). This I do not understand these results.
work #2
CPU---- Claimed Granted GPU time ms/step
478.53 3232.06 3232.06 64824.85 76.27
2951.58 3232.06 3232.06 63049.30 74.18
3513.09 3232.06 3232.06 61384.59 72.22
3540.48 3232.06 3232.06 61432.72 72.27
3456.95 3232.06 3232.06 61631.57 72.51
work #1
CPU---- Claimed Granted GPU time ms/step
522.89 3232.06 3232.06 69912.97 82.25
623.58 3232.06 3232.06 68343.72 80.4
285.72 3232.06 3232.06 68463.05 80.55
home since turning off BOINCview has shown no real change, CPU time is about the same as is GPU time.
home
CPU---- Claimed Granted GPU time ms/step
3439.88 3232.06 3232.06 6.45 58286.17 68.57
3567.89 3232.06 3232.06 6.45 58595.14 68.94
3703.58 3232.06 3232.06 6.45 60286.58 70.93
3545.36 3232.06 3232.06 6.45 59464.68 69.96
3436.73 3232.06 3232.06 6.45 59270.56 69.73
3723.67 3232.06 3232.06 6.45 58289.86 68.58
I'll keep looking, but I want to let things run a few days, especially over the weekend where I will not be at work so they will have two days undisturbed. |
|
|
|
Computation error.
19.09.2008 1:31:47|PS3GRID|Restarting task RN18493-GPUTEST3-3-10-acemd_0 using acemd version 645
19.09.2008 1:31:50|PS3GRID|Computation for task RN18493-GPUTEST3-3-10-acemd_0 finished
19.09.2008 1:31:50|PS3GRID|Output file RN18493-GPUTEST3-3-10-acemd_0_1 for task RN18493-GPUTEST3-3-10-acemd_0 absent
19.09.2008 1:31:50|PS3GRID|Output file RN18493-GPUTEST3-3-10-acemd_0_2 for task RN18493-GPUTEST3-3-10-acemd_0 absent
19.09.2008 1:31:50|PS3GRID|Output file RN18493-GPUTEST3-3-10-acemd_0_3 for task RN18493-GPUTEST3-3-10-acemd_0 absent
|
|
|
|
I still have varied times between my one hosts at work.
Mostly this seems to be to BOINCview runnning on the host, but do not know why it interferes. I turned it off last weekend, and over the weekend all takes finished with less CPUtime. On Monday I restarted BOINCview and CPU times went back up and have reamined higher all week long.
If I try same test at home (shutting it down), it has no effect, times have remained the same.
Times are averages or approximate.
If running BOINCview the host has CPU time of about 57m and GPU time of 60,841s=1014m=16.90h
If not running BOINCview, the host CPU time drops to 4m but the GPU time goes up to 66,478s=1107m=18.47h, an increase of 1.57 hours.
It seems to me though, it would be wiser to give up about one hour of CPU time to save 2 hours of GPU time. Net gain is 1 wall clock hour less processing.
From a credit point of view
On this host, it earns about 2.5-2.75cs per minute for GPU and only 0.23cs per minute for CPU (from another project). This is just an estimate. So I would be giving up 14cs from the CPU to gain 300cs from the GPU as each task would finish 2 hours quicker, thus giving 2 extra hours of use to the GPU.
OK, these are not exact numbers, but the difference is big enough to justify the CPU usage. I think you can see the point.
The only question is how I make my other work computer which as always had a low CPU time use more CPU time. I guess I can install BOINCview on that one too. |
|
|
|
Strange problem:
04.10.2008 6:54:33|PS3GRID|Started upload of yr20704-GPUTEST3-7-10-acemd_3_3
04.10.2008 6:54:35|PS3GRID|[error] Error reported by file upload server: Server is out of disk space
04.10.2008 6:54:35|PS3GRID|Temporarily failed upload of yr20704-GPUTEST3-7-10-acemd_3_3: transient upload error
04.10.2008 6:54:35|PS3GRID|Backing off 5 min 30 sec on upload of yr20704-GPUTEST3-7-10-acemd_3_3
What's up with disk space?! |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
fixed.
gdf |
|
|
|
My GPU performance for this box has been on a decline...
http://www.ps3grid.net/result.php?resultid=83851
This is the last wu finished. The previous wu was not as bad. And before this, my result times for this GPU were:
# Time per step: 77.024 ms
# Approximate elapsed time for entire WU: 65470.047 s
I re-booted prior to starting the "83851" wu.
app 6.45
WinXP64 sp2
177.84 driver
6.3.14
8800GT
____________
Consciousness: That annoying time between naps......
Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
|
|
|
|
I think I fixed it. It might have been another application (not related to Boinc) that was started by mistake, and hogged the cpu cycles the GPU needed.
I'll know in a day or so...
____________
Consciousness: That annoying time between naps......
Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
|
|
|