WU: OPM995 simulations

Message boards : News : WU: OPM995 simulations

Author	Message
Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43600 - Posted: 27 May 2016 \| 8:54:11 UTC
	Here we go again :) This time with 33% more credits + corrected runtimes which means an additional 2x credit for WUs which take more than 18 hours on a 780 and only WUs which take up to a max of 24 hours on a 780. I hope I don't seriously overshoot on credits this time but it's really a bit hit & miss.
	ID: 43600 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 43602 - Posted: 27 May 2016 \| 9:19:12 UTC - in response to Message 43600.
	Thanks Stefan! As there is plenty of workunits queued (7920 atm), and some of these are very long I suggest everyone to reduce their work cache to 0.03 days to maximize throughput & the credits earned.
	ID: 43602 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43604 - Posted: 27 May 2016 \| 13:51:39 UTC - in response to Message 43602. Last modified: 27 May 2016 \| 13:52:17 UTC
	Thanks Stefan! As there is plenty of workunits queued (7920 atm), and some of these are very long I suggest everyone to reduce their work cache to 0.03 days to maximize throughput & the credits earned. Good suggestion. Given the length of these tasks (extra-long or at least some of them), and so many being available, there is no point in people hoarding tasks - they will just miss bonus deadlines and get less credit. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43604 \| Rating: 0 \| rate: / Reply Quote

WPrion Send message Joined: 30 Apr 13 Posts: 96 Credit: 1,958,984,111 RAC: 19,320,070 Level Scientific publications	Message 43628 - Posted: 29 May 2016 \| 2:05:23 UTC - in response to Message 43602.
	Thanks Stefan! As there is plenty of workunits queued (7920 atm), and some of these are very long I suggest everyone to reduce their work cache to 0.03 days to maximize throughput & the credits earned. Are you referring to the setting: "Maintain enough work for an additional" I set mine to 0.03 several hours ago and updated my client. Yet it downloaded another WU shortly after one was finished just as the the running WU barely started. Is there something else to tweak? Thanks, Win ____________
	ID: 43628 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43632 - Posted: 29 May 2016 \| 13:14:27 UTC - in response to Message 43628. Last modified: 29 May 2016 \| 13:19:19 UTC
	Yes, in Boinc Manager (advanced view) under Options, Computing preference and the Computing tab you need to set two values: Store at least [0.02] days of work Store up to an additional [0.01] days of work If the combined values add up to anything less than 0.10 then the settings should work reasonably well. It's likely that the second value was something like 0.25 or 0.5 and that caused you to download additional work (a second task). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43632 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43633 - Posted: 29 May 2016 \| 13:30:23 UTC Last modified: 29 May 2016 \| 13:32:05 UTC
	Please note that really low buffer settings cause increased stress on project scheduler servers, for all projects you are attached to. I personally leave my buffers at something like "store at least 1 day, store up to 0.5 days more", since I don't care about the GPUGrid credit bonus, and short buffers don't really help GPUGrid throughput unless very few work units are available, and I don't want to add increased stress to my attached projects' scheduler servers.
	ID: 43633 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43634 - Posted: 29 May 2016 \| 14:20:12 UTC - in response to Message 43633.
	... short buffers don't really help GPUGrid throughput ... Not necessarily true. I'm not speaking specifically about the OPM simulations here, but I think most GPUGrid work is run as a sort of relay race - you hold the baton for a short while, complete your lap of the track, and then hand it back in for somebody else to take over. If you sit at the side of the track for a day and a half before you even start running, that particular baton - series of linked tasks, each generated from the result of the previous lap - is permanently delayed, and the final results aren't available for the scientists to study until that much later.
	ID: 43634 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43635 - Posted: 29 May 2016 \| 14:26:29 UTC - in response to Message 43634. Last modified: 29 May 2016 \| 14:28:33 UTC
	That had slipped my mind. But, if GPUGrid was having a problem getting the batons back for the next runners, and they wanted to ensure that the race kept running smoothly, they could tighten the deadlines on the relay chunks if need be. So, I'm just going to stick with the deadlines they give me, and not micro-manage BOINC, and not add stress to my attached projects' servers. I actually have GPUGrid set to 99999 resource share, and GPUs crunching 2-at-a-time, so ... :) When I get tasks from this project, they are usually firing on all cylinders, top priority.
	ID: 43635 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43639 - Posted: 29 May 2016 \| 19:25:46 UTC - in response to Message 43635. Last modified: 29 May 2016 \| 19:28:57 UTC
	Until the scheduler is re-written at a per device/device-specific level there will be issues with attaching to multiple projects (when using multiple devices). However, these have been addressed as far as reasonably feasible with the existing manager. Would add that many CPU projects have long tasks; some Einstein and WCG tasks for example take ~20h to complete, ClimatePrediction several days to weeks. If you have a low cache and are running a GPUGrid task(s) on your GPU(s) and WCG tasks on your CPU then you won't badger the server for new work until you are almost out of work which probably won't be very often (a few times per day, which isn't an issue). Granted there are/where some projects with very short run-times, but that does not mean it's better to have long a long queue/big cache of tasks. There are substantial issues with having hundreds/thousands of tasks in your queue too. For example, if you crunch for BU and your Internet goes down, all queued tasks will fail - not exactly great news for their server. My opinion for here - low cache good for the project and user/team credits, higher (but reasonably low) cache not as good for either but still good, Not Bad, and it's your choice. High cache (3+ days) bad news. The bonus system is designed to reflect this projects need for a quick return. It can't take into account what else you crunch. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43639 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43640 - Posted: 29 May 2016 \| 19:53:28 UTC - in response to Message 43639. Last modified: 29 May 2016 \| 20:00:50 UTC
	It can't take into account what else you crunch. That's exactly the reason that you shouldn't make blanket suggestions on suggested cache settings that benefit GPUGrid most, without also specifying some of the drawbacks :) I digress. For my particular scenario, I have modified my cache settings a bit, in order to try to keep all my GPUs sustained at 2-GPUGrid-tasks-per-GPU without taking on additional work from other attached GPU projects. I'm using 0.9d+0.9d on the PC that has GTX970+GTX660Ti+GTX660Ti, and 0.5d+0.5d on the PC that has GTX980Ti+GTX980Ti. To each their own.
	ID: 43640 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43641 - Posted: 29 May 2016 \| 21:02:47 UTC
	For years many have asked for per project work buffer settings or at LEAST separate settings for GPUs and CPUs. All to no avail, while a lot of effort has been spent on less important (IMO) issues.
	ID: 43641 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43642 - Posted: 29 May 2016 \| 22:04:57 UTC - in response to Message 43640.
	It can't take into account what else you crunch. That's exactly the reason that you shouldn't make blanket suggestions on suggested cache settings that benefit GPUGrid most, without also specifying some of the drawbacks :) I digress. For my particular scenario, I have modified my cache settings a bit, in order to try to keep all my GPUs sustained at 2-GPUGrid-tasks-per-GPU without taking on additional work from other attached GPU projects. I'm using 0.9d+0.9d on the PC that has GTX970+GTX660Ti+GTX660Ti, and 0.5d+0.5d on the PC that has GTX980Ti+GTX980Ti. To each their own. My suggestions are predominantly for GPUGrid only and are typically optimisations for GPUGrid throughput and user/team credit. I don't make suggestions at GPUGrid to facilitate every conceivable combination of Boinc-wide project admix, nor could I - it can't be done. You have different views, values, opinions and objectives which you are quite entitled to express and implement for yourself and to your own ends. My advice is mostly aimed at new, novice or just GPUGrid-new crunchers or people with a specific problem to here. Usually they need a setup to facilitate crunching here and often changes just to make it work. Occasionally I digress too, to advise on an experience crunching elsewhere, or to pass on some observations or knowledge, but there is no catch all super setup for Boinc. I enjoy the fact that people crunch for a diversity of reasons with different setups and takes on crunching. Highlighting different circumstances and experiences adds to my knowledge and crunchers knowledge as a whole, but one shoe doesn't fit all and this is a GPUGrid forum not the Boinc central forum where generic advice might better be propagated. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43642 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43643 - Posted: 29 May 2016 \| 22:10:02 UTC - in response to Message 43641.
	For years many have asked for per project work buffer settings or at LEAST separate settings for GPUs and CPUs. All to no avail, while a lot of effort has been spent on less important (IMO) issues. I don't bother any more. IMO it is what it is and that's just about all it will ever be. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43643 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43644 - Posted: 29 May 2016 \| 23:04:59 UTC - in response to Message 43643.
	For years many have asked for per project work buffer settings or at LEAST separate settings for GPUs and CPUs. All to no avail, while a lot of effort has been spent on less important (IMO) issues. I don't bother any more. IMO it is what it is and that's just about all it will ever be. Gave up too. However it is supremely important to devise more ways for people to burn up their phones while doing nothing useful.
	ID: 43644 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,636,206,793 RAC: 2,304,655 Level Scientific publications	Message 43654 - Posted: 30 May 2016 \| 14:21:55 UTC
	Stefan, Two of my computers have received SDOERR_opm995 tasks which are processed by an other computer at the same time. They have been send more or less at the same time. https://www.gpugrid.net/workunit.php?wuid=11614785 https://www.gpugrid.net/workunit.php?wuid=11614829 Is this by your intention as these SDOERR WUs had been so error prone or is it a fault of the scheduler? Please advise as fast as possible so I might kill them as soon as possible. I do not like to make double work if it is not required.
	ID: 43654 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43655 - Posted: 30 May 2016 \| 15:17:12 UTC - in response to Message 43654.
	initial replication 2 https://www.gpugrid.net/workunit.php?wuid=11614785 That means two tasks are sent out, by design. One of the OPM995's I'm running also has an initial replication of 2: https://www.gpugrid.net/workunit.php?wuid=11614838 ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43655 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43656 - Posted: 30 May 2016 \| 15:25:56 UTC - in response to Message 43655.
	Perhaps the question is: Why was it set up with initial replication set to 2?
	ID: 43656 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43659 - Posted: 30 May 2016 \| 20:48:32 UTC - in response to Message 43656. Last modified: 30 May 2016 \| 22:10:00 UTC
	Probably validation; any proof of concept experiment to demonstrate ability needs to contain appropriate verification for it to be accepted as a model/framework for performing experiments. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43659 \| Rating: 0 \| rate: / Reply Quote

WPrion Send message Joined: 30 Apr 13 Posts: 96 Credit: 1,958,984,111 RAC: 19,320,070 Level Scientific publications	Message 43661 - Posted: 31 May 2016 \| 0:56:38 UTC - in response to Message 43632.
	Thanks!
	ID: 43661 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43662 - Posted: 31 May 2016 \| 1:14:43 UTC - in response to Message 43659.
	Hmm... validation deals with quorum though, and also, I thought the way these GPUGrid tasks worked was that the results couldn't really be validated against each other. I might be mistaken though.
	ID: 43662 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43663 - Posted: 31 May 2016 \| 7:11:54 UTC - in response to Message 43662.
	Wasn't thinking about task validation in the Boinc sense but rather validation of the experimental procedure - does it hold any weight? If we consider an experiment as a batch of work, validation of the experiment (and procedures) in scientific terms usually requires that the whole experiment be replicated, and perhaps many times before the results/methods are accepted. Of course Stefan might be doing this for different reasons. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43663 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43666 - Posted: 31 May 2016 \| 12:52:59 UTC - in response to Message 43663.
	I see what you mean now. I hope he has another reason.
	ID: 43666 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43671 - Posted: 31 May 2016 \| 16:40:16 UTC Last modified: 31 May 2016 \| 16:40:32 UTC
	GTX970 on W10 24h and 41min with a bit of upload time too (118MB). http://www.gpugrid.net/result.php?resultid=15125538 Run time 88,881.18 CPU time 88,253.09 Validate state Valid Credit 788,690.00 I expect if a system was setup a bit better it could complete within 24h but I've a second GPU, the room's been 24C to 28C, I'm using the CPU quite a bit and my system is set to drop the clocks to keep the temperature down. This GPU was clocked at ~1300MHz, the second has dropped down to 1088. GDDR5 is @7GHz. Haven't managed to get an OPM on my Linux system yet. The point of installing Ubuntu 16.04 was to see if I could setup a GTX970 system to return these long OPM's inside 24h! ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43671 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 485 Credit: 10,549,623,466 RAC: 15,250,943 Level Scientific publications	Message 43678 - Posted: 1 Jun 2016 \| 1:53:30 UTC
	I was fortunate enough to get and complete successfully 2 of these units: 5f1c-SDOERR_opm995-0-1-RND8074_2 11614800 30 May 2016 \| 13:52:39 UTC 31 May 2016 \| 6:23:14 UTC Completed and validated 56,458.02 56,161.20 940,443.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) # Time per step (avg over 5000000 steps): 11.257 ms # Approximate elapsed time for entire WU: 56284.859 s # PERFORMANCE: 157144 Natoms 11.257 ns/day 0.000 ms/step 0.000 us/step/atom 02:17:56 (7792): called boinc_finish http://www.gpugrid.net/result.php?resultid=15124495 3jw8R0-SDOERR_opm995-0-1-RND9612_2 11614181 30 May 2016 \| 8:49:32 UTC 31 May 2016 \| 0:50:29 UTC Completed and validated 55,859.07 55,499.59 956,403.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) # Time per step (avg over 10000000 steps): 5.578 ms # Approximate elapsed time for entire WU: 55780.416 s # PERFORMANCE: 79913 Natoms 5.578 ns/day 0.000 ms/step 0.000 us/step/atom 20:45:10 (7740): called boinc_finish http://www.gpugrid.net/result.php?resultid=15124201 With the 5f1c-SDOERR_opm995-0-1-RND8074_2, my windows 10 computer was able to achieve a 87% maximum GPU usage, while using 1950 MB of memory. While the 3jw8R0-SDOERR_opm995-0-1-RND9612_2, on the same computer, achieved 80% maximum GPU usage, while using 1100 MB of memory. I can't wait to get a few more of these!
	ID: 43678 \| Rating: 0 \| rate: / Reply Quote

_Ryle_ Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 363,138 Level Scientific publications	Message 43681 - Posted: 1 Jun 2016 \| 14:38:34 UTC
	Is it so, that when the new students arrive, that you would consider creating more short tasks? I think it is a pity, that you mostly cater to the very highend cards here. I'd like to continue supporting this project, but as it is I just can't afford to buy the faster cards. I do own a 970, and it is still a fast card. I would just hate to see it go over that 24H limit in the near future. I understand it is eventually inevitable, but it's barely a year old. Sadly, the highend cards also crunch the short units, when the long unit pool is dry, so they quickly eat up the short pool too. A WU tier would be nice however. I think it's been suggested somewhere else before, in these forums, that you could make a short, medium and long unit pool. That would be cool, so the small cards have the short pool, the cards a bit faster have the medium pool, and finally the highend can get into the top tier, long pool. Still, it was so in the past, that the short units also gave less points per day overall, even if same time is used on same card, but I don't know what the reason is for that. (Maybe the bonus isn't added to those?). Well, just my 2 cents worth of opinion :)
	ID: 43681 \| Rating: 0 \| rate: / Reply Quote

John C MacAlister Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level Scientific publications	Message 43683 - Posted: 2 Jun 2016 \| 10:40:59 UTC
	Agreed: pity there are so few shorts..... My 650 Tis are too slow and the 660Tis looking pretty slow compared to many others. I can't afford newer cards and now with electricity costing me 18 cents (Canadian) per kWh, my contribution to GPUGrid will be very low. :(
	ID: 43683 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43685 - Posted: 2 Jun 2016 \| 16:05:42 UTC Last modified: 2 Jun 2016 \| 16:19:22 UTC
	Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. GTX970 (2m59 WU) compute 6.45hr estimated runtime (15.480% per 1hr). 2m59 WU status: 11-14% CPU usage (3.2GHz) / 54% GPU usage (1511MHz) / 24% MCU (7200MHz) / 25% BUS (PCIe3.0 x4) / GPU temp 39C / 33% GPU power (108W) / 550MB memory usage (no display connected) Topology reports 27558 atoms 4344 waters in system Thank you Zoltan for sharing helpful tip (in previous OPM thread) on where to locate a WU's atom amount file.
	ID: 43685 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 485 Credit: 10,549,623,466 RAC: 15,250,943 Level Scientific publications	Message 43695 - Posted: 3 Jun 2016 \| 5:47:27 UTC Last modified: 3 Jun 2016 \| 5:48:04 UTC
	I had one of these WUs fail with this error message: upload failure: <file_xfer_error> <file_name>4mt6-SDOERR_opm994-0-1-RND0442_0_11</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> http://www.gpugrid.net/result.php?resultid=15127701 Has this happened to anyone else with these WUs? I remember this happened in the past, and there is a fix to this posted, in the threads somewhere, but I can't remember where. I think this WU would have been otherwise good.
	ID: 43695 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 43696 - Posted: 3 Jun 2016 \| 7:35:49 UTC - in response to Message 43695.
	I had one of these WUs fail with this error message: upload failure: <file_xfer_error> <file_name>4mt6-SDOERR_opm994-0-1-RND0442_0_11</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> http://www.gpugrid.net/result.php?resultid=15127701 Has this happened to anyone else with these WUs? I remember this happened in the past, and there is a fix to this posted, in the threads somewhere, but I can't remember where. I think this WU would have been otherwise good. See the WARNING/CHALLENGE: VERY LONG WU (VERYLONG_CXCL12_confAna) thread. It's embarrassing that we've run into this again.
	ID: 43696 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43697 - Posted: 3 Jun 2016 \| 8:02:15 UTC
	I've got 2d57-SDOERR_opm994-0-1-RND4399_1 running. The file description in client_state.xml is <file> <name>2d57-SDOERR_opm994-0-1-RND4399_1_11</name> <nbytes>0.000000</nbytes> <max_nbytes>5000000.000000</max_nbytes> <status>0</status> <upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url> </file> - so the maximum size allowed is 5,000,000 bytes. So far, it's reached 852 KB at about 80% progress - which sounds like plenty of headroom, and perhaps not a widespread problem. But I'll keep an eye on it as it approaches completion.
	ID: 43697 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43698 - Posted: 3 Jun 2016 \| 8:09:50 UTC
	I apologize for not answering in a while, I have been a bit busy with writing my thesis. Job replication 2 was my desperate attempt to get my results back faster while also competing with the mass of simulations sent out by Gerard and reducing a bit my failure rates. I hope you don't mind too much since they were only around 300 WUs. If they arrive on the same host of course it's quite pointless. On the subject of short runs, I am unfortunately unable to help you because the equilibration runs cannot be split into smaller chunks. But as Gianni mentioned we are getting new students soon so it is possible that they have something for short.
	ID: 43698 \| Rating: 0 \| rate: / Reply Quote

John C MacAlister Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level Scientific publications	Message 43699 - Posted: 3 Jun 2016 \| 10:08:34 UTC
	Hi, Stefan: Thank you for this- On the subject of short runs, I am unfortunately unable to help you because the equilibration runs cannot be split into smaller chunks. But as Gianni mentioned we are getting new students soon so it is possible that they have something for short.
	ID: 43699 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43700 - Posted: 3 Jun 2016 \| 10:46:15 UTC - in response to Message 43697.
	2d57-SDOERR_opm994-0-1-RND4399_1 uploaded cleanly, so it's not a universal problem. 4azpR0-SDOERR_opm995-0-1-RND6483_1 might get closer to the limit - I'll keep an eye on it.
	ID: 43700 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43703 - Posted: 3 Jun 2016 \| 17:18:48 UTC - in response to Message 43685.
	Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. GTX970 (2m59 WU) compute 6.45hr estimated runtime (15.480% per 1hr). 2m59 WU status: 11-14% CPU usage (3.2GHz) / 54% GPU usage (1511MHz) / 24% MCU (7200MHz) / 25% BUS (PCIe3.0 x4) / GPU temp 39C / 33% GPU power (108W) / 550MB memory usage (no display connected) Topology reports 27558 atoms 4344 waters in system Thank you Zoltan for sharing helpful tip (in previous OPM thread) on where to locate a WU's atom amount file. WUid=11616186 (1a0r OPM994) crashed my system multiple times - this WU had 100% GPU usage / 1% MCU / 20% power (65W) before the (first ever driver reset(s) I've encountered computing ACEMD in three years.) The (1a0r) WU ended with a -97 (0xffffffffffffff9f) Unknown error number after 102sec at reference stock clock once I noticed the first couple of driver recoveries OCed. (FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965) A few other stable wingman (980ti / (2) 970's) high-end RAC systems (6 total) have error(s) (<100sec) with (1a0r) WU. As of now (2) OPM995 are without issue on my 970's at very high OC's: (WUid=11614432) 4a6fRO (50479 atoms with 9411 waters in system) 20.25hr estimated runtime at 12-15% CPU usage (3.2GHz) / 63% GPU usage (1511MHz) / 31% MCU (7200MHz) / 27% BUS (PCIe3.0 x4) / 34% power (110W) / 42C core / 820MB memory usage (WUid=116143650 4u15RO (51270 atoms with 8255 waters in system) 20.5hr estimated runtime at 12-15% CPU usage (3.2GHz) / 65% GPU usage (1511MHz) / 34% MCU (7010MHz) / 22% BUS (PCIe3.0 x8) / 60% power (120W) / 45C core / 843MB memory usage
	ID: 43703 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43704 - Posted: 3 Jun 2016 \| 20:41:09 UTC
	1s4wR0-SDOERR_opm995-0-1-RND5214_0 11614436 3 Jun 2016 \| 6:47:02 UTC 3 Jun 2016 \| 20:01:33 UTC Completed and validated 45,293.51 20,015.48 147,829.50 Finally got an OPM on my Ubuntu 16.04 rig. Alas it didn't turn out to be an extra-long run and completed in 12h 35min at stock. Based on the run time of other long WU's the credit is about half what it should be. Was hoping to get an extra-long task and to finish inside 24h - c'est la vie... ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43704 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43705 - Posted: 3 Jun 2016 \| 21:56:01 UTC - in response to Message 43700.
	4azpR0-SDOERR_opm995-0-1-RND6483_1 looks safe as well - 1,283 KB at 61%. # Topology reports 50432 atoms
	ID: 43705 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43706 - Posted: 3 Jun 2016 \| 22:20:36 UTC
	Too many errors (may have bug) 1a0r-SDOERR_opm994-0-1-RND9594 https://www.gpugrid.net/workunit.php?wuid=11616186
	ID: 43706 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43713 - Posted: 4 Jun 2016 \| 14:51:49 UTC - in response to Message 43703.
	(2) new OPM995 that should make the maximum size file_xfer allowed 5,000,000 bytes: 3nce WU#11614771 (126091 atoms with 25796 waters) status: 20hr estimated runtime at 12-16% CPU usage (3.2GHz) / 76% GPU usage (1511MHz) / 40% MCU (7200MHz) / 33% BUS (PCIe3.0 x4) / 40% power (130W) / 44C temp / 1559MB memory usage 2b6p WU#11614758 (129818 atoms with 23308 waters) status: 21hr estimated runtime at 12-16% CPU usage (3.2GHZ) / 75% GPU usage (1511MHz) / 45% MCU (7010MHz) / 24% BUS (PCIe3.0 x8) / 70% power (140W) / 47C temp / 1662MB memory usage Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. GTX970 (2m59 WU) compute 6.45hr estimated runtime (15.480% per 1hr). 2m59 WU status: 11-14% CPU usage (3.2GHz) / 54% GPU usage (1511MHz) / 24% MCU (7200MHz) / 25% BUS (PCIe3.0 x4) / GPU temp 39C / 33% GPU power (108W) / 550MB memory usage (no display connected) Topology reports 27558 atoms 4344 waters in system Thank you Zoltan for sharing helpful tip (in previous OPM thread) on where to locate a WU's atom amount file. WUid=11616186 (1a0r OPM994) crashed my system multiple times - this WU had 100% GPU usage / 1% MCU / 20% power (65W) before the (first ever driver reset(s) I've encountered computing ACEMD in three years.) The (1a0r) WU ended with a -97 (0xffffffffffffff9f) Unknown error number after 102sec at reference stock clock once I noticed the first couple of driver recoveries OCed. (FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965) A few other stable wingman (980ti / (2) 970's) high-end RAC systems (6 total) have error(s) (<100sec) with (1a0r) WU. Too many errors (may have bug) 1a0r-SDOERR_opm994-0-1-RND9594 As of now (2) OPM995 are without issue on my 970's at very high OC's: (WUid=11614432) 4a6fRO (50479 atoms with 9411 waters in system) 20.25hr estimated runtime at 12-15% CPU usage (3.2GHz) / 63% GPU usage (1511MHz) / 31% MCU (7200MHz) / 27% BUS (PCIe3.0 x4) / 34% power (110W) / 42C core / 820MB memory usage (WUid=116143650 4u15RO (51270 atoms with 8255 waters in system) 20.5hr estimated runtime at 12-15% CPU usage (3.2GHz) / 65% GPU usage (1511MHz) / 34% MCU (7010MHz) / 22% BUS (PCIe3.0 x8) / 60% power (120W) / 45C core / 843MB memory usage
	ID: 43713 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43724 - Posted: 5 Jun 2016 \| 14:38:34 UTC
	Any TX/980ti/980/970 (Present batch) SDOERR_opm99 grant 1,000,000 credit? My -+ (runtime) Credit: 23,912.30 GPU / 11,332.23 CPU / 41,296.50 credits (27588 atoms) / 5mil step 74,154.80 / 16,389.80 / 377,254.50 credits (126091 atoms) / 5mil step An odd short run 5mil step (~27k atoms) WU cropped up. 0 unsent 271 in progress 1155 success 47.62% error rate
	ID: 43724 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 43725 - Posted: 5 Jun 2016 \| 15:54:02 UTC - in response to Message 43724. Last modified: 5 Jun 2016 \| 15:57:25 UTC
	Any TX/980ti/980/970 (Present batch) SDOERR_opm99 grant 1,000,000 credit? 4by0-SDOERR_opm994-0-1-RND5591_1 58.472s (16h 14m 26s) 1.023.036 credits 170941 atoms 11.696 ns/day 5M steps This workunit is very interesting, as the initial replication was 2, the other host which received this workunit also received the +50% bonus, while it has returned it after 1d 14h.
	ID: 43725 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43727 - Posted: 5 Jun 2016 \| 20:11:34 UTC - in response to Message 43725.
	This workunit is very interesting, as the initial replication was 2, the other host which received this workunit also received the +50% bonus, while it has returned it after 1d 14h. AFAIK that's the way it's always worked here. The first reported WU sets the credit for everyone.
	ID: 43727 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43728 - Posted: 5 Jun 2016 \| 20:15:08 UTC - in response to Message 43704.
	Finally got an OPM on my Ubuntu 16.04 rig. Alas it didn't turn out to be an extra-long run and completed in 12h 35min at stock. Based on the run time of other long WU's the credit is about half what it should be. Had 4 OPMs finish today. The credit on all of them is 1/2 or less per hour compared to any other long WUs. Guess the credit wasn't fixed after all.
	ID: 43728 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43729 - Posted: 5 Jun 2016 \| 20:30:58 UTC - in response to Message 43728. Last modified: 5 Jun 2016 \| 20:36:10 UTC
	Got 2 real extra-long tasks on my Win10 system and one 'fake' extra-long task on my Linux system. The real extra-long tasks got 900K Boinc credits whereas the normal-long task only received 147K credits (or there about). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43729 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43730 - Posted: 5 Jun 2016 \| 20:33:19 UTC - in response to Message 43729. Last modified: 5 Jun 2016 \| 20:34:08 UTC
	I got 2 really long tasks on my Win10 system and one fake long task on my Linux system. The real long tasks got \|900K Boinc credits whereas the not-really-long task (normal-ling) only received 147K credits (or there about). Remedial math is a good post graduate course... ;-)
	ID: 43730 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43731 - Posted: 5 Jun 2016 \| 20:37:01 UTC - in response to Message 43730. Last modified: 5 Jun 2016 \| 20:46:09 UTC
	Just after correcting my remedial English :) PS. Looks like it's backup-project time again 🕒 ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43731 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43733 - Posted: 5 Jun 2016 \| 20:59:45 UTC - in response to Message 43731.
	Just after correcting my remedial English :) PS. Looks like it's backup-project time again 🕒 I don't think it's you that needs the remedial math, and yep it's that time again.
	ID: 43733 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43745 - Posted: 7 Jun 2016 \| 10:04:03 UTC
	Really, I am out of ideas on how to fix the credits any further. I did everything I could imagine being wrong. I could blindly multiply the credits by whatever factor you guys tell me, but right now I have to base it off our usual credit calculation script.
	ID: 43745 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43749 - Posted: 8 Jun 2016 \| 18:36:19 UTC - in response to Message 43745.
	Really, I am out of ideas on how to fix the credits any further. I did everything I could imagine being wrong. I could blindly multiply the credits by whatever factor you guys tell me, but right now I have to base it off our usual credit calculation script. Recent comparisons, OPM vs. CXCL12VOLK. Example from one of my machines: 1gzmR0-SDOERR_opm995-0-1-RND1802_0 11614349 3 Jun 2016 \| 6:36:43 UTC 5 Jun 2016 \| 9:02:02 UTC Completed and validated 162,200.44 47,231.66 237,804.00 e6s24_e1s9p0f524-GERARD_CXCL12VOLK_15782120_2-0-1-RND1978_0 11613059 28 May 2016 \| 21:23:31 UTC 30 May 2016 \| 4:49:33 UTC Completed and validated 96,473.03 31,352.66 233,875.00 Here's another one of my computers. This WU had 131548 Natoms: 2w61-SDOERR_opm994-0-1-RND7728_0 11616211 2 Jun 2016 \| 17:30:20 UTC 5 Jun 2016 \| 18:09:42 UTC Completed and validated 243,192.27 35,757.08 262,409.00 e4s9_e1s18p0f473-GERARD_CXCL12VOLK_15782120_2-0-1-RND7513_1 11609049 1 Jun 2016 \| 14:32:18 UTC 2 Jun 2016 \| 22:47:34 UTC Completed and validated 98,520.08 29,575.68 233,875.00 From the OPM WUs I've been running lately it seems that the credit is about 45% - 60% per hour compared to other/previous long WUs. On top of that there is a greater chance of failure with these long WUs. I would suggest erroring on the high side rather than the low side when estimating credit as it costs you nothing and it's one of the few tokens of appreciation that we receive for our small contribution to the great science that you guys are doing. Whining aside, keep up the excellent work. For a lot of us this is a small way that we can contribute to science.
	ID: 43749 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43777 - Posted: 14 Jun 2016 \| 12:29:45 UTC Last modified: 14 Jun 2016 \| 13:59:24 UTC
	I thought you guys might appreciate seeing what can go wrong in a simulation ;) I always love these mistakes. Still, only 1 out of 600+ systems managed to break like this so I'm quite impressed. http://imgur.com/qcvaMyq Essentially because of some water between the protein and the lower membrane layer (whose upper side is hydrophobic, hence hates water), the membrane starts bending and when it bends it suddenly interacts with the periodic image* of the protein and decides that it likes it more than staying with the other membrane layer. And then it goes pop :D * MD simulations are typically done using periodic interactions
	ID: 43777 \| Rating: 0 \| rate: / Reply Quote

Vagelis Giannadakis Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level Scientific publications	Message 43778 - Posted: 14 Jun 2016 \| 14:25:28 UTC - in response to Message 43777.
	Cool animation, Stefan! Thanks for sharing! :) ____________
	ID: 43778 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : News : WU: OPM995 simulations

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43600 - Posted: 27 May 2016 \| 8:54:11 UTC
	Here we go again :) This time with 33% more credits + corrected runtimes which means an additional 2x credit for WUs which take more than 18 hours on a 780 and only WUs which take up to a max of 24 hours on a 780. I hope I don't seriously overshoot on credits this time but it's really a bit hit & miss.
	ID: 43600 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 43602 - Posted: 27 May 2016 \| 9:19:12 UTC - in response to Message 43600.
	Thanks Stefan! As there is plenty of workunits queued (7920 atm), and some of these are very long I suggest everyone to reduce their work cache to 0.03 days to maximize throughput & the credits earned.
	ID: 43602 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43604 - Posted: 27 May 2016 \| 13:51:39 UTC - in response to Message 43602. Last modified: 27 May 2016 \| 13:52:17 UTC
	Thanks Stefan! As there is plenty of workunits queued (7920 atm), and some of these are very long I suggest everyone to reduce their work cache to 0.03 days to maximize throughput & the credits earned. Good suggestion. Given the length of these tasks (extra-long or at least some of them), and so many being available, there is no point in people hoarding tasks - they will just miss bonus deadlines and get less credit. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43604 \| Rating: 0 \| rate: / Reply Quote

WPrion Send message Joined: 30 Apr 13 Posts: 96 Credit: 1,958,984,111 RAC: 19,320,070 Level Scientific publications	Message 43628 - Posted: 29 May 2016 \| 2:05:23 UTC - in response to Message 43602.
	Thanks Stefan! As there is plenty of workunits queued (7920 atm), and some of these are very long I suggest everyone to reduce their work cache to 0.03 days to maximize throughput & the credits earned. Are you referring to the setting: "Maintain enough work for an additional" I set mine to 0.03 several hours ago and updated my client. Yet it downloaded another WU shortly after one was finished just as the the running WU barely started. Is there something else to tweak? Thanks, Win ____________
	ID: 43628 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43632 - Posted: 29 May 2016 \| 13:14:27 UTC - in response to Message 43628. Last modified: 29 May 2016 \| 13:19:19 UTC
	Yes, in Boinc Manager (advanced view) under Options, Computing preference and the Computing tab you need to set two values: Store at least [0.02] days of work Store up to an additional [0.01] days of work If the combined values add up to anything less than 0.10 then the settings should work reasonably well. It's likely that the second value was something like 0.25 or 0.5 and that caused you to download additional work (a second task). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43632 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43633 - Posted: 29 May 2016 \| 13:30:23 UTC Last modified: 29 May 2016 \| 13:32:05 UTC
	Please note that really low buffer settings cause increased stress on project scheduler servers, for all projects you are attached to. I personally leave my buffers at something like "store at least 1 day, store up to 0.5 days more", since I don't care about the GPUGrid credit bonus, and short buffers don't really help GPUGrid throughput unless very few work units are available, and I don't want to add increased stress to my attached projects' scheduler servers.
	ID: 43633 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43634 - Posted: 29 May 2016 \| 14:20:12 UTC - in response to Message 43633.
	... short buffers don't really help GPUGrid throughput ... Not necessarily true. I'm not speaking specifically about the OPM simulations here, but I think most GPUGrid work is run as a sort of relay race - you hold the baton for a short while, complete your lap of the track, and then hand it back in for somebody else to take over. If you sit at the side of the track for a day and a half before you even start running, that particular baton - series of linked tasks, each generated from the result of the previous lap - is permanently delayed, and the final results aren't available for the scientists to study until that much later.
	ID: 43634 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43635 - Posted: 29 May 2016 \| 14:26:29 UTC - in response to Message 43634. Last modified: 29 May 2016 \| 14:28:33 UTC
	That had slipped my mind. But, if GPUGrid was having a problem getting the batons back for the next runners, and they wanted to ensure that the race kept running smoothly, they could tighten the deadlines on the relay chunks if need be. So, I'm just going to stick with the deadlines they give me, and not micro-manage BOINC, and not add stress to my attached projects' servers. I actually have GPUGrid set to 99999 resource share, and GPUs crunching 2-at-a-time, so ... :) When I get tasks from this project, they are usually firing on all cylinders, top priority.
	ID: 43635 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43639 - Posted: 29 May 2016 \| 19:25:46 UTC - in response to Message 43635. Last modified: 29 May 2016 \| 19:28:57 UTC
	Until the scheduler is re-written at a per device/device-specific level there will be issues with attaching to multiple projects (when using multiple devices). However, these have been addressed as far as reasonably feasible with the existing manager. Would add that many CPU projects have long tasks; some Einstein and WCG tasks for example take ~20h to complete, ClimatePrediction several days to weeks. If you have a low cache and are running a GPUGrid task(s) on your GPU(s) and WCG tasks on your CPU then you won't badger the server for new work until you are almost out of work which probably won't be very often (a few times per day, which isn't an issue). Granted there are/where some projects with very short run-times, but that does not mean it's better to have long a long queue/big cache of tasks. There are substantial issues with having hundreds/thousands of tasks in your queue too. For example, if you crunch for BU and your Internet goes down, all queued tasks will fail - not exactly great news for their server. My opinion for here - low cache good for the project and user/team credits, higher (but reasonably low) cache not as good for either but still good, Not Bad, and it's your choice. High cache (3+ days) bad news. The bonus system is designed to reflect this projects need for a quick return. It can't take into account what else you crunch. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43639 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43640 - Posted: 29 May 2016 \| 19:53:28 UTC - in response to Message 43639. Last modified: 29 May 2016 \| 20:00:50 UTC
	It can't take into account what else you crunch. That's exactly the reason that you shouldn't make blanket suggestions on suggested cache settings that benefit GPUGrid most, without also specifying some of the drawbacks :) I digress. For my particular scenario, I have modified my cache settings a bit, in order to try to keep all my GPUs sustained at 2-GPUGrid-tasks-per-GPU without taking on additional work from other attached GPU projects. I'm using 0.9d+0.9d on the PC that has GTX970+GTX660Ti+GTX660Ti, and 0.5d+0.5d on the PC that has GTX980Ti+GTX980Ti. To each their own.
	ID: 43640 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43641 - Posted: 29 May 2016 \| 21:02:47 UTC
	For years many have asked for per project work buffer settings or at LEAST separate settings for GPUs and CPUs. All to no avail, while a lot of effort has been spent on less important (IMO) issues.
	ID: 43641 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43642 - Posted: 29 May 2016 \| 22:04:57 UTC - in response to Message 43640.
	It can't take into account what else you crunch. That's exactly the reason that you shouldn't make blanket suggestions on suggested cache settings that benefit GPUGrid most, without also specifying some of the drawbacks :) I digress. For my particular scenario, I have modified my cache settings a bit, in order to try to keep all my GPUs sustained at 2-GPUGrid-tasks-per-GPU without taking on additional work from other attached GPU projects. I'm using 0.9d+0.9d on the PC that has GTX970+GTX660Ti+GTX660Ti, and 0.5d+0.5d on the PC that has GTX980Ti+GTX980Ti. To each their own. My suggestions are predominantly for GPUGrid only and are typically optimisations for GPUGrid throughput and user/team credit. I don't make suggestions at GPUGrid to facilitate every conceivable combination of Boinc-wide project admix, nor could I - it can't be done. You have different views, values, opinions and objectives which you are quite entitled to express and implement for yourself and to your own ends. My advice is mostly aimed at new, novice or just GPUGrid-new crunchers or people with a specific problem to here. Usually they need a setup to facilitate crunching here and often changes just to make it work. Occasionally I digress too, to advise on an experience crunching elsewhere, or to pass on some observations or knowledge, but there is no catch all super setup for Boinc. I enjoy the fact that people crunch for a diversity of reasons with different setups and takes on crunching. Highlighting different circumstances and experiences adds to my knowledge and crunchers knowledge as a whole, but one shoe doesn't fit all and this is a GPUGrid forum not the Boinc central forum where generic advice might better be propagated. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43642 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43643 - Posted: 29 May 2016 \| 22:10:02 UTC - in response to Message 43641.
	For years many have asked for per project work buffer settings or at LEAST separate settings for GPUs and CPUs. All to no avail, while a lot of effort has been spent on less important (IMO) issues. I don't bother any more. IMO it is what it is and that's just about all it will ever be. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43643 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43644 - Posted: 29 May 2016 \| 23:04:59 UTC - in response to Message 43643.
	For years many have asked for per project work buffer settings or at LEAST separate settings for GPUs and CPUs. All to no avail, while a lot of effort has been spent on less important (IMO) issues. I don't bother any more. IMO it is what it is and that's just about all it will ever be. Gave up too. However it is supremely important to devise more ways for people to burn up their phones while doing nothing useful.
	ID: 43644 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,636,206,793 RAC: 2,304,655 Level Scientific publications	Message 43654 - Posted: 30 May 2016 \| 14:21:55 UTC
	Stefan, Two of my computers have received SDOERR_opm995 tasks which are processed by an other computer at the same time. They have been send more or less at the same time. https://www.gpugrid.net/workunit.php?wuid=11614785 https://www.gpugrid.net/workunit.php?wuid=11614829 Is this by your intention as these SDOERR WUs had been so error prone or is it a fault of the scheduler? Please advise as fast as possible so I might kill them as soon as possible. I do not like to make double work if it is not required.
	ID: 43654 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43655 - Posted: 30 May 2016 \| 15:17:12 UTC - in response to Message 43654.
	initial replication 2 https://www.gpugrid.net/workunit.php?wuid=11614785 That means two tasks are sent out, by design. One of the OPM995's I'm running also has an initial replication of 2: https://www.gpugrid.net/workunit.php?wuid=11614838 ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43655 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43656 - Posted: 30 May 2016 \| 15:25:56 UTC - in response to Message 43655.
	Perhaps the question is: Why was it set up with initial replication set to 2?
	ID: 43656 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43659 - Posted: 30 May 2016 \| 20:48:32 UTC - in response to Message 43656. Last modified: 30 May 2016 \| 22:10:00 UTC
	Probably validation; any proof of concept experiment to demonstrate ability needs to contain appropriate verification for it to be accepted as a model/framework for performing experiments. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43659 \| Rating: 0 \| rate: / Reply Quote

WPrion Send message Joined: 30 Apr 13 Posts: 96 Credit: 1,958,984,111 RAC: 19,320,070 Level Scientific publications	Message 43661 - Posted: 31 May 2016 \| 0:56:38 UTC - in response to Message 43632.
	Thanks!
	ID: 43661 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43662 - Posted: 31 May 2016 \| 1:14:43 UTC - in response to Message 43659.
	Hmm... validation deals with quorum though, and also, I thought the way these GPUGrid tasks worked was that the results couldn't really be validated against each other. I might be mistaken though.
	ID: 43662 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43663 - Posted: 31 May 2016 \| 7:11:54 UTC - in response to Message 43662.
	Wasn't thinking about task validation in the Boinc sense but rather validation of the experimental procedure - does it hold any weight? If we consider an experiment as a batch of work, validation of the experiment (and procedures) in scientific terms usually requires that the whole experiment be replicated, and perhaps many times before the results/methods are accepted. Of course Stefan might be doing this for different reasons. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43663 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43666 - Posted: 31 May 2016 \| 12:52:59 UTC - in response to Message 43663.
	I see what you mean now. I hope he has another reason.
	ID: 43666 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43671 - Posted: 31 May 2016 \| 16:40:16 UTC Last modified: 31 May 2016 \| 16:40:32 UTC
	GTX970 on W10 24h and 41min with a bit of upload time too (118MB). http://www.gpugrid.net/result.php?resultid=15125538 Run time 88,881.18 CPU time 88,253.09 Validate state Valid Credit 788,690.00 I expect if a system was setup a bit better it could complete within 24h but I've a second GPU, the room's been 24C to 28C, I'm using the CPU quite a bit and my system is set to drop the clocks to keep the temperature down. This GPU was clocked at ~1300MHz, the second has dropped down to 1088. GDDR5 is @7GHz. Haven't managed to get an OPM on my Linux system yet. The point of installing Ubuntu 16.04 was to see if I could setup a GTX970 system to return these long OPM's inside 24h! ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43671 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 485 Credit: 10,549,623,466 RAC: 15,250,943 Level Scientific publications	Message 43678 - Posted: 1 Jun 2016 \| 1:53:30 UTC
	I was fortunate enough to get and complete successfully 2 of these units: 5f1c-SDOERR_opm995-0-1-RND8074_2 11614800 30 May 2016 \| 13:52:39 UTC 31 May 2016 \| 6:23:14 UTC Completed and validated 56,458.02 56,161.20 940,443.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) # Time per step (avg over 5000000 steps): 11.257 ms # Approximate elapsed time for entire WU: 56284.859 s # PERFORMANCE: 157144 Natoms 11.257 ns/day 0.000 ms/step 0.000 us/step/atom 02:17:56 (7792): called boinc_finish http://www.gpugrid.net/result.php?resultid=15124495 3jw8R0-SDOERR_opm995-0-1-RND9612_2 11614181 30 May 2016 \| 8:49:32 UTC 31 May 2016 \| 0:50:29 UTC Completed and validated 55,859.07 55,499.59 956,403.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) # Time per step (avg over 10000000 steps): 5.578 ms # Approximate elapsed time for entire WU: 55780.416 s # PERFORMANCE: 79913 Natoms 5.578 ns/day 0.000 ms/step 0.000 us/step/atom 20:45:10 (7740): called boinc_finish http://www.gpugrid.net/result.php?resultid=15124201 With the 5f1c-SDOERR_opm995-0-1-RND8074_2, my windows 10 computer was able to achieve a 87% maximum GPU usage, while using 1950 MB of memory. While the 3jw8R0-SDOERR_opm995-0-1-RND9612_2, on the same computer, achieved 80% maximum GPU usage, while using 1100 MB of memory. I can't wait to get a few more of these!
	ID: 43678 \| Rating: 0 \| rate: / Reply Quote

_Ryle_ Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 363,138 Level Scientific publications	Message 43681 - Posted: 1 Jun 2016 \| 14:38:34 UTC
	Is it so, that when the new students arrive, that you would consider creating more short tasks? I think it is a pity, that you mostly cater to the very highend cards here. I'd like to continue supporting this project, but as it is I just can't afford to buy the faster cards. I do own a 970, and it is still a fast card. I would just hate to see it go over that 24H limit in the near future. I understand it is eventually inevitable, but it's barely a year old. Sadly, the highend cards also crunch the short units, when the long unit pool is dry, so they quickly eat up the short pool too. A WU tier would be nice however. I think it's been suggested somewhere else before, in these forums, that you could make a short, medium and long unit pool. That would be cool, so the small cards have the short pool, the cards a bit faster have the medium pool, and finally the highend can get into the top tier, long pool. Still, it was so in the past, that the short units also gave less points per day overall, even if same time is used on same card, but I don't know what the reason is for that. (Maybe the bonus isn't added to those?). Well, just my 2 cents worth of opinion :)
	ID: 43681 \| Rating: 0 \| rate: / Reply Quote

John C MacAlister Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level Scientific publications	Message 43683 - Posted: 2 Jun 2016 \| 10:40:59 UTC
	Agreed: pity there are so few shorts..... My 650 Tis are too slow and the 660Tis looking pretty slow compared to many others. I can't afford newer cards and now with electricity costing me 18 cents (Canadian) per kWh, my contribution to GPUGrid will be very low. :(
	ID: 43683 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43685 - Posted: 2 Jun 2016 \| 16:05:42 UTC Last modified: 2 Jun 2016 \| 16:19:22 UTC
	Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. GTX970 (2m59 WU) compute 6.45hr estimated runtime (15.480% per 1hr). 2m59 WU status: 11-14% CPU usage (3.2GHz) / 54% GPU usage (1511MHz) / 24% MCU (7200MHz) / 25% BUS (PCIe3.0 x4) / GPU temp 39C / 33% GPU power (108W) / 550MB memory usage (no display connected) Topology reports 27558 atoms 4344 waters in system Thank you Zoltan for sharing helpful tip (in previous OPM thread) on where to locate a WU's atom amount file.
	ID: 43685 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 485 Credit: 10,549,623,466 RAC: 15,250,943 Level Scientific publications	Message 43695 - Posted: 3 Jun 2016 \| 5:47:27 UTC Last modified: 3 Jun 2016 \| 5:48:04 UTC
	I had one of these WUs fail with this error message: upload failure: <file_xfer_error> <file_name>4mt6-SDOERR_opm994-0-1-RND0442_0_11</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> http://www.gpugrid.net/result.php?resultid=15127701 Has this happened to anyone else with these WUs? I remember this happened in the past, and there is a fix to this posted, in the threads somewhere, but I can't remember where. I think this WU would have been otherwise good.
	ID: 43695 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 43696 - Posted: 3 Jun 2016 \| 7:35:49 UTC - in response to Message 43695.
	I had one of these WUs fail with this error message: upload failure: <file_xfer_error> <file_name>4mt6-SDOERR_opm994-0-1-RND0442_0_11</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> http://www.gpugrid.net/result.php?resultid=15127701 Has this happened to anyone else with these WUs? I remember this happened in the past, and there is a fix to this posted, in the threads somewhere, but I can't remember where. I think this WU would have been otherwise good. See the WARNING/CHALLENGE: VERY LONG WU (VERYLONG_CXCL12_confAna) thread. It's embarrassing that we've run into this again.
	ID: 43696 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43697 - Posted: 3 Jun 2016 \| 8:02:15 UTC
	I've got 2d57-SDOERR_opm994-0-1-RND4399_1 running. The file description in client_state.xml is <file> <name>2d57-SDOERR_opm994-0-1-RND4399_1_11</name> <nbytes>0.000000</nbytes> <max_nbytes>5000000.000000</max_nbytes> <status>0</status> <upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url> </file> - so the maximum size allowed is 5,000,000 bytes. So far, it's reached 852 KB at about 80% progress - which sounds like plenty of headroom, and perhaps not a widespread problem. But I'll keep an eye on it as it approaches completion.
	ID: 43697 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43698 - Posted: 3 Jun 2016 \| 8:09:50 UTC
	I apologize for not answering in a while, I have been a bit busy with writing my thesis. Job replication 2 was my desperate attempt to get my results back faster while also competing with the mass of simulations sent out by Gerard and reducing a bit my failure rates. I hope you don't mind too much since they were only around 300 WUs. If they arrive on the same host of course it's quite pointless. On the subject of short runs, I am unfortunately unable to help you because the equilibration runs cannot be split into smaller chunks. But as Gianni mentioned we are getting new students soon so it is possible that they have something for short.
	ID: 43698 \| Rating: 0 \| rate: / Reply Quote

John C MacAlister Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level Scientific publications	Message 43699 - Posted: 3 Jun 2016 \| 10:08:34 UTC
	Hi, Stefan: Thank you for this- On the subject of short runs, I am unfortunately unable to help you because the equilibration runs cannot be split into smaller chunks. But as Gianni mentioned we are getting new students soon so it is possible that they have something for short.
	ID: 43699 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43700 - Posted: 3 Jun 2016 \| 10:46:15 UTC - in response to Message 43697.
	2d57-SDOERR_opm994-0-1-RND4399_1 uploaded cleanly, so it's not a universal problem. 4azpR0-SDOERR_opm995-0-1-RND6483_1 might get closer to the limit - I'll keep an eye on it.
	ID: 43700 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43703 - Posted: 3 Jun 2016 \| 17:18:48 UTC - in response to Message 43685.
	Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. GTX970 (2m59 WU) compute 6.45hr estimated runtime (15.480% per 1hr). 2m59 WU status: 11-14% CPU usage (3.2GHz) / 54% GPU usage (1511MHz) / 24% MCU (7200MHz) / 25% BUS (PCIe3.0 x4) / GPU temp 39C / 33% GPU power (108W) / 550MB memory usage (no display connected) Topology reports 27558 atoms 4344 waters in system Thank you Zoltan for sharing helpful tip (in previous OPM thread) on where to locate a WU's atom amount file. WUid=11616186 (1a0r OPM994) crashed my system multiple times - this WU had 100% GPU usage / 1% MCU / 20% power (65W) before the (first ever driver reset(s) I've encountered computing ACEMD in three years.) The (1a0r) WU ended with a -97 (0xffffffffffffff9f) Unknown error number after 102sec at reference stock clock once I noticed the first couple of driver recoveries OCed. (FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965) A few other stable wingman (980ti / (2) 970's) high-end RAC systems (6 total) have error(s) (<100sec) with (1a0r) WU. As of now (2) OPM995 are without issue on my 970's at very high OC's: (WUid=11614432) 4a6fRO (50479 atoms with 9411 waters in system) 20.25hr estimated runtime at 12-15% CPU usage (3.2GHz) / 63% GPU usage (1511MHz) / 31% MCU (7200MHz) / 27% BUS (PCIe3.0 x4) / 34% power (110W) / 42C core / 820MB memory usage (WUid=116143650 4u15RO (51270 atoms with 8255 waters in system) 20.5hr estimated runtime at 12-15% CPU usage (3.2GHz) / 65% GPU usage (1511MHz) / 34% MCU (7010MHz) / 22% BUS (PCIe3.0 x8) / 60% power (120W) / 45C core / 843MB memory usage
	ID: 43703 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43704 - Posted: 3 Jun 2016 \| 20:41:09 UTC
	1s4wR0-SDOERR_opm995-0-1-RND5214_0 11614436 3 Jun 2016 \| 6:47:02 UTC 3 Jun 2016 \| 20:01:33 UTC Completed and validated 45,293.51 20,015.48 147,829.50 Finally got an OPM on my Ubuntu 16.04 rig. Alas it didn't turn out to be an extra-long run and completed in 12h 35min at stock. Based on the run time of other long WU's the credit is about half what it should be. Was hoping to get an extra-long task and to finish inside 24h - c'est la vie... ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43704 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1617 Credit: 8,224,794,351 RAC: 16,571,330 Level Scientific publications	Message 43705 - Posted: 3 Jun 2016 \| 21:56:01 UTC - in response to Message 43700.
	4azpR0-SDOERR_opm995-0-1-RND6483_1 looks safe as well - 1,283 KB at 61%. # Topology reports 50432 atoms
	ID: 43705 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43706 - Posted: 3 Jun 2016 \| 22:20:36 UTC
	Too many errors (may have bug) 1a0r-SDOERR_opm994-0-1-RND9594 https://www.gpugrid.net/workunit.php?wuid=11616186
	ID: 43706 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43713 - Posted: 4 Jun 2016 \| 14:51:49 UTC - in response to Message 43703.
	(2) new OPM995 that should make the maximum size file_xfer allowed 5,000,000 bytes: 3nce WU#11614771 (126091 atoms with 25796 waters) status: 20hr estimated runtime at 12-16% CPU usage (3.2GHz) / 76% GPU usage (1511MHz) / 40% MCU (7200MHz) / 33% BUS (PCIe3.0 x4) / 40% power (130W) / 44C temp / 1559MB memory usage 2b6p WU#11614758 (129818 atoms with 23308 waters) status: 21hr estimated runtime at 12-16% CPU usage (3.2GHZ) / 75% GPU usage (1511MHz) / 45% MCU (7010MHz) / 24% BUS (PCIe3.0 x8) / 70% power (140W) / 47C temp / 1662MB memory usage Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. GTX970 (2m59 WU) compute 6.45hr estimated runtime (15.480% per 1hr). 2m59 WU status: 11-14% CPU usage (3.2GHz) / 54% GPU usage (1511MHz) / 24% MCU (7200MHz) / 25% BUS (PCIe3.0 x4) / GPU temp 39C / 33% GPU power (108W) / 550MB memory usage (no display connected) Topology reports 27558 atoms 4344 waters in system Thank you Zoltan for sharing helpful tip (in previous OPM thread) on where to locate a WU's atom amount file. WUid=11616186 (1a0r OPM994) crashed my system multiple times - this WU had 100% GPU usage / 1% MCU / 20% power (65W) before the (first ever driver reset(s) I've encountered computing ACEMD in three years.) The (1a0r) WU ended with a -97 (0xffffffffffffff9f) Unknown error number after 102sec at reference stock clock once I noticed the first couple of driver recoveries OCed. (FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965) A few other stable wingman (980ti / (2) 970's) high-end RAC systems (6 total) have error(s) (<100sec) with (1a0r) WU. Too many errors (may have bug) 1a0r-SDOERR_opm994-0-1-RND9594 As of now (2) OPM995 are without issue on my 970's at very high OC's: (WUid=11614432) 4a6fRO (50479 atoms with 9411 waters in system) 20.25hr estimated runtime at 12-15% CPU usage (3.2GHz) / 63% GPU usage (1511MHz) / 31% MCU (7200MHz) / 27% BUS (PCIe3.0 x4) / 34% power (110W) / 42C core / 820MB memory usage (WUid=116143650 4u15RO (51270 atoms with 8255 waters in system) 20.5hr estimated runtime at 12-15% CPU usage (3.2GHz) / 65% GPU usage (1511MHz) / 34% MCU (7010MHz) / 22% BUS (PCIe3.0 x8) / 60% power (120W) / 45C core / 843MB memory usage
	ID: 43713 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43724 - Posted: 5 Jun 2016 \| 14:38:34 UTC
	Any TX/980ti/980/970 (Present batch) SDOERR_opm99 grant 1,000,000 credit? My -+ (runtime) Credit: 23,912.30 GPU / 11,332.23 CPU / 41,296.50 credits (27588 atoms) / 5mil step 74,154.80 / 16,389.80 / 377,254.50 credits (126091 atoms) / 5mil step An odd short run 5mil step (~27k atoms) WU cropped up. 0 unsent 271 in progress 1155 success 47.62% error rate
	ID: 43724 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 43725 - Posted: 5 Jun 2016 \| 15:54:02 UTC - in response to Message 43724. Last modified: 5 Jun 2016 \| 15:57:25 UTC
	Any TX/980ti/980/970 (Present batch) SDOERR_opm99 grant 1,000,000 credit? 4by0-SDOERR_opm994-0-1-RND5591_1 58.472s (16h 14m 26s) 1.023.036 credits 170941 atoms 11.696 ns/day 5M steps This workunit is very interesting, as the initial replication was 2, the other host which received this workunit also received the +50% bonus, while it has returned it after 1d 14h.
	ID: 43725 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43727 - Posted: 5 Jun 2016 \| 20:11:34 UTC - in response to Message 43725.
	This workunit is very interesting, as the initial replication was 2, the other host which received this workunit also received the +50% bonus, while it has returned it after 1d 14h. AFAIK that's the way it's always worked here. The first reported WU sets the credit for everyone.
	ID: 43727 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43728 - Posted: 5 Jun 2016 \| 20:15:08 UTC - in response to Message 43704.
	Finally got an OPM on my Ubuntu 16.04 rig. Alas it didn't turn out to be an extra-long run and completed in 12h 35min at stock. Based on the run time of other long WU's the credit is about half what it should be. Had 4 OPMs finish today. The credit on all of them is 1/2 or less per hour compared to any other long WUs. Guess the credit wasn't fixed after all.
	ID: 43728 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43729 - Posted: 5 Jun 2016 \| 20:30:58 UTC - in response to Message 43728. Last modified: 5 Jun 2016 \| 20:36:10 UTC
	Got 2 real extra-long tasks on my Win10 system and one 'fake' extra-long task on my Linux system. The real extra-long tasks got 900K Boinc credits whereas the normal-long task only received 147K credits (or there about). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43729 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43730 - Posted: 5 Jun 2016 \| 20:33:19 UTC - in response to Message 43729. Last modified: 5 Jun 2016 \| 20:34:08 UTC
	I got 2 really long tasks on my Win10 system and one fake long task on my Linux system. The real long tasks got \|900K Boinc credits whereas the not-really-long task (normal-ling) only received 147K credits (or there about). Remedial math is a good post graduate course... ;-)
	ID: 43730 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43731 - Posted: 5 Jun 2016 \| 20:37:01 UTC - in response to Message 43730. Last modified: 5 Jun 2016 \| 20:46:09 UTC
	Just after correcting my remedial English :) PS. Looks like it's backup-project time again 🕒 ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43731 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43733 - Posted: 5 Jun 2016 \| 20:59:45 UTC - in response to Message 43731.
	Just after correcting my remedial English :) PS. Looks like it's backup-project time again 🕒 I don't think it's you that needs the remedial math, and yep it's that time again.
	ID: 43733 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43745 - Posted: 7 Jun 2016 \| 10:04:03 UTC
	Really, I am out of ideas on how to fix the credits any further. I did everything I could imagine being wrong. I could blindly multiply the credits by whatever factor you guys tell me, but right now I have to base it off our usual credit calculation script.
	ID: 43745 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43749 - Posted: 8 Jun 2016 \| 18:36:19 UTC - in response to Message 43745.
	Really, I am out of ideas on how to fix the credits any further. I did everything I could imagine being wrong. I could blindly multiply the credits by whatever factor you guys tell me, but right now I have to base it off our usual credit calculation script. Recent comparisons, OPM vs. CXCL12VOLK. Example from one of my machines: 1gzmR0-SDOERR_opm995-0-1-RND1802_0 11614349 3 Jun 2016 \| 6:36:43 UTC 5 Jun 2016 \| 9:02:02 UTC Completed and validated 162,200.44 47,231.66 237,804.00 e6s24_e1s9p0f524-GERARD_CXCL12VOLK_15782120_2-0-1-RND1978_0 11613059 28 May 2016 \| 21:23:31 UTC 30 May 2016 \| 4:49:33 UTC Completed and validated 96,473.03 31,352.66 233,875.00 Here's another one of my computers. This WU had 131548 Natoms: 2w61-SDOERR_opm994-0-1-RND7728_0 11616211 2 Jun 2016 \| 17:30:20 UTC 5 Jun 2016 \| 18:09:42 UTC Completed and validated 243,192.27 35,757.08 262,409.00 e4s9_e1s18p0f473-GERARD_CXCL12VOLK_15782120_2-0-1-RND7513_1 11609049 1 Jun 2016 \| 14:32:18 UTC 2 Jun 2016 \| 22:47:34 UTC Completed and validated 98,520.08 29,575.68 233,875.00 From the OPM WUs I've been running lately it seems that the credit is about 45% - 60% per hour compared to other/previous long WUs. On top of that there is a greater chance of failure with these long WUs. I would suggest erroring on the high side rather than the low side when estimating credit as it costs you nothing and it's one of the few tokens of appreciation that we receive for our small contribution to the great science that you guys are doing. Whining aside, keep up the excellent work. For a lot of us this is a small way that we can contribute to science.
	ID: 43749 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43777 - Posted: 14 Jun 2016 \| 12:29:45 UTC Last modified: 14 Jun 2016 \| 13:59:24 UTC
	I thought you guys might appreciate seeing what can go wrong in a simulation ;) I always love these mistakes. Still, only 1 out of 600+ systems managed to break like this so I'm quite impressed. http://imgur.com/qcvaMyq Essentially because of some water between the protein and the lower membrane layer (whose upper side is hydrophobic, hence hates water), the membrane starts bending and when it bends it suddenly interacts with the periodic image* of the protein and decides that it likes it more than staying with the other membrane layer. And then it goes pop :D * MD simulations are typically done using periodic interactions
	ID: 43777 \| Rating: 0 \| rate: / Reply Quote

Vagelis Giannadakis Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level Scientific publications	Message 43778 - Posted: 14 Jun 2016 \| 14:25:28 UTC - in response to Message 43777.
	Cool animation, Stefan! Thanks for sharing! :) ____________
	ID: 43778 \| Rating: 0 \| rate: / Reply Quote