Message boards : News : More CPU jobs
Author | Message |
---|---|
...with a new and improved application (Linux only). The current version should eliminate dependencies on gcc and devel libraries. | |
ID: 49769 | Rating: 0 | rate: / Reply Quote | |
By the way, the new app downloads updated libraries. Feel free to reset the project to free up disk space taken by the old ones. | |
ID: 49770 | Rating: 0 | rate: / Reply Quote | |
why are these CPU jobs for Linux only, and not for Windows, too? | |
ID: 49771 | Rating: 0 | rate: / Reply Quote | |
Because they can make the app work under Linux but are not successful yet in creating a Windows app that works. | |
ID: 49773 | Rating: 0 | rate: / Reply Quote | |
Because they can make the app work under Linux but are not successful yet in creating a Windows app that works. hm, this makes we wonder why it is so much more difficult to create an app for Windows than for Linux ... further, an easy way to solve this would be to have the Linux app run in a Virtual Machine (like, for example, LHC is doing it for some of it's sub-projects). | |
ID: 49774 | Rating: 0 | rate: / Reply Quote | |
Making boinc apps is like building a ship in a bottle, in the sense that your tools are very limited and you don't control the environment. In the case of windows the bottle is dark. ;) | |
ID: 49776 | Rating: 0 | rate: / Reply Quote | |
Erich56 said: further, an easy way to solve this would be to have the Linux app run in a Virtual Machine Erich56, if you want to run the Linux app in a Virtual Machine, you can create your own virtual machine, install Linux and BOINC, then run the QC tasks from there. That is what I have done on my Windows machines and it works fine. | |
ID: 49780 | Rating: 0 | rate: / Reply Quote | |
I have been running CERN LHC@home Virtual Machines for more than ten years, and I have been rewarded with a CERN Polo Shirt. But yes, they do present some problems. Now your CPU tasks seem to run fine on my old SUN Workstation with SuSE Leap 42.3 Linux. | |
ID: 49781 | Rating: 0 | rate: / Reply Quote | |
That is what I have done on my Windows machines and it works fine. +1 Virtual box on my Win10. But i think it's not the best solution for performance.... | |
ID: 49783 | Rating: 0 | rate: / Reply Quote | |
As far as I know virtualization is almost native speed these days, especially for computing. | |
ID: 49784 | Rating: 0 | rate: / Reply Quote | |
The recent batch of CPU WUs seems to be done. Will there be more soon? | |
ID: 49785 | Rating: 0 | rate: / Reply Quote | |
Yes, I am making some now. I'll try to submit new ones today | |
ID: 49786 | Rating: 0 | rate: / Reply Quote | |
Sorry, sorry, sorry I messed up due to a small mistake. Had to nuke the WUs. Redoing them now. | |
ID: 49787 | Rating: 0 | rate: / Reply Quote | |
No issue at all. I'm glad the team communicates openly. | |
ID: 49788 | Rating: 0 | rate: / Reply Quote | |
All of my Stefan CPU WUs are stuck at 10% and I aborted them after about 4 hours. This is the machine (16.04 LTS) that has never had any issues with pretty much any of the WUs. | |
ID: 49789 | Rating: 0 | rate: / Reply Quote | |
Holy cow the website is SOO SLOW. I had to use a proxy in Sweden to just get anything to load. I can't even get tasks even though the site says there are plenty. | |
ID: 49790 | Rating: 0 | rate: / Reply Quote | |
GPUGRID is taking 3.34 GB of disk space on my main Linux host, 3.90 on a Linux laptop. On the same laptop LHC@home is taking 5.75 GB. | |
ID: 49791 | Rating: 0 | rate: / Reply Quote | |
Yes this is the stuff I resent to the beta queue I guess. They are much larger molecules so they were crashing on the QM queue cause they ran out of scratch space. I have seen them use up to 18GB scratch space so at the moment I don't know yet how to run these on GPUGRID as it seems to be an issue with many users. | |
ID: 49792 | Rating: 0 | rate: / Reply Quote | |
http://gpugrid.net/results.php?hostid=470907 | |
ID: 49793 | Rating: 0 | rate: / Reply Quote | |
Must we update conda? | |
ID: 49794 | Rating: 0 | rate: / Reply Quote | |
They are much larger molecules so they were crashing on the QM queue cause they ran out of scratch space. I have seen them use up to 18GB scratch space so at the moment I don't know yet how to run these on GPUGRID as it seems to be an issue with many users. The most recent Betas have worked OK for me. But I have 32 GB memory, which may help. http://www.gpugrid.net/results.php?hostid=334241&offset=0&show_names=0&state=0&appid=35 You could set up a special sub-project for the large molecules if you want to. | |
ID: 49795 | Rating: 0 | rate: / Reply Quote | |
We are using QC beta to test large molcules and how much disk space they take. I think they can (temporarily of course) go up to 20 GB of space (!). I am not sure about RAM - they should be < 4 GB. | |
ID: 49796 | Rating: 0 | rate: / Reply Quote | |
We are using QC beta to test large molcules and how much disk space they take. I think they can (temporarily of course) go up to 20 GB of space (!). I am not sure about RAM - they should be < 4 GB. Well... So my Threadripper could need up to 160GB of space?! It has just 32GB... | |
ID: 49797 | Rating: 0 | rate: / Reply Quote | |
We are talking about DISK space. Only a few WUs will be that big - unless we make a "big" queue. T [/quote] | |
ID: 49798 | Rating: 0 | rate: / Reply Quote | |
As far as I know virtualization is almost native speed these days, especially for computing. Yes, if you are using "hard" virtualization like Esx and Hyper-v. "Soft" virtualization like VirtualBox or VmPlayer may suffer bottlenecks | |
ID: 49799 | Rating: 0 | rate: / Reply Quote | |
We are using QC beta to test large molcules and how much disk space they take. Toni, if I may ask you, what molecule size are we (roughly) talking about? As you know, because of my son I have personal interest in HCF1 research, and I would like to get a feel how far science is still away from handling that large molecules. Thanks in advance and my apologies for coming up with my personal issues once in a while. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49800 | Rating: 0 | rate: / Reply Quote | |
I have a degree in Theoretical physics obtained in 1967, but that was related to elementary particle physics. Then in the Nineties, while at Trieste Area Science Park as manager of a UNIX BULL Laboratory I attended a few lectures in the UN Center for Genetic Engineering and Biotechnology on the Density Functional Theory. Since retirement, I have run a few BOINC projects including one on Monte Carlo Method applied to Quantum Chemistry but it no longer exists. This is the first time I am running a project which uses Neural Networks. | |
ID: 49801 | Rating: 0 | rate: / Reply Quote | |
[/quote] I'm also talking about disk space. I'm using 32GB Optane module as boot drive. It's time to change it for something bigger. | |
ID: 49802 | Rating: 0 | rate: / Reply Quote | |
@kain: the disk space is used in the directory BOINC is running. Usually (if you use the distribution installers) it is the in the disk used at the root of the file system, indeed. | |
ID: 49804 | Rating: 0 | rate: / Reply Quote | |
I had to enlarge the root partition to accommodate the QC beta 3.31 since boinc from the Fedora distro by default installs in /var/lib/boinc and runs as a daemon under systemctl. After that, the WU's seemed to run fine but they sure ate up a lot of RAM. Both my 8 core's have 16 GB RAM and I was running them 2 concurrent with 4 cores each. I think only two errored out and the rest completed and validated. Guess I'll have to max my 8 core machines with 32 GB RAM to run the bigger molecules. | |
ID: 49805 | Rating: 0 | rate: / Reply Quote | |
Are you able to set the disk limit in the boinc preferences to prevent too many WUs from running? | |
ID: 49806 | Rating: 0 | rate: / Reply Quote | |
Easier and more precise simply to set max_concurrent in app_config.xml | |
ID: 49807 | Rating: 0 | rate: / Reply Quote | |
Great to know, thanks. Actually I was also wondering if boinc respects the disk limits . | |
ID: 49808 | Rating: 0 | rate: / Reply Quote | |
Worth it to perform the experiment, certainly. Possibly depends whether it respects the declared space needed (<rsc_disk_bound>), or the actual space used. If the latter, there might be a problem if the actual usage increases gradually during the run - BOINC might only check it when deciding whether to start a(nother) new task. Lots of fun to be had with those possibilities... | |
ID: 49809 | Rating: 0 | rate: / Reply Quote | |
@Toni, Yes boinc does appear to respect the client disk settings as it lets one know if disk space is too low to run certain projects in the event log. I usually set a high arbitrary GB size but the client appears to react to the real amount available in the execution partition and uses the percentage limits to notify user when disk space is too low. I had to readjust the percent limits higher (in the client settings) a few days ago to run the 3.30 app on one of my machines. Probably due to the project directory getting too full. I hate to reset the project and loose WU's but I suppose I will have to eventually. | |
ID: 49810 | Rating: 0 | rate: / Reply Quote | |
We are using QC beta to test large molcules and how much disk space they take. I get it. Thank you so much for your help. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49811 | Rating: 0 | rate: / Reply Quote | |
The molecules for QM are max 50 atoms or so. The size is however not very indicative. This is a specific "chemistry-oriented" type of calculations. | |
ID: 49812 | Rating: 0 | rate: / Reply Quote | |
Thank you VERY much for that line, I really appreciate that. I was already afraid of being a constant bother. Of course I understand that we are still years or even decades away from handling huge proteins like HCF1 and I don't want to be obtrusive. Having said that, I would like to keep sight of those long term targets. Thanks again... if I may, I will get back to you with this question in a couple of years. But I am glad that Gpugrid and its team is more than just being "exclusively academic". There actually is a vision of the future we can believe in. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49813 | Rating: 0 | rate: / Reply Quote | |
On an EDX online course on quantum computers which I followed recently there was a professor at Dartmouth University who uses a quantum computer to do quantum chemistry calculations. | |
ID: 49814 | Rating: 0 | rate: / Reply Quote | |
Hey JoergF, while we are not doing proteins with QM yet, (some other groups are trying to do that with networks), what we are calculating is directly related to drug design so I think it is very relevant. | |
ID: 49820 | Rating: 0 | rate: / Reply Quote | |
Thank you very much. Which kind of contribution will help you most in order to make progress on proteins (in the long run of course)? Because I am just considering whether to buy an additional GPU or CPU this autumn. | |
ID: 49822 | Rating: 0 | rate: / Reply Quote | |
We as a group are not really focusing on applying QM to proteins. The problem being double: | |
ID: 49823 | Rating: 0 | rate: / Reply Quote | |
Thank you... no problem. So we just keep on crunching on all sides and see where the road leads us to. :) | |
ID: 49824 | Rating: 0 | rate: / Reply Quote | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid. I'm thinking of leaving it crunching these instead of Rosetta (the rest of my PCs all run Windows...). Hope it helps! | |
ID: 49825 | Rating: 0 | rate: / Reply Quote | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid. I'm thinking of leaving it crunching these instead of Rosetta (the rest of my PCs all run Windows...). Hope it helps! Wow! That's a lot of compute! | |
ID: 49830 | Rating: 0 | rate: / Reply Quote | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid. I'm thinking of leaving it crunching these instead of Rosetta (the rest of my PCs all run Windows...). Hope it helps! Epyc with 48 threads ... I go green with envy :-)) ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49831 | Rating: 0 | rate: / Reply Quote | |
48 QM WUs are 192 (CPU) threads. I need 4 computers to reach that. | |
ID: 49834 | Rating: 0 | rate: / Reply Quote | |
48 QM WUs are 192 (CPU) threads. I need 4 computers to reach that. I just realized from your comment that it actually crunches 12 WUs at a time (I just saw all 48 threads running @ 100% immediately thinking it was running 48 WUs just like Rosetta) I am not a smart man. ____________ | |
ID: 49839 | Rating: 0 | rate: / Reply Quote | |
Well i run QC on a 2core pc just for fun :-) | |
ID: 49840 | Rating: 0 | rate: / Reply Quote | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid.To everybody using hyper-threaded CPUs for crunching: You should test how well the given app scales with HT on or off on your system. The other approach is leave HT on, but lower the percentage of the usable CPUs in BOINC manager (down to 50%). Too many simultaneous memory intensive apps would cause too many cache misses, resulting in degraded combined performance. With HT off (or by setting the usable CPUs to 50%) calculation time should be halved (due that two threads have one FPU). If it's more than a half, then the number of usable CPUs could be increased, while the RAC has risen accordingly (= in a direct ratio). I can't test it myself until the Windows app has been released, but I'm interested. A simultaneous GPU task also could degrade the performance of the CPU tasks and vice versa. | |
ID: 49842 | Rating: 0 | rate: / Reply Quote | |
Most tasks benefit from HT but I only recall one doing better overall with HT off on my 2670v1s. | |
ID: 49843 | Rating: 0 | rate: / Reply Quote | |
I have a related question I cannot answer myself. | |
ID: 49856 | Rating: 0 | rate: / Reply Quote | |
2) Can I limit the number of cores used by QC? Use an "app_config.xml" file to limit the number of cores per work unit, and also the number of QC work units running if you wish. http://www.gpugrid.net/forum_thread.php?id=4748&nowrap=true#49369 I have found that QC is tough on resources too. Even though I reserved a CPU core to support a GTX 1070 on Folding, running QC still caused a drop in Folding points, showing that the GPU was being starved for CPU support. To fix that, I now run only six cores of my i7-4770 on CPU work, and leave two cores to support the GPU. But even that was not enough, so I run 4 cores on QC (two work units running two cores each) with the other two on LHC/native ATLAS. That frees up enough CPU resources so that I see only a minimal drop in Folding points. | |
ID: 49857 | Rating: 0 | rate: / Reply Quote | |
Use an "app_config.xml" file to limit the number of cores per work unit, and also the number of QC work units running if you wish. Thanks Jim1348, I just tried, but without improvement. Seems to be connected to the algorithm. | |
ID: 49858 | Rating: 0 | rate: / Reply Quote | |
Use an "app_config.xml" file to limit the number of cores per work unit, and also the number of QC work units running if you wish. You must tell BOINC to reread configs to pick up the changes. Tasks already downloaded will still say 4c even. Only new ones will say 1c or 2c but all will run at your new setting. Its a BOINC thing to sometimes squeeze in more tasks than cores. I've seen it happen on my 3570k when a single threaded task completes and a 4c tasks starts it will show more running for a but but it eventually corrects. | |
ID: 49859 | Rating: 0 | rate: / Reply Quote | |
You must tell BOINC to reread configs to pick up the changes. Tasks already downloaded will still say 4c even. Only new ones will say 1c or 2c but all will run at your new setting. I did, but it still required a reboot. Tasks that were 4 core previously appeared as x core after a reboot and were crunched as such as well. Credit might take a hit, but I didn't mind for this test. Its a BOINC thing to sometimes squeeze in more tasks than cores. I've seen it happen on my 3570k when a single threaded task completes and a 4c tasks starts it will show more running for a but but it eventually corrects. I thought that was the issue, but it wasn't. I even suspended one/several/all QC task, but if/when BOINC could start another task it always did, despite CPU% >400%. CPU% stayed >700% (i.e. 7 tasks running) for an hour plus. It is working now. | |
ID: 49860 | Rating: 0 | rate: / Reply Quote | |
Looks like the CPU WU Queue is almost running dry | |
ID: 49968 | Rating: 0 | rate: / Reply Quote | |
Holidays...mumble...something...something...holidays :D hahah. I restocked them now. From Monday I'll be back working so I'll take more care of my WUs | |
ID: 49969 | Rating: 0 | rate: / Reply Quote | |
I downloaded 4 QC tasks on my Windows 10 PC and of course they failed. But why the server sends me QC tasks on a Windows PC? | |
ID: 50017 | Rating: 0 | rate: / Reply Quote | |
I downloaded 4 QC tasks on my Windows 10 PC and of course they failed. But why the server sends me QC tasks on a Windows PC? same is true for GPU tasks - one can download them on a Windows OS, and they fail after a few seconds. | |
ID: 50019 | Rating: 0 | rate: / Reply Quote | |
Just a notice to Stefan, only a few days left of CPU WUs in the queue. | |
ID: 50273 | Rating: 0 | rate: / Reply Quote | |
Thanks, I noticed :) I'm in the process of creating new WUs but the issue is that they are more demanding than the last ones so we are trying to figure out ways to make them use less disk at the cost of more computation time because the largest one used 50GB of scratch space to calculate. | |
ID: 50276 | Rating: 0 | rate: / Reply Quote | |
My HP Linux laptop running SuSE Leap 15.0 after Leap 42.3 (any relationship to SLES 15.0 ?) has 752.37 GB available to BOINC. Instead my older SUN WS running SuSE Leap 42.3 has at most 30 GB available to BOINC 7.8.3 of a 1 TB disk. | |
ID: 50277 | Rating: 0 | rate: / Reply Quote | |
So the ones I am sending out now should use maximum around 6GB scratch space on /tmp/. If you hit any problems feel free to report here. | |
ID: 50278 | Rating: 0 | rate: / Reply Quote | |
Minor note: barring changes I am unaware of, the scratch space used during the run is in the slot directory. (/tmp is limited on many systems) | |
ID: 50280 | Rating: 0 | rate: / Reply Quote | |
I have a QC task running on my Linux laptop. It is at 73% after 9;07;27 hours. But its slot is empty. | |
ID: 50281 | Rating: 0 | rate: / Reply Quote | |
Two QC tasks failed on my main Linux box which has a 30 GB limit to BOINC 7.8.3 with the same message DISK USAGE LIMIT EXCEEDED. GPU task running fine on its GTX 750 Ti at 61 C. | |
ID: 50282 | Rating: 0 | rate: / Reply Quote | |
OK I have a feeling we hit a file-size limit of BOINC and not of the drives. I'll chat it up with Toni and see what we can do. | |
ID: 50283 | Rating: 0 | rate: / Reply Quote | |
OK I have a feeling we hit a file-size limit of BOINC and not of the drives. I'll chat it up with Toni and see what we can do. Every workunit sent out by a BOINC server has an associated value <rsc_disk_bound> (in bytes). That value - set by the project - has to be large enough to accommodate all anticipated disk usage. If you use more than you've declared in advance, 'DISK USAGE LIMIT EXCEEDED' is exactly the error message you'd expect. | |
ID: 50285 | Rating: 0 | rate: / Reply Quote | |
Completed and validated two QC task on my main Linux host. A GPU task is running on it at 1202 MHz clock,5400 MHz memory transfer, temperature 61 C on its GTX 750 Ti, driver 384.111. | |
ID: 50292 | Rating: 0 | rate: / Reply Quote | |
Completed and validated two QC task on my main Linux host. It's really too bad that QC is not available for Windows :-( | |
ID: 50293 | Rating: 0 | rate: / Reply Quote | |
just get Linux installed. | |
ID: 50294 | Rating: 0 | rate: / Reply Quote | |
Two more QC tasks completed, two ready to start. Thanks. | |
ID: 50296 | Rating: 0 | rate: / Reply Quote | |
CPU usage reaches 197% on my old Opteron 1210 with 2 cores, 145% when a GPU task is also running. RAM is 8 GB. | |
ID: 50301 | Rating: 0 | rate: / Reply Quote | |
Sorry, but these are still the same old QC jobs. Thanks for the reports but it should not have changed much. I had to put some more of the old ones on queue while we fix the app space configuration so that I can send the new "SELE*" workunits. | |
ID: 50302 | Rating: 0 | rate: / Reply Quote | |
I canceled remaining QMML50_2 jobs because I found out that some of them might be duplicates of already calculated WUs since there was a minor issue when retrieving them which left some behind. I am redoing the calculation of missing WUs now to make sure the ones I send out are correct. It might take me a day so please be patient. | |
ID: 50306 | Rating: 0 | rate: / Reply Quote | |
SELE2 WUs are being sent out now. Toni increased the allowed space of the app to 30GB. From my tests the WUs should not use more than 6GB space each (the largest molecule). If you run many in parallel you might hit the limit though? I'm not certain about that. | |
ID: 50307 | Rating: 0 | rate: / Reply Quote | |
CPU usage reaches 197% on my old Opteron 1210 with 2 cores, 145% when a GPU task is also running. RAM is 8 GB. Wouldn't it be more energy efficient to run a newer CPU? It's 100W for 2 Cores @ 1Ghz. Unless your electricity is free :D ____________ | |
ID: 50308 | Rating: 0 | rate: / Reply Quote | |
new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever. | |
ID: 50309 | Rating: 0 | rate: / Reply Quote | |
I see 89 successes and 17 errors. Seems ok for a start. I'll look into the errors but they don't seem to be broken as a whole. | |
ID: 50310 | Rating: 0 | rate: / Reply Quote | |
Actually 14 out of the total 17 failures are on your machines Thomas so it might be specific to your case. Generally they seem ok. | |
ID: 50311 | Rating: 0 | rate: / Reply Quote | |
It's running at 1.8 GHz and I have a 1220 Opteron in my drawers at 2.8 GHz. It's been running since January 2008.My electricity costs me 0.21 euro /kWh and I have 3 computers running 24/7, this Opteron, an AMD E-450 and a A10-6700 which should have 4 cores but Windows Task Manager says 2 cores and 4 logical processors. My total electricity expenditure is about 60 euro/month. | |
ID: 50312 | Rating: 0 | rate: / Reply Quote | |
I have an Intel 8 core (16 thread) Xeon server that has a 146 GB disk drive (has 2 of them but one died). It also has 24GB RAM. | |
ID: 50314 | Rating: 0 | rate: / Reply Quote | |
@Conan: the QM calculations need to store lots of data in memory for best performance. Since we cannot ask for 20GB of RAM the software instead writes any amount of calculation data that exceeds the RAM limit (4GB) to the hard drive. | |
ID: 50316 | Rating: 0 | rate: / Reply Quote | |
The current "disk limit" for CPU jobs is set at 20 GB. This is a ballpark estimate to accommodate both the software and libraries (largish by themselves) and the temporary (scratch) data. | |
ID: 50317 | Rating: 0 | rate: / Reply Quote | |
new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever. On your failures I see "connection errors". Could be firewall filtering, or the like. | |
ID: 50318 | Rating: 0 | rate: / Reply Quote | |
First SELE task done by my Old Faithful Opteron 1210 running SuSE Linux Leap 42.3. | |
ID: 50323 | Rating: 0 | rate: / Reply Quote | |
I have a funny SELE task on my Linux laptop. It is stuck at 10% after 14 hours 38 min, but the remaining estimated time is rising to more than 5 days. All seems normal by the "top" command and it has lots of disk space. | |
ID: 50335 | Rating: 0 | rate: / Reply Quote | |
new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever. No firewall here. And same problem. | |
ID: 50336 | Rating: 0 | rate: / Reply Quote | |
As said, those WUs do not work properly. I am away for another project and come back, if they are fixed. | |
ID: 50337 | Rating: 0 | rate: / Reply Quote | |
OK, thanks Toni and Stefan for the information, that explains a lot. | |
ID: 50343 | Rating: 0 | rate: / Reply Quote | |
In the slot of a running task there is an output directory which leads to a report of what the program is doing in physical terms. Maybe some explanation by the admins would be welcome. | |
ID: 50347 | Rating: 0 | rate: / Reply Quote | |
We investigated another algorithm which doesn't use scratch disk space. Unfortunately on my test it was 13x slower than the one that uses disk (25 minutes became 5:30 hours). | |
ID: 50353 | Rating: 0 | rate: / Reply Quote | |
I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing. | |
ID: 50354 | Rating: 0 | rate: / Reply Quote | |
I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing. The 10% progress is explained as follows: updating (if necessary) the app is 10%, and usually happens immediately. The remaining 90% advances when molecules are calculated (e.g. 5 molecules = 90%/5 increments). However very big WUs have only one molecule, so no apparent progress until the end. (We have no finer grain progress). | |
ID: 50355 | Rating: 0 | rate: / Reply Quote | |
I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing. So how much space do these WUs need? I'm running 12 at a time with 64 GB of RAM, but no swap space. I see that not all 48 threads are at 100%, I'm thinking it's the lack of swap. ____________ | |
ID: 50389 | Rating: 0 | rate: / Reply Quote | |
In the old UNIX days a rule of thumb was that you needed a swap space twice the RAM, which was usually small.Now RAM is plenty. I got 22 GB RAM on the Windows 10 PC, and 8 GB RAM on each Linux box. GGPUGRID CPU tasks use some swap but most is not used. | |
ID: 50390 | Rating: 0 | rate: / Reply Quote | |
I amped the swap to 300GB, but it only seems to be using RAM. Is this "scratch space" used in swap space or does the WU use the file directory for storage? I'm thinking it is the latter since the BOINC space usage goes up and down. | |
ID: 50391 | Rating: 0 | rate: / Reply Quote | |
I see temporary files in the slots/0 directory They are named psi.25019.number | |
ID: 50392 | Rating: 0 | rate: / Reply Quote | |
Yes afaik it doesn't use swap space, so increasing that will not help. It's probably where Tullio mentioned. The files are called `psi.XXXXX.XX`. Usually there are two and the second can grow significantly. | |
ID: 50393 | Rating: 0 | rate: / Reply Quote | |
Stefan, I see 4 plus one which says psi.30091.clean | |
ID: 50394 | Rating: 0 | rate: / Reply Quote | |
Im fixing an issue with SELE2 so I cancelled them and will send out SELE3 in a bit | |
ID: 50397 | Rating: 0 | rate: / Reply Quote | |
Im fixing an issue with SELE2 so I cancelled them and will send out SELE3 in a bit Is this related to the 'upload failure - file size too big' problem reported for SELE2 last week? Whether or not, please double-check the <max_nbytes> value for the new batch. | |
ID: 50398 | Rating: 0 | rate: / Reply Quote | |
I had to add WCG to this 48-thread beast because it isn't using all of the threads @ 100% when running GPUGRID only. I'd wager it's because of scratch space bottleneck (it's running a SSD tho, 200 MB/s according to hdparm)... ? | |
ID: 50399 | Rating: 0 | rate: / Reply Quote | |
No, the issue was with an old version of psi4 giving wrong results on large molecules when using the scratch space. This is fixed in the latest version now. | |
ID: 50400 | Rating: 0 | rate: / Reply Quote | |
No, the issue was with an old version of psi4 giving wrong results on large molecules when using the scratch space. This is fixed in the latest version now. I'll set WCG to don't allow new work and I'll report back! ____________ | |
ID: 50405 | Rating: 0 | rate: / Reply Quote | |
I am running 3.31 SELE6. | |
ID: 50406 | Rating: 0 | rate: / Reply Quote | |
Something is wrong. The BOINC Manager says it is running but python does not appear in the "top" console. | |
ID: 50408 | Rating: 0 | rate: / Reply Quote | |
I just had about 20 of these fly through before they corrected and started to run correctly. <core_client_version>7.8.3</core_client_version> | |
ID: 50409 | Rating: 0 | rate: / Reply Quote | |
Yes we had to do some testing with SELE3-5. SELE6 ought to work fine though. 1741/88 success/fail ratio | |
ID: 50413 | Rating: 0 | rate: / Reply Quote | |
Things seem rather stable for SELE6. For further discussion let's please go to the multicore forum. | |
ID: 50416 | Rating: 0 | rate: / Reply Quote | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid.To everybody using hyper-threaded CPUs for crunching: Zoltan, I think you have a great point here. I am noticing much higher CPU utilization and half of the RAM usage when I switched to 50% CPU in BOINC on these new QC WUs. I think it's mostly to due to the much lower Hard drive bandwidth required and perhaps also the cache on the CPU is more efficiently allocated. | |
ID: 50428 | Rating: 0 | rate: / Reply Quote | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid.To everybody using hyper-threaded CPUs for crunching: Yup, I added Rosetta and WCG to the mix and the few GPUGRID WU run constantly @ 400% ____________ | |
ID: 50429 | Rating: 0 | rate: / Reply Quote | |
Do you have any tips for getting higher utilization out of these new large molecule QC WUs? I am already running 4 WUs on a 16 core system which is 50% usage in BOINC but the utilization is all over the place. It's using up to 23gb of ram (I have 32gb) with only 4 WUs and I have plenty of space on the SSD. | |
ID: 50435 | Rating: 0 | rate: / Reply Quote | |
CPU tasks - unsent: 44,723; in progress: 848; users in last 24hrs: 76 Quantum Chemistry unsent: 13,191 in progress: 866 Looks like we are cutting that number down to size quickly... ____________ | |
ID: 50488 | Rating: 0 | rate: / Reply Quote | |
QC WUs are almost out, less than 600 to send out. | |
ID: 50518 | Rating: 0 | rate: / Reply Quote | |
They may be waiting until the 3.31 jobs finish before introducing the new 3.32 version. I expect they have plenty more. | |
ID: 50519 | Rating: 0 | rate: / Reply Quote | |
Hopefully. We are officially out of cpu work. | |
ID: 50521 | Rating: 0 | rate: / Reply Quote | |
I am running two resends. One of them failed with "file too big" error. The other is running. | |
ID: 50528 | Rating: 0 | rate: / Reply Quote | |
I submitted some WUs but I am warning you :P This batch will use lots of scratch space. | |
ID: 50529 | Rating: 0 | rate: / Reply Quote | |
This batch will use lots of scratch space. I am set up to run four work units at a time. How much will that need? I can change it as necessary; it is "only" a 120 GB SSD, with maybe 80 GB free at the moment. | |
ID: 50530 | Rating: 0 | rate: / Reply Quote | |
I think the largest one took 50GB of scratch space. But they should scale linearly (they are not all that big) so it's practically up to chance if you will be able to run them all in parallel depending on if you get some of the smaller ones or the larger ones. | |
ID: 50531 | Rating: 0 | rate: / Reply Quote | |
OK, the 250 GB SSDs are a good buy at the moment in the U.S. | |
ID: 50532 | Rating: 0 | rate: / Reply Quote | |
I submitted some WUs but I am warning you :P This batch will use lots of scratch space. Most errors I get from these work units have this </stderr_txt> ____________ | |
ID: 50538 | Rating: 0 | rate: / Reply Quote | |
Two have completed on my Linux box. | |
ID: 50544 | Rating: 0 | rate: / Reply Quote | |
One more task gained 1848.00 credits. | |
ID: 50548 | Rating: 0 | rate: / Reply Quote | |
One more task gained 1848.00 credits. 9 1/2 hours? that's longer than most Long Run GPUs tasks ;) | |
ID: 50549 | Rating: 0 | rate: / Reply Quote | |
Again,1526.20 credits. | |
ID: 50553 | Rating: 0 | rate: / Reply Quote | |
Again,1526.20 credits. Starting to see some of the longer run work units Run time - CPU time - Credit 3,812.94 - 11,088.78 - 2,307.16 4,054.73 - 11,758.62 - 2,637.67 ____________ | |
ID: 50565 | Rating: 0 | rate: / Reply Quote | |
Longer runs don't seem to be affected by the DISK_LIMIT_EXCEEDED error which happens in some shorter runs.My later long run gave me 1151 credits. | |
ID: 50567 | Rating: 0 | rate: / Reply Quote | |
I am submitting now some more of the faster QMML50_3 workunits. These should be quite quick and have a higher priority than the SELE6 so you might be getting these for a while now. | |
ID: 50569 | Rating: 0 | rate: / Reply Quote | |
Run time 3,241.13 | |
ID: 50570 | Rating: 0 | rate: / Reply Quote | |
Run time 3,241.13 Hm, a marked drop in the credit, compared to what Zalster got (see a few postings above):
| |
ID: 50571 | Rating: 0 | rate: / Reply Quote | |
Yes, but his CPU time is much higher, probably because of the core number he has. I have only two. | |
ID: 50572 | Rating: 0 | rate: / Reply Quote | |
Yes, but his CPU time is much higher, probably because of the core number he has. I have only two. Couple of things I've noticed. The computer that are getting the higher time/credit only has 12 threads. My 10core/20thread is still getting the shorter, quicker work units. Not sure why. Also, most of the long run are resends that erred out on other computers. Maybe it was the disk space as the issue, don't know. Just thought I would point that out as well. | |
ID: 50573 | Rating: 0 | rate: / Reply Quote | |
My Linux HP laptop cannot get GPUGRID tasks because it has only 24.90 GB available and the server says it needs 32. So I am running SETI@home on it, which does not require that much space. | |
ID: 50574 | Rating: 0 | rate: / Reply Quote | |
I have no clue how the BOINC scheduler works but if it works as I hope it works you should be getting only the QMML50 workunits for a while now. Maybe some SELE6 were still scheduled from before. | |
ID: 50575 | Rating: 0 | rate: / Reply Quote | |
Yes the new QMML50s are flowing. Running between 2-4 minutes currently. Will keep an eye on them. | |
ID: 50578 | Rating: 0 | rate: / Reply Quote | |
Help! I have no working coming in and it has been a number of days, I see there is plenty of work on the server??? Please advise soonest, thanks Gary | |
ID: 52318 | Rating: 0 | rate: / Reply Quote | |
There is GPU work available. And you will need an Nvidia GPU. | |
ID: 52319 | Rating: 0 | rate: / Reply Quote | |
Message boards : News : More CPU jobs