Message boards : Graphics cards (GPUs) : PCI-e 16x PCI-e 4x
Author | Message |
---|---|
Hi, | |
ID: 47472 | Rating: 0 | rate: / Reply Quote | |
I have 2 GTX 1050 Ti 3Gb on a Gigabyte AB350MHD3 Motherbaord with an AMD Ryzen 5 1400 CPU on Windows 10.I think you refer to host 430643. This host now have only one GTX 1050 Ti, but I've checked your previous tasks, and it had two GPUs before, one GTX 1050 Ti and one GTX 1050 (without the "Ti": see task 16353457, 16355260). These are two different GPUs, the main difference is the number of CUDA cores: 768 vs 640, which explains the different computing times you've experienced, but there's more. I have noticed that when processing WU.( one core / WU, Core working on GPU is ranging 70-80 % core usage).GPU usage is misleading, if you want better estimation you should look at the GPU power measurement. Different GPU models have different power consumption (and TDP). The first GPU card gets hot and GPU processor is at 90% most of the time.If this PC has a tower case, then the hot air from the lower card heats the upper card. Cards which blow the hot air out directly through the rear grille are better in this regard (they don't blow the hot air into the case). (There's no such GTX 1050 / GTX 1050 Ti as far as I know) This GPU is connected to the first PCI-e which works at PCI-e 3.0 16X.They would not heat up the same even if they would be the same GPU. For info, the second card takes a few additional hours to complete a WU than the first.That could be caused by the narrower PCIe bandwidth, but in your case simply the lesser GPU takes longer. Also, Since it is evident that there is no bottleneck on the bus interface , does this mean the cards perform the same for this type of WU even though the bandwidth between GPU / CPU is narrower?Yes. There are two factors of it: 1. the present workunits does not need too much interaction between the GPU and the CPU 2. Your GPUs are not that fast to make the PCIe bandwidth bottleneck noticeable Now, the big question, since both cards are having the same processor usage, and there is no bottle neck, should the WU not finish at the same time also?They should, but your past workunits show different GPUs, so you might be missed the "Ti" on one of them. I might not be looking at the right information from GPU-Z.GPUz should show the different models. (One with Ti, and one without.) | |
ID: 47476 | Rating: 0 | rate: / Reply Quote | |
I run three dedicated machines. One of them has 5 GPUs. Only one of them is in a full length (16X) slot. One of them is in an 8X slot. The rest are in 1X slots with riser ribbons. | |
ID: 47483 | Rating: 0 | rate: / Reply Quote | |
Thanks for all Info. | |
ID: 47486 | Rating: 0 | rate: / Reply Quote | |
I run three dedicated machines. One of them has 5 GPUs. Only one of them is in a full length (16X) slot. One of them is in an 8X slot. The rest are in 1X slots with riser ribbons.Every BOINC app is different. The GPUGrid app is *very* different. Different workunit batches in GPUGrid could need *very* different PCIe bandwidth. Even the WDDM alone can shave off 10-15% performance. That's why I used exclusively Windows XP x64 for crunching, until the Pascals came. The card does all the work...That's not true for the GPUGrid app, as the double precision arithmetic is done on the CPU, moreover there are batches which apply extra forces to the atoms, and this is done on the CPU. ...and only needs 1X for communicating. The other 15 lanes are for processing video during videogame play which BOINc does not use.There's no such dedicated lane in PCIe. If there are more lanes, every app (regardless if it's a game, CAD, folding@home, or some BOINC related app) will utilize it. That's one of the key features of PCIe. The performance gain could be in the range from negligible to a direct ratio of the available lanes, but it's related to the GPU application, not to the PCIe architecture. I run a GeForce GT 710, two 620s, a 520 and a 420. The 710 has the fastest chip (954 Mhz) but only 1Gb Vram/64-bit memory interface. Also has no processor clock, just a graphics clock.Fast does not equal to clock frequency regarding GPUs. A GTX 1080 Ti could be 30 times faster than a GT 710, while it has only 1.5 times the clock frequency of a GT 710. The faster the GPU, the easier the app running on it could hit the PCIe bandwidth limit. The 420 has 2Gb Vram/128-bit memory interface. This card I use to run the display because it has the most memory/memory bandwidth/speed. It also has an 80mm fan. It also uses the most power. The other cards fall somewhere in between having 1-2Gb Vram but only 64-bit memory.The GPUGrid app and project is a power hungry kind. | |
ID: 47487 | Rating: 0 | rate: / Reply Quote | |
But when the 1050 is installed alone it produces more heat then on the second slot...A PCIe3.0x16 connection is roughly 8 times faster than a PCIe2.0x4, that could cause the difference in processing speed and heat dissipation. The other factor is the number of other (CPU) tasks running simultaneously. In my experience if I run more than 1 CPU task it does not "worth it" regarding the host's overall RAC, as it reduces more the host's GPU throughput (RAC) than it increases the host's CPU throughput (RAC) (If the host has a high-end GPU). Of course there could be other reasons to crunch CPU and GPU tasks on the same host, but regarding performance it does not worth it. If GPU performance is the main reason for building a rig it should have only one high-end GPU, but it does not need a high-end CPU and/or MB (with more PCIex16 connectors); it's better to build as many PCs as your space / budget allows. | |
ID: 47489 | Rating: 0 | rate: / Reply Quote | |
I am in the process of building another rig using the 6 x PCI-e 1x riser cable on 1060 GPUs. I'll report back on another thread. I myself could not locate any 'reference' info of people crunching on PCI-e 1x riser cables. This project uses extreme PCIe BUS memory bandwidth and requires at least an 8x PCIe 2.0 connection for anywhere near full speed. If you have PCIe 3.0 I would still try to get an 8x or even a 16x riser. This project, unlike all other GPU compute mining for cryptocurrency, is scientific which means it has aspects that need double precision compute, which is done on the CPU. I would run 0 CPU tasks and have at least 1 CPU thread per GPU dedicated and lock the CPU at the highest frequency you can get away with. During the 6 1060 installation, if your motherboard doesn't support at least 6 4x PCIe slots then I would stick to 4 GPUs, and perhaps put the last two in another machine. 1x is fine for cryptocurrency mining, but this is far more important and you will probably get 30% GPU utilization on 1x. | |
ID: 47490 | Rating: 0 | rate: / Reply Quote | |
Here is the xserver output of the GPU that's currently running GPUGrid: | |
ID: 47493 | Rating: 0 | rate: / Reply Quote | |
The most I've EVER seen it at for ANY project was 15% regardless of 16X or 1X. With my hosts running a GTX750Ti and at GTX970, both on MBs with PCIe2@16x, GPU-Z shows bus interface values between 54% and 57%. On the host with the two GTX980Ti (MB with PCIe3@16x), GPU-Z unfortunately shows "0" for bus interface (which means the tool does not recognize the bus speed). | |
ID: 47494 | Rating: 0 | rate: / Reply Quote | |
Anyway, at 95% GPU use the PCIe bandwidth is only at 1%. The most I've EVER seen it at for ANY project was 15% regardless of 16X or 1X. GTX 1080 @ 2000 MHz / 4763 MHz, Windows 10 v1703, NVidia v382.05, PCIe3.0x16, Bus usage: 30-31% GTX 1080 Ti @ 1974 MHz / 5454 MHz, Windows 10 v1703, NVidia v382.05, PCIe3.0x16, Bus usage: 33-34% | |
ID: 47495 | Rating: 0 | rate: / Reply Quote | |
Where's the proof? | |
ID: 47496 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : PCI-e 16x PCI-e 4x