nVidia Pascal X80

Message boards : Graphics cards (GPUs) : nVidia Pascal X80

Author	Message
Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 43046 - Posted: 20 Mar 2016 \| 0:09:05 UTC
	http://wccftech.com/nvidia-pascal-specs/ @+ _ ____________
	ID: 43046 \| Rating: 0 \| rate: / Reply Quote

[CSF] Thomas H.V. DUPONT Send message Joined: 20 Jul 14 Posts: 732 Credit: 126,845,366 RAC: 156,524 Level Scientific publications	Message 43047 - Posted: 20 Mar 2016 \| 9:52:21 UTC - in response to Message 43046.
	Déjà vu on PrimeGrid, Zarck is everywhere ;) Thanks Zarck! ____________ [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres
	ID: 43047 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43050 - Posted: 20 Mar 2016 \| 13:38:12 UTC - in response to Message 43047. Last modified: 20 Mar 2016 \| 20:25:28 UTC
	Will be interesting to see which versions of Generation Pascal (GP) ship first. Last time round the GTX750 and GTX750Ti Generation Maxwell (GM) cards turned up early (Feb 2014) compared to the main high end GM cards GTX970 and GTX980 (Sept 2014) and a Titan Z (GK) turned up after the GM107 releases. In Mar 2014 mobile variants of GM also turned up GM108 and GM107 in the GF800M series. This all made good business sense so I'm expecting something similar this time round. Also worth noting that the GeForce 700 line-up (including Titans) summed 19 standard + OEM variants but the GF900 series only has 6, so far... Probably at least another 2 or 3 months before anything GM ships to POS though. About a month ago I read speculation of a GTX950 SE or LE (Second Edition or Light Edition), but nothing since. Could have been an Engineering Sample (ES) based on a GTX950 GM card but intended for GP, suggesting something like a GTX950Ti might turn up based on GP, say GP107, just ahead of a couple of similar mobile GP models. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43050 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 485 Credit: 11,130,472,824 RAC: 15,320,537 Level Scientific publications	Message 43051 - Posted: 20 Mar 2016 \| 19:48:33 UTC - in response to Message 43046. Last modified: 20 Mar 2016 \| 19:51:46 UTC
	http://wccftech.com/nvidia-pascal-specs/ @+ _ It looks like on the surface these cards are going to be twice as fast as their predecessors, while using a little less power. But here is the catch, how much bigger is the WDDM lag going to be? Also, on high CPU dependent WUs like the GERARD A2ARs, how much bigger is that lag going to be? I guess, you have to take the good with the bad.
	ID: 43051 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43052 - Posted: 20 Mar 2016 \| 21:25:52 UTC - in response to Message 43051. Last modified: 20 Mar 2016 \| 21:30:27 UTC
	It looks like on the surface these cards are going to be twice as fast as their predecessors, while using a little less power. Based on those specs I would guess a performance boost of around 45% but in terms of performance/Watt it could be around 2.2times better. But here is the catch, how much bigger is the WDDM lag going to be? There might be a lot of catches and things we don't know about yet. For example the acemd apps performance might not scale well on these high density cards. That's already the case with the GTX Titan X and GTX980Ti (to a lesser extent). The 7010 to 8000MHz DDR would reduce the Memory Controller Unit burden somewhat but probably not sufficiently to prevent a bottleneck in itself. The architecture might sort that out, or not... The memory bandwidth probably wouldn't be an issue with HBM2, but that seems to be limited to the supposed X80Titan, which might be a bit different in other ways too (double precision), and >>pricier. Also, on high CPU dependent WUs like the GERARD A2ARs, how much bigger is that lag going to be? Who knows what architectural advancements and cuda magic might reduce such CPU dependency? I guess, you have to take the good with the bad. I'm interested in what the NV-Link will bring to real world crunchers. The possibility of adding up to 8 GPU's via an NV-Link sounds like smallish devices might be great connected up, assuming the reliance on the CPU isn't as big. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43052 \| Rating: 0 \| rate: / Reply Quote

Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 43053 - Posted: 21 Mar 2016 \| 0:06:57 UTC - in response to Message 43047. Last modified: 21 Mar 2016 \| 0:07:09 UTC
	Déjà vu on PrimeGrid, Zarck is everywhere ;) Thanks Zarck! Souvent présent ici -> http://forum.boinc-af.org @+ _ ____________
	ID: 43053 \| Rating: 0 \| rate: / Reply Quote

[CSF] Thomas H.V. DUPONT Send message Joined: 20 Jul 14 Posts: 732 Credit: 126,845,366 RAC: 156,524 Level Scientific publications	Message 43054 - Posted: 21 Mar 2016 \| 8:11:14 UTC - in response to Message 43053.
	Sincères salutations de toute l'équipe CRUNCHERS SANS FRONTIERES à L'Alliance Francophone ! :) ____________ [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres
	ID: 43054 \| Rating: 0 \| rate: / Reply Quote

Skyler Baker Send message Joined: 19 Feb 16 Posts: 19 Credit: 140,656,383 RAC: 0 Level Scientific publications	Message 43076 - Posted: 24 Mar 2016 \| 22:59:37 UTC
	Wonder what they will cost also.
	ID: 43076 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43077 - Posted: 25 Mar 2016 \| 2:36:07 UTC - in response to Message 43052.
	http://www.gputechconf.com/ Keep an eye out during the week of April 3rd for real Pascal Information. At 2015 GTC (GM200) Titan X launched. Maybe the GPU-Z database reveals a real Pascal soon. Si Software benchmarking database a good place to find new GPU's. skgiven wrote: Who knows what architectural advancements and cuda magic might reduce such CPU dependency? A year and half ago in the "Maxwell now" thread I asked: Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature. Quoted from newest CUDA programming guide-- "new managed memory space in which all processors see a single coherent memory image with a common address space. A processor refers to any independent execution unit with a dedicated MMU. This includes both CPUs and GPUs of any type and architecture. " In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions. GPUGRID developer Matt (MJH) replied: We have cc-specific optimisations for each of the most performance sensitive kernels. Generally don't use any of the features introduced post CUdA 4.2 though, nothing there we particularly need. I expect the GM204 performance will be marked improved once I have my hands on one. One difference is Maxwell's L1 cache size 80kB per SMM (SM 5.0/5.2) while Kelper L1 16kB per SMX (SM 3.0/3.5) possibly configured up to 48kB per SMX. Maxwell L2 cache also larger than Kelper. (See Maxwell now thread for more info and CUDA toolkit. ) Pascal likely has a larger L1/L2 cache and more efficient instruction throughput as well more registers per SMP and per thread. If the DP ratio is 1/3 than upping an SMP might be back to Kelper's 192c SMX size. Pascal with a 1/3 DP to SP ratio a 6 [32c] blocks per (1) warp schedulers for each 32c block or maybe (3) 64c blocks per SMP with 2 warp schedulers per block could work. Maxwell's current 4 [32] blocks per SMM with One warp scheduler for each 32c block would only work with 1/2 or 1/4 and 1/8 DP to SP ratio. Some rumors are pointing to a 1/3 ---> 1/24 (consumer Geforce GPU) DP ratio Pascal design. Although it's also possible the Tesla or Titan is 1/2 or 1/4. Pascal could be 8 [32c] block with 8 warp schedulers per 256 core SMP - either 64/128DP cores per SMP at 8 or 16DP per 32c block. It possible a block can grow to 64c with 16 or 32DP in each block - 256c SMP with (4) 64c blocks. Pascal "SMP" possible configurations with 16 bit cores (without TMU/ROP counts): --- 256c SMP (4) 64c block - 8 warp schedulers - 2 warps per 64c block - 8/16/32 DP per block. 1/8 or 1/4 or 1/2 DP ratio. 32 bits cores (2x16bit) --- 256c SMP (8) 32c block - 8 warp schedulers - 1 warp per 32c block - 8 or 16DP cores per (32c) block at 1/4 or 1/2 SP to DP ratio --- 192 core SMP (6) 32c block - 16 DP core per 32c block for Tesla/Titans (96DP in an 192c SMP at 1/2 ratio) 1 warp scheduler per 32c block. --- 192c SMP (3) 64c block - 32DP per block - 96DP core per SMP - 1/3 DP to SP ratio. Consumer's GPU knocked down to 8DP core per 64c block. 2 warps per 64c block. --- 192c SMP with similar design as Kelper: 64 DP core per SMP with 32c block per warp. --- 192c SMP with (4) 48c blocks (superscalar) 16bit cores (128 or 256bit SIMD lane) and (32) 32bit cores per block - 12DP core per block - 1/3 DP ratio --- 128c SMP (2) 64 blocks and 2 warp schedulers per block - 16 or 32 DP cores per block - 1/4 or 1/2 DP ratio. Consumer's GPU's 8 DP cores per block. --- 128 SMP (4) 32 blocks and 1 warp scheduler per block - 8 or 16 DP cores per block - 1/4 or 1/2 P ratio. Consumers GPU's 4 DP per block. Maxwell's (4) 32bit 32c block at 1 warp per block (4 warp schedulers per 128SMX) design much more efficient than Kelpers 192c 4 warp schedulers SMX. Kelper is superscalar 4 (32c) vec32 sets and 2 (32c) vec16 design per SMX. Maxwell is 32vec only so scalar. The 8 64bit core per SMX was trimmed to 4 per SMM in Maxwell. Kelper GK104/110 golden 24/7 OC mark is 1.250/1.3GHz - GM107 1.4GHz - GM204/GM206 1.5GHz. 16nm GP104 probably able to equal GM107 clocks. I'm going to enjoy learning how 16nm transistors react when overclocked with water cooling and air while comparing temperature/power/OC 16nm profiles to 28nm Maxwell. ??Pascal compute Capability or "SM" version ??: CUDA_ARCH=6.0 CUDA_ARCH=6.1 CUDA_ARCH=6.2 CUDA_ARCH=6.3 http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html http://docs.nvidia.com/cuda/parallel-thread-execution/index.html http://docs.nvidia.com/cuda/gpudirect-rdma/index.html 7.5 CUDA has numerous updates. CUDA 8.0 even more so. I'm interested in what the NV-Link will bring to real world crunchers. The possibility of adding up to 8 GPU's via an NV-Link sounds like smallish devices might be great connected up, assuming the reliance on the CPU isn't as big. https://devblogs.nvidia.com/parallelforall/how-nvlink-will-enable-faster-easier-multi-gpu-computing/ Nv-livk whitepaper available. NV-link (GPU >><< GPU) and (GPU >><< host) 5 to 12x more bandwidth than PCIe3.0. For IBM POWER8 and possible ARM CPU's they'll be CPU <<>> GPU <<>> GPU nvlivk while maybe (PCIe4.0?) Skylake Xeons get GPU <<>> GPU nvlink only. And maybe Kabylake i7/i5 also receive GPU <<>> GPU. Programs that require PCIe bandwidth can be issue with Maxwell. (Kelper not so much) A tip: run CUDA-Z without and with ACEMD processes and it will show the types of memory performance being affected. Until consumer nv-link is mainstream: Skylake z170 and upcoming Kabylake z200 series chipsets platform offer the highest consumer WDDM OS ACEMD performance. Linux and XP also see benefits with Skylake compared to Haswell or Ivy Bridge. Skylake CPU upgraded to DMI3 while Haswell/Ivy a DMI2 link. (doubling the bandwidth rate) PCIe/DMI clocks are also decoupled in Skylake with new (fclk) running 1 GHz or above. GPU runtimes with skylake chipsets are mostly faster with lower CPU times than Haswell. (Some risers have an area where a DIY PCIe 6pin can soldered) on a Powered 15pin with (3 12V pins) SATA to 4pin Molex riser Maxwell GTX750 GPU runtimes lose around 50% performance running on a x1 PCIe 2.0 compared to x4 PCIe 3.0. My (800MHz) 1.6GHz DDR3 64bit memory GT630 (384cores) secondary crunching GPU on the PCIe2.0 x1 slot > USB3.0 riser is around 30% slower than my 384 core GT650m GDDR5 128bit (1GHz/2GHz/4GHz) memory interface that's connected to PCIe2.0 x8. (My 970's show a 10% runtime difference with PCI3.0 x4 - vs. - x8 for certain WU.) A note about power consumption: Primegrid's Genefer opencl (4.3 PTX model) OCL3 and about to be released OCL4 puts an OCed 970 at 230W running n=21/22 WU. A n=20 WU will push be 210W with clocks at 1.4GHz or slightly above. Genefer uses the funnel shift instruction which is a C.C 3.5+ feature. Having more power phases on a GPU and motherboard help's keep electrical power and heat lower. I find this important if one's computer 24/7 for years at a time crunching. Both my GTX970's ran ACEMD at (5teraFLOPS) 1.5GHz core clock (145W to up 165W) - the Zotac 13 phase 970 consumes 8-15W less power on all ACEMD WU than an EVGA AVX2.0+ 3973 model with 8 phases when both GPU are on PCIe3.0 x8 or x4) WDDM isn't fully utilizing the ACEMD OC scaling. My current MB is an all digital 20 phase (a refub z87 MSI Mpower Max I picked up for 79$ last July) that use's less power on CPU than my former 16 phase z97 Mpower (also bought as a refub for 80$) by about 3-7W and around 10-14W than a 12 all digital phase ASUS board I tried out. If you're building a Multi GPU board - a top notch phase count will help with long-term power consumption as does a custom PCB GPU compared to reference design. A 1200W (platinum) PSU with 102 AMP single 12V rail also helps.
	ID: 43077 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 43129 - Posted: 1 Apr 2016 \| 20:48:13 UTC
	@NV-Link: it's meant for servers, so don't expect any benefit for crunchers anytime soon. @Price: the 1st card from the 28 nm generation provided lower power consumption and modern features, but were priced about as high as the similarly performing cards of the prior generation. I.e. there was no improvement of the price/performance ratio until about 1/2 to 3/4 years availability. The reason was limited fab capacity for the new process and initially higher costs (due to inferior yields), as well as refunding some R&D. Since 14/16 nm is going to be more expensive per area even at comparable yield I expect this situation to repeat, maybe even a bit worse (for us). @WCCFTech: they're wrong so often that I call them "WTF Tech" :p Sometimes they're right, but those specs don't make sense in too many points: - just 1000 MHz for the X80? That's significantly below the GTX980, despite a major fabrication technology improvement. Unless the 16 nm process is extremly clock-unfriendly (and the mobile chips sugget otherwise) this looks wrong. - the bigger chips clock higher? It's always been the other way around for technical reasons. - X80Ti and X80 Titan based on the same flag-chip GP100, yet use different memory interfaces? And the smaller one comes with 1/6 of all units deactivated? If you're building a massive GPU (and 6000 shaders is massive!) every additional mm² hurts. The last thing you want on such a chip is a 2nd memory interface going unused. A 512 bit GDDR5 interface at extremly high clock speeds (8 GHz) eats a significant amount of die space. I suspect they'd rather build 2 different GPUs for that and equip one with strong FP64 horsepower for additional differentiation. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 43129 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43131 - Posted: 2 Apr 2016 \| 9:53:31 UTC - in response to Message 43129. Last modified: 2 Apr 2016 \| 12:13:33 UTC
	Based on shipping manifests there will be a Big card and a range of smaller cards. Logs show a range of GPU's with 'Insurance' values of between 30,000INR and 56,000INR and another GPU costing around 200,000INR (£2K). Assumes 699 represents the Pascal architecture, which might be wrong as it's also listed in mobile quadro's. Date HS Code Description Origin Country Port of Discharge Unit Quantity Value (INR) Per Unit (INR) 26-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G413-0000-000 / NOT FOR SALE United States Banglore Air Cargo NOS 2 61,380 30,690 26-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G610-0000-000 China Banglore Air Cargo NOS 5 291,803 58,361 26-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 United States Banglore Air Cargo NOS 5 238,085 47,617 22-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-5G418-0503-000 United States Banglore Air Cargo NOS 2 483,089 241,544 22-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-5G418-0503-000 United States Banglore Air Cargo NOS 6 1,221,994 203,666 22-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 United States Banglore Air Cargo NOS 15 657,083 43,806 19-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 / NOT FOR SALE United States Banglore Air Cargo NOS 2 112,473 56,237 18-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 China Banglore Air Cargo NOS 10 381,192 38,119 16-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-12914-0000-100 China Banglore Air Cargo NOS 1 90,020 90,020 GM first appeared in the form of a GTX750Ti and GTX750. That was way back in Feb 2014. It’s now over 2years since the last Generation of NV GPU hit the streets. That was a successful launch within the entry level gaming GPU market and was followed up with similar mobile variants. 7 months later the high end cards (980 and 970) arrived - too long IMO. This time I'm expecting a similar but not identical approach; entry level to mid range gaming cards to appearing first, as that's the biggest market area and NV will want to flood it before AMD release their 14nm cards. Similar mobile cards to follow and some time later the bigger cards. I'm not really expecting a replacement for the GTX980 or GTX980Ti yet as the big card could be a GP Quadro, GTX1080Ti, GTX 10-Titan, or even a Tesla. That said, and going by lots of GPU pairs shipping, a dual mid-range card based on Pascal could well outperform high end GM GPU's such as the GTX980 and offer many an early upgrade route. I'm expecting enough enticement for GTX700 card holders to buy sooner, rather than later. Just releasing entry level cards wouldn't do that, but mid range cards should. Good chance Jen-Hsun Huang, NV’s CEO, will announce something GP when he speaks at the GPU Technology Conference on 5th April. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43131 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43133 - Posted: 2 Apr 2016 \| 14:11:59 UTC - in response to Message 43131.
	BTW was there any dual GTX750Ti variant ever released?
	ID: 43133 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43134 - Posted: 2 Apr 2016 \| 14:20:14 UTC - in response to Message 43133.
	Not that I'm aware of. The GTX750Ti isn't SLI capable, so I doubt that a card would be built on GM107. Not being Sli capable probably improved the card's performance/Watt. Perhaps a dual GTX960 would be viable, but with GP so close I doubt anyone would want to build it. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43134 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43154 - Posted: 5 Apr 2016 \| 18:19:16 UTC Last modified: 5 Apr 2016 \| 19:14:55 UTC
	https://devblogs.nvidia.com/parallelforall/inside-pascal/ Pascal Tesla P100 revealed at GTC (available 2017Q1 for OEM): 21.2 teraFLOPS FP16 10.6 () FP32 5.3 () FP64 3840 CUDA cores GPU 15.3 billion transistors / 610mm² 4MB L2 cache / 14MB shared register file (6MB Maxwell GM200) NVlink / HBM2 Unified Memory Compute Preemption Compute Capability 6.0 core clock 1328MHz / boost 1480MHz 64 cores per SM / (2) 32core blocks (warp for each block - dispatching two warp instructions per clock) 32 (Double precision) cores / 16 load/store / 16 SFU per SM eight 512-bit memory controllers (4096 bits total) 16nm FinFET 6 Graphics Processing Clusters (10 SM's per GPC) 300W TDP https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2016/04/gp100_SM_diagram-624x452.png https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2016/04/gp100_block_diagram-1-624x368.png https://devblogs.nvidia.com/parallelforall/cuda-8-features-revealed/ CUDA 8 will be available in August 2016 and there will be a release candidate available around June.
	ID: 43154 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43155 - Posted: 5 Apr 2016 \| 19:29:49 UTC - in response to Message 43154.
	Well, this is not the product we are waiting for.
	ID: 43155 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43160 - Posted: 9 Apr 2016 \| 12:58:40 UTC - in response to Message 43155.
	That GP100 is the Tesla (or similar manifestation) which doesn't concern most crunchers. They will probably be subject to pre-contracts and ALL be going to data centres first (AKA Titan 3 years ago), then to OEM's (maybe in the form of Quadro's) and then possibly/eventually in the form of a GTX1080i to everyone else, if it makes sense in 2017 to do that. Would have been nice to see a midrange card announced 1050Ti or 1060 but that didn't happen. For now, the GPU's we are interested in will likely be GP104 and down; from the GTX1080 (or whatever name it will be) down to perhaps a GTX1040. So while there probably won't be a direct replacement for the GTX980Ti any time soon, there will likely be smaller models (GTX1080) that might offer up similar or competitive performances to the GTX980Ti, and if not it would be highly likely that two lesser GP cards would outperform a GTX980Ti. Even today, two GTX970's (£360) match a GTX980Ti (£520). The 980Ti didn't even ship until after the Titan X. While the GTX980Ti is a flagship gaming card and is only 10months old it was almost an afterthought in the GM line-up and never really made financial sense. The latest rumour suggests a GTX1080 and 1070 will turn up in June (with an announcement between May 31st and June 4th at Computex, or not...), but mass shipment probably wouldn't happen until July 2016. The same rumours also suggest 7400M to 7900M transistors (not far off a GTX980Ti's 8000M and well over a GTX980's 5000M). Even at 74000M transistors (92.5% of a GTX980Ti's) it would only take a slight architectural improvement to match a GTX980Ti, or an 8% shader clock boost. That said, there could also be bottlenecks and architectural constraints that would inhibit performance and it probably won't work here straight out of the box. I guess if you really want another card now, your electric is cheap, and you are not bothered about it losing value or being one of the fastest cards around in 6months then a GTX980Ti might still be an option but it's not something I would go for. The best time to buy a GTX10xx to crunch here would be when we know they work, then when AMD release their 14nm line range, which will probably be some time after NV's launch (price drop). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43160 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 43161 - Posted: 9 Apr 2016 \| 13:53:39 UTC - in response to Message 43160. Last modified: 9 Apr 2016 \| 13:55:48 UTC
	I picked up a new Asus GTX 980 TI on Ebay for £450 free P&P. I figured that it would be a while before a competitor to come out and then for GPUGrid to make it work. At £450 I should be able to resell at not too much of a loss. As a by the way, graphics cards go for crazy prices on Ebay had to wait ages to get a price like £450 Completely of track, "The Druids Nephew" EW in The Grand National
	ID: 43161 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43162 - Posted: 9 Apr 2016 \| 14:54:25 UTC - in response to Message 43161.
	If your money's backing GP to be a late runner with poor odds of an early app appearance, £450 was a decent punt for an each way bet and a heads length better than sticking one on the nose at £520 so to speak. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43162 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 43163 - Posted: 9 Apr 2016 \| 16:11:45 UTC - in response to Message 43162.
	HaHa, well done SK :-)
	ID: 43163 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43165 - Posted: 9 Apr 2016 \| 17:49:14 UTC Last modified: 9 Apr 2016 \| 17:59:58 UTC
	This GP100 is intended for professional use. I don't think it will be ever released in a form of a GeForce card. It's a "replacement" for the GK110 (GTX Titan, Titan Black, Titan Z), as there was no such chip in the Maxwell product line. The lesser Pascal chips we are waiting for probably won't have that many DP units, NVlink and HBM2. The leaked specifications are surely inaccurate regarding the clocks, as it could be as high as of the GP100. By using 16nm technology theoretically it is possible to have 3 times as much components over the same area as using 28nm (28/16=1.75; 1.75^2=3.0625), but I don't think NVidia wants to produce that large chips for the gaming market (to achieve higher yields), so I expect physically smaller chips than the GP100 in the high-end segment of the gaming cards. Still they could be twice as fast as the GTX980Ti, which is pretty enough (depending on what AMD will have). The effect of the WDDM overhead could be even more deteriorating than on the present high-end cards, also maybe there won't be Windows XP drivers for the Pascal series at all, and in this case I will have to switch to Linux on some of my hosts.
	ID: 43165 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 43166 - Posted: 9 Apr 2016 \| 19:55:28 UTC - in response to Message 43165.
	There was a comment a while ago on the POEM forum that it was not clear whether the HBM2 stacked memory had the fine-grained address ability (if that is the term) required for optimum performance for their work. Whether that applies to GPUGrid I don't know, but I would not jump into the lake without checking for rocks first.
	ID: 43166 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43167 - Posted: 9 Apr 2016 \| 23:58:23 UTC - in response to Message 43166.
	Almost a cert it will be GDDR5@8GHz on the gaming cards. Agree that GP104/6... will not be that similar to GP100:- DP just isn't needed on mid-range to high end gaming cards, never mind entry level cards. Would be cautious about performance though - who knows what bottlenecks there will be. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43167 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 315,774 Level Scientific publications	Message 43169 - Posted: 10 Apr 2016 \| 10:57:48 UTC
	http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=980Ti&_sop=15 Now it began and still going to be in the future very much nvidia 980Ti for sale at a good price. but import charges to europe.. destroy all. It is also a lot of good servers on ebay for only Cpu crunch in the usa for sale. but import charges in europe are crazy. http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_dcat=11211&Number%2520of%2520Processors=8&_nkw=server&_sop=16
	ID: 43169 \| Rating: 0 \| rate: / Reply Quote

nanoprobe Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level Scientific publications	Message 43181 - Posted: 12 Apr 2016 \| 16:01:23 UTC - in response to Message 43076.
	Wonder what they will cost also. Your first born. 🙀
	ID: 43181 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43235 - Posted: 22 Apr 2016 \| 13:07:19 UTC
	For those interested: GP100 Pascal architecture Whitepaper is now available. Whitepaper access and download from Nvidia's website requires registration (including already registered developer accounts.)
	ID: 43235 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 315,774 Level Scientific publications	Message 43236 - Posted: 22 Apr 2016 \| 18:33:38 UTC
	http://videocardz.com/59266/nvidia-pascal-gp104-gpu-pictured-up-close
	ID: 43236 \| Rating: 0 \| rate: / Reply Quote

Wrend Send message Joined: 9 Nov 12 Posts: 51 Credit: 522,101,722 RAC: 0 Level Scientific publications	Message 43237 - Posted: 22 Apr 2016 \| 20:38:28 UTC
	Nice. Looks like my Titan Blacks may have finally found a worthy replacement... XD Looks like we're maybe getting some decent double precision, 64FP capabilities again too? Just started doing some tests on DP using MilkyWay@Home to see if it can make use of it well enough on my Titan Black cards. (Sorry, bit of a break from GPUGrid.) ____________ My BOINC Cruncher, Minecraft Multiserver, Mobile Device Mainframe, and Home Entertainment System/Workstation: http://www.overclock.net/lists/display/view/id/4678036#
	ID: 43237 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43320 - Posted: 7 May 2016 \| 14:02:41 UTC
	GEFORCE GTX 1080 ($599 MSRB) Available May 27: 7.2 billion transistors 294mm² die 2560 NVIDIA CUDA Cores 1607 (MHz) Base Clock 1733 (MHz) Boost Clock GPU boost 3.0 8.5 (FP32) TeraFLOPs 180W reference TDP 100% power limit (real TDP) likely 225W+ (1) 8 pin Memory Specs: 10 Gbps Memory Speed 8 GB GDDR5X 256-bit Interface Width 320 (GB/sec) Bandwidth Geforce GTX 1070 $379 MSRP Available June 10 Last night's Pascal 1080 unveil demo was running at 2.1MHz on air - similar to Maxwell L2N cooled clocks. Pascal +2.1MHz boost will be Maxwell's 1.5GHz. (For 24/7 OC) Early adopters can help GPUGRID beta test a new ACEMD app - Count me in. Once general pubic availability (June~July) is secured when does the Project announce a new CUDA phase? Will Pascal be a repeat of GK110 initial ACEMD production difficulties? Titan X / GTX 980ti is now EOL/DOA. Maxwell's real TDP's (power limit): GTX980ti = 325~350W GTX980 = 250~275W GTX970 = 220~240W GTX960 = 160~180W GTX750 = 60~75W
	ID: 43320 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43321 - Posted: 7 May 2016 \| 15:08:19 UTC - in response to Message 43320. Last modified: 7 May 2016 \| 15:13:09 UTC
	Now that's the product we are waiting for. If it's not faster by its newer architecture, it should have 1540MHz GPU clock to achieve the performance of an overclocked GTX980Ti (@1400MHz). So by its standard boost clock it should be ~12.5% faster; but if the 2.1GHz is true, then it should be ~36.4% faster than a GTX980Ti@1400MHz while consuming only ~2/3 of the electricity. I thought that this chip would have 3072 CUDA cores, so it has only the 5/6 of what I've expected, but it will be enough to top the GTX980Ti as it has higher clocks. Will Pascal be a repeat of GK110 initial ACEMD production difficulties? As it happened at the release of every previous GPU generation I expect that there will be some difficulties. (The present app won't work with the new cards)
	ID: 43321 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43326 - Posted: 8 May 2016 \| 14:46:24 UTC - in response to Message 43321. Last modified: 8 May 2016 \| 14:57:50 UTC
	If it's not faster by its newer architecture, it should have 1540MHz GPU clock to achieve the performance of an overclocked GTX980Ti (@1400MHz). So by its standard boost clock it should be ~12.5% faster; but if the 2.1GHz is true, then it should be ~36.4% faster than a GTX980Ti@1400MHz while consuming only ~2/3 of the electricity. 2.1GHz is true - check out Nvidia's blog. GP104 Mid-tier 16nm die going to destroy the 3rd gen 28nm big die (GM200) performance/watt. 28nm Big die generations: GK110a > GK110b (GK210) > GM200 There's no doubting that GTX980ti is a strong 32bit chip but with all that heat it's tough cooling a dense setup OCed. IMHO: Water cooling GM200 was really only way to go - WC still has a 350W power limit running a monster program. I thought that this chip would have 3072 CUDA cores, so it has only the 5/6 of what I've expected, but it will be enough to top the GTX980Ti as it has higher clocks. 3072 CUDA is possible as a 2nd or 3rd gen mid-tier 16nm (GP204 or GV104 Volta) part with a slightly bigger die - similar to 4th or is it 5th gen 28nm GM204 (398mm²) .vs. 1st/2nd gen 28nm GK104 (294mm²). (3rd gen mid/low tier 28nm is GM107) Maybe the big die GP100 becomes a Geforce >3000CUDA - or Volta might be the first (Geforce) Big die? Either way the GPU performance/watt advancements is impressive compared to last couple of CPU generations. A non-Pascal question: does you're GM200 hit 1.5GHz stable on any projects and what's the highest (stable) OC for ACEMD? I've always thought GM200 1.5GHz ACEMD was possible. I've ran GM204 @1.5GHz since having them. (A lot Maxwell boards push the 1.5GHz boundary.) FYI: To find any Maxwell and (Pascal) PCB (BIOS) power limit - run Primegrid's OCL4 n=20/21/22 Genefer created by (Developer Yves Gallot) or Sisoftware CUDA scientist benchmark OCed.
	ID: 43326 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43346 - Posted: 10 May 2016 \| 7:55:36 UTC - in response to Message 43326. Last modified: 10 May 2016 \| 7:58:07 UTC
	2560 NVIDIA CUDA Cores would be a nice step up from 2048 (GTX980), which is primarily what we should be comparing a 1080 with. 8GB DDR5 also doubles the 980's 4GB DDR5. The 1733 (MHz) Boost Clock might actually Boost to ~1850MHz without any tweaking. If it outperforms the GTX980Ti for throughput then even better, but even if it doesn't it's highly likely to be better in terms of performance/Watt. For the app to work here it will probably need to be recompiled with the latest CUDA Tool Kit. ACEMD is a complex app unlike some others which might work straight out of the box or with only a little bit of work. If the cards don't initially work here they might work elsewhere. Worst case scenario is that GPUGrid has to wait on a new CUDA tool kit and then don't have the time over the summer holidays to redevelop and test the app. Best case scenario is that a fully functional CUDA took kit is available on launch day and Gianni and Matt are available and up to speed with any CUDA advances/changes and get a Pascal to test with. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43346 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43651 - Posted: 30 May 2016 \| 11:33:14 UTC Last modified: 30 May 2016 \| 11:34:42 UTC
	https://xdevs.com/guide/pascal_oc/ Many Geforce GTX 1080 (Gamer) review(s) recently confirmed an average performance increase of ~30% verse stock GTX980ti. (1) OCed GTX1080 typical performance = (2.3) GTX970 (As of now: No reviewers have published a folding@home or sisoftware scientific CUDA benchmark. ) GTX1080 fastest sisoftware single precision scientific CUDA result as of today: http://ranker.sisoftware.net/show_device.php? Total Benchmark score 877.78GFLOPS GEMM (General Matrix Multiply) 4259.25GFLOPS FFT (Fast Fourier Transform) 180.90GFLOPS N-Body (N Body Simulation) 3962.22GFLOPS My GTX970 at 1602MHz for reference (fastest GTX970 in the si software database): Total benchmark score 666.70GFLOPS GEMM (General Matrix Multiply) 2844.11GFLOPS FFT (Fast Fourier Transform) 156.28GFLOPS N-Body (N Body Simulation) 2684.82GFLOPS GTX1080 is 314^mm die with 640CUDA per (4) GPC / 2560CUDA / 20SM * 128CUDA per / (80) 32CUDA blocks / 160 dispatch units / 80 warp schedulers / 640 ld/store units / 640 SFU / 160TMU / 64ROP / 2MB 2nd level cache / 48-112KB 1st level cache with 96KB shared memory for each SM / 65536 [32]bit registers per SM at 16384 per 32CUDA block) GP104 (SM) core structure is as Maxwell SMM (432c blocks 128CUDA per SM) while GP100 has (64CUDA / 32DP per SM). GP104 256bit memory interface is split into 832bit rather than 464bit controllers. 8 ROP are tied to Pascal's 32bit partition. GM200/204 have 16 ROP for every 64bit controller. Each GP104 GPC has 5 SM's (640CUDA). GP100 GPC amounts to the same 640 in (10 sm 64CUDA) The GP104 DP:SP ratio is (Maxwell's) 1/32. (1) DP core per 32SP block (4DP per 128c SMP) GP104 (FP16) will not have double the FP32 output as GP100. GP104 same as Maxwell's 1:1 FP32/FP16 ratio. The new 4.0 polymorph engine / SIP video encode / decode / display / memory compression cache(s) / etc. redesigned. Pascal also other enhancements. (see reviews) TX1070 specs were also confirmed - it's cut down more than Maxwell's (970/980) 1664 .vs. 2048 [3] SMM 384CUDA. GTX1070 has 1920CUDA .vs. the GTX1080 2560CUDA. That's (640core) GPC worth of cut - 5 (128C) SMP's. OCed GTX1070 won't be able to match stock GTX1080 FLOPs even if the GTX1070 OCed beyond a stable 2100MHz. Current Performance on GTX970 at 1.5GHz equals the stock GTX980 running WU's with no lag or choke points. From looks of most reviews - Pascal 24/7 compute OC scaling sweet spot is ~2100. Once above 2100MHz Pascal cores lose some steam. (Maxwell's 1500 ~ 1550 ~ 1600MHz) http://videocardz.com/60547/comparison-of-custom-geforce-gtx-1080-pcbs Zotac's 16VRM + 3Mem (I think it's really 20VRM + 2AUX + 2Mem = 24 total phases) has over 3 times the power delivery compared to a (reference 5+1) model. If (any) crunchers find this card available - please link it here. Most reports indicate Zotac PGF (with OC+ microcontroller) is specific only to Asia markets. It's possible that AMP Extreme GTX1080 will get the PGF PCB in the USA though not confirmed. (no official word as of yet.) (PGF a great card to cool passively (uni-block and no radiator fans) if one has a (hard-line) water cooled system with a +240mm radiator(s). IMO - numerous fans are not required for radiators.) A Zotac Extreme GTX 970 13 phase with OC+ module (70.7% GPU-Z ASIC rating) and EVGA 8 phase (81.1% ASIC rating) both at >1.5GHz eat most GTX 980 for breakfast over at POEM and PrimeGrid Genefer. Prime numbers are helpful in Bio-medical research. Phases matter: at stock voltages and similar core temps - the EVGA power consumption is 240W (1451MHz) computing Primegrid n=21/22 Genefer WU. The Zotac 210W (1481MHz). Genefer certainly the most powerful (Maxwell) app with (lowest stable overclock) available on the BOINC platform by a -100MHz twin-turbo gap. My 24/7 electric usage and bill went up compared to CUDA6.5 ACEMD. These (2) 970's have a 10-15W difference computing ACEMD WU at 1.5GHz.
	ID: 43651 \| Rating: 0 \| rate: / Reply Quote

davebodger Send message Joined: 30 Jul 11 Posts: 2 Credit: 7,052,262 RAC: 0 Level Scientific publications	Message 43767 - Posted: 11 Jun 2016 \| 15:34:54 UTC
	Just tried my nice shiny new Gigabyte GTX1080 on GPUGRID and I just got Computation Errors on the two wu's I downloaded. :-( Astroids was the same but Collatz works OK, so I know it's not the card. I've turned off the project now (Allow no new taks) until you say it's OK for me to try again, I don't want to waste wu's or time, as I presume you need to adapt for Pascal?
	ID: 43767 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43770 - Posted: 11 Jun 2016 \| 18:45:16 UTC - in response to Message 43767. Last modified: 11 Jun 2016 \| 18:46:44 UTC
	Just tried my nice shiny new Gigabyte GTX1080 on GPUGRID and I just got Computation Errors on the two wu's I downloaded. :-( Astroids was the same but Collatz works OK, so I know it's not the card. I've turned off the project now (Allow no new taks) until you say it's OK for me to try again, I don't want to waste wu's or time, as I presume you need to adapt for Pascal? Exactly. See this post: GDF wrote: HI, we expect great performance from the GTX1080 but at the moment we don't have any. As soon as we have them, we need to recompile the code for them and check it. At the moment, the app will crash on any new pascal gpu. gdf
	ID: 43770 \| Rating: 0 \| rate: / Reply Quote

peeticek_LubosKrutek Send message Joined: 30 Nov 08 Posts: 7 Credit: 62,377,145 RAC: 0 Level Scientific publications	Message 43772 - Posted: 12 Jun 2016 \| 5:55:34 UTC - in response to Message 43767.
	What time for collatz tasks are you getting? Rac? Did you try anither projects except the gougrid and asteroids? If yes, ehat results? Thanks
	ID: 43772 \| Rating: 0 \| rate: / Reply Quote

wiyosaya Send message Joined: 22 Nov 09 Posts: 114 Credit: 589,114,683 RAC: 0 Level Scientific publications	Message 44022 - Posted: 20 Jul 2016 \| 18:34:22 UTC
	For anyone interested, AnandTech published Compute benchmark results for the consumer founders edition cards. http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/28 ____________
	ID: 44022 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : nVidia Pascal X80

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 43046 - Posted: 20 Mar 2016 \| 0:09:05 UTC
	http://wccftech.com/nvidia-pascal-specs/ @+ _ ____________
	ID: 43046 \| Rating: 0 \| rate: / Reply Quote

[CSF] Thomas H.V. DUPONT Send message Joined: 20 Jul 14 Posts: 732 Credit: 126,845,366 RAC: 156,524 Level Scientific publications	Message 43047 - Posted: 20 Mar 2016 \| 9:52:21 UTC - in response to Message 43046.
	Déjà vu on PrimeGrid, Zarck is everywhere ;) Thanks Zarck! ____________ [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres
	ID: 43047 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43050 - Posted: 20 Mar 2016 \| 13:38:12 UTC - in response to Message 43047. Last modified: 20 Mar 2016 \| 20:25:28 UTC
	Will be interesting to see which versions of Generation Pascal (GP) ship first. Last time round the GTX750 and GTX750Ti Generation Maxwell (GM) cards turned up early (Feb 2014) compared to the main high end GM cards GTX970 and GTX980 (Sept 2014) and a Titan Z (GK) turned up after the GM107 releases. In Mar 2014 mobile variants of GM also turned up GM108 and GM107 in the GF800M series. This all made good business sense so I'm expecting something similar this time round. Also worth noting that the GeForce 700 line-up (including Titans) summed 19 standard + OEM variants but the GF900 series only has 6, so far... Probably at least another 2 or 3 months before anything GM ships to POS though. About a month ago I read speculation of a GTX950 SE or LE (Second Edition or Light Edition), but nothing since. Could have been an Engineering Sample (ES) based on a GTX950 GM card but intended for GP, suggesting something like a GTX950Ti might turn up based on GP, say GP107, just ahead of a couple of similar mobile GP models. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43050 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 485 Credit: 11,130,472,824 RAC: 15,320,537 Level Scientific publications	Message 43051 - Posted: 20 Mar 2016 \| 19:48:33 UTC - in response to Message 43046. Last modified: 20 Mar 2016 \| 19:51:46 UTC
	http://wccftech.com/nvidia-pascal-specs/ @+ _ It looks like on the surface these cards are going to be twice as fast as their predecessors, while using a little less power. But here is the catch, how much bigger is the WDDM lag going to be? Also, on high CPU dependent WUs like the GERARD A2ARs, how much bigger is that lag going to be? I guess, you have to take the good with the bad.
	ID: 43051 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43052 - Posted: 20 Mar 2016 \| 21:25:52 UTC - in response to Message 43051. Last modified: 20 Mar 2016 \| 21:30:27 UTC
	It looks like on the surface these cards are going to be twice as fast as their predecessors, while using a little less power. Based on those specs I would guess a performance boost of around 45% but in terms of performance/Watt it could be around 2.2times better. But here is the catch, how much bigger is the WDDM lag going to be? There might be a lot of catches and things we don't know about yet. For example the acemd apps performance might not scale well on these high density cards. That's already the case with the GTX Titan X and GTX980Ti (to a lesser extent). The 7010 to 8000MHz DDR would reduce the Memory Controller Unit burden somewhat but probably not sufficiently to prevent a bottleneck in itself. The architecture might sort that out, or not... The memory bandwidth probably wouldn't be an issue with HBM2, but that seems to be limited to the supposed X80Titan, which might be a bit different in other ways too (double precision), and >>pricier. Also, on high CPU dependent WUs like the GERARD A2ARs, how much bigger is that lag going to be? Who knows what architectural advancements and cuda magic might reduce such CPU dependency? I guess, you have to take the good with the bad. I'm interested in what the NV-Link will bring to real world crunchers. The possibility of adding up to 8 GPU's via an NV-Link sounds like smallish devices might be great connected up, assuming the reliance on the CPU isn't as big. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43052 \| Rating: 0 \| rate: / Reply Quote

Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 43053 - Posted: 21 Mar 2016 \| 0:06:57 UTC - in response to Message 43047. Last modified: 21 Mar 2016 \| 0:07:09 UTC
	Déjà vu on PrimeGrid, Zarck is everywhere ;) Thanks Zarck! Souvent présent ici -> http://forum.boinc-af.org @+ _ ____________
	ID: 43053 \| Rating: 0 \| rate: / Reply Quote

[CSF] Thomas H.V. DUPONT Send message Joined: 20 Jul 14 Posts: 732 Credit: 126,845,366 RAC: 156,524 Level Scientific publications	Message 43054 - Posted: 21 Mar 2016 \| 8:11:14 UTC - in response to Message 43053.
	Sincères salutations de toute l'équipe CRUNCHERS SANS FRONTIERES à L'Alliance Francophone ! :) ____________ [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres
	ID: 43054 \| Rating: 0 \| rate: / Reply Quote

Skyler Baker Send message Joined: 19 Feb 16 Posts: 19 Credit: 140,656,383 RAC: 0 Level Scientific publications	Message 43076 - Posted: 24 Mar 2016 \| 22:59:37 UTC
	Wonder what they will cost also.
	ID: 43076 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43077 - Posted: 25 Mar 2016 \| 2:36:07 UTC - in response to Message 43052.
	http://www.gputechconf.com/ Keep an eye out during the week of April 3rd for real Pascal Information. At 2015 GTC (GM200) Titan X launched. Maybe the GPU-Z database reveals a real Pascal soon. Si Software benchmarking database a good place to find new GPU's. skgiven wrote: Who knows what architectural advancements and cuda magic might reduce such CPU dependency? A year and half ago in the "Maxwell now" thread I asked: Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature. Quoted from newest CUDA programming guide-- "new managed memory space in which all processors see a single coherent memory image with a common address space. A processor refers to any independent execution unit with a dedicated MMU. This includes both CPUs and GPUs of any type and architecture. " In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions. GPUGRID developer Matt (MJH) replied: We have cc-specific optimisations for each of the most performance sensitive kernels. Generally don't use any of the features introduced post CUdA 4.2 though, nothing there we particularly need. I expect the GM204 performance will be marked improved once I have my hands on one. One difference is Maxwell's L1 cache size 80kB per SMM (SM 5.0/5.2) while Kelper L1 16kB per SMX (SM 3.0/3.5) possibly configured up to 48kB per SMX. Maxwell L2 cache also larger than Kelper. (See Maxwell now thread for more info and CUDA toolkit. ) Pascal likely has a larger L1/L2 cache and more efficient instruction throughput as well more registers per SMP and per thread. If the DP ratio is 1/3 than upping an SMP might be back to Kelper's 192c SMX size. Pascal with a 1/3 DP to SP ratio a 6 [32c] blocks per (1) warp schedulers for each 32c block or maybe (3) 64c blocks per SMP with 2 warp schedulers per block could work. Maxwell's current 4 [32] blocks per SMM with One warp scheduler for each 32c block would only work with 1/2 or 1/4 and 1/8 DP to SP ratio. Some rumors are pointing to a 1/3 ---> 1/24 (consumer Geforce GPU) DP ratio Pascal design. Although it's also possible the Tesla or Titan is 1/2 or 1/4. Pascal could be 8 [32c] block with 8 warp schedulers per 256 core SMP - either 64/128DP cores per SMP at 8 or 16DP per 32c block. It possible a block can grow to 64c with 16 or 32DP in each block - 256c SMP with (4) 64c blocks. Pascal "SMP" possible configurations with 16 bit cores (without TMU/ROP counts): --- 256c SMP (4) 64c block - 8 warp schedulers - 2 warps per 64c block - 8/16/32 DP per block. 1/8 or 1/4 or 1/2 DP ratio. 32 bits cores (2x16bit) --- 256c SMP (8) 32c block - 8 warp schedulers - 1 warp per 32c block - 8 or 16DP cores per (32c) block at 1/4 or 1/2 SP to DP ratio --- 192 core SMP (6) 32c block - 16 DP core per 32c block for Tesla/Titans (96DP in an 192c SMP at 1/2 ratio) 1 warp scheduler per 32c block. --- 192c SMP (3) 64c block - 32DP per block - 96DP core per SMP - 1/3 DP to SP ratio. Consumer's GPU knocked down to 8DP core per 64c block. 2 warps per 64c block. --- 192c SMP with similar design as Kelper: 64 DP core per SMP with 32c block per warp. --- 192c SMP with (4) 48c blocks (superscalar) 16bit cores (128 or 256bit SIMD lane) and (32) 32bit cores per block - 12DP core per block - 1/3 DP ratio --- 128c SMP (2) 64 blocks and 2 warp schedulers per block - 16 or 32 DP cores per block - 1/4 or 1/2 DP ratio. Consumer's GPU's 8 DP cores per block. --- 128 SMP (4) 32 blocks and 1 warp scheduler per block - 8 or 16 DP cores per block - 1/4 or 1/2 P ratio. Consumers GPU's 4 DP per block. Maxwell's (4) 32bit 32c block at 1 warp per block (4 warp schedulers per 128SMX) design much more efficient than Kelpers 192c 4 warp schedulers SMX. Kelper is superscalar 4 (32c) vec32 sets and 2 (32c) vec16 design per SMX. Maxwell is 32vec only so scalar. The 8 64bit core per SMX was trimmed to 4 per SMM in Maxwell. Kelper GK104/110 golden 24/7 OC mark is 1.250/1.3GHz - GM107 1.4GHz - GM204/GM206 1.5GHz. 16nm GP104 probably able to equal GM107 clocks. I'm going to enjoy learning how 16nm transistors react when overclocked with water cooling and air while comparing temperature/power/OC 16nm profiles to 28nm Maxwell. ??Pascal compute Capability or "SM" version ??: CUDA_ARCH=6.0 CUDA_ARCH=6.1 CUDA_ARCH=6.2 CUDA_ARCH=6.3 http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html http://docs.nvidia.com/cuda/parallel-thread-execution/index.html http://docs.nvidia.com/cuda/gpudirect-rdma/index.html 7.5 CUDA has numerous updates. CUDA 8.0 even more so. I'm interested in what the NV-Link will bring to real world crunchers. The possibility of adding up to 8 GPU's via an NV-Link sounds like smallish devices might be great connected up, assuming the reliance on the CPU isn't as big. https://devblogs.nvidia.com/parallelforall/how-nvlink-will-enable-faster-easier-multi-gpu-computing/ Nv-livk whitepaper available. NV-link (GPU >><< GPU) and (GPU >><< host) 5 to 12x more bandwidth than PCIe3.0. For IBM POWER8 and possible ARM CPU's they'll be CPU <<>> GPU <<>> GPU nvlivk while maybe (PCIe4.0?) Skylake Xeons get GPU <<>> GPU nvlink only. And maybe Kabylake i7/i5 also receive GPU <<>> GPU. Programs that require PCIe bandwidth can be issue with Maxwell. (Kelper not so much) A tip: run CUDA-Z without and with ACEMD processes and it will show the types of memory performance being affected. Until consumer nv-link is mainstream: Skylake z170 and upcoming Kabylake z200 series chipsets platform offer the highest consumer WDDM OS ACEMD performance. Linux and XP also see benefits with Skylake compared to Haswell or Ivy Bridge. Skylake CPU upgraded to DMI3 while Haswell/Ivy a DMI2 link. (doubling the bandwidth rate) PCIe/DMI clocks are also decoupled in Skylake with new (fclk) running 1 GHz or above. GPU runtimes with skylake chipsets are mostly faster with lower CPU times than Haswell. (Some risers have an area where a DIY PCIe 6pin can soldered) on a Powered 15pin with (3 12V pins) SATA to 4pin Molex riser Maxwell GTX750 GPU runtimes lose around 50% performance running on a x1 PCIe 2.0 compared to x4 PCIe 3.0. My (800MHz) 1.6GHz DDR3 64bit memory GT630 (384cores) secondary crunching GPU on the PCIe2.0 x1 slot > USB3.0 riser is around 30% slower than my 384 core GT650m GDDR5 128bit (1GHz/2GHz/4GHz) memory interface that's connected to PCIe2.0 x8. (My 970's show a 10% runtime difference with PCI3.0 x4 - vs. - x8 for certain WU.) A note about power consumption: Primegrid's Genefer opencl (4.3 PTX model) OCL3 and about to be released OCL4 puts an OCed 970 at 230W running n=21/22 WU. A n=20 WU will push be 210W with clocks at 1.4GHz or slightly above. Genefer uses the funnel shift instruction which is a C.C 3.5+ feature. Having more power phases on a GPU and motherboard help's keep electrical power and heat lower. I find this important if one's computer 24/7 for years at a time crunching. Both my GTX970's ran ACEMD at (5teraFLOPS) 1.5GHz core clock (145W to up 165W) - the Zotac 13 phase 970 consumes 8-15W less power on all ACEMD WU than an EVGA AVX2.0+ 3973 model with 8 phases when both GPU are on PCIe3.0 x8 or x4) WDDM isn't fully utilizing the ACEMD OC scaling. My current MB is an all digital 20 phase (a refub z87 MSI Mpower Max I picked up for 79$ last July) that use's less power on CPU than my former 16 phase z97 Mpower (also bought as a refub for 80$) by about 3-7W and around 10-14W than a 12 all digital phase ASUS board I tried out. If you're building a Multi GPU board - a top notch phase count will help with long-term power consumption as does a custom PCB GPU compared to reference design. A 1200W (platinum) PSU with 102 AMP single 12V rail also helps.
	ID: 43077 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 43129 - Posted: 1 Apr 2016 \| 20:48:13 UTC
	@NV-Link: it's meant for servers, so don't expect any benefit for crunchers anytime soon. @Price: the 1st card from the 28 nm generation provided lower power consumption and modern features, but were priced about as high as the similarly performing cards of the prior generation. I.e. there was no improvement of the price/performance ratio until about 1/2 to 3/4 years availability. The reason was limited fab capacity for the new process and initially higher costs (due to inferior yields), as well as refunding some R&D. Since 14/16 nm is going to be more expensive per area even at comparable yield I expect this situation to repeat, maybe even a bit worse (for us). @WCCFTech: they're wrong so often that I call them "WTF Tech" :p Sometimes they're right, but those specs don't make sense in too many points: - just 1000 MHz for the X80? That's significantly below the GTX980, despite a major fabrication technology improvement. Unless the 16 nm process is extremly clock-unfriendly (and the mobile chips sugget otherwise) this looks wrong. - the bigger chips clock higher? It's always been the other way around for technical reasons. - X80Ti and X80 Titan based on the same flag-chip GP100, yet use different memory interfaces? And the smaller one comes with 1/6 of all units deactivated? If you're building a massive GPU (and 6000 shaders is massive!) every additional mm² hurts. The last thing you want on such a chip is a 2nd memory interface going unused. A 512 bit GDDR5 interface at extremly high clock speeds (8 GHz) eats a significant amount of die space. I suspect they'd rather build 2 different GPUs for that and equip one with strong FP64 horsepower for additional differentiation. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 43129 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43131 - Posted: 2 Apr 2016 \| 9:53:31 UTC - in response to Message 43129. Last modified: 2 Apr 2016 \| 12:13:33 UTC
	Based on shipping manifests there will be a Big card and a range of smaller cards. Logs show a range of GPU's with 'Insurance' values of between 30,000INR and 56,000INR and another GPU costing around 200,000INR (£2K). Assumes 699 represents the Pascal architecture, which might be wrong as it's also listed in mobile quadro's. Date HS Code Description Origin Country Port of Discharge Unit Quantity Value (INR) Per Unit (INR) 26-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G413-0000-000 / NOT FOR SALE United States Banglore Air Cargo NOS 2 61,380 30,690 26-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G610-0000-000 China Banglore Air Cargo NOS 5 291,803 58,361 26-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 United States Banglore Air Cargo NOS 5 238,085 47,617 22-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-5G418-0503-000 United States Banglore Air Cargo NOS 2 483,089 241,544 22-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-5G418-0503-000 United States Banglore Air Cargo NOS 6 1,221,994 203,666 22-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 United States Banglore Air Cargo NOS 15 657,083 43,806 19-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 / NOT FOR SALE United States Banglore Air Cargo NOS 2 112,473 56,237 18-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-1G411-0000-100 China Banglore Air Cargo NOS 10 381,192 38,119 16-Mar-2016 84733092 COMPUTER GRAPHICS CARDS, 699-12914-0000-100 China Banglore Air Cargo NOS 1 90,020 90,020 GM first appeared in the form of a GTX750Ti and GTX750. That was way back in Feb 2014. It’s now over 2years since the last Generation of NV GPU hit the streets. That was a successful launch within the entry level gaming GPU market and was followed up with similar mobile variants. 7 months later the high end cards (980 and 970) arrived - too long IMO. This time I'm expecting a similar but not identical approach; entry level to mid range gaming cards to appearing first, as that's the biggest market area and NV will want to flood it before AMD release their 14nm cards. Similar mobile cards to follow and some time later the bigger cards. I'm not really expecting a replacement for the GTX980 or GTX980Ti yet as the big card could be a GP Quadro, GTX1080Ti, GTX 10-Titan, or even a Tesla. That said, and going by lots of GPU pairs shipping, a dual mid-range card based on Pascal could well outperform high end GM GPU's such as the GTX980 and offer many an early upgrade route. I'm expecting enough enticement for GTX700 card holders to buy sooner, rather than later. Just releasing entry level cards wouldn't do that, but mid range cards should. Good chance Jen-Hsun Huang, NV’s CEO, will announce something GP when he speaks at the GPU Technology Conference on 5th April. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43131 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43133 - Posted: 2 Apr 2016 \| 14:11:59 UTC - in response to Message 43131.
	BTW was there any dual GTX750Ti variant ever released?
	ID: 43133 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43134 - Posted: 2 Apr 2016 \| 14:20:14 UTC - in response to Message 43133.
	Not that I'm aware of. The GTX750Ti isn't SLI capable, so I doubt that a card would be built on GM107. Not being Sli capable probably improved the card's performance/Watt. Perhaps a dual GTX960 would be viable, but with GP so close I doubt anyone would want to build it. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43134 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43154 - Posted: 5 Apr 2016 \| 18:19:16 UTC Last modified: 5 Apr 2016 \| 19:14:55 UTC
	https://devblogs.nvidia.com/parallelforall/inside-pascal/ Pascal Tesla P100 revealed at GTC (available 2017Q1 for OEM): 21.2 teraFLOPS FP16 10.6 () FP32 5.3 () FP64 3840 CUDA cores GPU 15.3 billion transistors / 610mm² 4MB L2 cache / 14MB shared register file (6MB Maxwell GM200) NVlink / HBM2 Unified Memory Compute Preemption Compute Capability 6.0 core clock 1328MHz / boost 1480MHz 64 cores per SM / (2) 32core blocks (warp for each block - dispatching two warp instructions per clock) 32 (Double precision) cores / 16 load/store / 16 SFU per SM eight 512-bit memory controllers (4096 bits total) 16nm FinFET 6 Graphics Processing Clusters (10 SM's per GPC) 300W TDP https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2016/04/gp100_SM_diagram-624x452.png https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2016/04/gp100_block_diagram-1-624x368.png https://devblogs.nvidia.com/parallelforall/cuda-8-features-revealed/ CUDA 8 will be available in August 2016 and there will be a release candidate available around June.
	ID: 43154 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43155 - Posted: 5 Apr 2016 \| 19:29:49 UTC - in response to Message 43154.
	Well, this is not the product we are waiting for.
	ID: 43155 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43160 - Posted: 9 Apr 2016 \| 12:58:40 UTC - in response to Message 43155.
	That GP100 is the Tesla (or similar manifestation) which doesn't concern most crunchers. They will probably be subject to pre-contracts and ALL be going to data centres first (AKA Titan 3 years ago), then to OEM's (maybe in the form of Quadro's) and then possibly/eventually in the form of a GTX1080i to everyone else, if it makes sense in 2017 to do that. Would have been nice to see a midrange card announced 1050Ti or 1060 but that didn't happen. For now, the GPU's we are interested in will likely be GP104 and down; from the GTX1080 (or whatever name it will be) down to perhaps a GTX1040. So while there probably won't be a direct replacement for the GTX980Ti any time soon, there will likely be smaller models (GTX1080) that might offer up similar or competitive performances to the GTX980Ti, and if not it would be highly likely that two lesser GP cards would outperform a GTX980Ti. Even today, two GTX970's (£360) match a GTX980Ti (£520). The 980Ti didn't even ship until after the Titan X. While the GTX980Ti is a flagship gaming card and is only 10months old it was almost an afterthought in the GM line-up and never really made financial sense. The latest rumour suggests a GTX1080 and 1070 will turn up in June (with an announcement between May 31st and June 4th at Computex, or not...), but mass shipment probably wouldn't happen until July 2016. The same rumours also suggest 7400M to 7900M transistors (not far off a GTX980Ti's 8000M and well over a GTX980's 5000M). Even at 74000M transistors (92.5% of a GTX980Ti's) it would only take a slight architectural improvement to match a GTX980Ti, or an 8% shader clock boost. That said, there could also be bottlenecks and architectural constraints that would inhibit performance and it probably won't work here straight out of the box. I guess if you really want another card now, your electric is cheap, and you are not bothered about it losing value or being one of the fastest cards around in 6months then a GTX980Ti might still be an option but it's not something I would go for. The best time to buy a GTX10xx to crunch here would be when we know they work, then when AMD release their 14nm line range, which will probably be some time after NV's launch (price drop). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43160 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 43161 - Posted: 9 Apr 2016 \| 13:53:39 UTC - in response to Message 43160. Last modified: 9 Apr 2016 \| 13:55:48 UTC
	I picked up a new Asus GTX 980 TI on Ebay for £450 free P&P. I figured that it would be a while before a competitor to come out and then for GPUGrid to make it work. At £450 I should be able to resell at not too much of a loss. As a by the way, graphics cards go for crazy prices on Ebay had to wait ages to get a price like £450 Completely of track, "The Druids Nephew" EW in The Grand National
	ID: 43161 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43162 - Posted: 9 Apr 2016 \| 14:54:25 UTC - in response to Message 43161.
	If your money's backing GP to be a late runner with poor odds of an early app appearance, £450 was a decent punt for an each way bet and a heads length better than sticking one on the nose at £520 so to speak. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43162 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 43163 - Posted: 9 Apr 2016 \| 16:11:45 UTC - in response to Message 43162.
	HaHa, well done SK :-)
	ID: 43163 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43165 - Posted: 9 Apr 2016 \| 17:49:14 UTC Last modified: 9 Apr 2016 \| 17:59:58 UTC
	This GP100 is intended for professional use. I don't think it will be ever released in a form of a GeForce card. It's a "replacement" for the GK110 (GTX Titan, Titan Black, Titan Z), as there was no such chip in the Maxwell product line. The lesser Pascal chips we are waiting for probably won't have that many DP units, NVlink and HBM2. The leaked specifications are surely inaccurate regarding the clocks, as it could be as high as of the GP100. By using 16nm technology theoretically it is possible to have 3 times as much components over the same area as using 28nm (28/16=1.75; 1.75^2=3.0625), but I don't think NVidia wants to produce that large chips for the gaming market (to achieve higher yields), so I expect physically smaller chips than the GP100 in the high-end segment of the gaming cards. Still they could be twice as fast as the GTX980Ti, which is pretty enough (depending on what AMD will have). The effect of the WDDM overhead could be even more deteriorating than on the present high-end cards, also maybe there won't be Windows XP drivers for the Pascal series at all, and in this case I will have to switch to Linux on some of my hosts.
	ID: 43165 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 43166 - Posted: 9 Apr 2016 \| 19:55:28 UTC - in response to Message 43165.
	There was a comment a while ago on the POEM forum that it was not clear whether the HBM2 stacked memory had the fine-grained address ability (if that is the term) required for optimum performance for their work. Whether that applies to GPUGrid I don't know, but I would not jump into the lake without checking for rocks first.
	ID: 43166 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43167 - Posted: 9 Apr 2016 \| 23:58:23 UTC - in response to Message 43166.
	Almost a cert it will be GDDR5@8GHz on the gaming cards. Agree that GP104/6... will not be that similar to GP100:- DP just isn't needed on mid-range to high end gaming cards, never mind entry level cards. Would be cautious about performance though - who knows what bottlenecks there will be. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43167 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 315,774 Level Scientific publications	Message 43169 - Posted: 10 Apr 2016 \| 10:57:48 UTC
	http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_nkw=980Ti&_sop=15 Now it began and still going to be in the future very much nvidia 980Ti for sale at a good price. but import charges to europe.. destroy all. It is also a lot of good servers on ebay for only Cpu crunch in the usa for sale. but import charges in europe are crazy. http://www.ebay.com/sch/i.html?_from=R40&_sacat=0&_dcat=11211&Number%2520of%2520Processors=8&_nkw=server&_sop=16
	ID: 43169 \| Rating: 0 \| rate: / Reply Quote

nanoprobe Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level Scientific publications	Message 43181 - Posted: 12 Apr 2016 \| 16:01:23 UTC - in response to Message 43076.
	Wonder what they will cost also. Your first born. 🙀
	ID: 43181 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43235 - Posted: 22 Apr 2016 \| 13:07:19 UTC
	For those interested: GP100 Pascal architecture Whitepaper is now available. Whitepaper access and download from Nvidia's website requires registration (including already registered developer accounts.)
	ID: 43235 \| Rating: 0 \| rate: / Reply Quote

Jozef J Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 315,774 Level Scientific publications	Message 43236 - Posted: 22 Apr 2016 \| 18:33:38 UTC
	http://videocardz.com/59266/nvidia-pascal-gp104-gpu-pictured-up-close
	ID: 43236 \| Rating: 0 \| rate: / Reply Quote

Wrend Send message Joined: 9 Nov 12 Posts: 51 Credit: 522,101,722 RAC: 0 Level Scientific publications	Message 43237 - Posted: 22 Apr 2016 \| 20:38:28 UTC
	Nice. Looks like my Titan Blacks may have finally found a worthy replacement... XD Looks like we're maybe getting some decent double precision, 64FP capabilities again too? Just started doing some tests on DP using MilkyWay@Home to see if it can make use of it well enough on my Titan Black cards. (Sorry, bit of a break from GPUGrid.) ____________ My BOINC Cruncher, Minecraft Multiserver, Mobile Device Mainframe, and Home Entertainment System/Workstation: http://www.overclock.net/lists/display/view/id/4678036#
	ID: 43237 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43320 - Posted: 7 May 2016 \| 14:02:41 UTC
	GEFORCE GTX 1080 ($599 MSRB) Available May 27: 7.2 billion transistors 294mm² die 2560 NVIDIA CUDA Cores 1607 (MHz) Base Clock 1733 (MHz) Boost Clock GPU boost 3.0 8.5 (FP32) TeraFLOPs 180W reference TDP 100% power limit (real TDP) likely 225W+ (1) 8 pin Memory Specs: 10 Gbps Memory Speed 8 GB GDDR5X 256-bit Interface Width 320 (GB/sec) Bandwidth Geforce GTX 1070 $379 MSRP Available June 10 Last night's Pascal 1080 unveil demo was running at 2.1MHz on air - similar to Maxwell L2N cooled clocks. Pascal +2.1MHz boost will be Maxwell's 1.5GHz. (For 24/7 OC) Early adopters can help GPUGRID beta test a new ACEMD app - Count me in. Once general pubic availability (June~July) is secured when does the Project announce a new CUDA phase? Will Pascal be a repeat of GK110 initial ACEMD production difficulties? Titan X / GTX 980ti is now EOL/DOA. Maxwell's real TDP's (power limit): GTX980ti = 325~350W GTX980 = 250~275W GTX970 = 220~240W GTX960 = 160~180W GTX750 = 60~75W
	ID: 43320 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43321 - Posted: 7 May 2016 \| 15:08:19 UTC - in response to Message 43320. Last modified: 7 May 2016 \| 15:13:09 UTC
	Now that's the product we are waiting for. If it's not faster by its newer architecture, it should have 1540MHz GPU clock to achieve the performance of an overclocked GTX980Ti (@1400MHz). So by its standard boost clock it should be ~12.5% faster; but if the 2.1GHz is true, then it should be ~36.4% faster than a GTX980Ti@1400MHz while consuming only ~2/3 of the electricity. I thought that this chip would have 3072 CUDA cores, so it has only the 5/6 of what I've expected, but it will be enough to top the GTX980Ti as it has higher clocks. Will Pascal be a repeat of GK110 initial ACEMD production difficulties? As it happened at the release of every previous GPU generation I expect that there will be some difficulties. (The present app won't work with the new cards)
	ID: 43321 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43326 - Posted: 8 May 2016 \| 14:46:24 UTC - in response to Message 43321. Last modified: 8 May 2016 \| 14:57:50 UTC
	If it's not faster by its newer architecture, it should have 1540MHz GPU clock to achieve the performance of an overclocked GTX980Ti (@1400MHz). So by its standard boost clock it should be ~12.5% faster; but if the 2.1GHz is true, then it should be ~36.4% faster than a GTX980Ti@1400MHz while consuming only ~2/3 of the electricity. 2.1GHz is true - check out Nvidia's blog. GP104 Mid-tier 16nm die going to destroy the 3rd gen 28nm big die (GM200) performance/watt. 28nm Big die generations: GK110a > GK110b (GK210) > GM200 There's no doubting that GTX980ti is a strong 32bit chip but with all that heat it's tough cooling a dense setup OCed. IMHO: Water cooling GM200 was really only way to go - WC still has a 350W power limit running a monster program. I thought that this chip would have 3072 CUDA cores, so it has only the 5/6 of what I've expected, but it will be enough to top the GTX980Ti as it has higher clocks. 3072 CUDA is possible as a 2nd or 3rd gen mid-tier 16nm (GP204 or GV104 Volta) part with a slightly bigger die - similar to 4th or is it 5th gen 28nm GM204 (398mm²) .vs. 1st/2nd gen 28nm GK104 (294mm²). (3rd gen mid/low tier 28nm is GM107) Maybe the big die GP100 becomes a Geforce >3000CUDA - or Volta might be the first (Geforce) Big die? Either way the GPU performance/watt advancements is impressive compared to last couple of CPU generations. A non-Pascal question: does you're GM200 hit 1.5GHz stable on any projects and what's the highest (stable) OC for ACEMD? I've always thought GM200 1.5GHz ACEMD was possible. I've ran GM204 @1.5GHz since having them. (A lot Maxwell boards push the 1.5GHz boundary.) FYI: To find any Maxwell and (Pascal) PCB (BIOS) power limit - run Primegrid's OCL4 n=20/21/22 Genefer created by (Developer Yves Gallot) or Sisoftware CUDA scientist benchmark OCed.
	ID: 43326 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43346 - Posted: 10 May 2016 \| 7:55:36 UTC - in response to Message 43326. Last modified: 10 May 2016 \| 7:58:07 UTC
	2560 NVIDIA CUDA Cores would be a nice step up from 2048 (GTX980), which is primarily what we should be comparing a 1080 with. 8GB DDR5 also doubles the 980's 4GB DDR5. The 1733 (MHz) Boost Clock might actually Boost to ~1850MHz without any tweaking. If it outperforms the GTX980Ti for throughput then even better, but even if it doesn't it's highly likely to be better in terms of performance/Watt. For the app to work here it will probably need to be recompiled with the latest CUDA Tool Kit. ACEMD is a complex app unlike some others which might work straight out of the box or with only a little bit of work. If the cards don't initially work here they might work elsewhere. Worst case scenario is that GPUGrid has to wait on a new CUDA tool kit and then don't have the time over the summer holidays to redevelop and test the app. Best case scenario is that a fully functional CUDA took kit is available on launch day and Gianni and Matt are available and up to speed with any CUDA advances/changes and get a Pascal to test with. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 43346 \| Rating: 0 \| rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 43651 - Posted: 30 May 2016 \| 11:33:14 UTC Last modified: 30 May 2016 \| 11:34:42 UTC
	https://xdevs.com/guide/pascal_oc/ Many Geforce GTX 1080 (Gamer) review(s) recently confirmed an average performance increase of ~30% verse stock GTX980ti. (1) OCed GTX1080 typical performance = (2.3) GTX970 (As of now: No reviewers have published a folding@home or sisoftware scientific CUDA benchmark. ) GTX1080 fastest sisoftware single precision scientific CUDA result as of today: http://ranker.sisoftware.net/show_device.php? Total Benchmark score 877.78GFLOPS GEMM (General Matrix Multiply) 4259.25GFLOPS FFT (Fast Fourier Transform) 180.90GFLOPS N-Body (N Body Simulation) 3962.22GFLOPS My GTX970 at 1602MHz for reference (fastest GTX970 in the si software database): Total benchmark score 666.70GFLOPS GEMM (General Matrix Multiply) 2844.11GFLOPS FFT (Fast Fourier Transform) 156.28GFLOPS N-Body (N Body Simulation) 2684.82GFLOPS GTX1080 is 314^mm die with 640CUDA per (4) GPC / 2560CUDA / 20SM * 128CUDA per / (80) 32CUDA blocks / 160 dispatch units / 80 warp schedulers / 640 ld/store units / 640 SFU / 160TMU / 64ROP / 2MB 2nd level cache / 48-112KB 1st level cache with 96KB shared memory for each SM / 65536 [32]bit registers per SM at 16384 per 32CUDA block) GP104 (SM) core structure is as Maxwell SMM (432c blocks 128CUDA per SM) while GP100 has (64CUDA / 32DP per SM). GP104 256bit memory interface is split into 832bit rather than 464bit controllers. 8 ROP are tied to Pascal's 32bit partition. GM200/204 have 16 ROP for every 64bit controller. Each GP104 GPC has 5 SM's (640CUDA). GP100 GPC amounts to the same 640 in (10 sm 64CUDA) The GP104 DP:SP ratio is (Maxwell's) 1/32. (1) DP core per 32SP block (4DP per 128c SMP) GP104 (FP16) will not have double the FP32 output as GP100. GP104 same as Maxwell's 1:1 FP32/FP16 ratio. The new 4.0 polymorph engine / SIP video encode / decode / display / memory compression cache(s) / etc. redesigned. Pascal also other enhancements. (see reviews) TX1070 specs were also confirmed - it's cut down more than Maxwell's (970/980) 1664 .vs. 2048 [3] SMM 384CUDA. GTX1070 has 1920CUDA .vs. the GTX1080 2560CUDA. That's (640core) GPC worth of cut - 5 (128C) SMP's. OCed GTX1070 won't be able to match stock GTX1080 FLOPs even if the GTX1070 OCed beyond a stable 2100MHz. Current Performance on GTX970 at 1.5GHz equals the stock GTX980 running WU's with no lag or choke points. From looks of most reviews - Pascal 24/7 compute OC scaling sweet spot is ~2100. Once above 2100MHz Pascal cores lose some steam. (Maxwell's 1500 ~ 1550 ~ 1600MHz) http://videocardz.com/60547/comparison-of-custom-geforce-gtx-1080-pcbs Zotac's 16VRM + 3Mem (I think it's really 20VRM + 2AUX + 2Mem = 24 total phases) has over 3 times the power delivery compared to a (reference 5+1) model. If (any) crunchers find this card available - please link it here. Most reports indicate Zotac PGF (with OC+ microcontroller) is specific only to Asia markets. It's possible that AMP Extreme GTX1080 will get the PGF PCB in the USA though not confirmed. (no official word as of yet.) (PGF a great card to cool passively (uni-block and no radiator fans) if one has a (hard-line) water cooled system with a +240mm radiator(s). IMO - numerous fans are not required for radiators.) A Zotac Extreme GTX 970 13 phase with OC+ module (70.7% GPU-Z ASIC rating) and EVGA 8 phase (81.1% ASIC rating) both at >1.5GHz eat most GTX 980 for breakfast over at POEM and PrimeGrid Genefer. Prime numbers are helpful in Bio-medical research. Phases matter: at stock voltages and similar core temps - the EVGA power consumption is 240W (1451MHz) computing Primegrid n=21/22 Genefer WU. The Zotac 210W (1481MHz). Genefer certainly the most powerful (Maxwell) app with (lowest stable overclock) available on the BOINC platform by a -100MHz twin-turbo gap. My 24/7 electric usage and bill went up compared to CUDA6.5 ACEMD. These (2) 970's have a 10-15W difference computing ACEMD WU at 1.5GHz.
	ID: 43651 \| Rating: 0 \| rate: / Reply Quote

davebodger Send message Joined: 30 Jul 11 Posts: 2 Credit: 7,052,262 RAC: 0 Level Scientific publications	Message 43767 - Posted: 11 Jun 2016 \| 15:34:54 UTC
	Just tried my nice shiny new Gigabyte GTX1080 on GPUGRID and I just got Computation Errors on the two wu's I downloaded. :-( Astroids was the same but Collatz works OK, so I know it's not the card. I've turned off the project now (Allow no new taks) until you say it's OK for me to try again, I don't want to waste wu's or time, as I presume you need to adapt for Pascal?
	ID: 43767 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,249,865,968 RAC: 4,089,892 Level Scientific publications	Message 43770 - Posted: 11 Jun 2016 \| 18:45:16 UTC - in response to Message 43767. Last modified: 11 Jun 2016 \| 18:46:44 UTC
	Just tried my nice shiny new Gigabyte GTX1080 on GPUGRID and I just got Computation Errors on the two wu's I downloaded. :-( Astroids was the same but Collatz works OK, so I know it's not the card. I've turned off the project now (Allow no new taks) until you say it's OK for me to try again, I don't want to waste wu's or time, as I presume you need to adapt for Pascal? Exactly. See this post: GDF wrote: HI, we expect great performance from the GTX1080 but at the moment we don't have any. As soon as we have them, we need to recompile the code for them and check it. At the moment, the app will crash on any new pascal gpu. gdf
	ID: 43770 \| Rating: 0 \| rate: / Reply Quote

peeticek_LubosKrutek Send message Joined: 30 Nov 08 Posts: 7 Credit: 62,377,145 RAC: 0 Level Scientific publications	Message 43772 - Posted: 12 Jun 2016 \| 5:55:34 UTC - in response to Message 43767.
	What time for collatz tasks are you getting? Rac? Did you try anither projects except the gougrid and asteroids? If yes, ehat results? Thanks
	ID: 43772 \| Rating: 0 \| rate: / Reply Quote

wiyosaya Send message Joined: 22 Nov 09 Posts: 114 Credit: 589,114,683 RAC: 0 Level Scientific publications	Message 44022 - Posted: 20 Jul 2016 \| 18:34:22 UTC
	For anyone interested, AnandTech published Compute benchmark results for the consumer founders edition cards. http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/28 ____________
	ID: 44022 \| Rating: 0 \| rate: / Reply Quote