Author |
Message |
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
After some tests the only thing that we are sure is that downloads of large files stall when the operating is CentOS7. If we use CentOS6 they do work. Can anybody reproduce it?
the way we test it on gpugrid is:
wget --no-cache --delete-after http://www.gpugrid.net/download/libcufft.so.8.0
We don't know if it is a combination of CentOS7 and our network or it's a problem of CentOS7.
There could also be other problems but this is one.
gdf |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Ubuntu 16.04 stalled
wget --no-cache --delete-after http://www.gpugrid.net/download/libcufft.so.8.0
--2016-12-23 23:16:39-- http://www.gpugrid.net/download/libcufft.so.8.0
Resolving www.gpugrid.net (www.gpugrid.net)... 84.89.134.145
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 146745600 (140M) [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 0%[ ] 692.94K --.-KB/s eta 7h 35m
Ditto for W10 (from cmd)
bitsadmin /transfer file1 /download /priority normal http://www.gpugrid.net/download/libcufft.so.8.0 c:\users\YOURNAME\Downloads\file1
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
When did you start using CentOS7 (on your server, I presume)? IIRC, the download problems started 3 or 4 months ago (others will correct me if needed) - any correlation?
If it's the download *client* you're referring to, I can confirm there are problems with a Windows-based client too.
When we hit similar problems at SETI@Home some years ago, RFC 1323 proved most helpful: it's enabled by default in Linux, and available (but optional - requires enabling) in Windows. But it isn't enough to solve the current problems here, unfortunately.
Looking it up again tonight for this post, I see that 1323 has been obsoleted by RFC 7323 - I haven't read it yet.
Of all the suggestions I've read on this board since this problem started, the most plausible (and consistent with all the symptoms I've observed so far) seems to be that the packets arrive massively out of order, and overflow the typical number of receive buffers allocated in the network driver by consumer OS installations. I haven't tested the suggested tools for increasing the receive buffer allocation, but I might try that over the holidays. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
we changed to CentOS 7 with the latest server update, so this introduced a further problem with the server.
The other problem started few months ago and it was due to the switch of network provider (now we are in the university domain). I think that they are trying to fix this in January.
gdf |
|
|
|
Try asking service provider for help. |
|
|
koschi Send message
Joined: 14 Aug 08 Posts: 124 Credit: 789,679,198 RAC: 163,607 Level
Scientific publications
|
Ran wget --no-cache --delete-after http://www.gpugrid.net/download/libcufft.so.8.0 from an EC2 vhost in Dublin, a vhost in Bavaria and my home computer in Hannover/Germany. All Ubuntu 16.04, all stalling...
Interestingly another computer today downloaded the ~40MB libcufft 6.5 in one go today via a 3G mobile link (carrier 1und1.de, D network). I'll retry there tomorrow with the libcufft.so.8.0... |
|
|
|
we changed to CentOS 7 with the latest server update, so this introduced a further problem with the server.
The other problem started few months ago and it was due to the switch of network provider (now we are in the university domain). I think that they are trying to fix this in January.
gdf
Sounds like you violated the most cardinal rule:
"If it ain't broke, don't fix it."
Unless, you were coerced by the university to join their domain network.
You can always revert to CentOS6, everything with computers is trial and error, or so it seems!
|
|
|
|
Perhaps the number of the transmit / receive buffers should be increased on the GPUGrid server(s) (if it's possible).
I assume that they are virtualized, so I guess that it should be increased on the host OS also. |
|
|
|
Here's a log from a laptop running Lubuntu. It took quite awhile but did finally complete.
leonidas@lubui3:~$ wget --no-cache --delete-after http://www.gpugrid.net/download/libcufft.so.8.0
--2016-12-23 19:33:45-- http://www.gpugrid.net/download/libcufft.so.8.0
Resolving www.gpugrid.net (www.gpugrid.net)... 84.89.134.145
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 146745600 (140M) [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 8%[> ] 11.42M 841KB/s in 39m 19ss
2016-12-23 20:13:05 (4.96 KB/s) - Connection closed at byte 11969900. Retrying.
--2016-12-23 20:13:06-- (try: 2) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 134775700 (129M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 12%[+> ] 17.99M 143KB/s in 11m 1s s
2016-12-23 20:24:07 (10.2 KB/s) - Connection closed at byte 18868516. Retrying.
--2016-12-23 20:24:09-- (try: 3) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 127877084 (122M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 12%[++ ] 18.08M --.-KB/s in 15m 1s h
2016-12-23 20:39:11 (106 B/s) - Read error at byte 18963972/146745600 (Connection timed out). Retrying.
--2016-12-23 20:39:14-- (try: 4) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 127781628 (122M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 19%[++=> ] 27.66M 410KB/s in 46m 51sm
2016-12-23 21:26:05 (3.49 KB/s) - Connection closed at byte 29008738. Retrying.
--2016-12-23 21:26:09-- (try: 5) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 117736862 (112M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 38%[++++===> ] 53.90M 268KB/s in 2h 5m m
2016-12-23 23:31:48 (3.56 KB/s) - Connection closed at byte 56515648. Retrying.
--2016-12-23 23:31:53-- (try: 6) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 90229952 (86M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 45%[++++++++=> ] 64.12M 393KB/s in 33m 4s s
2016-12-24 00:04:57 (5.28 KB/s) - Connection closed at byte 67240731. Retrying.
--2016-12-24 00:05:03-- (try: 7) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 79504869 (76M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 47%[++++++++++ ] 66.00M --.-KB/s in 15m 7s s
2016-12-24 00:20:11 (2.11 KB/s) - Read error at byte 69205974/146745600 (Connection timed out). Retrying.
--2016-12-24 00:20:18-- (try: 8) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 77539626 (74M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 48%[++++++++++ ] 68.49M --.-KB/s in 15m 9s s
2016-12-24 00:35:27 (2.80 KB/s) - Read error at byte 71813367/146745600 (Connection timed out). Retrying.
--2016-12-24 00:35:35-- (try: 9) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 74932233 (71M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 66%[++++++++++===> ] 93.17M 726KB/s in 64m 24ss
2016-12-24 01:40:00 (6.54 KB/s) - Connection closed at byte 97697830. Retrying.
--2016-12-24 01:40:09-- (try:10) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 49047770 (47M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 67%[++++++++++++++ ] 94.48M --.-KB/s in 15m 2s s
2016-12-24 01:55:12 (1.49 KB/s) - Read error at byte 99071683/146745600 (Connection timed out). Retrying.
--2016-12-24 01:55:22-- (try:11) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 47673917 (45M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 67%[++++++++++++++ ] 94.89M --.-KB/s in 15m 1s m
2016-12-24 02:10:23 (474 B/s) - Read error at byte 99498576/146745600 (Connection timed out). Retrying.
--2016-12-24 02:10:33-- (try:12) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 47247024 (45M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 68%[++++++++++++++ ] 95.19M --.-KB/s in 15m 3s m
2016-12-24 02:25:37 (351 B/s) - Read error at byte 99815821/146745600 (Connection timed out). Retrying.
--2016-12-24 02:25:47-- (try:13) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 46929779 (45M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 73%[++++++++++++++=> ] 102.25M 544KB/s in 29m 46ss
2016-12-24 02:55:34 (4.05 KB/s) - Connection closed at byte 107221560. Retrying.
--2016-12-24 02:55:44-- (try:14) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 39524040 (38M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 86%[++++++++++++++++==> ] 121.72M --.-KB/s in 15m 15ss
2016-12-24 03:10:59 (21.8 KB/s) - Read error at byte 127630554/146745600 (Connection timed out). Retrying.
--2016-12-24 03:11:09-- (try:15) http://www.gpugrid.net/download/libcufft.so.8.0
Connecting to www.gpugrid.net (www.gpugrid.net)|84.89.134.145|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 146745600 (140M), 19115046 (18M) remaining [application/x-troff-man]
Saving to: ‘libcufft.so.8.0’
libcufft.so.8.0 100%[+++++++++++++++++++==>] 139.95M 3.90MB/s in 12s
2016-12-24 03:11:21 (1.48 MB/s) - ‘libcufft.so.8.0’ saved [146745600/146745600]
Removing libcufft.so.8.0.
____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370! |
|
|
|
Today I've reinstalled my GTX 1080 host with Windows 10 x64 (as my Ubuntu have not received work for a while).
After a fresh install (and Windows & driver updates) I've installed BOINC v7.6.33, and it has download problems (not just on the cufft64_80.dll file, but on the larger files of the tasks also). It has a Realtek Gigabit LAN adapter (using the latest driver), just as my other i3 hosts with Windows XP x64 on the same LAN, but those hosts have not had any stalled downloads recently.
I've tried to change some settings like jumbo frames and energy efficiency (the number of transmit and receive buffers are already at their maximum settings by default), but these changes did not helped any. |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Hopefully the download stall problem is fixed now. Please post here if you still experience it. |
|
|
|
I saw some stalling a few hours ago, but the wget mentioned above worked flawlessly just now. The file downloaded in just under four minutes without any stalling.
____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370! |
|
|
|
Hopefully the download stall problem is fixed now. Please post here if you still experience it.
Excellent news! |
|
|
|
Hopefully the download stall problem is fixed now. Please post here if you still experience it.
Any details on what actions you took that might have resolved it? |
|
|
|
I wonder if that transmit/receive buffer thing or any other user end changes that could help could be done by the project when adding the project or even when getting a new task? That is probably too much to do when permissions would be needed to make changes on computers that way. But if there is a way to make it better, or anything better for that matter, it should go out on a News communication to the clients. Some thread that articulates changing those settings or using SWAN_SYNC, or whatever else that optimized network and project speed and quality.
BTW, what was done that we do now have quicker downloads? And is it a temporary fix or a permanent solution?
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org |
|
|
TrotadorSend message
Joined: 25 Mar 12 Posts: 103 Credit: 13,391,027,393 RAC: 71,232,833 Level
Scientific publications
|
Today I've reinstalled my GTX 1080 host with Windows 10 x64 (as my Ubuntu have not received work for a while).
After a fresh install (and Windows & driver updates) I've installed BOINC v7.6.33, and it has download problems (not just on the cufft64_80.dll file, but on the larger files of the tasks also). It has a Realtek Gigabit LAN adapter (using the latest driver), just as my other i3 hosts with Windows XP x64 on the same LAN, but those hosts have not had any stalled downloads recently.
I've tried to change some settings like jumbo frames and energy efficiency (the number of transmit and receive buffers are already at their maximum settings by default), but these changes did not helped any.
I've upgraded the NVIDIA driver in ubuntu to 370.28 and now I'm able to download WUs, it seems that 367.57 does not make it any longer although it was working when the Pascal application was released. The downside is that now the card does not boost beyond 2GHz as before....
|
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
After some tests the only thing that we are sure is that downloads of large files stall when the operating is CentOS7. If we use CentOS6 they do work. Can anybody reproduce it?
the way we test it on gpugrid is:
wget --no-cache --delete-after http://www.gpugrid.net/download/libcufft.so.8.0
We don't know if it is a combination of CentOS7 and our network or it's a problem of CentOS7.
There could also be other problems but this is one.
gdf
we changed to CentOS 7 with the latest server update, so this introduced a further problem with the server.
This DL problem has been going on for many months before the server upgrade, but it did get a lot worse at that time. It also got worse and worse as the file sizes became larger and larger. |
|
|
|
Downloads still stalling: I'm just about to quit GPUGrid if this continues....all suggestions welcome
Computer ID Name Location Avg. credit Total credit BOINC
version CPU GPU Operating System Last contact
ID: 214484
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC Panzer-01 home 1,046.01 81,849,947 7.6.33 AuthenticAMD
AMD FX(tm)-8350 Eight-Core Processor [Family 21 Model 2 Stepping 0]
(8 processors) [2] NVIDIA GeForce GTX 660 Ti (2048MB) driver: 340.62 Microsoft Windows 7
Ultimate x64 Edition, Service Pack 1, (06.01.7601.00) 12 Mar 2017 | 11:57:21 UTC
____________
John |
|
|