OpenSSH
[Top] [All Lists]

Re: Nagle & delayed ACK strike again

To: Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: Nagle & delayed ACK strike again
From: Damien Miller <djm@mindrot.org>
Date: Fri, 22 Dec 2006 14:20:12 +1100 (EST)
Cc: rick.jones2@hp.com, openssh-unix-dev@mindrot.org
Delivered-to: sp-com-lists@consult.net
Delivered-to: openssh-unix-dev-list1@securepoint.com
Delivered-to: openssh-unix-dev-tmda@mindrot.org
Delivered-to: openssh-unix-dev@mindrot.org
In-reply-to: <E1GxX7Y-0001Zq-00@dorka.pomaz.szeredi.hu>
List-archive: <http://lists.mindrot.org/pipermail/openssh-unix-dev>
List-help: <mailto:openssh-unix-dev-request@mindrot.org?subject=help>
List-id: Development of portable OpenSSH <openssh-unix-dev.mindrot.org>
List-post: <mailto:openssh-unix-dev@mindrot.org>
List-subscribe: <http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev>, <mailto:openssh-unix-dev-request@mindrot.org?subject=subscribe>
List-unsubscribe: <http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev>, <mailto:openssh-unix-dev-request@mindrot.org?subject=unsubscribe>
References: <E1Gx45V-00052p-00@dorka.pomaz.szeredi.hu> <4589B1FC.8070709@hp.com> <E1GxAul-0005oc-00@dorka.pomaz.szeredi.hu> <458ADA18.4060300@hp.com> <E1GxTcr-00012V-00@dorka.pomaz.szeredi.hu> <458B0DDD.8020806@hp.com> <E1GxX7Y-0001Zq-00@dorka.pomaz.szeredi.hu>
Sender: openssh-unix-dev-bounces+openssh-unix-dev-list1=securepoint.com@mindrot.org
sorry to interrupt your argument:

revision 1.120
date: 2006/03/15 01:05:22;  author: djm;  state: Exp;  lines: +2 -3
   - dtucker@cvs.openbsd.org 2006/03/13 08:33:00
     [packet.c]
     Set TCP_NODELAY for all connections not just "interactive" ones.  Fixes
     poor performance and protocol stalls under some network conditions (mindrot
     bugs #556 and #981). Patch originally from markus@, ok djm@

is in OpenSSH from 4.4 onwards

On Fri, 22 Dec 2006, Miklos Szeredi wrote:

> > > To me it still looks like the use of Nagle is the exception, it has
> > > already been turned off in the server for
> > > 
> > >   - interactive sessions
> > 
> > For at least some interactive sessions.  In the telnet space at least, 
> > there is this constant back and forth happening bewteen wanting 
> > keystrokes to be nice and uniform, and not overwhelming slot terminal 
> > devices (eg barcode scanners) when applications on the server dump a 
> > bunch of stuff down stdio.
> 
> For ssh this is unconditional.  I've suggested adding NoDelay/
> NoNoDelay options, but somebody on this list vetoed that.
> 
> > >   - X11 forwarding
> > > 
> > > and it will need to be turned off for
> > > 
> > >   - SFTP transport
> > > 
> > >   - IP tunnelling
> > > 
> > >   - ???
> > > 
> > > Is there any transported protocol where Nagle does make sense?
> > 
> > Regular FTP is one, anything unidirectional.
> 
> Nagle doesn't help FTP or HTTP does it?  Anything that just pushes a
> big chunk of data will automatically end up with big packets.
> 
> So other than the disputed interactive session, Nagle doesn't seem to
> have any positive effects.
> 
> > It also depends on what one is trying to optimize.  If one is only 
> > interested in optimizing time, Nagle may not be the thing.  However, 
> > Nagle can optimize the ratio of data to data+headers and it can optimize 
> > the quanity of CPU consumed per unit of data transferred.
> 
> For a filesystem protocol obviously latency (and hence throughput) is
> the most important factor.
> 
> > Some netperf data for the unidirectional case, between a system in Palo 
> > Alto and one in Cupertino, sending-side CPU utilization included, 
> > similar things can happen to receive-side CPU:
> > 
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > tardy.cup.hp.com (15.244.56.217) port 0 AF_INET
> > Recv  Send   Send                        Utilization      Service Demand
> > SocketSocket Message Elapsed             Send     Recv    Send    Recv
> > Size  Size   Size    Time     Throughput local    remote  local   remote
> > bytes bytes  bytes   secs.    10^6bits/s % S      % U     us/KB   us/KB
> > 
> > 131072 219136    512    10.10      74.59   8.78   -1.00   9.648   -1.000
> > 
> > raj@tardy:~/netperf2_work$ src/netperf -H tardy.cup.hp.com -c -- -m 512 
> > -s 128K -S 128K -D
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > tardy.cup.hp.com (15.244.56.217) port 0 AF_INET : nodelay
> > Recv   Send   Send                       Utilization      Service Demand
> > Socket Socket Message Elapsed            Send     Recv    Send    Recv
> > Size   Size   Size    Time    Throughput local    remote  local   remote
> > bytes  bytes  bytes   secs.   10^6bits/s % S      % U     us/KB   us/KB
> > 
> > 131072 219136   512    10.02       69.21  20.56   -1.00   24.335  -1.000
> > 
> > The multiple concurrent request/response case is more nuanced and 
> > difficule to make.  Basically, it is a race between how many small 
> > requests (or responses) will be made at one time, the RTT between the 
> > systems, the standalone ACK timer on the receiver, and the service time 
> > on the receiver.
> > 
> > Here is some data with netperf TCP_RR between those two systems:
> > 
> > raj@tardy:~/netperf2_work$ src/netperf -H tardy.cup.hp.com -c -t TCP_RR 
> > -- -r 128,2048 -b 3
> > TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > tardy.cup.hp.com (15.244.56.217) port 0 AF_INET : first burst 3
> > Local /Remote
> > Socket Size  Request Resp. Elapsed Trans.  CPU    CPU    S.dem   S.dem
> > Send   Recv  Size    Size  Time    Rate    local  remote local   remote
> > bytes  bytes bytes   bytes secs.   per sec % S    % U    us/Tr   us/Tr
> > 
> > 16384  87380 128     2048  10.00   1106.42 4.74   -1.00  42.852  -1.000
> > 32768  32768
> > raj@tardy:~/netperf2_work$ src/netperf -H tardy.cup.hp.com -c -t TCP_RR 
> > -- -r 128,2048 -b 3 -D
> > TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > tardy.cup.hp.com (15.244.56.217) port 0 AF_INET : nodelay : first burst 3
> > Local /Remote
> > Socket Size  Request Resp. Elapsed Trans.   CPU    CPU    S.dem   S.dem
> > Send   Recv  Size    Size  Time    Rate     local  remote local   remote
> > bytes  bytes bytes   bytes secs.   per sec  % S    % U    us/Tr   us/Tr
> > 
> > 16384  87380 128     2048  10.01   2145.98  10.49  -1.00  48.875  -1.000
> > 32768  32768
> > 
> > 
> > Now, setting TCP_NODELAY did indeed produce a big jump in transactions 
> > per second.  Notice though how it also resulted in a 14% increase in CPU 
> > utilization per transaction.  Clearly the lunch was not free.
> > 
> > The percentage difference in transactions per second will converge the 
> > larger the number of outstanding transactions.  Taking the settings from 
> > above, where the first column is the size of the burst in netperf, the 
> > second is without TCP_NODELAY set, the third with:
> > 
> > raj@tardy:~/netperf2_work$ for i in 3 6 9 12 15 18 21 24 27; do echo $i 
> > `src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P 0 -v 0 -- -r 128,2048 
> > -b $i; src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P 0 -v 0 -- -r 
> > 128,2048 -b $i -D`; done
> > 3 1186.40 2218.63
> > 6 1952.53 3695.64
> > 9 2574.49 4833.47
> > 12 3194.71 4856.63
> > 15 3388.54 4784.26
> > 18 4215.70 5099.52
> > 21 4645.97 5170.89
> > 24 4918.16 5336.79
> > 27 4927.71 5448.78
> > 
> > If we increase the request size to 256 bytes, and the response to 8192 
> > (In all honesty I don't know what sizes sftp might use so I'm making 
> > wild guesses) we can see the convergence happen much sooner - it takes 
> > fewer of the 8192 byte responses to take the TCP connection to the 
> > bandwidth delay product of the link:
> > 
> > raj@tardy:~/netperf2_work$ for i in 3 6 9 12 15 18 21 24 27; do echo $i 
> > `src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P 0 -v 0 -- -r 256,8192 
> > -b $i -s 128K -S 128K; src/netperf -H tardy.cup.hp.com -t TCP_RR -l 4 -P 
> > 0 -v 0 -- -r 256,8192 -s 128K -S 128K -b $i -D`; done
> > 3 895.18 1279.38
> > 6 1309.11 1405.38
> > 9 1395.30 1325.44
> > 12 1256.75 1422.01
> > 15 1412.39 1413.64
> > 18 1400.04 1419.76
> > 21 1415.62 1422.79
> > 24 1419.56 1420.10
> > 27 1422.43 1379.72
> 
> In SFTP the WRIYR request/reply sizes are more like 64kB/32B, and the
> outstanding transactions are as many as the socket buffers will bear.
> 
> The slowdown is clearly due to 50ms outages from delayed ACK, which is
> totally broken, the network is just sitting there idle for no good
> reason whatsoever.
> 
> I can make new traces, but I guess they would be very similar to the
> ones I sent last time for the SFTP download case.
> 
> Miklos
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev@mindrot.org
> http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
> 
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

<Prev in Thread] Current Thread [Next in Thread>