DragonFly users List (threaded) for 2013-07
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]
SO_REUSEPORT and accept(2) performance
Hi all,
The brief summary (as of 9a1272865c3cb6e079e1554031dcc712a881598b):
nonblocking accept(2)+kqueue(2) w/ SO_REUSEPORT, we are doing 335Kconns/s
nonblocking accept(2)+kqueue(2) w/o SO_REUSEPORT, we are doing 100Kconns/s
On DragonFly, in addition to better load balance, SO_REUSEPORT also
gives us ~200% performance boost!
Since the testing is on 1000Mbps network, the nonblocking accept(2) w/
SO_REUSEPORT has maxed out the network device's input path:
~1.35Mpps from netstat -I output, which matches theoretical value for
1000Mbps network (the input are consisted w/ 78B SYN, 66B ACK, 66B FIN
and 66B ACK packets for each connection).
The testing server hardware:
CPU intel i7-2600 w/ hyperthreading enabled (8 HT)
NIC broadcom 5719 (4 RX queues and 4 TX queues, using MSI-X)
The testing server software is:
tools/tools/netrate/accept_connect/kq_accept_server
kq_accept_server -p 5000 -i 8 [-r]
(8 user space processes accept connections, -r turns on SO_REUSEPORT)
The testing client software is:
tools/tools/netrate/accept_connect/connect_client
connect_client -p 5000 -4 10.0.0.49 -i 64
(64 user space processes do the connect)
route change -net 10.0.0.0/24 -msl 10
sysctl net.inet.ip.portrange.last=40000
(these two configures make sure that the client won't run out of local ports)
The network configure:
+---------+
|+--- emx | client1 |
|| +---------+
||
|| +---------+
|+--- emx | client2 |
+--------+ || +---------+
| server | bnx ---+|
+--------+ || +---------+
|+--- emx | client3 |
|| +---------+
||
|| +---------+
|+--- bce | client4 |
+---------+
"client1"~"client4" run the testing client software simultaneously as
shown above, mainly to generate enough traffic. "server" runs the
testing server software.
Statistics:
w/ SO_REUSEPORT
nonblocking accept(2) rate: 335Kconns/s
NIC interrupt rate: 6000/s on the first 4 HT
CPU idle time on HTs processing interrupt: ~15%
CPU idle time on HTs not processing interrupt: ~20%
Token contention rate: ~500/s (mostly TCP listen completion queue pool
token and TCP porthash token)
This shows w/ SO_REUSEPORT:
- We still have CPU time to process more connections.
- There is only minor TCP listen completion queue contention and this
probably could be further reduced by binding process to the specific
CPU.
w/o SO_REUSEPORT
nonblocking accept(2) rate: 100Kconns/s
NIC interrupt rate: 6000/s on the first 4 HT
CPU idle time on HTs processing interrupt: ~5% - 70%
CPU idle time on HTs not processing interrupt: ~10% - 80%
Token contention rate: ~20K/s - 600K/s (mostly TCP listen completion
queue pool token)
This shows w/o SO_REUSEPORT:
- TCP listen completion queue contention is obviously too high, i.e.
we are facing scaling problem on this TCP listen socket usage model.
Best Regards,
sephe
--
Tomorrow Will Never Die
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]