DragonFly users List (threaded) for 2012-12
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]
Re: IP forwarding performance (git.2aa7f7f, normal 4.20Mpps, fast 5.07Mpps)
Hi all,
Before I move on to the next big ticket (multiple-tx queue support),
here is the performance I currently got as of git 2aa7f7f.
Quick summary, the IFQ packets staging mechanism gives me:
+80Kpps for 2 bidirectional normal IP forwarding (now 4.20Mpps)
+30Kpps for 2 bidirectional fast forwarding (now 5.07Mpps)
Detailed information, please read the following inline comment.
On Thu, Dec 20, 2012 at 3:03 PM, Sepherosa Ziehau <sepherosa@gmail.com> wrote:
> On Fri, Dec 14, 2012 at 5:47 PM, Sepherosa Ziehau <sepherosa@gmail.com> wrote:
>> Hi all,
>>
>> This email serves as the base performance measurement for further
>> network stack optimization (as of git 107282b).
>
> Since bidirectional fast IP forwarding is already max out the GigE
> limit, I increase the measurement strength a bit. The new measurement
> is against git 7e1fbcf
>
>>
>>
>> The hardware:
>> mobo ASUS P867H-M
>> 4x4G DDR3 memory
>> CPU i7-2600 (w/ HT and Turbo Boost enabled, 4C/8T)
>> Forwarding NIC Intel 82576EB dual copper
>
> The forwarding NIC is now changed to 82580EB quad copper.
>
>> Packet generator NICs Intel 82571EB dual copper
>>
>>
>> A emx1 <---> igb0 forwarder igb1 <---> emx1 B
>
> The testing topology is changed into following configure:
> +---+ +-----------+ +---+
> | | emx1 <---> igb0 | | igb1 <---> emx1 | |
> | A | | forwarder | | B |
> | | emx2 <---> igb2 | | igb3 <---> emx2 | |
> +---+ +-----------+ +---+
>
> Streams:
> A.emx1 <---> B.emx1 (bidirectional)
> A.emx2 <---> B.emx2 (bidirectional)
>
>>
>> A and "forwarder", B and "forwarder" are directly connected using CAT6 cables.
>> Polling(4) is enabled on igb1 and igb0 on "forwarder". Following
>> tunables are in /boot/loader.conf:
>> kern.ipc.nmbclusters="524288"
>> net.ifpoll.user_frac="10"
>> net.ifpoll.status_frac="1000"
net.link.ifq_stage_cntmax="8"
>> Following sysctl is changed before putting igb1 into polling mode:
>> sysctl hw.igb1.npoll_txoff=4
>
> sysctl hw.igb1.npoll_txoff=1
> sysctl hw.igb2.npoll_txoff=2
> sysctl hw.igb3.npoll_txoff=3
sysctl hw.igb0.tx_wreg_nsegs=16
sysctl hw.igb1.tx_wreg_nsegs=16
sysctl hw.igb2.tx_wreg_nsegs=16
sysctl hw.igb3.tx_wreg_nsegs=16
>
>>
>>
>> First for the users that are only interested in the bulk forwarding
>> performance: The 32 netperf TCP_STREAMs running on A could do
>> 941Mbps.
>>
>>
>> Now the tiny packets forwarding performance:
>>
>> A and B generate 18 bytes UDP datagrams using
>> tools/tools/netrate/pktgen. The destination addresses of the UDP
>> datagrams are selected that the generated UDP datagrams will be evenly
>> distributed the to the 8 RX queues, which should be common in the
>> production environment.
>>
>> Bidirectional normal IP forwarding:
>> 1.42Mpps in each direction, so total 2.84Mpps are forwarded.
>> CPU usage:
>> On CPUs that are doing TX in addition to RX: 85% ~ 90% (max allowed by
>> polling's user_frac)
>> On CPUs that are only doing RX: 40% ~ 50%
>
> Two sets of bidirectional normal IP forwarding:
> 1.03Mpps in each direction, so total 4.12Mpps are forwarded.
1.05+Mpps in each direction, so total 4.20Mpps are forwarded.
> CPU usage:
> On CPUs that are doing TX in addition to RX: 90% (max allowed by
> polling's user_frac)
> On CPUs that are only doing RX: 70% ~ 80%
Not much improvement on CPU usage.
> IPI rate on CPUs that are doing TX in addition to RX: ~10K/s
IPI rate on CPUs that are doing TX in addition to RX: ~4.5K/s
>
>>
>> Bidirectional fast IP forwarding: (net.inet.ip.fastforwarding=1)
>> 1.48Mpps in each direction, so total 2.96Mpps are forwarded.
>> CPU usage:
>> On CPUs that are doing TX in addition to RX: 65% ~ 70%
>> On CPUs that are doing RX: 30% ~ 40%
>
> Two sets of bidirectional fast IP forwarding: (net.inet.ip.fastforwarding=1)
> 1.26Mpps in each direction, so total 5.04Mpps are forwarded.
~1.27Mpps in each direction, so total 5.07Mpps are forwarded.
> CPU usage:
> On CPUs that are doing TX in addition to RX: 90% (max allowed by
> polling's user_frac)
> On CPUs that are only doing RX: 60% ~ 70%
Not much improvement on CPU usage.
> IPI rate on CPUs that are doing TX in addition to RX: ~10K/s
IPI rate on CPUs that are doing TX in addition to RX: ~5K/s
Best Regards,
sephe
--
Tomorrow Will Never Die
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
[
Date Index][
Thread Index]