Quantcast
Channel: Serverphorums.com - HAProxy
Viewing all articles
Browse latest Browse all 5112

Strange system behaviour of during haproxy run (3 replies)

$
0
0
Hi all,

This is not related to haproxy, but I am having a performance issue with
number of
packets processed. I am running haproxy on a 48 core system (we have 64
such servers
at present, which is going to increase for production tessting), where cpus
0,2,4,6,..46
are part of NUMA node 1, and cpus 1,3,5,7,.. 47 are part of NUMA node 2.
The systems
are running Debian 7, with 3.16.0-23 (kernel has both CONFIG_XPS and
CONFIG_RPS
enabled). nbproc is set to 12, and each haproxy is bound to cpus 0,2,4, ...
22, so that
they are on the same socket, as seen here:

# ps -efF | egrep "hap|PID" | cut -c1-80
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
haproxy 3099 1 17 89697 324024 0 18:37 ? 00:11:19 haproxy
-f hap
haproxy 3100 1 18 87171 314324 2 18:37 ? 00:12:00 haproxy
-f hap
haproxy 3101 1 18 87214 305328 4 18:37 ? 00:12:00 haproxy
-f hap
haproxy 3102 1 19 89215 322676 6 18:37 ? 00:12:02 haproxy
-f hap
haproxy 3103 1 18 86788 310976 8 18:37 ? 00:11:59 haproxy
-f hap
haproxy 3104 1 18 87197 314888 10 18:37 ? 00:12:00 haproxy
-f hap
haproxy 3105 1 18 91311 319784 12 18:37 ? 00:11:59 haproxy
-f hap
haproxy 3106 1 18 88785 305576 14 18:37 ? 00:12:00 haproxy
-f hap
haproxy 3107 1 19 90366 326428 16 18:37 ? 00:12:09 haproxy
-f hap
haproxy 3108 1 19 89758 320780 18 18:37 ? 00:12:09 haproxy
-f hap
haproxy 3109 1 19 87670 314752 20 18:37 ? 00:12:07 haproxy
-f hap
haproxy 3110 1 19 87763 316672 22 18:37 ? 00:12:10 haproxy
-f hap

set_irq_affinity.sh was run on the ixgbe card, and /proc/irq/*/smp_affinity
shows that each
irq is bound to cpus 0-47 correctly. However, I see that packets are being
processed on
cpus of the 2nd socket too, though user/system usage is zero on those as
haproxy does
not run on those cores. The following shows the difference of number of
packets processed
after 10 seconds on the different rx/tx queues:

# ./rx_tx /tmp/ethtool_start /tmp/ethtool_end
"Significant" difference in #packets processed after 10 seconds on the
various rx/tx queues:
Queue# TX RX
0 2623165 2826065
1 2564573 2749859
2 2901998 2801043
3 2636856 2794000
4 2892465 2742228
5 3087442 2795762
6 2936588 2760732
7 2934087 2767705
8 2260933 2767707
9 2165087 2759038
10 2144893 2814390
11 2302304 2835790
12 3037722 2748335
13 2940284 2727689
14 2348277 2830378
15 2117679 2838013
16 2679899 487703
17 2447832 438733
18 2505330 429834
19 2611643 447960
20 2595708 449729
21 2534836 447217
22 2616150 466920
23 2522947 450145

mpstat shows that first 22 even numbered cpus are heavily used, while the
odd ones only
does softirq processing:

Average: CPU %usr %sys %soft %idle
Average: 0 15.47 60.0 24.47 0.00
Average: 1 0.00 0.00 12.86 87.14
Average: 2 20.32 58.49 21.19 0.00
Average: 3 0.10 0.00 2. 59 97.30
Average: 4 18.20 60.87 20.93 0.00
Average: 5 0.10 0.00 4.15 95.75
Average: 6 18.75 59.37 21.88 0.00
Average: 7 0.00 0.00 3.03 96.97
Average: 8 22.75 57.71 19.55 0.00
Average: 9 0.00 0.00 2.78 97.22
Average: 10 21.87 57.67 20.47 0.00
Average: 11 0.00 0.00 2.80 97.20
Average: 12 19.48 59.84 20.68 0.00
Average: 13 0.00 0.00 1.76 98.24
Average: 14 22.58 57.16 20.25 0.00
Average: 15 0.00 0.00 1.57 98.43
Average: 16 27.00 67.00 6.00 0.00
Average: 17 0.00 0.07 0.59 99.27
Average: 18 26.17 67.84 5.93 0.07
Average: 19 0.00 0.00 0.15 99.78
Average: 20 26.52 67.36 6.13 0.00
Average: 21 0.00 0.00 0.30 99.63
Average: 22 27.69 66.71 5.60 0.00
Average: 23 0.00 0.00 0.07 99.93
Average: 24 0.00 0.00 0.00 100.00
(remaining are 100% idle)

Is there a way to make sure that tx/rx happens only on the cpus that
haproxy runs on?
The reason I think this is affecting performance is due to locking and IPI:
cpu#0 gets skbs
and is in softirq handler. netif_receive_skb calls get_rps_cpu() and uses
the flow information
to find that this skb is for cpu#1. Next cpu#0 calls enqueue_to_backlog()
giving the cpu#1
index as parameter, which gets the input_pkt_queue_lock of cpu#1,
contending for a lock
across nodes, that should normally be only used by cpu#1, and then
enqueue's the skb.
Finally cpu#0 sends and IPI to cpu#1 to process it's backlog since we
added skbs to it.

Thanks,
- Krishna Kumar

--


------------------------------------------------------------------------------------------------------------------------------------------

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited. Although Flipkart has taken reasonable precautions to ensure no
viruses are present in this email, the company cannot accept responsibility
for any loss or damage arising from the use of this email or attachments

Viewing all articles
Browse latest Browse all 5112

Trending Articles