Sudden peak of orphaned sockets -> 100% cpu (5 replies)

Hi,
We have a haproxy setup consisting of a pair of nodes with keepalived, which then utilize the proxy protocol to pass requests (roundrobin) to a second pair of haproxy nodes. The first pair mainly terminates SSL and serves as a highly available entrypoint, while second pair does all the logic around routing to the correct application.
Yesterday at 21:30:35 CET the active node of the first pair suddenly accumulated thousands of orphaned sockets. Historically there have been around 300 orphans at any time on this machine. A few seconds later, the cpu shot up to 100%, up from 30%. At this point, most requests started timing out. At the moments just before, we were handling 400-450 req/s. The second pair saw no increase of load during these problems.
As far as we can tell so far (investigations are ongoing), there was no change anywhere in the environment for several hours preceding this sudden activity. We are lucky enough that a large part of the traffic coming in is diagnostic data, which we can live without for a while, and when we shut down that application, the situation returned to normal. Before doing that, we tried both failing over to the other node, as well as both restarting haproxy and a full reboot of the node. Neither of these helped, the situation returned in less than a minute.

We had a lot of "kernel: [ 1342.944691] TCP: too many orphaned sockets" in our logs, as well as a few of these:
kernel: [2352025.865855] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies. Check SNMP counters.
kernel: [2352068.014861] TCP: request_sock_TCP: Possible SYN flooding on port 80. Sending cookies. Check SNMP counters.

According to the Arbor our hosting provider has, there was no SYN attack, however. The servers are fronted by Akamai, which by default does two retries on connection failures or timeouts, so this may have amplified our problems. This feature will be turned off.

So, some questions:
1. Does it seem reasonable that the orphaned socket could cause this behaviour, or are they just a symptom?
2. What causes the orphaned sockets? Could haproxy start misbehaving when it is starved for resources?
3. We were speculating that it could somehow be related to keepalives not being terminated properly, any merit to that thought?

The nodes are virtual machines running Centos 7 and haproxy 1.6.7, single core with 2 GB memory. As I mentioned, they have been handling this load without any hiccups for several months, but we are still considering increasing the specs. Would a few more cores have helped, or would it just have taken a few more seconds to chew up?

Below is the haproxy configuration.

Thankful for any insights! Best regards,
Carl Pettersson

global
log 127.0.0.1 local0
maxconn 4000
log-tag haproxy-gateway
server-state-file /var/lib/haproxy/gateway.state
stats socket /var/run/haproxy-gateway.sock
stats timeout 2m
user haproxy
group haproxy

tune.ssl.default-dh-param 2048
ssl-default-bind-options no-sslv3
ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK
tune.ssl.cachesize 100000
tune.ssl.lifetime 600
tune.ssl.maxrecord 1460

defaults
log global
mode http
option httplog
option dontlognull
load-server-state-from-file global
option http-server-close
option forwardfor
# Redispatch if backend is down
option redispatch
retries 3
# timeouts
timeout http-request 5s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
# Set X-Request-Id on all incoming requests. Magic format taken from docs
unique-id-format %{+X}o\ %ci:%cp_%fi:%fp_%Ts_%rt:%pid
unique-id-header X-Request-ID
# Set log format to the same as default, adding the request id on the end
log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r\ %ID

listen http
bind 10.0.1.21:80
default_backend backend_pair
capture request header Host len 64
capture request header X-Forwarded-For len 64
capture request header User-Agent len 200
capture request header True-Client-IP len 32
listen https
bind 10.0.1.21:443 ssl crt /etc/haproxy/ssl/gateway/
http-request set-header X-Forwarded-Proto https
default_backend backend_pair
capture request header Host len 64
capture request header X-Forwarded-For len 64
capture request header User-Agent len 200
capture request header True-Client-IP len 32
backend backend_pair
option tcp-check
server 10_0_1_24_8082 10.0.1.24:8082 check send-proxy
server 10_0_1_27_8082 10.0.1.27:8082 check send-proxy

Sudden peak of orphaned sockets -> 100% cpu (5 replies)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112