Quantcast
Channel: Serverphorums.com - HAProxy
Viewing all articles
Browse latest Browse all 5112

Back pressure strategy for decentralized HAProxy setup (no replies)

$
0
0
Hi,

My question is more about HTTP/REST in general, but I can't think of a
better place to find experts than this mailing list.

Can you share your approaches to providing back pressure with HTTP? I'm
talking about handling cases when upstream service is overloaded due to
increased request rate, or due to some problem that reduces its normal
capacity, e.g. hardware issues or database backup.

To give you more context - we are running a set of RESTful microservices
that are very latency sensitive. We use HAProxy in a fully decentralized
fashion to route the requests (aka SmartStack
http://nerds.airbnb.com/smartstack-service-discovery-cloud/). Suppose we
have N instances of service A that need to speak to M instances of service
B - every instance of A runs local HAProxy, which is automatically
reconfigured whenever we scale service B up or down.

This model worked really well for us, and it's also becoming very common
in microservice environments (e.g. container schedulers such as Marathon
and Kubernetes), but somehow noone ever mentions how to implement back
pressure in it.

The decentralized nature prevents us from being able to use the only back
pressure mechanism HAProxy has - maxconn/maxqueue/timeout queue. Even if we
assume that request distribution is uniform and set A's HAProxy maxconn to
`Capacity(B) / N` this model breaks when we have another service C that
also makes requests to B, and we do have a complex service topology.

The way we are currently solving this:
- we assume that upstream service (B) is the only place where we know the
actual capacity and current load
- so it's the upstream service that makes a decision whether to accept
request for processing or to decline it
- if request is declined we want HAProxy to try the next server in the list
(`option redispatch`)
- the only way to achieve this (due to use of `splice`) is to prevent TCP
connection from ever being established
- so we use IPTables to set a hard limit on the number of active TCP
connection per service port
- when all `retries` instances of upstream service are busy - we fail
request very fast, allowing caller to perform any load shedding strategy
(asking client to try again, returning cached result etc.)

This solution worked well for us, but has a number of downsides:
- relies on iptables and conntrack, and a lot of kernel variable tweaking
- does not play well with Keep-Alive connections
- hard to use in containers, especially with network isolation

So we are looking at replacing it with something that works on application
level protocol. One idea would be:
- have yet another HAProxy, but now on the upstream service side
- it's only purpose will be to limit `maxconn` and maintain a queue of
connections
- ideally implement an Active Queue Management (AQM) strategy such as CoDel
http://queue.acm.org/detail.cfm?id=2209336 to allow this queue to absorb
short-term request bursts, but at the same time prevent bufferbloat-like
"standing queues"

To summarize:
- Can anyone recommend a better solution for back pressure with
decentralized HAProxy setup?
- Do you think AQM for connection queues will be a good addition to HAProxy?


Thanks,
Sergii

Viewing all articles
Browse latest Browse all 5112

Trending Articles