Any problems with having an active/active HAProxy setup with Keepalived

Question

Apologies if this has been asked before, but I can't seem to find much on it.

We're going to be using HAProxy to load balance our MariaDB Galera Cluster. All the articles/tutorials I have seen on this use Keepalived (or something similar) for an active/passive HAProxy setup.

Is there any good reason why you shouldn't have an active/active setup?

Each HAProxy node can have a fixed IP and both have a floating IP. Under normal conditions requests are shared between the two HAProxy nodes, if one goes down, the other takes it's floating IP and handles requests under both IPs. When the other comes back up it takes its floating IP and share of load back again.

I'd appreciate your opinions on this.

Luke

score 4 · Accepted Answer · answered Mar 18 '14 at 15:33

The important considerations not to have an active/active setup with two virtual IP addresses for the same resource is

how do you distribute requests over the two virtual IP's
how do you deal with sticky sessions, affinity, persistence and such, i.e. what happens when subsequent requests start off going to virtual IP1 and then go to virtual IP2 and do you need those to go the same back-end server.
what happens when the virtual IP-addresses fails over to the other host?

score 0 · Answer 2 · answered Mar 09 '23 at 01:44

Unce upon a time we learned that CDN doesn't work with active sessions that requires mTLS (mutual Transport Layer Security) sessions. This is the case for protocols like grpc, where you need to have an active session with the terminator. In this case the load balancer or the collectors sitting behind the load balancer (just in case the load balancer is just handling distribution of the sessions). Yes, this can also be done all in the cloud but there will always be use-cases for IoT apps were you need to have the collectors closer to the IoT services. Any how, I like your statement! It's pretty accurate to nowadays solutions.

score -1 · Answer 3 · edited Oct 07 '21 at 08:14

Update for 2020: keepalived has been obsolete for a while because it doesn't work in virtual clouds (AWS).

A bit of history

Once upon a time, there was an (Cisco) internet router in the office. The router provided internet access to all the machines and it was good.

... then the router died and internet was broken for everyone and it sucked.

Turns out, it takes two of anything to have redundancy. So Cisco started offering pairs of routers that work in tandem.

This is done with a protocol called HSRP, VRRP or CARP. HSRP is the original cisco-made protocol to solve the problem. It was standardized into VRRP later https://www.rfc-editor.org/rfc/rfc3768 (year 1998) that got implemented by most network devices and vendors. BSD folks reinvented their own protocols CARP to do the same thing, they couldn't adopt VRRP due to concerns around licensing or patents.

Keepalived (and uCARP) is software that implements VRRP (and CARP). It can be setup on two regular Linux servers to have failover between them.

The rise of AWS and the end of VRRP

How VRRP operates? For starters it needs a floating IP, let's say 192.168.1.254, only one router has ownership of the IP at any point in time. Devices in the network simply send traffic to that (floating) IP and reach the active router, they don't know it's floating and don't care. Both routers talk constantly to one another and if either dies, the other router takes over the IP and start processing traffic.

One needs to be familiar OSI network layers 2 and 3 at this point (MAC and IP). Network devices communicate with MAC and IP addresses, addresses are resolved with ARP.

The concept of floating IP being taken-over involves a number of shenanigans in the network stack (all the acronyms above), it's not exactly designed-in nor expected behaviour.

On a physical network, multiple computers physically plugged into one Ethernet switch, it usually works.

On a virtual machine, it usually doesn't work. The virtual network has to handle network traffic (MAC and IP layers), it typically blocks the magic packets or isolate the virtual host preventing VRRP from operating.

On the major virtual clouds (AWS, Google and co). It definitely doesn't work and it's on purpose. Imagine if an AWS instance could take over the IP -all the trafic- from another Linux instance maybe from another customer. What the hell?!

Cloud and CDN solutions

Cloud providers provide load balancers solutions, see AWS ELB and Google Cloud load balancers. They come with build-in redundancy for this problem, so you don't have to think about it. keepalived is simply obsolete.

The next aspect is CDN (CloudFlare, Akamai). All public websites run behind a CDN nowadays that provides caching, filtering and DDoS protection. CDN can provide load balancing between multiple upstream servers. Simply configure all the individual servers and the traffic is split.

Last but not least. keepalived only allows to have a single active server out of many, it's wasting resources to put it lightly. This is actually a catastrophic issue in the real world because things need to scale and it can't scale by design. Failover solutions in use today -as found in clouds and CDN- are meant to distribute traffic across multiple destinations all active. It's a lot more complicated to achieve and is done cumulatively at different layers (see DNS, Anycast, OSPF, BGP). keepalived is not part of the big picture anymore.

Any problems with having an active/active HAProxy setup with Keepalived

3 Answers3

A bit of history

The rise of AWS and the end of VRRP

Cloud and CDN solutions