Load Balancing and Routing in Docker Swarm
Learn how to set up load balancing, reverse proxying, and SSL termination in Docker Swarm. Covers two approaches: built-in routing mesh and Traefik with automatic HTTPS, plus monitoring with Prometheus and Grafana.
I am Philip—an engineer working at Distr, which helps software and AI companies distribute their applications to self-managed environments.
Our Open Source Software Distribution platform is available on GitHub (github.com/distr-sh/distr) and supports orchestrating Docker Swarm clusters alongside Docker Compose deployments.
Docker Swarm comes with a built-in load balancer, but production workloads quickly outgrow it. You need SSL certificates, host-based routing, path-based routing, rate limiting—none of which the built-in routing mesh provides. In this post, I’ll walk through two approaches to routing and load balancing in Docker Swarm: the minimal built-in setup and a production-ready Traefik configuration with automatic HTTPS, monitoring, and more.
How Docker Swarm Networking Works
Before diving into the approaches, it’s worth understanding how Docker Swarm handles networking out of the box.
Overlay Networks and Service Discovery
Docker Swarm uses overlay networks to connect containers across multiple nodes. When you create an overlay network, Docker establishes a VXLAN tunnel between all participating nodes, allowing containers on different hosts to communicate as if they were on the same LAN.
Every service deployed on a user-defined overlay network gets a DNS entry matching its service name.
Other services on the same network can resolve it by name—http://myservice:8080 just works.
No configuration files, no service registries.
The Ingress Routing Mesh
Docker Swarm’s most distinctive networking feature is the ingress routing mesh. When a service publishes a port, every node in the swarm listens on that port—even nodes not running any replicas of that service. Incoming requests hit the kernel’s IPVS (IP Virtual Server) module, which round-robin load balances them across all healthy replicas.
This means you can point a DNS record at any swarm node and traffic will reach your service. It’s a powerful zero-configuration load balancer.
VIP vs. DNSRR Endpoint Modes
Docker Swarm supports two endpoint modes for services:
- VIP (default): Assigns a single virtual IP to the service. DNS resolves the service name to this VIP, and IPVS handles the load balancing transparently. Best for most use cases.
- DNSRR (DNS Round-Robin): Returns a list of all task IP addresses. The client connects directly to individual containers. Required when using external load balancers like Traefik that need to see individual container IPs for features like sticky sessions.
Limitations of the Built-in Routing Mesh
The routing mesh is convenient but has significant limitations for production workloads:
- Layer 4 only. The routing mesh operates at TCP/UDP level. No HTTP host-based routing, path routing, header manipulation, or SSL termination.
- No client IP preservation. The routing mesh uses SNAT (Source Network Address Translation), rewriting the source IP to an internal ingress network address. Your application sees all requests coming from the same private IP range, making IP-based rate limiting, geolocation, and audit logging impossible.
- Round-robin only. No weighted distribution, least-connections, or health-check-based routing.
- No sticky sessions. The built-in load balancer cannot maintain session affinity, which breaks stateful applications and WebSocket connections.
- No SSL termination. You need to handle TLS at the application level or add a reverse proxy.
These limitations are exactly why a reverse proxy like Traefik, Caddy, or Nginx is almost always needed in production.
Approach 1: Built-in Routing Mesh
The simplest approach uses Docker Swarm’s native ingress routing mesh with no additional components. Every published port is automatically load balanced across all replicas.
Setting Up the Built-in Routing Mesh
services: webapp: image: myapp:latest networks: - app-network deploy: replicas: 3 update_config: parallelism: 1 delay: 10s ports: - target: 8080 published: 80 protocol: tcp mode: ingress
networks: app-network: driver: overlayDeploy this stack with:
docker stack deploy -c docker-compose.yaml myappThe mode: ingress setting (which is the default) tells Docker to use the routing mesh.
Every node in the swarm will accept connections on port 80, forwarding them to available replicas.
Basic Docker Swarm Routing Mesh Architecture
All three nodes participate in the routing mesh. External traffic hitting any node on port 80 is automatically routed to one of the webapp replicas via IPVS round-robin.
When to Use the Built-in Routing Mesh
This approach works well for:
- Internal services that don’t need SSL or host-based routing
- Development and staging environments where simplicity matters more than features
- Single-service deployments where you only expose one port
However, for anything facing the public internet, you’ll quickly need SSL certificates, domain-based routing, and the ability to run multiple services on ports 80/443. That’s where Approach 2 comes in.
Approach 2: Integrate Traefik with Docker Swarm
Traefik is the de facto reverse proxy for Docker Swarm.
Starting with version 3, Traefik has a dedicated Swarm provider (separate from the Docker provider) that reads routing configuration from service labels in the deploy section.
This approach gives you everything the routing mesh lacks: SSL termination with automatic Let’s Encrypt certificates, host-based and path-based routing, HTTP-to-HTTPS redirects, and proper load balancing—all configured via labels on your services.
Setting Up Traefik as a Docker Swarm Reverse Proxy
networks: traefik-public: driver: overlay attachable: true
volumes: letsencrypt:
services: traefik: image: traefik:v3.6 command: - '--providers.swarm=true' - '--providers.swarm.exposedbydefault=false' - '--providers.swarm.network=traefik-public' - '--entrypoints.web.address=:80' - '--entrypoints.websecure.address=:443' - '--entrypoints.web.http.redirections.entrypoint.to=websecure' - '--entrypoints.web.http.redirections.entrypoint.scheme=https' - '--certificatesresolvers.letsencrypt.acme.email=admin@example.com' - '--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json' - '--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web' - '--accesslog=true' - '--log.level=INFO' ports: - target: 80 published: 80 mode: host - target: 443 published: 443 mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock:ro - letsencrypt:/letsencrypt networks: - traefik-public deploy: replicas: 1 placement: constraints: - node.role == manager
webapp: image: myapp:latest networks: - traefik-public deploy: replicas: 3 labels: - 'traefik.enable=true' - 'traefik.http.routers.webapp.rule=Host(`app.example.com`)' - 'traefik.http.routers.webapp.entrypoints=websecure' - 'traefik.http.routers.webapp.tls.certresolver=letsencrypt' - 'traefik.http.services.webapp.loadbalancer.server.port=8080'Deploy with:
docker network create -d overlay --attachable traefik-publicdocker stack deploy -c docker-compose.yaml myappHow Traefik and Docker Swarm Service Discovery Works
Let’s break down the key configuration decisions:
Service discovery via labels. Traefik watches the Docker API for services with traefik.enable=true labels and dynamically generates routing configuration. No config files to maintain—add a service, add labels, and Traefik picks it up.
Host-based routing. The rule label defines how traffic is routed. Host(app.example.com) matches requests by hostname. You can also use PathPrefix(/api) for path-based routing or combine rules with && and || operators.
Automatic HTTPS. The certificatesresolvers.letsencrypt configuration tells Traefik to obtain certificates from Let’s Encrypt via the ACME protocol. The HTTP challenge validates domain ownership by responding to a challenge on port 80, then serves the certificate on port 443. Certificates are automatically renewed 30 days before expiration.
Host mode ports. Traefik publishes ports with mode: host instead of mode: ingress. This bypasses the routing mesh, preserving client IP addresses. The trade-off is that only the node running Traefik accepts connections—which is fine since Traefik is your single entry point.
Manager node constraint. Traefik needs access to the Docker socket to read service labels, so it must run on a manager node. The Docker socket is mounted read-only (:ro) for security.
Traefik with Docker Swarm Routing Architecture
Traefik terminates SSL, applies routing rules based on hostname, and forwards requests to the webapp replicas over the overlay network.
Traefik Configuration Best Practices
- Always set
exposedbydefault=falseand explicitly opt-in services withtraefik.enable=true. This prevents accidentally exposing internal services. - Mount the Docker socket read-only. Use
/var/run/docker.sock:/var/run/docker.sock:roto limit Traefik’s access to the Docker API. - Use host mode ports to preserve client IPs. This is critical for logging, security, and analytics.
- Store certificates on a persistent volume. The
letsencryptvolume ensures certificates survive container restarts. For multi-node setups, consider NFS or a distributed storage solution. - Place labels in the
deploysection. In Swarm mode, Traefik reads labels from the service definition, not from individual containers.
SSL Certificate Challenge Types for Traefik ACME
Let’s Encrypt supports three challenge types for domain validation:
| Challenge | Port Required | Supports Wildcards | Best For |
|---|---|---|---|
| HTTP-01 | 80 | No | Most setups—simple and works out of the box |
| TLS-ALPN-01 | 443 | No | Environments where port 80 is blocked |
| DNS-01 | None | Yes | Wildcard certificates, private networks |
The HTTP challenge (used above) is the simplest. If you need wildcard certificates (e.g., *.example.com), you’ll need the DNS-01 challenge. Traefik supports dozens of DNS providers through the Lego library.
Adding Prometheus and Grafana Monitoring to Traefik
Traefik can expose a /metrics endpoint for Prometheus by enabling the metrics entrypoint. This gives you request counts, latency histograms, and error rates per router, service, and entrypoint. Grafana provides dashboards for visualization.
The Prometheus configuration for scraping Traefik metrics is straightforward:
global: scrape_interval: 15s
scrape_configs: - job_name: 'traefik' static_configs: - targets: ['tasks.traefik:8080']The tasks.traefik DNS name resolves to all Traefik task IPs in the swarm, so Prometheus can scrape every instance.
For a complete monitoring stack example including pre-built Grafana dashboards, see vegasbrianc/docker-traefik-prometheus.
Alternatives to Traefik
While Traefik is the most mature reverse proxy for Docker Swarm, there are alternatives worth considering:
Caddy as a Docker Swarm Reverse Proxy
Caddy with the caddy-docker-proxy plugin provides automatic HTTPS out of the box with no configuration needed—certificates are issued from Let’s Encrypt and ZeroSSL automatically. The label syntax mirrors the Caddyfile format:
deploy: labels: caddy: app.example.com caddy.reverse_proxy: '{{upstreams 8080}}'Caddy’s strengths are its simplicity and built-in HTTP/3 support. For high availability, it offers a controller/server architecture where a single controller instance monitors Docker and pushes configuration to multiple server instances. However, its Swarm integration is less mature than Traefik’s, and there are known issues with service discovery on worker nodes.
HAProxy with Docker Swarm DNS Service Discovery
HAProxy uses Docker Swarm’s internal DNS for service discovery via server-template directives and DNS resolvers.
It offers the best raw performance and the most advanced load balancing algorithms (least-connections, source-hash, URI-hash), but lacks automatic service discovery via labels—you must update configuration files when services change.
There’s also no built-in Let’s Encrypt integration; you’ll need to pair it with certbot or acme.sh.
See the HAProxy on Docker Swarm guide for details.
Nginx Proxy Manager in Docker Swarm
Nginx Proxy Manager provides a web-based GUI for managing proxy hosts and SSL certificates. However, it was designed for single-host Docker setups and has significant limitations in Swarm mode: it doesn’t support multiple replicas, has no native Swarm service discovery, and can return 502 errors when proxying to service names on overlay networks.
Docker Swarm Reverse Proxy Feature Comparison
| Feature | Traefik | Caddy | HAProxy | Nginx PM |
|---|---|---|---|---|
| Swarm service discovery | Native provider | Plugin (labels) | DNS resolvers | Manual |
| Automatic HTTPS | ACME built-in | Built-in (zero-config) | External tool | Built-in UI |
| Rate limiting | Built-in middleware | No | ACL-based | No |
| Circuit breaking | Built-in middleware | No | No | No |
| Prometheus metrics | Built-in | Plugin | Built-in | No |
| Dashboard | Built-in | No | Stats page | Full Web UI |
| Multi-replica HA | Global mode + KV store | Controller/Server | Supported | No |
| Maturity for Swarm | Most mature | Growing | Mature (manual config) | Not suited |
Comparison Matrix
| Feature | Built-in Routing Mesh | Traefik Reverse Proxy |
|---|---|---|
| SSL Termination | None | Automatic (Let’s Encrypt) |
| Routing | Port-based only (L4) | Host + Path (L7) |
| Client IP Preservation | No (SNAT) | Yes (host mode) |
| Rate Limiting | None | Available via middleware |
| Circuit Breaking | None | Available via middleware |
| Monitoring | None | Prometheus + Grafana |
| Additional Components | None | Traefik (+ Prometheus for metrics) |
| Complexity | Minimal | Moderate |
Deploying with Distr
If you’re distributing Docker Swarm-based applications to customer environments, managing stack files, secrets, and updates across multiple deployment targets can become complex. Distr can orchestrate Docker Swarm clusters alongside Docker Compose deployments, handling the distribution of stack files to self-managed customer environments. This means your customers get a consistent deployment experience regardless of whether they’re running a single-node setup or a multi-node Swarm cluster.
Conclusion
Docker Swarm’s built-in routing mesh is a great starting point—it requires zero configuration and provides basic load balancing out of the box. But as soon as you need SSL, domain-based routing, or any Layer 7 feature, you need a reverse proxy.
Choose the built-in routing mesh for internal services, development environments, or applications that handle their own SSL termination. It’s the simplest setup with no additional components.
Choose Traefik for most production deployments. Automatic HTTPS via Let’s Encrypt, host and path-based routing, label-based configuration, and optional monitoring with Prometheus and Grafana make it the sweet spot between simplicity and functionality. This is what I use for the majority of Swarm deployments.
Regardless of which approach you choose, Docker Swarm provides a solid foundation for production workloads. Its networking model—overlay networks, DNS-based service discovery, and the routing mesh—handles the hard parts of distributed networking, letting you focus on your application logic.
Resources
Official Documentation
- Docker Swarm Mode Routing Mesh
- Docker Swarm Networking
- Docker Secrets Management
- Traefik Docker Swarm Setup Guide
- Traefik Swarm Provider Reference
- Traefik ACME Certificate Resolvers
- Traefik v2 to v3 Migration Guide
GitHub Repositories
traefik/traefik— Traefik reverse proxylucaslorentz/caddy-docker-proxy— Caddy with Docker label supportnewsnowlabs/docker-ingress-routing-daemon— Client IP preservation in routing meshvegasbrianc/docker-traefik-prometheus— Monitoring stack for Traefikheyvaldemar/traefik-letsencrypt-docker-swarm— Traefik + Let’s Encrypt exampleBretFisher/dogvscat— Advanced Swarm stack examples
Community Guides
- Docker Swarm Rocks — Production-tested Traefik + Swarm patterns
- Funky Penguin’s Geek Cookbook — Comprehensive Swarm deployment guides
- HAProxy on Docker Swarm — HAProxy DNS-based service discovery