Kubernetes vs. Nomad: Choosing the Right Orchestrator

Kubernetes vs. Nomad is a question that most infrastructure teams face exactly once — when they outgrow running containers directly on EC2 instances or bare metal. The answer isn't obvious, and the wrong choice has significant operational consequences. Kubernetes dominates the ecosystem in adoption and tooling, but Nomad is a serious contender for organizations that prioritize operational simplicity over ecosystem breadth.

This comparison is opinionated and grounded in what these systems actually cost to run, not just their feature checklists.

What Container Orchestration Solves
Kubernetes Architecture
Nomad Architecture
Operational Complexity Comparison
Feature Comparison
Ecosystem Comparison
Performance and Resource Overhead
Cost of Operations
Migration Considerations
Real-World Scenarios
Recommendation Framework

What Container Orchestration Solves

Before comparing tools, it's worth being precise about the problem. Container orchestration addresses a cluster of related operational concerns:

Scheduling — placing workloads on nodes that have sufficient CPU, memory, and other resources to run them.

Health management — detecting when a container has crashed or become unhealthy, and restarting or replacing it automatically.

Service discovery — allowing services to find each other by name rather than by IP address, which changes as containers are scheduled and rescheduled.

Load balancing — distributing traffic across multiple instances of a service.

Rolling deployments — updating running services without downtime by incrementally replacing old containers with new ones.

Secrets management — injecting credentials and configuration into containers without hardcoding them in images.

Both Kubernetes and Nomad solve all of these problems. They differ in how they solve them, how much complexity they introduce in the process, and what additional capabilities they offer.

Kubernetes Architecture

Kubernetes uses a control plane / data plane model. The control plane manages cluster state; the data plane (worker nodes) runs workloads.

The control plane consists of:

kube-apiserver — the central API server. Every interaction with the cluster goes through this component. It validates requests, persists state to etcd, and serves the REST API that kubectl and controllers communicate with.
etcd — a distributed key-value store that holds all cluster state. It's the source of truth for everything: node status, pod specs, service definitions, config maps. Running etcd reliably is non-trivial; it requires an odd number of nodes (typically 3 or 5), careful disk performance tuning, and regular backups.
kube-scheduler — watches for unscheduled pods and assigns them to nodes based on resource requests, affinity rules, taints, tolerations, and custom scheduling policies.
kube-controller-manager — a collection of controllers that reconcile actual cluster state with desired state. The ReplicaSet controller ensures the right number of pod replicas are running; the Node controller handles node failure detection.
cloud-controller-manager — optional component that integrates with cloud provider APIs for load balancers, persistent volumes, and node management.

Worker nodes run:

kubelet — the agent that receives pod specs from the apiserver and ensures containers are running via the container runtime.
kube-proxy — maintains network rules on each node to implement Services (ClusterIP, NodePort, LoadBalancer). In many modern deployments, kube-proxy is replaced by eBPF-based implementations like Cilium.
Container runtime — containerd or CRI-O (Docker support was removed in Kubernetes 1.24).
CNI plugin — implements pod networking. This is a separate ecosystem: Calico, Flannel, Cilium, Weave, and others each have different performance characteristics, feature sets, and operational requirements.

The total component count for a production Kubernetes cluster — including the control plane, CNI, ingress controller, cert-manager, metrics-server, and logging agent — typically exceeds 20 distinct system components, each with its own configuration surface, upgrade path, and failure modes.

Nomad Architecture

Nomad uses a simpler server / client model with a single binary.

Nomad servers form a Raft consensus cluster (typically 3 or 5) that stores cluster state, evaluates job submissions, and makes scheduling decisions. There is no separate state store — Nomad uses an embedded Raft log. This eliminates the etcd operational burden.

Nomad clients are the worker nodes. They run the Nomad agent (in client mode), receive allocations from the server, and execute workloads using task drivers.

Job specifications are Nomad's core abstraction. A job spec is a HCL or JSON file that describes what to run, how many instances, resource requirements, and networking configuration:

job "api-server" {
  datacenters = ["dc1"]
  type        = "service"

  group "web" {
    count = 3

    network {
      port "http" { to = 8080 }
    }

    service {
      name = "api-server"
      port = "http"
      provider = "consul"

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "server" {
      driver = "docker"

      config {
        image = "myapp:v1.2.3"
        ports = ["http"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

This is substantially more readable than an equivalent Kubernetes Deployment + Service + Ingress combination. The entire definition is self-contained in one file.

Nomad also supports non-container workloads natively: raw exec tasks (running binaries directly), Java tasks, QEMU VMs, and more. This heterogeneous workload support is a genuine differentiator for organizations that run a mix of containerized and legacy workloads.

Operational Complexity Comparison

This is where the most significant difference between the two systems lies.

Kubernetes complexity is high and compounds over time. A self-managed Kubernetes cluster requires expertise in: cluster bootstrapping (kubeadm, kops, or Talos), etcd operations and backup/restore procedures, Kubernetes API versioning and deprecation cycles, CNI selection and troubleshooting, RBAC policy design, admission webhooks, and the upgrade process (which requires coordination across control plane, etcd, worker nodes, and add-ons). The Kubernetes release cycle moves fast — approximately 3 minor versions per year — and older versions fall out of support quickly.

Managed Kubernetes (EKS, GKE, AKS) absorbs some of this complexity. The control plane becomes a managed service; etcd backup is handled for you. But managed Kubernetes still requires expertise in pod security, node group management, cluster autoscaler configuration, and the interaction between Kubernetes abstractions and cloud-provider networking. EKS in particular has a notoriously complex networking model that trips up even experienced engineers.

Nomad complexity is substantially lower. The architecture is a single binary. Server clustering uses embedded Raft — no separate etcd to operate. The upgrade process is a rolling restart of servers and then clients. There is no CNI to select and debug. Nomad networking is more straightforward, typically using host networking with Consul for service discovery and CNI only when required for fine-grained network policies.

A competent engineer can stand up a functional production Nomad cluster in a day. Standing up a production-ready Kubernetes cluster that you'd trust with real traffic takes considerably longer and requires specialized knowledge.

Feature Comparison

Service Mesh

Kubernetes does not include a service mesh natively. Istio, Linkerd, and Cilium Service Mesh are add-ons with their own significant operational overhead. Istio in particular is well-known for adding complexity that teams underestimate before installation and regret after.

Nomad integrates natively with Consul Service Mesh. If you're already using Consul for service discovery (common in HashiCorp stacks), enabling sidecar proxies for mTLS and traffic shaping is a configuration change rather than a separate system installation.

Auto-Scaling

Kubernetes has the Horizontal Pod Autoscaler (CPU/memory based) and the newer KEDA project (event-based scaling). Cluster autoscaling adds or removes nodes via the Cluster Autoscaler or Karpenter (AWS-native). Vertical Pod Autoscaler adjusts resource requests on running pods.

Nomad supports autoscaling through the Nomad Autoscaler, which handles both horizontal job scaling and cluster node scaling. It supports multiple scaling policies and external metrics sources. The capability is comparable but the Kubernetes ecosystem has more mature tooling.

Stateful Workloads

Kubernetes StatefulSets provide stable pod identity and persistent volume management for stateful workloads like databases. The storage ecosystem (CSI drivers, storage classes) is mature and well-supported across cloud providers.

Nomad's support for stateful workloads is functional but less mature. Host volumes and CSI plugins are supported, but the tooling and documentation around stateful Nomad workloads is thinner than Kubernetes. For running production databases or other stateful workloads, Kubernetes has the edge.

Multi-Datacenter

Nomad has native multi-datacenter support. A single Nomad cluster can span multiple datacenters, and job specifications can express datacenter affinity and spread constraints. Federation between multiple Nomad clusters (multi-region) requires the Nomad federation feature.

Kubernetes multi-cluster federation has historically been a weak point. KubeFed was deprecated; the current recommended approaches (Argo CD for GitOps across clusters, Admiralty for workload federation) are add-ons rather than native capabilities.

Ecosystem Comparison

The Kubernetes ecosystem is vast and is one of the strongest arguments for choosing it. Helm, the package manager for Kubernetes, has thousands of charts covering virtually every infrastructure component. Operators (custom controllers) exist for Postgres, Redis, Kafka, Prometheus, and hundreds of other systems. The CNCF landscape is dominated by Kubernetes-native projects.

This breadth has a cost: the Kubernetes ecosystem is fragmented and inconsistent. Helm v2, v3, and the ongoing shift toward Helm-alternative tools like Timoni represent years of tooling churn. Operator quality varies enormously. Version compatibility matrices between Kubernetes releases and ecosystem tools are a constant maintenance concern.

Nomad's ecosystem is smaller but more coherent. The HashiCorp suite — Nomad, Consul, Vault, and Terraform — integrates tightly and with consistent design patterns. Vault integration for secrets injection in Nomad is first-class; Kubernetes Vault integration requires the Vault Agent Injector or the Secrets Store CSI Driver, each with their own complexity. For teams already invested in HashiCorp tooling, Nomad's ecosystem coherence is a meaningful advantage.

Performance and Resource Overhead

Kubernetes control plane components have non-trivial resource requirements. A production control plane (3 masters) with etcd, apiserver, scheduler, and controller-manager consumes several GB of RAM and meaningful CPU just for cluster management overhead, separate from workloads.

Nomad's resource overhead is significantly lower. Three Nomad servers with embedded Raft require roughly 256–512 MB of RAM each for clusters of moderate size. The reduced component count translates directly to lower overhead.

At the data plane level, Kubernetes kubelet and kube-proxy add overhead per node. A Nomad client node runs a single agent process. The difference in per-node overhead is meaningful at scale and particularly noticeable on smaller instance types.

Cost of Operations

The real cost of Kubernetes is in human time. A self-managed Kubernetes cluster running critical production workloads requires someone (or a team) who understands it deeply. This means a dedicated platform engineering function at many organizations.

The benchmark is roughly: a small Kubernetes cluster (10–20 nodes, single environment) requires at minimum one person with strong Kubernetes expertise spending a significant portion of their time on it. A mature multi-environment Kubernetes platform with CI/CD integration, GitOps, autoscaling, and comprehensive observability typically requires a dedicated platform team of 2–4 engineers.

Nomad's operational burden is lower. One engineer with solid infrastructure experience can maintain a Nomad cluster alongside other responsibilities. This isn't a slight against Kubernetes — it's a genuine operational difference that directly affects headcount decisions at smaller organizations.

Managed Kubernetes partially addresses this by offloading control plane operations, but the worker node management, add-on management, and Kubernetes-specific expertise requirement remain.

Migration Considerations

From containers to Kubernetes: The typical migration involves containerizing applications (if not already done), writing Deployment and Service manifests, setting up ingress, and integrating CI/CD with kubectl apply or a GitOps tool. Most teams underestimate the time required to set up a production-grade Kubernetes environment correctly — not just getting workloads running, but getting monitoring, alerting, autoscaling, and security policies configured.

From containers to Nomad: The migration is simpler in most cases. Nomad job specs are more concise than their Kubernetes equivalents. The networking model is easier to reason about initially. Teams already using Consul for service discovery in non-container workloads find the migration particularly smooth.

From Nomad to Kubernetes: This migration is driven by hitting Nomad's limitations — usually stateful workloads at scale, operator ecosystem gaps, or organizational alignment with Kubernetes tooling. The migration itself is a rewrite of job specs to Kubernetes manifests and a change in operational tooling.

From Kubernetes to Nomad: Uncommon but not unheard of. Typically driven by reducing operational overhead at organizations that find Kubernetes complexity exceeding its value for their use case.

Real-World Scenarios

When Kubernetes Is the Right Choice

A 200-person SaaS company with 30 microservices, a dedicated platform team of 4, and heavy use of the Kubernetes ecosystem (Helm, Prometheus Operator, cert-manager, Argo CD). The investment in Kubernetes expertise is justified by the team size and the value of the ecosystem. The platform team's Kubernetes specialization serves the entire engineering organization.

A startup that needs to run Kafka, Postgres, Redis, and Elasticsearch alongside their application services will benefit from the mature Kubernetes operators for each of these systems. The operator pattern handles complex stateful lifecycle management that would require significant custom tooling in Nomad.

When Nomad Is the Right Choice

A 40-person company running 10–15 services across two environments (staging and production) with a two-person infrastructure team that also owns CI/CD, networking, and cloud cost optimization. Kubernetes would require more dedicated attention than the team can give it. Nomad runs reliably with lower maintenance overhead, integrates with their existing Consul and Vault setup, and supports the mix of containerized and legacy services they still run.

An organization with heterogeneous workloads — some containerized microservices, some Java applications that need to run as JVM processes (not in containers), some batch jobs — benefits from Nomad's flexible task driver model. Running non-container workloads in Kubernetes requires cumbersome workarounds.

Recommendation Framework

Choose Kubernetes if:

You have or plan to hire a dedicated platform engineering team
You need mature stateful workload support (databases running in-cluster)
Your team is already invested in the Kubernetes ecosystem (Helm charts, CNCF tooling)
You're targeting enterprise customers who expect Kubernetes as the deployment standard
Your organization is large enough that ecosystem breadth matters more than operational simplicity

Choose Nomad if:

You have a small infrastructure team (1–3 people) with broad responsibilities
You're already using other HashiCorp tools (Consul, Vault, Terraform)
You have mixed workloads (containers + non-container processes)
You want a simpler operational model and are willing to accept a smaller ecosystem
Your organization values operational simplicity over access to the full CNCF landscape

Consider managed Kubernetes (EKS, GKE, AKS) as a middle path if:

You want Kubernetes ecosystem access but can't staff full control plane operations
Cloud-native integration is a priority (IAM, load balancers, storage)
You're willing to pay the managed service premium to reduce operational burden

The honest assessment: Kubernetes is the safe, widely-supported choice that most organizations will eventually adopt as they grow. Nomad is the right choice for organizations that want to run reliable container infrastructure without staffing a platform team, particularly those already in the HashiCorp ecosystem. Neither choice is permanent — migration is feasible in either direction when organizational needs change.

Table of Contents