Architecture Reference

OpenStack
Architecture

An OpenStack private cloud is a collection of loosely coupled services that communicate over internal APIs. Each service handles one concern — compute, networking, storage, identity — and can be scaled, replaced, or upgraded independently.

Overview

How the Pieces Fit Together

An OpenStack deployment has three logical layers: control plane, data plane, and storage.

CP

Control Plane

API servers, schedulers, and databases that manage the cloud. Runs on a dedicated set of nodes (typically 3 for HA) and handles all orchestration decisions. No tenant workloads run here.

DP

Data Plane (Compute)

The hypervisor nodes where virtual machines actually run. Each compute node runs a Nova agent and a Neutron agent. This is where your capacity lives — add nodes to scale linearly.

ST

Storage Layer

Persistent block, object, and file storage — typically provided by Ceph. Runs on dedicated storage nodes with high-density drives. Decoupled from compute so storage and compute scale independently.

Core Services

The Eight Essential Components

Every production OpenStack deployment includes these services.

Nova — Compute

Nova is the compute engine. It manages the lifecycle of virtual machines: scheduling, launching, live migration, resizing, and termination. Nova talks to the KVM hypervisor on each compute node via libvirt. It does not handle networking or storage — it delegates those to Neutron and Cinder.

Components: nova-api, nova-scheduler, nova-conductor, nova-compute

Neutron — Networking

Neutron provides software-defined networking for the cloud. It creates virtual networks, subnets, routers, and floating IPs. In production, Neutron typically uses OVN as its backend — providing distributed routing, security groups, and DHCP without centralized agents.

Components: neutron-server, OVN northd, OVN controller, ovs-vswitchd

Cinder — Block Storage

Cinder provides persistent block volumes that attach to VMs — the equivalent of AWS EBS. Volumes persist independently of the VM lifecycle. Cinder supports multiple storage backends, but Ceph RBD is the most common choice for production deployments due to its distributed, self-healing architecture.

Components: cinder-api, cinder-scheduler, cinder-volume

Keystone — Identity

Keystone handles authentication and authorization for every OpenStack service. It issues tokens, manages projects and roles, and provides a service catalog so that each component can discover the others. Keystone integrates with LDAP, Active Directory, and SAML/OIDC identity providers.

Components: keystone (WSGI), Fernet token backend

Glance — Image Service

Glance stores and serves virtual machine images — the base operating system templates from which VMs are launched. It supports multiple image formats (qcow2, raw, vmdk) and multiple storage backends. With Ceph, Glance images can be cloned instantly using copy-on-write.

Components: glance-api, image store (Ceph RBD or filesystem)

Horizon — Dashboard

Horizon is the web-based management UI. It provides a graphical interface for launching instances, managing networks, viewing quotas, and basic administration. Power users typically work through the CLI or API directly, but Horizon remains valuable for visibility and less-frequent operations.

Components: Django-based web application behind Apache/Nginx

Heat — Orchestration

Heat is the infrastructure-as-code engine for OpenStack. It reads declarative templates (HOT format) that describe collections of resources — instances, networks, volumes, load balancers — and creates them in the correct order with dependency resolution. Similar in concept to AWS CloudFormation.

Components: heat-api, heat-engine

Octavia — Load Balancing

Octavia provides load-balancer-as-a-service. It creates and manages HAProxy instances (called amphora) that distribute traffic across backend pools. Supports HTTP, HTTPS, TCP, and UDP load balancing with health checks, connection limits, and TLS termination.

Components: octavia-api, octavia-worker, octavia-housekeeping, amphora

Service Communication

How Services Talk to Each Other

Message Queue

OpenStack services communicate asynchronously via RabbitMQ. When Nova needs to launch a VM, the API server places a message on the queue; the scheduler picks it up, decides which host to use, and sends another message to the chosen compute node. This decoupling is what allows the control plane to remain responsive under load.

Database

Each service maintains its own database (typically MariaDB or MySQL with Galera replication for HA). Nova has a database, Neutron has a database, Keystone has a database. They never share tables. Cross-service communication always goes through the API layer, never through direct database access.

API Layer

Every OpenStack service exposes a RESTful HTTP API. Keystone provides a service catalog that maps service names to API endpoints. When Cinder needs to verify a user's identity, it calls Keystone's API. When Nova needs to attach a network, it calls Neutron's API. All inter-service calls are authenticated with Keystone tokens.

High Availability

Control plane HA is achieved by running three copies of each service behind HAProxy or a similar load balancer. Galera provides synchronous database replication across all three nodes. RabbitMQ runs in a mirrored-queue cluster. If any single control plane node fails, the cloud continues operating without interruption.

Storage Backend

Ceph Integration

Ceph is the default storage backend for production OpenStack deployments. It provides unified block, object, and file storage from a single cluster, eliminates single points of failure, and scales by adding commodity hardware.

Ceph integrates with three OpenStack services simultaneously: Cinder uses Ceph RBD for block volumes, Glance uses Ceph RBD for image storage, and Nova can use Ceph RBD for ephemeral disks. Because all three share the same Ceph cluster, operations like booting a VM from an image become instant copy-on-write clones instead of full data copies.

Why Ceph for OpenStack

  • No single point of failure — data is replicated across nodes and failure domains
  • Instant VM boot — Glance images clone to Cinder volumes via copy-on-write
  • Linear scaling — add OSDs to increase capacity and throughput
  • Self-healing — Ceph automatically rebalances when disks or nodes fail
  • S3-compatible object storage via RADOS Gateway
  • No proprietary storage appliance needed — runs on commodity hardware

Ceph Architecture

Monitors (MON)

Maintain the cluster map and consensus. Minimum 3 for quorum. Lightweight — can co-locate with OpenStack control plane nodes.

Object Storage Daemons (OSD)

One OSD per physical disk. Handle data storage, replication, and recovery. This is where your storage capacity lives.

RADOS Gateway (RGW)

S3 and Swift-compatible object storage API. Optional — deploy if you need object storage or Swift compatibility.

Metadata Server (MDS)

Required only for CephFS (shared filesystem). Optional if you only use block and object storage.

Sizing rule of thumb: Start with a minimum of 3 storage nodes, each with 4-12 OSDs. Use NVMe for performance-sensitive workloads (databases, boot volumes) and HDD for capacity-oriented workloads (backups, object storage). A 3-replica pool with 10TB raw yields roughly 3.3TB usable.

Networking

Network Architecture

Production OpenStack deployments use multiple physically or logically separated networks.

Management Network

Carries API traffic, database replication, and RabbitMQ messaging between control plane nodes. Isolated from tenant traffic. Typically a dedicated VLAN or physical interface.

Tenant Network

Overlay networks (VXLAN or Geneve) that carry VM-to-VM traffic. OVN provides distributed virtual routing, security groups, and DHCP. Each project gets isolated L2 segments.

Provider / External

The network that connects VMs to the outside world via floating IPs. Mapped to a physical VLAN or flat network. Neutron manages IP allocation and NAT through the gateway nodes.

Storage Network

Dedicated high-bandwidth network for Ceph replication and client I/O. Separating storage traffic from management and tenant traffic prevents I/O storms from affecting API responsiveness. Minimum 10GbE; 25GbE recommended for NVMe-backed clusters.

IPMI / Out-of-Band

Baseboard management controller network for hardware-level access: power cycling, BIOS configuration, remote console. Completely isolated from all other networks. Essential for automated bare-metal provisioning with Ironic.

Reference Design

A Production-Ready Deployment

A typical mid-size OpenStack deployment for 200-500 VMs.

3

Control Plane Nodes

Each runs: Keystone, Nova API, Neutron server, Cinder API, Glance, Horizon, Heat, Octavia, MariaDB/Galera, RabbitMQ, HAProxy. 32-64 GB RAM, 500GB NVMe, dual 10GbE.

10+

Compute Nodes

Each runs: nova-compute, neutron-ovn-agent, libvirtd. Sized to your workload — 256-512 GB RAM for general purpose, 1-2TB for memory-intensive. Dual-socket, 25GbE.

3+

Storage Nodes (Ceph)

Each runs: ceph-osd (multiple per node), ceph-mon (co-located on 3 nodes). 8-12 drives per node. NVMe for WAL/DB acceleration. Dual 25GbE for cluster and public networks.

Next Step

Understand the Costs

Now that you know how OpenStack is architected, see how the total cost of ownership compares to public cloud at production scale.