When one machine or one instance can’t handle your agent workload, you scale out: run agents on multiple machines so you get more capacity, better latency, and resilience to single-node failure. Scaling agents across machines involves load balancing, state and context handling, and making sure document and PDF workflows still behave consistently no matter which node serves the request. This guide covers practical patterns for US teams running OpenClaw or similar agent systems at scale.
Summary Scale agents by adding more machines (horizontal scaling), use a load balancer and shared state (e.g., Redis or your backend) so any node can serve any user, and keep document processing in one pipeline. iReadPDF gives you consistent PDF summarization and extraction so every node gets the same document format without duplicating file handling.
Why Scale Across Machines
Single-machine limits show up as:
- Throughput: Too many concurrent users or requests; the one instance is saturated.
- Latency: Long queues or slow responses because one brain does everything.
- Resilience: If that machine or process dies, the whole service is down.
- Geography: Users in different regions need low latency; one region can’t serve everyone fast.
Scaling across machines addresses these by adding more nodes. More nodes mean more capacity and the ability to survive the loss of one (or more) nodes. For US teams with growing usage or strict uptime needs, multi-machine scaling is often the next step after outgrowing a single instance.
Core Concepts for Multi-Machine Scaling
Before you add machines, you need:
| Concept | Purpose | |---------|---------| | Stateless agents | Each request can be handled by any node. No "this user’s session lives only on node 3." State (memory, conversation history) lives in a shared store (DB, Redis, etc.). | | Load balancing | Incoming requests are distributed across nodes (round-robin, least connections, or by user/session hash so one user sticks to one node if you need sticky sessions). | | Shared context store | User memory, conversation history, and any cross-request state are read from and written to a central store so any node can continue a conversation. | | Document pipeline as a service | PDF handling isn’t duplicated per node. All nodes call the same pipeline (iReadPDF or your backend that uses it) so summaries and extractions are consistent and you don’t have N copies of document logic. |
Once state is shared and document access is centralized, adding machines is mostly adding capacity.
Pattern 1 Stateless Agents with Shared State
In this pattern, every agent process is stateless. It doesn’t keep user context in local memory; it loads context from a shared store at the start of each request and writes updates back at the end.
How it works:
- Request arrives at a load balancer and is sent to any healthy node.
- Node loads context for the user/session from the shared store (e.g., Redis or your DB). That includes conversation history and any persistent memory (e.g., OpenClaw memory).
- Agent runs using that context. If the request needs a document summary, the node calls your document pipeline (e.g., iReadPDF) or a backend that returns pre-computed summaries. The node does not store PDFs locally.
- Node writes back updated context (new messages, updated memory) to the shared store and returns the response.
- Next request for the same user may hit a different node; that node loads the same updated context and continues seamlessly.
Pros: Simple horizontal scaling; add nodes to add capacity. Cons: Every request pays the cost of reading/writing shared state; you need to design the store for low latency and high availability.
Document handling: No node should hold raw PDFs or run its own OCR. All document needs are satisfied by calling a single pipeline. iReadPDF runs in the browser or via your backend; agents on any node request "summary for document X" and get the same format. That keeps scaling and compliance simple for US teams.
Try the tool
Pattern 2 Dedicated Nodes per Agent Type
Here you scale by agent type: e.g., a pool of "research" nodes, a pool of "comms" nodes, and a pool of "document" nodes. The orchestrator or gateway routes each subtask to the right pool.
How it works:
- User request hits the gateway or orchestrator. It decomposes the request (e.g., "prep board meeting" → get notes, get metrics, summarize board pack).
- Orchestrator routes each subtask to the right pool. "Summarize board pack" goes to the document-agent pool; "get metrics" goes to the metrics pool.
- Pool load balancing: Within each pool, nodes are load-balanced. So you might have 3 document nodes and 5 research nodes. Scaling means adding nodes to the pool that’s bottlenecked.
- Results are aggregated by the orchestrator and returned to the user. Document nodes only talk to your PDF pipeline (iReadPDF); they don’t need to know about research or comms.
Pros: You scale only the part that’s slow (e.g., document summarization). Isolated failure: if document nodes are down, research and comms can still run for non-document tasks. Cons: More moving parts; you need an orchestrator and per-pool configuration.
Document handling: Document nodes are the only ones that need to call the PDF pipeline. They return summaries to the orchestrator or the next agent. Using iReadPDF as the single source for extraction and summarization keeps document output consistent across all document nodes and simplifies security and audit for US organizations.
Implementing Scaling Step by Step
Step 1: Make Agents Stateless
Ensure each agent process doesn’t rely on in-memory state between requests. All user and conversation state goes to a shared store (DB or Redis). Identify what you store: conversation history, OpenClaw-style memory, and any per-user settings.
Step 2: Introduce a Shared Context Store
Choose a store (e.g., Redis for speed, Postgres for durability) and define the schema: user ID, session ID, conversation turns, memory blobs. Implement read-on-request and write-after-response so any node can serve any user.
Step 3: Add a Load Balancer
Put a load balancer (or API gateway) in front of your agent nodes. Configure health checks so unhealthy nodes are removed from the pool. Prefer stateless routing (round-robin or least connections) unless you need sticky sessions for a specific reason.
Step 4: Centralize Document Access
Ensure no node reads PDFs from local disk or runs its own OCR. Route all document summarization and extraction through one pipeline. iReadPDF gives you one format and one place to audit; agents on any machine request summaries from that pipeline (or from your backend that uses it).
Step 5: Add Nodes and Monitor
Add more nodes and watch latency, throughput, and error rates. Scale the pool that’s bottlenecked. If document workload grows, add more document nodes (in Pattern 2) or more general nodes (in Pattern 1); in both cases they all use the same document pipeline.
Documents and PDFs When Scaling
Document handling must stay consistent when you scale:
- One pipeline, many nodes: All agent nodes that need PDF content get it from the same pipeline. iReadPDF runs in the browser or powers your backend; agents call an API or read from a cache of pre-computed summaries. No node should run its own PDF parsing.
- Caching (optional): For frequently accessed documents (e.g., standard contracts, board pack), you can cache summaries in your shared store so nodes don’t hit the pipeline on every request. The pipeline remains the source of truth; cache is a performance optimization.
- Audit: Log which node (or which agent) requested which document and when. With one pipeline, you have one place to enforce access control and retention for US compliance.
Conclusion
Scaling agents across machines increases capacity and resilience: stateless agents with shared state, load balancing, and optional dedicated pools per agent type. Keep agents stateless, use a shared context store, and centralize document handling so every node uses the same PDF pipeline. iReadPDF gives you consistent summarization and extraction so scaling doesn’t scatter or duplicate document logic—critical for US teams that need reliability and clear audit trails.
Ready to scale your agents without scattering document handling? Use iReadPDF for OCR, summarization, and extraction—one pipeline for every node.