A delivery runbook is an operational document that captures procedure, rationale, and recovery paths for the operations your agency runs repeatedly. This guide covers the five highest-frequency WordPress site operations every agency runbook should address, how to write entries that survive staff turnover, and how to connect each entry to the sites it governs. The result is a document that sharpens with every execution, turning incidents into improvements rather than recurring failures.
A runbook is not a checklist; it is the operational document that captures what to do, why each step exists, and what to do when a step fails. A checklist is a memory aid for someone who already knows the procedure. A runbook is the procedure itself, documented for anyone on the team, including the person who joined last week.
The distinction matters at agency scale. When you manage multiple WordPress sites across a rotating team, a checklist relies on tacit knowledge that leaves with each departing employee. A runbook externalizes that knowledge into a persistent document the team can run against, audit, and improve over time.
Consider a site launch. A checklist might say: “Enable caching.” A runbook entry says: “Enable caching on the production server using the agency standard configuration (see: Site Variables, Caching Layer), because uncached WordPress under traffic load on launch day caused the Q3 client incident. If the caching layer fails to activate, revert to the static pre-launch state and open an incident.” That third sentence, the recovery path, is what a checklist never carries. It is what turns a generic instruction into a durable operational asset.
A checklist is disposable. A runbook compounds. Every time a team member executes a procedure and updates the entry afterward, the document becomes more reliable. After two years of consistent use and post-incident updates, the runbook carries institutional knowledge no individual on the team could reconstruct from memory alone.
Five operations recur in every WordPress agency regardless of team size: site launch, core and extension updates, client onboarding, scheduled WordPress maintenance, and incident response. These are not the only operations an agency runs, but they are the ones where undocumented procedure causes the most expensive failures.
Site launch covers the full sequence from staging sign-off to DNS propagation confirmation. Every agency has learned at least one painful launch lesson; the runbook is where those lessons live permanently, not in the memory of a senior developer who may leave next quarter.
Core and extension updates covers the cadence, the staging verification step, the production deployment sequence, and the rollback trigger condition. This is the operation most agencies run informally, and the one that causes the most unplanned downtime. A WordPress maintenance plan that scales across multiple client sites depends on this runbook entry being precise and consistently followed across the fleet.
Client onboarding covers the steps to provision a site in the agency’s operating environment: access grants, branding kit configuration, initial site status, and the first Playbook entries. Documenting this operation reduces new client setup from a three-person coordination effort to a one-person procedure.
Scheduled maintenance windows cover the client communication sequence (when to notify, through which channel), the maintenance-mode activation procedure, the scope of work to be performed, and the verification sequence before the site returns to live status.
Incident response is the operation most agencies never document until after a damaging incident. The runbook entry does not need to anticipate every failure mode; it needs to establish the response sequence: detect, contain, communicate, resolve, and record. A team following a documented sequence under pressure makes fewer compounding errors than one improvising from memory.
Each runbook entry needs four components: the trigger condition, the ordered procedure, the verification step, and the rollback path. Without all four, the entry is a partial document that forces the operator to improvise at exactly the wrong moment.
The trigger condition answers: what causes this operation to run? For a site launch, it is client sign-off on staging plus a confirmed DNS cutover window. For a core update, it is a new WordPress release with a security classification. Documenting the trigger removes ambiguity about when to start and prevents premature execution on a site that is not ready.
The ordered procedure is the numbered sequence of steps. Each step should be atomic enough to verify independently: not “configure the server” but “set PHP memory limit to 256MB in wp-config.php and confirm with a phpinfo() check.” When an operation uses wordpress automation (a deployment script, a backup command, a pre-launch preflight), the runbook step names the script and its expected output, not just the intention behind it.
The verification step answers: how do you know the operation succeeded? For a site launch, this is a structured list of URLs to test, a load threshold to confirm, and a client confirmation to collect. For a maintenance window, it is a specific set of site functions to confirm are operating correctly before the maintenance notice comes down. Verification that cannot be checked is not verification.
The rollback path is the most important component and the most commonly omitted one. For every operation, document the condition that triggers a rollback and the exact steps to reverse the procedure. A runbook entry without a rollback path is incomplete for any operation that touches a live site.
A runbook entry carrying all four components is self-sufficient: a new team member can execute a covered operation without asking a senior colleague. That is the practical test for whether an entry is complete enough to ship.
A runbook that lives in a shared document no one maintains is almost as costly as a runbook that does not exist; survival requires ownership, version history, and rationale embedded in every step. The institutional memory leak is the most expensive failure mode for a growing WordPress agency, and a decaying runbook is still a leak.
Assign an owner to each operation entry. The owner is the person responsible for keeping the entry current, not necessarily the person who executes the procedure. When an entry becomes outdated, there is a named person to update it. When a team member leaves, the handoff explicitly includes their runbook entries, with the incoming owner reviewing and approving the current state before the departure is complete.
Store the runbook in a system that records change history. When a procedure changes, capture the reason alongside the change. Three months after an update, “changed caching activation step because new server infrastructure requires a different sequence” is worth far more than the updated step alone. Rationale in the version history prevents the team from reverting a deliberate change because they no longer remember why it was made.
Embed rationale inside the entries themselves, not only in the version history. The rationale does not need to be long: a single sentence per step that would surprise a new operator is enough. Steps that “everyone knows” are the first ones to cause incidents when the people who knew them leave.
Use the onboarding test as the most reliable quality check: give a new team member a runbook entry for an operation they have not performed before and ask them to execute it without assistance. Every place they stop and ask a question is a gap in the runbook, not a gap in their knowledge. Close those gaps before the next execution, not after.
A runbook disconnected from site-specific context delivers generic instructions that fail on site-specific details; every operation entry should reference the variables that change from one client site to the next. The procedure for a core update is the same across your fleet in structure, but the staging URL, the backup location, the client communication contact, and the rollback threshold differ per site.
Structure runbook entries to distinguish the constant procedure (steps that apply to every site) from the site variables (the values that change). The constant procedure lives in the runbook. The site variables live in the site record. When an operator runs the update procedure for a specific client, they follow the constant procedure and substitute the site-specific values from that client’s record. This structure scales to a fleet of any size without duplicating procedures.
This is what it means to operate WordPress as an operating layer across your agency fleet rather than treating each site as a standalone engagement. The same runbook entry governs all client sites; only the site variables differ. Agencies that manage multiple WordPress sites at scale know that per-client procedural duplication is where consistency breaks down first.
Site variables worth recording for each client include: staging and production URLs, backup schedule and storage location, client communication contacts with their notification preferences, performance baselines, and any site-specific constraints that override the standard procedure. A client who requires 72 hours advance notice before any maintenance window is a site variable, not a note buried in a chat thread that no one will find at 11pm on a Tuesday.
Connect the runbook to each site’s Playbook entries so that recorded decisions surface during the operations they govern. A client preference recorded in the Decisions log should be visible when the maintenance runbook entry is opened for that site. Without that connection, the decision exists but does not govern the operation, which is the structural gap that produces client-facing errors that feel surprising and are entirely avoidable.
Every incident is a runbook audit: if a step failed or was improvised under pressure, the gap belongs in the document, not only in the post-mortem. The instinct after an incident is to fix the immediate problem and move on; the discipline is to spend thirty minutes updating the runbook before closing the incident record.
Run a structured post-incident review after any incident that required improvisation or caused client impact. The review asks four questions: what was the trigger, what step failed or was missing, what did the team do instead, and what would the correct runbook entry have said? The answer to the last question is the update. Write it before the memory of the improvisation fades, because that improvisation is the most valuable data the incident produced.
Not every incident reveals a runbook gap; some reveal a gap in a site variable record. If a team member improvised because a staging URL was not recorded, the update is to the site record, not the procedure. Distinguishing between procedure gaps and information gaps ensures updates go to the right place and the runbook does not accumulate site-specific data that belongs elsewhere.
Track the update frequency as a signal. A runbook entry updated five times after incidents in six months indicates an unstable operation, one that warrants investigation into whether the underlying procedure is sound. A high-frequency operation entry untouched for two years is a candidate for a scheduled review: either it is genuinely stable, or the team has stopped recording incidents against it.
The compounding value of a living runbook is that each update makes the next execution of that operation safer. An agency that has run site launches for five years and updated its launch entry after every gap now holds five years of collective learning in a single document. That document belongs to the agency, not to any individual on the team. It is what separates an agency that operates a fleet from one that manages a collection of sites one at a time.
A runbook is a type of standard operating procedure, but written specifically for execution under time pressure. Runbooks include explicit trigger conditions, rollback paths, and failure modes that a general SOP often omits. The defining test: a runbook should be self-sufficient for anyone on the team, including someone new to the operation, without requiring additional guidance from a senior colleague.
Long enough to be self-sufficient, short enough to be read under pressure. A well-structured entry for a site launch typically covers one to three pages: trigger condition, a numbered procedure of 10 to 20 steps, a verification checklist, and a rollback path. Entries that run longer often carry information that belongs in site records or separate reference documents. If an entry requires deep background reading before execution, it needs to be split.
After every incident that required improvisation, after every significant change to the agency’s operating environment (new server infrastructure, a major WordPress release, a change in standard procedure), and on a quarterly scheduled review for entries that have not been touched. The incident-driven updates matter most: do not wait for the quarterly review to record what an incident has already exposed.
One runbook covers all clients. The runbook holds the constant procedures; the site-specific variables (URLs, contacts, constraints, baselines) are recorded in each client’s site record. Running the site launch procedure means following the runbook and substituting the variables from that client’s record. This structure scales to a fleet of any size without duplicating procedures, which is what makes consistent delivery across many clients operationally possible.
Documenting only the steps that work. Most runbook entries cover the correct sequence and omit the rollback path and failure conditions. When something goes wrong, the team improvises because the runbook only addresses success. The rollback path is the most important section of any runbook entry: it is the part the team will reach for under the most pressure, and it should be written before the first production execution, not after the first incident.
200 free credits. Just describe what you need.
See It In Action