The premise sounds obvious: test before you update. The problem is that most WordPress staging setups test the wrong thing. A local Docker container will tell you, in most configurations, whether a plugin activates without a fatal error — but it won’t tell you whether your LiteSpeed cache behaves correctly, whether Plesk’s PHP-FPM config survives the change, or whether Cloudflare is quietly routing traffic somewhere unexpected. I learned that last one the hard way, through a debugging session where the apparent WordPress problem turned out to have nothing to do with WordPress.
What started as a sysops hygiene project — a repeatable weekly update workflow for a handful of production WordPress sites — turned into a deeper look at where staging can actively mislead you.
Why local Docker isn’t enough for this class of work #
When I’m testing whether a WordPress core update, a WooCommerce bump, or a plugin combination will survive in production, I need the test environment to share the same constraints as production: the same web server (LiteSpeed, in this case), the same PHP-FPM configuration managed by Plesk, the same caching layer, and ideally the same volume of Action Scheduler jobs, webhook endpoints, and third-party integrations running in the background. A Docker container gives you WordPress and a database — not any of that surrounding context, and the context is where failures tend to hide.
The approach I settled on was a throwaway subdomain on the actual Plesk server: a real vhost, provisioned via Plesk WP Toolkit, populated by syncing production down to it. The local machine stays the control plane — it runs orchestration scripts over SSH and runs Playwright smoke tests externally against the staging URL. The staging environment itself lives on the same infrastructure as production, which keeps staging genuinely close to real conditions without requiring a separate server.
The neutralization problem after a production sync #
A production WordPress site for an e-commerce or marketing operation typically has a lot of outbound activity: CRM webhooks, Action Scheduler queues processing email sends and order events, pixel and analytics scripts firing on every page load, shipping integrations resolving rates, SEO plugins pinging for index status. A naive copy of that environment will do all of those things from a staging URL, and some of them will cause real damage.
I ended up with what I think of as a post-copy neutralization pass: a set of operations that run automatically after each production-to-staging sync, before any smoke testing begins. The specific items depend on your plugin stack, but the pattern splits into two categories.
Things that cause active damage if left on:
- Outbound webhook endpoints — blank or reroute them
- Action Scheduler queues — clear or pause so background jobs don’t run
- Transactional email — redirect through a trap (I use a staging-specific SMTP route that discards or captures rather than delivers)
- Pixel and tracking scripts — disable, or replace real account IDs with dead placeholder values
Things that clutter but don’t break anything:
wp_optionsvalues like site URL, admin email, and debug settings — re-apply staging-safe versions- Read-only integrations (an SEO plugin reading its own database, for instance) — I leave these active where I can confirm they’re harmless
The goal isn’t to disable everything; it’s to know exactly what’s live and make a deliberate call on each integration rather than leaving it to chance. This pass is not exciting work, but the duplicate contact record was a good enough argument for doing it every time.
The weekly gate: only promote if staging passes #
- Sync production database and files to staging
- Create a staging restore point via Plesk WP Toolkit
- Run the neutralization pass
- Apply WordPress core and plugin updates on staging
- Run Playwright smoke tests against the staging URL from the local machine
- If tests pass: apply the same updates to production
- If tests fail: restore staging from the restore point, investigate, notify
The Playwright tests aren’t exhaustive — they cover the homepage, a product or content page, checkout initiation, and login. Enough to catch a white screen of death, a broken layout, or a missing critical element. The goal is a fast gate, not a full regression suite.
Having the restore point before applying updates matters more than it sounds. When something breaks on staging you want to reset cleanly and try again without re-syncing production, because re-syncing is the slow step.
The Cloudflare detour that looked like a WordPress problem #
The symptom: visiting the domain returned only the site header, no content, and wp-login.php returned a 404. The Plesk error log showed:
File not found: /var/www/vhosts/default/htdocs/wp-login.php
That path — /var/www/vhosts/default/htdocs/ — is Plesk’s fallback docroot, not the actual vhost docroot. Something was resolving the request against the wrong directory.
I ran plesk repair web on the domain:
Checking web server configuration
Repair web server configuration for the domain? [Y/n] y
Repairing web server configuration for the domain .......... [OK]
Checking php-fpm configuration
Repair php-fpm configuration for the domain? [Y/n] y
Repairing php-fpm configuration for the domain .......... [OK]
Error messages: 0; Warnings: 0; Errors resolved: 0
Clean. No errors resolved because there was nothing to resolve — Plesk’s own configuration was fine. I had reread that output twice before I accepted what it was telling me and tried something different.
I tested directly against the server’s internal IP, bypassing DNS entirely:
curl -sk --resolve example.com:443:<internal-ip> https://example.com/ | sed -n '1,60p'
Full WordPress homepage. wp-login.php returned 200. The vhost, LiteSpeed, PHP-FPM, and the WordPress files were all working correctly.
So the public URL was serving a static index.html from /var/www/vhosts/default/htdocs/ — which matched exactly what I was seeing in the browser. Cloudflare was routing traffic somewhere else, and whatever it was routing to was serving that fallback file.
Switching the Cloudflare DNS records for the apex and www from Proxied to DNS only fixed it immediately. Full site came back, login worked, no further changes needed.
I still don’t know what changed in Cloudflare to cause this — nothing in the configuration had been touched recently. My best guess is a Cloudflare-side routing anomaly; I can’t say whether it would have resolved on its own, because switching the DNS record to DNS-only fixed it before I had time to find out. What I can say is that Plesk, LiteSpeed, the vhost config, and WordPress were never the problem.
The lesson I’ll carry from this: when a WordPress problem only reproduces through a public URL and not through a direct-to-origin request, stop debugging WordPress — a broken proxy can look exactly like a broken CMS, and without the habit of testing directly against origin, I would have spent hours in the wrong layer.
What changed after all of this #
The staging workflow now runs weekly and has surfaced two plugin conflicts before the production gate ran — one between a shipping rate plugin and a WooCommerce update, one involving a cache plugin that silently broke the checkout session after a WordPress core bump. Neither would have been obvious from a local Docker environment.
I still have gaps. The neutralization pass is manual in places where I’d prefer automation, and the Playwright tests are thinner than I’d like on mobile flows. But the gate is real now, and that’s the thing that was missing before.