For about three months, our WordPress site threw a 500 error roughly once every ten days. Refreshing the page brought it back immediately, so it felt like a blip — easy to dismiss, hard to explain. The pattern only became obvious when I noticed flushing the object cache made the problem disappear completely, until the next time.
That was the only real clue I had for a while: flush the cache, buy ten more days.
What the big-keys scan actually showed #
Sampled 77726 keys in the keyspace!
Total key length in bytes is 8083517 (avg len 104.00)
Biggest string found '[hash]:options:notoptions' has 3971911 bytes
Biggest zset found '[hash]:redis-cache:metrics' has 2471 members
One key — options:notoptions — was sitting at nearly 4 MB. Everything else was noise by comparison: some site-transient:feed_* blobs around 0.5 MB, various post_meta entries in the 100–220 KB range. That one string dwarfed the rest.
The name sounds harmless, even a little bureaucratic. It is not.
options:notoptions is a WordPress Options API cache entry that tracks option names that do not exist. Every time plugin or theme code calls get_option('some_missing_option'), WordPress checks the database, finds nothing, and adds that name to this array so it does not have to check again. Stored without a TTL in Redis, the array just keeps growing — and the most likely failure point is when a request pulls that serialized blob into PHP and tries to unserialize() a 4 MB string, consistent with a memory spike that tips the limit on that one request, hence the intermittent 500 and the recovery on refresh; the cache flush just restarted the clock.
I did not figure this out immediately. I spent time suspecting a plugin conflict — plausible, because the 500s correlated loosely with activity from a few plugins that called get_option() heavily — and then another stretch suspecting a database query timeout, which the slow-query logs neither confirmed nor ruled out cleanly. The big-keys scan was what actually pointed me somewhere useful, and it took about two minutes to run.
The config changes #
wp-config.php:
// Exclude groups most likely to cause unbounded growth
define('WP_REDIS_IGNORED_GROUPS', [
'site-transient', // remote feed blobs can be large (~0.5 MB per scan) — Redis overhead without proportional read benefit
'users', 'userlogins', 'usermeta', 'user_meta', // rarely beneficial in Redis for most sites
]);
// Cap key lifetime — bad entries cannot survive indefinitely
define('WP_REDIS_MAXTTL', 86400); // 24-hour ceiling
// Enforce consistent serialization
define('WP_REDIS_SERIALIZER', 'php');
The TTL cap does not fix notoptions directly, since WordPress sets that key without a TTL, but it reduces the blast radius of any other unbounded entry that slips through.
On serialization: php serialization is slower and produces larger strings, but it is consistent — and if igbinary is not cleanly available across your entire stack, a deserialization mismatch produces its own 500, which makes it a worse trade. Our setup has more than one PHP handler touching the same Redis instance, so consistent was the right call. If your stack is uniform, igbinary is worth it.
Raising the PHP memory limit to 512M was a backstop, not a fix. A 4 MB unserialize() hitting a 256M limit is a real collision, but raising the ceiling just moves the threshold for the next one. The three changes above are what actually matter.
The Elementor piece #
I cannot say how much of the notoptions growth Elementor was responsible for versus plugins calling get_option() speculatively, because I did not isolate that variable before making the other changes. That is an honest gap in my diagnosis — and the reason I moved it here rather than treating it as part of the root cause.
What changed after #
notoptions has not grown back anywhere close to 4 MB.
What I keep coming back to is the three months I spent treating this as a flaky server issue before running a two-minute scan.