A mailflow rule meant to catch spoofed mail started blocking legitimate transactional email from a third-party shipping provider. The regex logic looked sound in isolation. The gap it exposed was not.
I wrote the rule. I tested it against a small sample. I deployed it. Then I found out a client had been waiting three days for an invoice tied to a delayed shipment — sitting in quarantine the whole time.
What the rule actually did #
List-Id header. The idea was to catch bulk spoofed mail that was slipping through by riding mailing list headers.
Here is roughly what I wrote in the admin console:
Apply this rule if:
'Authentication-Results' header matches:
'(?i)\barc\s*=\s*fail\b'
OR
'(?i)\b(dmarc|dkim)\s*=\s*fail\b'
Do the following:
Set spam confidence level (SCL) to 9
Stop processing more rules
On paper, combining three failure signals before escalating to the highest spam confidence level felt conservative. In practice, it was a conjunction that looked conservative but behaved like a trap for a specific class of legitimate mail I had not thought about.
Why legitimate mail failed all three checks #
SPF failed at our mail provider because the connecting IP was the security gateway — an IP that is not listed in the sending domain’s SPF record. The gateway is authorized by our organization to relay mail, but the sending domain has no knowledge of it. SPF alignment broke at the handoff.
DKIM passed at the gateway but failed by the time our provider re-evaluated the message. The signature used c=relaxed/simple — relaxed header canonicalization, strict body. The gateway had appended a legal disclaimer and rewritten some URLs. Those body changes invalidated the signature. c=relaxed/simple is known to be fragile against body modification in transit. I knew that in the abstract. I did not account for it when designing the rule.
DMARC failed as a downstream consequence. With SPF alignment broken and DKIM broken, DMARC had nothing to pass on.
ARC was absent. ARC (Authenticated Received Chain) is the mechanism designed for exactly this scenario: a trusted intermediary records the authentication state it observed before the message was modified, and signs that chain so the next hop can verify it. If the gateway had been publishing ARC seals, our provider could have seen that DKIM passed upstream. But the gateway was not generating ARC headers. So from our provider’s perspective, this was just a message with three authentication failures and a List-Id header.
My rule had no awareness of that asymmetry. It saw three failures and applied SCL 9.
The actual mistake was not the regex #
I assumed that if legitimate mail passed through the gateway cleanly, authentication signals would be preserved end-to-end. I was modeling the gateway as invisible infrastructure. It is not. It is an active participant in the mail path that rewrites bodies, strips or adds headers, and changes the connecting IP at the delivery boundary. Every one of those actions can — and in this case did — alter the authentication state of a message.
Because I was not thinking about the gateway as an actor, I also was not thinking about what happens when a sending domain has no ARC support and relies on DKIM body integrity to survive transit through an intermediary that modifies messages. Those two gaps together — gateway opacity and absent ARC — created the condition my rule was not equipped to handle.
I also tested against a small sample of recent quarantine candidates. None of them were transactional mail routed through an external gateway. My test set confirmed the rule worked against the threat I was thinking about and told me nothing about the threat I was not.
What I actually changed #
The decision I had to make was whether to fix the rule or fix the gateway. Fixing the rule meant loosening the logic enough to exclude this class of legitimate mail — which defeats most of the purpose. Fixing the gateway meant enabling ARC sealing so that downstream providers can see what the gateway observed before it modified the message.
The right answer is the gateway. ARC exists to solve the exact problem of trusted intermediaries that break authentication signals. If the gateway publishes ARC seals, my original rule becomes viable again because legitimate mail routed through the gateway would carry an ARC pass, and the rule’s ARC = fail condition would correctly exclude it.
I have not fully resolved this yet. Enabling ARC on the gateway is in progress but involves a vendor configuration that I do not fully control. The allow-list stays in place until that is confirmed working.
The governance gap underneath this #
The deeper issue is that I had no documented map of what the email path actually looks like for inbound transactional mail from third-party senders. I knew the gateway existed. I did not know which sending domains route through it, how it handles body modification, or whether it was generating ARC headers.
That information gap is infrastructure governance, not email configuration. And it is the kind of gap that only becomes visible when a rule with reasonable logic hits an edge case that falls outside what you modeled when you wrote it.
For any rule that operates on authentication signals — SPF, DKIM, DMARC, ARC — the rule is only as good as your understanding of the full mail path. If you have an intermediary in that path, you need to know what it does to message integrity before you write logic that depends on message integrity being preserved.
I did not have that picture. I should have.