For several of my projects I have implemented Mailgun for email delivery. That system works wonderfully to help handle the dirty work of email deliverability. However recently I encountered a problem where all messages which were to a gmail.com email account were not being sent. Looking at the logs I found:
Not delivering to previously bounced address
Looking further back along the logs I found:
5.1.1 The email account that you tried to reach does not exist. Please try 5.1.1 double-checking the recipient’s email address for typos.
However I knew that at least some of these email addresses did exist. A quick turn to Google resulted in lots of articles on this topic. It appears that Google had a lot of outage problems between December 14 and 15, 2020. But what was very unusual about this outage is the SMTP response code given when people were trying to deliver email. Instead of one of the many undeliverable response codes or simply not responding at all, it returned “5.1.1” basically saying the account doesn’t exist.
If the mailbox was full or some other temporary or technical issue the sending server would continue to try again at specific intervals before failing. It would be considered a temporary failure. However, SMTP 5.1.1 is a permanent failure, which means there is zero reason to try sending email again. Looking at the List of SMTP Server Return Codes you can see that any 5xx code is permanent, whereas a 4xx code is temporary. Which is what gmail.com should have returned.
If you threw together a basic email delivery service it would send the email and would receive the error message, but simply ignore it. Future email delivery attempts would also be sent. However, when you’re using a service like Mailgun, or if you wrote a robust SMTP API, it actually listens to these responses. In the case of Mailgun, because it wants to follow the rules as closely as possible, recognizes the response code as a permanent failure, and will add that email address to a suppression list — that is, even if you try emailing that account again, it will suppress the actual sending of that message. Now depending on how well your application is implementing that API will determine what you application does (ie. provide the end user a warning, etc).
Many electronic newsletter services like MailChimp will stop sending emails to that address in the future. As the business sending out newsletters you might find that your bounce rate significant increased during that timeframe. If you receive these newsletters and find that you’re not getting certain ones then you might need to re-subscribe to them. Also as a side note – this likely has zero impact on the spam you receive because they don’t follow the rules to begin with so the SMTP 5.1.1 error was likely ignored.
A few things moving forward:
- Be aware that if you send email to Gmail or someone using Gmail (like my wife’s school district) that if a message was returned undeliverable during the last two days (December 14 and December 15) to simply try sending it again.
- This should be a major wakeup call for Google – it didn’t fail in a trivial way – but in a very damaging way. Likewise as a application developer or manager, you should take a moment to consider when your system fails, how exactly does it fail?
- If you use Mailgun, SendGrid, or some other mail delivery service, you should check your suppression system and perhaps you need to remove some addresses from this list.
- If you use MailChimp, VertalContact, or some other newsletter service, you should check your bounce rates, and address them accordingly.
- It is not the fault of your sending server, IT department, newsletter provider or email delivery service for properly handling the 5.1.1 error sent by Google. They handled this error properly as it has been defined since 1982 — yes for 38 years. And the fact that they handled the message “properly” by industry accepted definitions (RFC 821) is not their error but rather Google – who is definitely big enough to know better.