- Security
Email Security Best Practices: How To Keep Your Email Program Safe
We want to provide you a full report on the connectivity issues that have afflicted some of our Mailgun customers these past few days.
Rackspace, our parent company and hosting provider, communicated with us on Friday, September 26th that it would be performing reboots on cloud instances in the Chicago data center (ORD) that could affect Mailgun’s infrastructure, between Sunday, September 28, 11:00 a.m. UTC, and Monday, September 29, 11:00 a.m. UTC. You can read Rackspace’s explanation of the reboots here: http://www.rackspace.com/blog/an-apology/
Mailgun operates several environments in multiple data centers using the Rackspace hybrid cloud setup that employs load balancers from F5 and are used as the primary gateway for all incoming and outgoing connections in our data centers. In the event of outages or maintenances in these environments, Mailgun gracefully reroutes traffic from the affected environments using F5 virtual IP pools, without customer impact or DNS changes. This gives us more control over the traffic, allowing us to gradually increase/decrease the load balancing ratios on the virtual IPs and distribute traffic across environments.
To prevent customer impact as a result of the ORD cloud instance reboots, we prepared to switch the traffic from this environment and made all necessary changes to do so. On Saturday, September 27, at 6:50 p.m. UTC, we noticed that all database replicas as well as internal traffic flow between environments between all regions was broken and started reporting timeouts.
We were unable to determine the root cause of the timeouts before the cloud instance reboots in ORD began and consequently could not switch the traffic in time to avoid customer impact.
Cloud instance reboots in ORD began at 1:10 a.m. UTC on Monday, September 30, and resulted in the following problems between the hours of 1:10 a.m. UTC and 11:29 a.m. UTC:
Intermittent connection failures
Lost events between 2:30 a.m. UTC and 7:00 a.m. UTC
Approximately 100,000 duplicate emails sent
All messages reported as accepted during this outage have been delivered.
A portion of Mailgun customers continued to experience intermittent connection loss and failure rates through the afternoon on Tuesday, September 30.
We worked with the Rackspace enterprise networking team on the investigation and were finally able to identify the source of the problem.
The cloud maintenance triggered the error condition on Mailgun F5 servers that started reporting the Path MTU of the value 296 for the IPs on Mailgun networks and was cached by all the F5s of Rackspace dedicated customers using Mailgun.
The F5s of the Rackspace dedicated customers cached MTU of the 296 but the servers behind the F5s were only capable of sending packets with the minimum of the 512 MTU which triggered the packet loss as Mailgun’s F5 enforced the MTU of the smaller value.
The Rackspace networking team cleared the caches on Mailgun’s F5s and some Rackspace dedicated customers’ F5s, and set up the new virtual IP for Mailgun services. Mailgun changed DNS settings, which forced clearing of the caches on the remote F5s. This workaround solved the remaining issues for the Rackspace dedicated customers using Mailgun.
We continue to work with Rackspace to investigate the root cause of the problem. At this point, we can say that it is related to an unusually low path MTU value that was originally cached and enforced by Mailgun’s F5 load balancers.
We take uptime seriously and apologize to all Mailgun customers that were affected by this issue. Once we have completed our root cause analysis on the F5 issue, we will take steps to ensure that when our cloud infrastructure undergoes maintenance, there will be no impact on Mailgun customers.
Last updated on August 27, 2019
Email Security Best Practices: How To Keep Your Email Program Safe
Mailgun’s Active Defense Against Log4j
Vulnerability Management: Working With the Community To Patch Security Threats
A Word of Caution For Laravel Developers
Privacy Matters: Your Data Is Safe With Us
TLS Version 1.0 and 1.1 Deprecation
Password Meters Are Not For Humans
Session Awareness & Account Management - How Active are You?
Common Phishing Email Warning Signs
The Bug Hunt Is On — Mailgun Goes Public With Bugcrowd
InboxReady x Salesforce: The Key to a Stronger Email Deliverability
Become an Email Pro With Our Templates API
Google Postmaster Tools: Understanding Sender Reputation
Navigating Your Career as a Woman in Tech
Implementing Dmarc – A Step-by-Step Guide
Email Bounces: What To Do About Them
Announcing InboxReady: The deliverability suite you need to hit the inbox
Black History Month in Tech: 7 Visionaries Who Shaped The Future
How To Create a Successful Triggered Email Program
Designing HTML Email Templates For Transactional Emails
InboxReady x Salesforce: The Key to a Stronger Email Deliverability
Implementing Dmarc – A Step-by-Step Guide
Announcing InboxReady: The deliverability suite you need to hit the inbox
Designing HTML Email Templates For Transactional Emails
Email Security Best Practices: How To Keep Your Email Program Safe
Mailgun’s Active Defense Against Log4j
Email Blasts: The Dos And Many Don’ts Of Mass Email Sending
Email's Best of 2021
5 Ideas For Better Developer-Designer Collaboration
Mailgun Joins Sinch: The Future of Customer Communications Is Here
Always be in the know and grab free email resources!
By sending this form, I agree that Mailgun may contact me and process my data in accordance with its Privacy Policy.