Some Bitbucket customers hosted in the EU-WEST-1 AWS region experienced slow connections when cloning repositories. The Bitbucket Support team received notifications about these slow connections and triggered an incident to have our SRE team investigate the issue. A team of engineers gathered to look at the issue and were able to correlate the time when the reports started coming in with a networking change that updated the Bitbucket production traffic route between Dublin and where Bitbucket is hosted. Once this correlation was discovered, the change was rolled back and the the immediate problem was addressed.
In order to put the networking route change back in place, our team of network engineers started investigating the root cause of the problem. After a couple of days of troubleshooting and data analysis the root cause was pointed to physical cross-connect issues that were causing CRC errors on the provider side. This caused slow download speed in one of the directions between where Bitbucket is hosted and Dublin. The physical cross-connect issues have been fixed and the root cause of the problem has been addressed. The networking route change is now back in place with no incidents. Additionally, the redundancy link between these locations, which had been plan all along but was waiting to be installed, is also now in place.
Our Network Engineering team conducts different ping tests (big size, short timeout etc) when commissioning new circuits, however none of these tests caught this particular issue. They are adding new steps to do a thorough performance testing before bringing circuits in production.