There was a long downtime, yesterday evening for the deployment service (SSH/Git) for the whole EU area, around 4 hours, connected to a scheduled maintenance done yesterday. Luckily Sunday evening/night is not prime time when it comes to deployment — and it was only deployment, not web delivery — still that was more downtime, as we are comfortable with.
The technical issue was quite complex, as usual in such cases. A simple explanation goes like this: There was a complication when restarting the service after upgrade, resulting in an unstable state. We immediately cloned the service, but that took a little time and after that was finished, numerous other problems appeared.
We — of course — learned a lot and we are going to implement further steps in hardening the platform, by technical improvements and better procedures protocol.
We have probably focused too much on the web delivery part — as that is the most critical part of the fortrabbit infra — while preparing and testing this update in our staging environment.