On Thursday 1. Oct 2020, starting from 14:15:00 UTC and until 23:00:00 UTC our customers were affected by downtime on our platform. The event was triggered by a system wide update of a central component. This update caused interruption in web delivery and code deployments (SSH & SFTP).
Rest assured, the responsible party is painfully aware of the distress and inconvenience this type of event causes our clients. We will not fire this person just yet, and hopefully as a result of internal discussions, we will improve our practices and avoid scenarios like this in the hereafter.
More than 28% of all Universal Apps were potentially affected, and we had around 45-65 support cases during the time of the incident. The real impact was somewhere between 2% - 15% of all Apps. For individual Apps, the outage ranged in duration from 25 minutes to six hours. Less than a handful of Pro Apps were affected.
We booted new hosts and transferred affected Universal Apps to the new, unaffected hosts. We believe that at the core of the issue, was an unresponsive filesystem causing other parts of the system to fail during the system wide update.
A greater emphasis on gradual deployment of system-wide updates is the main take away from this incident. We could have easily avoided downtime for the majority of the Apps if the deployment had been done in incremental stages. We may introduce more rigorous maintenance of the filesystem used for Universal Apps. The kernel was in fact reporting this issue prior and during the incident (dmesg). We may incorporate monitoring of these specific messages in our routine monitoring.
We've now resolved the incident. Thanks for your patience.
Most Apps are back by now, but some are still hanging. We are still monitoring the situation.
Several Apps are still down due to unexpected issues created by routine maintenance. We are working as fast as we can on restoring service for all Apps. The majority should already be back online. We will post more updates as we have them.
We are seeing partial problems on all services, including EU and US, deployment and web delivery.
We’ll find your subscription and send you a link to login to manage your preferences.
We’ve found your existing subscription and have emailed you a secure link to manage your preferences.
We’ll use your email to save your preferences so you can update them later.
Subscribe to other services using the bell icon on the subscribe button on the status page.
You’ll no long receive any status updates from fortrabbit, are you sure?
{{ error }}
We’ll no longer send you any status updates about fortrabbit.