On Sunday 4th day of November 2018 we executed a scheduled maintenance to upgrade some minor PHP versions. The update started at 19:00 UTC and finished at 23:00 UTC.

During this all the Apps were updated successfully (Professional & Universal, EU & US). But some leftover processes on the deploy Node in EU kept on running in the background. Unfortunately it didn't complete overnight.

Business as usual started Monday morning. Clients started to deploy. The combination of normal operations and the still running background processes caused a high load on that Node. We noticed that, but decided to keep it running. At 12:00 UTC the service failed.

The file system containing all Git repositories had become read-only, because during the PHP updates the file system was filled up and was not letting us resize it. It took us some time to identify the cause, a bug in the underlying file system encryption.

We realised we couldn't fix the file system issue in time, so we started preparing an alternate fix. At 15:40 UTC we created a new file system and started re-initialising the Git repos there, which took some time but was finally finished by 20:16 UTC.

YOUR MINOR ACTION might be REQUIRED for Git deployment

As consequence of the alternate fix applied, all Git repo contents where reset — think the repo is there, but completely empty. To continue working with Git you need to reconnect your Git repo with the remote one and push your local contents up. The first deployment will take a little longer as a fresh Composer install will be triggered. When working in a team: Make sure to push only the latest commit up.

OUR LEARNINGS

We now further nailed down the causes and are working on platform updates to prevent those issues in the future, of course. We learned a LOT with that in a painful way.

It was one of the biggest issues we have ever faced here. Luckily it was only for deployment and only one region, but it was for a long time. We thrive to provide max uptime. I know you hate downtime, please be assured we do too.

Thank you for your understanding and support.

The deployment issues are finally fully resolved for all Apps. All in EU Apps can deploy with Git, SSH and SFTP again.

For all Git deployments: You need to set the upstream and remote branch accordingly: git push {{remote-name}} master -u. Replace remote-name with whatever you have called the fortrabbit remote, usually that's "fortrabbit" or "origin".

Universal Apps: If you are just using SSH or SFTP, it will just work now. If you are actively using Git, the next deployment will take a little longer, as Composer dependencies will be re-downloaded. If you have been using Git but neglected it, take care, maybe the contents of the old Git repo have been re-deployed.

For Professional Apps: Git Git deploy will take a little longer next time, as Composer will be re-initalized. Please deploy via Git before using SSH remote execution like so: git push {{remote-name}} master -u.

A full post mortem on the incident will follow soon here. Sorry one more time for the inconvenience.

A fix has been implemented and we are monitoring the results.

We are continuing to work on a fix for this issue.

Degraded from Major to Partial, half of the Apps should already be working now.

We are confident to have identified the issue for real and fixed it for now. SSH and SFTP recovery in process. Deployment for Apps is coming back, one by one, but it can take another while.

YEP! We are still on it.

We believe the current incident is related to an issue yesterday night / earlier this morning after a scheduled maintenance.

We are currently investigating ANOTHER issue related to deployment in EU. It currently affects all Apps. Git, SSH and SFTP are not working.

Began at: