It has been a network outage in the AWS network between two nodes of one of our storage clusters of about 60 minutes today — 2014-11-25. Around 15% of our client nodes had been affected.
The outage affected parts of the network which should never be down (i.e. 169.254/16 network). This lead to a failure in our automated fail-over: both storage nodes assumed to be master. This in turn lead fooled our monitoring, since both storage nodes were still available - and master. We've now implemented a patch to compensate for this particular failure and will roll it out asap.
Now we are sure everything is really resolved. we will write a post mortem soon.
everything seems to be UP again — we are still checking some things.
we see some on/off behaviour. the issue is not yet fully identified.
we are investigating current issues, probably related to the storage layer. we'll keep you updated here.
We’ll find your subscription and send you a link to login to manage your preferences.
We’ve found your existing subscription and have emailed you a secure link to manage your preferences.
We’ll use your email to save your preferences so you can update them later.
Subscribe to other services using the bell icon on the subscribe button on the status page.
You’ll no long receive any status updates from fortrabbit, are you sure?
{{ error }}
We’ll no longer send you any status updates about fortrabbit.