Storage issues

It has been a network outage in the AWS network between two nodes of one of our storage clusters of about 60 minutes today — 2014-11-25. Around 15% of our client nodes had been affected.

The outage affected parts of the network which should never be down (i.e. 169.254/16 network). This lead to a failure in our automated fail-over: both storage nodes assumed to be master. This in turn lead fooled our monitoring, since both storage nodes were still available - and master. We've now implemented a patch to compensate for this particular failure and will roll it out asap.

22:13 UTC - 25 November 2014

Now we are sure everything is really resolved. we will write a post mortem soon.

13:55 UTC - 25 November 2014

everything seems to be UP again — we are still checking some things.

13:50 UTC - 25 November 2014

we see some on/off behaviour. the issue is not yet fully identified.

13:27 UTC - 25 November 2014

we are investigating current issues, probably related to the storage layer. we'll keep you updated here.

13:23 UTC - 25 November 2014

Find Your Subscription

Subscribe to Status Updates