{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 38 minutes

Only a small number of Apps were affected. They have not been totally down as the second Node (third, fourth …) was still responding. But indeed, that's not good. One failing Node should not cause downtime for Professional Apps on a production plan, as the load balancer should not use a Node in error state. Problem here was, that the health check of the frontend reported to be OK, while in fact it wasn't. We think that an internal DNS lookup is the related issue. We are looking into that.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 18 minutes

Everything seems to be 200 OK now again.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 7 minutes

We think it is resolved now but are still monitoring it.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}

We are currently see 5xx errors on some Pro Apps, not all, in EU. Ongoing since 14:10 CET.

Began at: