Apps and dashboard not responding

Resolved
Updated

Post Mortem 2022-07-04

On June 4. 2022, all Pro Apps in the EU region were affected by a disruption to outgoing traffic. The time period was 10:00 - 10:30 UTC. Some clients reported having to restart their Apps for a full recovery.

The affected server node, responsible for routing outgoing traffic from Pro Apps, is scheduled for further maintenance as reported to us by AWS. We will post a separate maintenance window outside of peak hours for that in the next two weeks. We are looking into a solution to incorporate faster fail-over if the outgoing-traffic node should ever fail like this in the future.

Resolved

The issue lasted 11:50 - 12:30 CEST (UTC+2) and affected only the EU region. More than 85 different apps for several clients were affected but the issue was not system-wide. Unfortunately we do not know exactly what happened but the symptoms were increased latency in HTTP responses followed by timeouts.

Updated

It looks like a few server nodes were affected for both Pro and Uni apps, spanning two different availability zones in AWS-Ireland. AWS has so far not reported anything and we think the issue was "above us" in the network somehow.

It's unclear what caused several apps to stop responding. None of the affected infrastructure was flagged as down or unreachable internally or by AWS. The fortrabbit dashboard was affected because it's hosted as any other app on our platform.

Investigating

A temporary network error caused several apps and also our dashboard to go down for a few minutes around 12:30 CEST. We are looking into what happened.

Began at:

Affected components
  • EU
    • Pro Apps
      • Memcache
      • Object Storage
      • Web delivery
      • Worker
    • Universal Apps
    • MySQL
    • Deployment
    • Backups
  • Dashboard