All systems are go

Don't agree with this? Please let us know

  • US

    United States data center location

    • Pro Apps

      US

      • Web delivery

        US Pro

      • Object Storage

        US Pro

      • Memcache

        US Pro

      • Worker

        US Pro

    • Universal Apps

      US web delivery

    • MySQL

      US Universal and Pro Apps

    • Deployment

      US Git, SSH, SFTP for Pro and Universal

  • EU

    European data center location

    • Pro Apps

      EU

      • Memcache

        EU Pro

      • Object Storage

        EU Pro

      • Web delivery

        EU Pro

      • Worker

        EU Pro

    • Universal Apps

      EU web delivery

    • MySQL

      EU Universal and Pro Apps

    • Deployment

      Git, SSH, SFTP

  • Dashboard
Previous Incidents

[Complete] Hardware replacement to affect some Universal Apps in EU

Began: Ended: Duration:
  • Universal Apps

We need to replace a Node of Universal Apps, since the underlying hardware is getting retired. We plan a two hours time frame. We expect a very short downtime of a couple of minutes for around 100 affected Apps. Some Apps might have a longer downtime.

The scheduled maintenance is now underway. We'll keep you updated on our progress.

The maintenance is now complete. Thanks for your patience.

[Resolved] MySQL permission issues

Began: Ended: Duration:
  • MySQL
  • MySQL

Multiple Apps are experiencing issues with accessing their databases because of permission issues. We are investigating. Updates to follow shortly.

We have identified the likely cause of the issues and are preparing a hotfix.

We have deployed the hotfix and the issues should now be resolved. Please let us know if you have any further problems.

We've now resolved the incident. Thanks for your patience.

[Resolved] Web delivery problems for some Apps

Began: Ended: Duration:
  • Universal Apps

We are currently looking into problems with loading websites.

We've now resolved the incident. Thanks for your patience.

[Resolved] General service issues

Began: Ended: Duration:
  • Worker
  • Deployment
  • Web delivery
  • Deployment
  • Worker
  • US
  • Object Storage
  • EU
  • MySQL
  • Pro Apps
  • MySQL
  • Pro Apps
  • Memcache
  • Universal Apps
  • Memcache
  • Universal Apps
  • Object Storage
  • Web delivery

We are seeing partial problems on all services, including EU and US, deployment and web delivery.

Several Apps are still down due to unexpected issues created by routine maintenance. We are working as fast as we can on restoring service for all Apps. The majority should already be back online. We will post more updates as we have them.

Most Apps are back by now, but some are still hanging. We are still monitoring the situation.

We've now resolved the incident. Thanks for your patience.

## Summary

On Thursday 1. Oct 2020, starting from 14:15:00 UTC and until 23:00:00 UTC our customers were affected by downtime on our platform. The event was triggered by a system wide update of a central component. This update caused interruption in web delivery and code deployments (SSH & SFTP).

Rest assured, the responsible party is painfully aware of the distress and inconvenience this type of event causes our clients. We will not fire this person just yet, and hopefully as a result of internal discussions, we will improve our practices and avoid scenarios like this in the hereafter.

## Impact

More than 28% of all Universal Apps were *potentially* affected, and we had around 45-65 support cases during the time of the incident. The real impact was somewhere between 2% - 15% of all Apps. For individual Apps, the outage ranged in duration from 25 minutes to six hours. Less than a handful of Pro Apps were affected.

## Mitigation

We booted new hosts and transferred affected Universal Apps to the new, unaffected hosts. We believe that at the core of the issue, was an unresponsive filesystem causing other parts of the system to fail during the system wide update.

## Follow-up

A greater emphasis on gradual deployment of system-wide updates is the main take away from this incident. We could have easily avoided downtime for the majority of the Apps if the deployment had been done in incremental stages. We may introduce more rigorous maintenance of the filesystem used for Universal Apps. The kernel was in fact reporting this issue prior and during the incident (dmesg). We may incorporate monitoring of these specific messages in our routine monitoring.

No further notices from the past 30 days.