We have had some problems with the Professional Stack in US over the weekend and on Monday. It started on Saturday the 23th of Feb noon and took on until Monday, 26th of Feb in the morning (both UTC). There have been two related issues, both affected a small number of Pro Apps (not all) in US.
The issues where related to single Nodes, with no space left on the ephemeral storage. Apps that where trying to write on the disk, might have seen some 5xx errors for the requests on the affected Node.
The underlying problem here is, that we can't enforce storage quotas App level, so those are soft limits. Apps are allowed to have 2 GB of ephemeral storage by spec. But some Apps where exhausting those limits affecting other Apps on the same Node.
Unfortunately on the weekend we have first failed to see the issue right away, so identifying it took us a little bit longer.
We are going to improve our internal alerting so that we can act more quickly before something happens in the future.
Posted about 1 month ago. Feb 23, 2019 - 10:00 UTC