Slack's server infrastructure scales up every day to handle volume in North America by increasing the size of the server pool available to handle requests. Some of these servers did not successfully register with our load balancing infrastructure during this process of scaling up, and this ultimately led to a decline in the health of the server pool over time.
We resolved the customer-facing impact of this problem by provisioning and registering a new fleet of servers with our load balancer. Once this was done, we refreshed the list of servers registered with the load balancer to ensure that unhealthy servers were no longer receiving traffic.
We are conducting our standard postmortem process on this incident to ensure that we fully understand what went wrong and what steps we can take to prevent a recurrence. The output of this process is a Root Cause Analysis (RCA) report. Customers can contact us at email@example.com to request a copy of this RCA, which we will send once the postmortem process is complete.
8:31 AM PST
We have fully restored service and everyone should be able to connect to Slack now. If you have any further trouble, please let us know at firstname.lastname@example.org.
We're very sorry for the disruption. We appreciate your patience as we worked to get everyone back online.
5:33 PM PST
Users are unable to connect to Slack. We are investigating and will provide an update shortly.
5:02 PM PST
Users have reported general performance issues such message sending failures and timeouts. We’re working to get things back to normal as quickly as possible and will provide an update shortly.
4:53 PM PST