Emergency Metrics Volumes Resize - Algorithm Scaling degraded
Incident Report for Algorithmia
Resolved
Metrics subsystem is healthy, algorithm scaling is operating in normal ranges. Follow ups in progress to both improve early warning and problem detection in the metrics subsystem and to improve fall-back resiliency to metrics outages.
Posted Nov 06, 2020 - 14:13 PST
Monitoring
Metrics services restored internally. We are monitoring system behavior and investigating any secondary impacted service components.
Posted Nov 06, 2020 - 12:26 PST
Identified
The issue has been identified and a fix is being implemented.
Posted Nov 06, 2020 - 12:19 PST
Investigating
Algorithmia are working to recover from a failure in our internal metrics service which supports dynamic scaling of algorithm workers. This failure may prevent new algorithm workers from starting and high traffic algorithms may have insufficient capacity to service requests reliably. Customers may see timeouts or other errors call algorithm endpoints.
Posted Nov 06, 2020 - 12:15 PST
This incident affected: Algorithm API and Data API.