Today, the 8th of March 2023 at approximately 3:30pm AEST, we started seeing an elevated level of API errors with our public facing services.
All public services were partially impacted, with the quote api hit the hardest. Clients would have been receiving a high number of error responses from this time. Order creation was also impacted at this time with an elevated error response being returned and also resulted in a small number of partially created orders.
Upon investigation, there was an issue with a common downstream dependency, a shared distributed cache (redis). The poor performance of the cache due to a large and sustained spike in memory requirements, had knock on effects with multiple services which led to the incident.
Upon finding the root cause, the team immediately remedied the situation and all services returned to normal at approximately 4:06pm.
The system is being closely monitored and partially created orders have been rectified. A further investigation into future preventative measures is under way.
Posted Mar 08, 2023 - 19:15 AEDT
This incident affected: Web (Store Platform), Mobile Apps (Courier Apps), and Yello API (Public Orders API, Public ETA API, Public Quote API, Webhook Events, Public Store API).