JSON POST RPC endpoint failure (error 502)
Incident Report for XRPL Labs, XUMM
Postmortem

After a physical hardware migration of some backend servers for XRPLCluster.com, IP address changes were not committed for the JSON POST RPC config, after being successfully changed for WebSocket connections.

The mistake wasn’t noticed because the cluster is configured to be redundant on several levels. So all traffic that didn't hit the changed IPs was redirected to other, still available backends.

When the infra serving all fallback requests went down formaintenance, the uncommitted IP change resulted in zero failover capacity.

JSON POST RPC upstream health has now obviously been added to monitoring. We already meticulously monitored WebSocket upstream health, on connectivity and even data sanity.

Posted Nov 30, 2023 - 11:51 CET

Resolved
NOTE! This problem DID NOT affect Xumm.

Those consuming XRPLCluster.com resources through JSON POST RPC calls instead of the (preferred) WebSocket method, could not obtain data from XRPLCluster.com. Instead an Error 502 would be returned.

WebSocket traffic was unaffected.

The amount of third party clients affected by this outage is limited, providing most clients use WebSockets for communcation to the XRP Ledger through XRPLCluster.com.

The only known larger vendor affected is Ledger, resulting in users being unable to use their Ledger Nano (etc.) effectively for the XRP Ledger.
Posted Nov 29, 2023 - 04:00 CET