Xaman (Xumm) backend degraded performance
Incident Report for XRPL Labs, XUMM
Postmortem

Due to extreme load to our platform, several backend services slowed down resulting in cascading failures.

The cause was an extremely high number of unique visitors, using a project/platform wrongly connecting multiple times to our backend per client. The amount of clients was not the kind of traffic that would organically ever hit our backends.

Normally our rate limiting would have kicked in, but the project owners created quite a lot of API credentials, with which they created many payloads, distributed to many users, who then connected multiple times each. Public payload websocket connections weren’t rate limited.

The following measures have been taken (and would have prevented this outage):

  1. Optimized a Redis Query to prevent exponential load on caching backend
  2. Strict enforcement of API rate limits, not only on calls but also payload creation
  3. Limited payload creation (exp. curve from dev account creation date) unless whitelisted
  4. Limited dev account App creation to 3 per week
  5. Limited payload websocket subscribers to 4 simult

After implementing these limits, backend infrastructure was restarted to force getting rid of pending connections & queries.

Project owners who create a relatively high amount of legitimate payloads may be rate limited on payload creation: if this is the case, please reach out to us so we can apply suitable payload creation limits.

Posted Dec 26, 2023 - 03:26 CET

Resolved
This incident has been resolved, the faulty backend component has been replaced. Everything should be fully functional again. We'll monitor closely.
Posted Dec 25, 2023 - 17:18 CET
Identified
Due to a failure in a component of the Xaman (Xumm) backend, loading rates, preflight information & payloads may fail.

This results in not being able to move to the next step when trying to transact with Xumm.

Public XRP Ledger infrastructure and nodes are fine.

This problem is being resolved.
Posted Dec 25, 2023 - 17:07 CET
This incident affected: XUMM API / SDK (XUMM Developer API/SDK).