All global XRP Ledger nodes have a full Transaction Queue
Incident Report for XRPL Labs, XUMM
Resolved
While the fee escalation condition on the XRP Ledger *has not been* resolved, for XUMM end users problems are no longer noticeable as XUMM forces the network fees to be paid to be slightly elevated, causing transactions to go through.

Where the normal transaction fee used to be 12 drops (0.000012 XRP), this is currently forced to 150 drops (still only 0.00015 XRP).
Posted Dec 02, 2021 - 20:54 CET
Update
XRPL Fee Escalation Analysis, by Richard Holland, XRPL-Labs, Dec 2, 2021

Executive Summary

The fee escalation code is working as intended, but:
1. Its design is broken, and
2. Deadlocks in `rippled` are probably causing misactivation of emergency fee measures in validators.


The Simple Test

"rippled" chooses an "open ledger fee" (the minimum fee to place a transaction into the open ledger) based on a simple test: https://github.com/ripple/rippled/blob/fbedfb25aefc609aff2c6090b19c419c224a8ab2/src/ripple/app/misc/impl/TxQ.cpp#L179-L184
- If the current ledger's transaction count is less than the expected transaction count then the base fee is used (`10` drops).
- Otherwise: the Fee Multiplier is used.

Expected Transaction Count

When updating the expected transaction count for the next ledger, a circuit breaker called a timeLeap detector: https://github.com/ripple/rippled/blob/fbedfb25aefc609aff2c6090b19c419c224a8ab2/src/ripple/app/misc/impl/TxQ.cpp#L108-L120 is used.

The timeLeap detector is hard coded: https://github.com/ripple/rippled/blob/fbedfb25aefc609aff2c6090b19c419c224a8ab2/src/ripple/app/consensus/RCLConsensus.cpp#L779 to activate whenever _roundTime_ exceeds 5 seconds.

The "roundTime": https://github.com/ripple/rippled/blob/fbedfb25aefc609aff2c6090b19c419c224a8ab2/src/ripple/consensus/ConsensusTypes.h#L232-L233 is computed by measuring: https://github.com/ripple/rippled/blob/f0c237e0018b5e4637dc66c3bb3fc705b33d12f8/src/ripple/consensus/Consensus.h#L1266 the time spent in the establish phase. This is a per-node calculation indepedent of consensus.

When this breaker triggers on a node its expected transactions count immediately halves: https://github.com/ripple/rippled/blob/fbedfb25aefc609aff2c6090b19c419c224a8ab2/src/ripple/app/misc/impl/TxQ.cpp#L115-L116. It does this every ledger where the timeLeap breaker activated until it hits a minimum of 5 expected transactions per ledger.

Fee Multiplier

The fee escalation multiplier: https://github.com/ripple/rippled/blob/fbedfb25aefc609aff2c6090b19c419c224a8ab2/src/ripple/app/misc/impl/TxQ.cpp#L146-L161 is computed by taking the median fee paid by all fee-claiming transactions in the last closed ledger.

The way it's computed is relatively unimportant for this analysis, the minimum value is `12800` which corresponds to `5000` drops and, at least for now, the value hovers around the minimum value.

Unintended Breaker Flipping
This analysis proposes that a majority of `dUNL` validators are triggering the timeLeap breaker on their nodes, repeatedly, and unintentionally, due to a known deadlocking bug. Rippled (especially version 1.7.3) contains a series of deadlock conditions, the spontaenous occurance of which have significantly increased over the past 3 months.

The average roundTime is around 3000 ms on a normally operating node. However a deadlock may occur at any time for 2 or more seconds, breaching the 5s threshold, triggering the breaker, and halving the number of expected transactions. This in turn immediately causes the simple test: https://github.com/ripple/rippled/blob/fbedfb25aefc609aff2c6090b19c419c224a8ab2/src/ripple/app/misc/impl/TxQ.cpp#L179-L184 to take the fee escalation multiplier pathway.

The saturated transaction queue (and job queue) may also contribute to slow downs in nodes crossing the 5 second threshold.

Interim recommendations

- Rippled 1.8.1 has improved and parallelised garbage collection. This may improve the deadlocking situtation and should be installed on all validators.
- Validators should add the following to their config:

[transaction_queue]
ledgers_in_queue=2048
minimum_txn_in_ledger = 64
target_txn_in_ledger = 256

- Validators which are unmanned or poorly manned should be removed from the `dUNL`.
- The hardcoded 5s threshold should be increased and changed to a configurable value.
- Fee escalation code should be scrapped and rewritten to a simple "highest fee gets the spot" model.
Posted Dec 02, 2021 - 15:51 CET
Update
We just released a hotfix, causing xrplcluster.com to override the advertised "base fee" (for transactions) of the XRP Ledger. This causes XRP Ledger clients (wallet, platforms) respecting the "base fee" to believe the network base fee has increased.

XUMM is designed to pick up on this. This causes XUMM to now sign transactions with a higher fee (still only as low as ~0.00015 XRP, but higher than the normal 0.000012 XRP).

Transactions with a higher fee bypass the transaction queue, currently filled up on all XRP Ledger nodes.

We have just confirmed that most transactions in XUMM are getting through again. While this does not address the underlying issue on the XRP Ledger, we hope this recovers usability for XUMM users.
Posted Dec 02, 2021 - 00:50 CET
Monitoring
XRP Ledger Validator operators are currently joining forces to update their validator configuration to allow validators to have a higher target amount of transactions per closed ledger. This should clean the transaction queue faster, as the network transaction queue is currently growing at the same rate it is being cleared.

Also, a problem has been identified in the source code of the software powering the XRP Ledger (nodes, validators). Weird emergent behavior in the modules responsible for ledger size & fee escalation are keeping this situation from resolving itself due to a feedback loop between the two mechanisms.

Finally, XUMM and other clients limit the Fee Escalation they will work with, to prevent users from overpaying transaction fees. We will add a Fee Selection option in the upcoming 2.2.5 release of XUMM, for those who prefer to manually override and pay a higher fee anyway to get a transaction through even if transaction queues are full.

Addressing the three items above will result in:
1. Validators emptying the queue faster, this will hopefully clear the queue and allow the ledger to resume normal operation
2. Looking to update the source code of 'rippled', the source code powering XRP Ledger nodes. See if fee escalation & transaction queue processing / cleaning can be handled smarter
3. By adding fee selection to XUMM, even if this happens again, users could opt in to paying a higher network fee to get their transaction through
Posted Dec 01, 2021 - 20:33 CET
Update
Please note: these problems aren't XRPLCluster.com / XUMM specific: these problems are XRP Ledger wide, and all XRP Ledger wallets, apps and nodes will encounter the same problems.
Posted Dec 01, 2021 - 17:34 CET
Update
The problems seem to be globally and XRP Ledger wide: the transaction queue is being reported by all nodes as unusually full (maxed out @ the max. value configured at nodes). This seems to be caused by elevated network fees on the XRP Ledger, causing all transactions with the default 12 drop fees to be queued.

The reason for the elevated fees is still being investigated.
Posted Dec 01, 2021 - 17:08 CET
Identified
Due to a sudden spike in load on the XRP Ledger, all XRP Ledger nodes are currently having trouble staying in sync. Most nodes "deadlock" at moments. This affects all nodes, Full History, non-Full History & submission nodes. Also separate nodes outside of the cluster are affected as this is caused by overall network load.
Posted Dec 01, 2021 - 15:45 CET
This incident affected: XRP Ledger - Public nodes (xrplcluster.com (XRPL Mainnet)).