In this blog, Artem Kovalchuk, Vincent Alessi, and Vyacheslav Trushkov introduce how to turn common blockchain events into anomaly detection findings through time series analysis. Both detection bots introduced here were developed by Forta community bot detection developers Artem and Vyacheslav during the recent Forta contest. The intent of this blog is to expand the bot developer’s community toolbox to create more valuable Forta detection bots that help to secure web3.
Forta’s mission is to monitor all transactions and protect all assets in Web3. It is a community driven protocol in which a distributed set of scan nodes pass blocks and transactions to community developed detection bots that trigger findings that highlight operational and security relevant events on the blockchain. There are over 600 Forta detection bot developers and anyone can author a detection bot, get started here!
Detection bots can be heuristic based and this often works just fine. In order, for instance, to monitor whether an account is funded by a privacy protocol, like Tornado Cash and then interacts with your protocol, you monitor funding transactions, maintain state of those funded accounts, and emit a finding when that account interacts with a protocol (Tornado Cash Funded Account Interaction).
The approach, however, does not work well when the event itself is frequent or constantly present, such as a transaction’s gas price? What is a good threshold to choose? Is a priority fee of 2 a good value? 5? 50? Further, what if it changes over time and depends on the underlying protocol’s usage pattern? This may be just the nature of the protocol (some competitive aspect, like a hot NFT project where people frequently are eager to see their transaction mined quickly, so they are willing to pay higher gas to get picked up by the miners). In those instances, we see frequent spikes in gas prices that could even be seasonal (e.g. NFTs get minted every Monday), so we would expect spikes on Mondays as part of the normal course of business. If, however, we have a protocol where such seasonal patterns do not exist, gas price spikes would be rare and a spike in the future may indicate something truly unusual, such as an attack.
The recent Ronin attack illustrates this: We do see many spikes in priority fees over time, but the attacker transaction clearly sticks out. A hard threshold approach will likely raise a lot of unnecessary findings. This is where time series anomaly detection comes in. A time series buckets numeric values (e.g. counts or absolute values) over time
A time series detection approach learns the historical variability as well as seasonality that may be represented in the data (e.g. gas priority fees may be higher on weekdays than on weekends). It learns a band or range and alerts when a value breaks out of the expected range:
In this post we take a look at two bots applying time series analysis detection approach in detail: the High Gas Detection Bot developed in JavaScript and the Price Anomaly Detection Bot in Python.
Artem developed a high gas detection bot in JavaScript to identify the Ronin attack using a time series analysis approach to identify the attack transaction in question with as little noise as possible.
First, let’s review what gas data value is going to be analyzed by the bot. After Ethereum’s London Hard Fork, EIP-1559 introduced a base gas fee that is set by the network taking into account network congestion. A user can pay an additional priority fee on top of the base fee to get their transactions prioritized. This priority fee will be used in the bot.
For this bot – since it was written in JavaScript which lacks the machine learning and data science packages, a statistical time series analysis approach was chosen. Moving average can be utilized, but they have a difficult time incorporating cyclical patterns that may exist in the data, so anomalies may not be properly identified.
In order to solve for this problem, time series are decomposed into several components that capture trend, seasonality and residue.
JavaScript library zodiac-ts implements this approach. The library supports Holt Winters method, which is based on a model consisting of three aspects: a typical value (average), a slope (trend) over time, and a cyclical repeating pattern (seasonality).
A noticeable feature of this library is the automatic selection of the most optimal model parameters, at which the deviation of the prediction from the training data has the smallest error. This allows you to adapt the model depending on the data dynamically, while the agent is running. Each protocol will have its own parameters that best fit its time series.
To normalize the data, it was decided to use a bucket size of one hour, taking the maximum value per bucket for a 7 day period. However, transactions do not have a clear regularity in time and can often have gaps between the values of the bucket. For example, some not very popular smart contracts may have a transaction once every few hours, or even more. In such cases, we can interpolate them, that is, guess the values yourself. For this purpose, a simple library range-interpolator was used. In order to reduce the noise in the time series, a Kalman filter was applied.
Once the time series model was created, it allowed us to predict the range of expected values in the next time step. If they deviate beyond a given threshold, an anomaly has been identified and an alert is raised.
Testing the bot on the Ronin attack (block 14342885
to 14442835
) yields the appropriate alert.
Vyacheslav developed a detection bot that detects unusual price changes, which were observed in the Saddle Finance and Inverse finance attack. This bot was developed in Python, which has a rich set of libraries available to conduct time series analysis.
The Price Anomaly detection bot was created as a universal tool for detecting anomalies in price changes. The goal was to cover as many price feeds as possible in as many networks as possible.
Similarly to the high gas detection bot, setting hard thresholds is not suitable here as prices behave quite differently depending on the asset: for example, some tokens with little liquidity are more volatile, while others have a clear downward or upward trend; stablecoins should maintain a ratio of 1:1 to the underlying. Therefore, adaptive threshold for each separate price pair need to be maintained. Time series approach seems to be a suitable approach to identify statistically significant price movements that may be indicative of an attack.
Modeling several price feeds in a time series presents us with memory and computational challenges to maintain a series of up-to-date time series models. To solve this problem, the bot uses an asynchronous database provided by SQLAlchemy library, an open-source SQL toolkit with ORM technology, along with asyncio extension, which is fast, stable.
Price manipulation, as seen in the Saddle Finance and Inverse Finance attack, happen on chain manipulating certain liquidity pools that protocols may rely on as an oracle to obtain their price data.
{ "anonymous": false, "inputs": [ { "indexed": true, "internalType": "address", "name": "sender", "type": "address" }, { "indexed": true, "internalType": "address", "name": "recipient", "type": "address" }, { "indexed": false, "internalType": "int256", "name": "amount0", "type": "int256" }, { "indexed": false, "internalType": "int256", "name": "amount1", "type": "int256" }, { "indexed": false, "internalType": "uint160", "name": "sqrtPriceX96", "type": "uint160" }, { "indexed": false, "internalType": "uint128", "name": "liquidity", "type": "uint128" }, { "indexed": false, "internalType": "int24", "name": "tick", "type": "int24" } ], "name": "Swap", "type": "event" }
Amount0
and amount1
values are the amount of token0
exchanged for the amount of token1
. Therefore, by dividing one by the second, a price that we can maintain in the aforementioned database as events are presented to the bot.
We again turn to time series analysis to understand whether an anomalous movement has occurred. Prophet, developed by Meta, is the library used in this bot. It does an excellent job with such parameters as seasonality and trend. Running the bot on blocks 14502359
to 14507132
(the moment when Inverse Finance attack happen) shows the following time series for the price data of the WETH-INV pool:
The black dots on this chart are the actual swap price (grouped by minute with the mean() function), the blue line shows the predictions, and the light blue area shows the expected range (the width is configurable as per configuration values to the Prophet library). Towards the right, you continue to see a range without any dots, which represents Prophet’s predictions, which the bot uses to raise an alert if price ranges outside of those predictions.
Again, the bot has been validated on a range of attacks and represents a great addition to identify price manipulation/oracle attacks to the hundreds of Forta bots that help to protect Web3 today.
Detection bots allow for a broad range of developer freedom. In this blog post, we illustrated how a challenging data problem can be solved with time series anomaly detection to identify the highlight security relevant events through Forta alerts. If you are developing detection bots on Forta and are dealing with time series data, consider implementing one of these approaches. It will help to better separate signal from noise and help to raise more useful alerts. Get started here and join the Forta Discord to discuss detection bot development and detection bot development ideas.