Testing and Debugging Detection Bots

Detection bots and scan nodes are core to the functionality of the Forta Network. Scan nodes forward transactions, blocks, and alerts to community developed detection bots containing logic to identify malicious behavior on-chain in real time.

Proper functioning of detection bots is crucial. The Forta Network often represents the last line of defense and a missed attack due to a bug in one's bot could lead to millions of dollars being lost.

This blog post describes best practices and techniques to test and debug detection bots. First, various local testing strategies (functional and performance) are discussed along with preparing your bot to provide you with the information you need to ensure it is functioning properly.

Functional Testing

Testing not only allows one to assess proper functioning of a detection bot, but also explore edge cases that may be rare on chain. Testing is helpful when developing the bot initially but the investment will also pay dividends during the later stages of the bot lifecycle. Inevitably, one will adjust their bots in response to false positives, (incorrectly alerted on a tx/blocks it is not designed to detect, aka noise) false negatives, (lack of an alert on the behavior it is designed to detect) and additional features that may become available.

Unit testing stands at the heart of the testing approach. Unit testing lends itself to a test-first development approach in which one authors the test first, has the system fail, and then proceeds to implement the bot. Once the tests pass, the work is done. The test first approach usually results in elegant and efficient code, which can be aggressively refactored when new functionality is added or performance optimizations are pursued.

A key aspect of unit tests is the utilization of mock objects, so failures of the tests are not due to external dependencies, but rather due to the bot code being tested. This provides a lot of control to create edge conditions. When mocking out the eth object, what happens when a function call results in an exception? To see a great example of unit/mock testing, check out these Python and Javascript files. Another example can be seen in the large transfer bot where the mock object returns a fixed balance for various addresses.

Time based testing should avoid datetime.now() but rather utilize block time stamps. This ensures your test stays consistent across time.

Certain functionality may differ from local execution to network execution. For instance, in a local test, a bot state may persist to a file whereas on the network, a persistence service may be used. Another example is to implement a bot with a long running task pattern. Here, a test against a block range may take a long time; for the local test, one may want the bot to be in synchronous mode. A best practice is to utilize the environment property NODE_ENV to differentiate the execution environment (production vs local node/ dev environment) to modify the bot behavior slightly. An example toggle like this can be found in the attack simulation bot.

It is also recommended to test against historical data/actual mined transactions. This allows one to assess whether the bot triggers on the transaction it should and doesn’t trigger on the transaction it shouldn’t. This can happen in two ways:

1. Via unit tests. This ensures that the bot will always function correctly against known true positives and true negatives. For instance, the large transfer bot unit test checks whether a finding is emitted for the Gera coin attack.

2. Utilizing the npm run tx tx_hash. This also helps to ensure that the mock objects are consistent with the real objects (e.g. mock object returns an int vs the real object returning a boolean value)

When testing against complex transactions and more comprehensive blockchain state, a unit test would require to mock out a large number of objects. In those cases, one can emulate the entire blockchain state with Ganache. Ganache allows you to create a local copy of the blockchain with the ability to publish your smart contracts and perform any transactions there. One will be able to execute transactions that interact with existing smart contracts without worrying about transaction costs. This can also be very useful if you need to brute-force some parameters when calling functions. The attack simulation bot provides an example of this functionality.

Last step for local testing is to run the bot against the live chain. It is best to redirect the output to a file to facilitate later analysis using npm run start > output.txt. This can give the bot developer an initial sense of precision (how noisy is it)/recall (is it able to identify the behavior it is designed to detect) for a bot. If the bot triggers many times during this test, it may be too noisy.

Performance Testing

Bots are required to process transactions quickly in order to keep up with the blockspeed of the chain. On Ethereum, a block is mined every 12 seconds and has an average of ~150 transactions contained in the block. In that case, if a bot processes a transaction slower than 80ms (12000ms/ 150tx) it would fall behind on Ethereum; a queue handles temporary spikes, but if the queue is permanently backed up, it eventually leads to transactions being dropped and therefore depriving the bot to alert on the tx. For faster chains, the time needs to be even quicker (e.g. BSC, it is about 40ms).

While the hardware requirements of a scan node can differ wildly, a local performance test can help identify and optimize some initial bottlenecks.

As such, it is recommended to add a unit test to ensure the processing time for a set of average transactions is below the calculated threshold. This should be done without mock objects as one wants to take into account the latency introduced by any external calls (if there are any). As bots usually don't process each transaction in-depth, but rather return quickly upon some quickly assessed condition (e.g. the to_address isn't in a set), it is recommended to send a mix transaction to the bot to assess its processing time and fail if the average exceeds a threshold. An example can be found in the large transfer bot.

Logging

Prior to deployment of the bot onto the network, it is recommended to enable logging and instrument your code with relevant logging statements at various granularity. It is recommended to emit INFO alerts on production bots, so those can be reviewed and assessed at a later point in time.

Logging is enabled in the dockerfile with the following property:

LABEL "network.forta.settings.agent-logs.enable"="true"

Further, enabling the debug option in the forta.config.json will add additional debug information to the logs. It is enabled through the property:

"debug": true

Logs can be accessed using the npm run log command or through the Forta app’s bot stats page (for example). Make it a habit to review logs after bot deployment to ensure the bot is running.

Logging isn’t just helpful in a production environment, but also when testing locally. Often, one may take an alert raised in production and run it against a bot in the local environment. Depending on the logic of one's code, it may be helpful to include particular identifiers in the log statement, so the log can be grepped for that value to understand what exactly is happening. This is particularly useful in case of multi-threaded bots. For instance, if a bot processes multiple addresses at the same time, a logging statement may want to output the address with every single log statement.

Error Handling and Bot Health Page

The bot health section of the bot health page provides additional insights into the health of the bot on the network. Latency, for instance, measures how quickly a bot processes a transaction/block. This provides one with real world latency metrics that may differ from latency metrics in the local test.

Further, the error count gives you information about when a bot has encountered an unrecoverable error. To expose such an error condition, the handle tx/block/alert functions should not catch and log the error, but rather throw it, so it can be counted on that page. For local testing, one may want to treat errors differently through the NODE_ENV toggle described above.

Hopefully this blog post outlined a few recommendations that will help to develop reliable detection bots that secure web3 and prevent the next hack. For any feedback on this post and additional suggestions, please join the Forta Discord.

Testing and Debugging Detection Bots

Functional Testing

Performance Testing

Logging

Error Handling and Bot Health Page

포르타(Forta) Firewall, 플룸(Plume) 메인넷 런칭의 보안을 책임지다

Forta Firewall now securing Plume’s mainnet launch

Forta x Celo: Bringing "Security by Default" to Celo Layer 2

Subscribe to Forta’s News

Functional Testing

Performance Testing

Logging

Error Handling and Bot Health Page

Related Articles

포르타(Forta) Firewall, 플룸(Plume) 메인넷 런칭의 보안을 책임지다

Forta Firewall now securing Plume’s mainnet launch

Forta x Celo: Bringing "Security by Default" to Celo Layer 2

Subscribe to Forta’s News