Diversity of Data as a Strength

Article by Forta Network Oct. 12, 2023

Forta is the largest network of security intel in Web3. The decentralized Forta Network leverages machine learning to detect exploits, scams and other threats. 


Diversity in security isn’t just a lofty ideal, it’s a tested and proven method for robust threat detection. A recent data-driven exploration of the Forta Network with this in mind has revealed intriguing results. When compared against centralized approaches, Forta’s diverse threat detection approach achieved a superior F1 score of 0.57, with its Snorkel-based weak labeling model outshining centralized threat detection approaches.

Vitalik Buterin has long championed diversity in the context of node software. Ethereum’s multi-client support, spurred by his advocacy for diversity, has made it more resilient against specific types of attacks. Drawing a parallel to threat detection, the principle is clear: a room filled with experts from diverse backgrounds stands a far better chance at identifying potential threats.

Enter Forta: An innovative protocol embodying this very principle. Unlike traditional centralized detection systems, Forta’s decentralized and permissionless structure invites all to launch detection bots with each bot offering its unique lens on threats, ensuring a network rich in detection layers and ultimately stronger in defenses.

Consider phishing, a prevalent scam. Within Forta, seven distinct bots scan on-chain activities in real-time, each tailored to detect this threat. These bots, spanning from reputable security firms to solo community contributors, incorporate varying techniques. While some harness cutting-edge machine learning, others employ heuristic models; certain bots analyze all on-chain transactions, while others specialize in detecting phishing pertaining to specific tokens. 

Yet, the overarching question looms: Does a decentralized threat detection system outpace its centralized counterpart? Refer to empirical data, examining metrics like precision/recall and F1 scores, to support the hypothesis.

Create benchmark by juxtaposing Forta’s alert data with a dataset curated by Etherscan. Representing the centralized paradigm are the individual phishing bots from Forta. For a decentralized comparison, consider these three collective methodologies:

A simple UNION condition.
A voting mechanism.
A weak labeling model, deriving insights from bot correlations (Note: this operates entirely unsupervised).

The data, illustrated in a heatmap below (each cell reflecting the log-scale count of addresses from intersecting bot/alertIDs), showcases the blend of overlapping and unique detections, underscoring the network’s diverse detection strategies.

The precision/recall graph further corroborates this hypothesis. Decentralized methods (depicted in red) gravitate towards the coveted top-right quadrant, outshining the centralized tactics denoted by individual bots (in blue). Though the UNION strategy enhanced recall, it compromised precision. The Snorkel-based weak labeling model, however, took the lead with the highest F1 score of 0.57, surpassing both the UNION method (0.56) and Blocksec’s standalone bot (0.55).

The presented analysis affirms Forta’s decentralized prowess. A myriad of bots, each with its distinctive lens, collaboratively weave a robust defense that challenges traditional centralized models.

In light of these findings, join the Forta community and start developing on Forta. Even if you believe that your bot might be similar to existing ones, remember: It’s the diverse nuances that make all the difference. By contributing to Forta, you are not just deploying a bot; you’re fortifying a revolution in threat detection.

Come build, make Forta’s threat detection even stronger and Web3 safer.