Predicting Phishing Scams: A Kaggle Competition

Article by Forta Network Sep. 8, 2022

The Web3 ecosystem is plagued by scams and exploits. In 2021 alone, billions of dollars were lost, further eroding trust in Web3. As an industry, it is time to do better. Forta stands at the center of securing web3 through real-time monitoring of on-chain transactions for security threats. To help combat one of the worst offenders, ice phishing attacks, Forta is launching a competition with Kaggle.

So far, 1000+ detection bots have been deployed by Forta’s amazing community over the last year. Because of the community’s large and decentralized nature, diverse perspectives on approaches to identify attack behavior have formed. Today, this community composed of security researchers and developers exercise their expertise building detection bots that monitor the blockchain. While this segment is an essential piece to detect attack behavior, it is also important to invite the greater machine learning and data science community to secure Web3. 

Today, machine learning is already an essential part of securing Web2, its reach spanning from malware to email phishing protection all the way to endpoint threat detection. However, machine learning is not a replacement for expert knowledge, rather a complementary piece. That said, it has the advantage of higher recall and precision, less bias, and ability to utilize more information than any individual to make fine-grain decisions.

Due to the proprietary nature of the underlying data, machine learning security models in Web2 are mostly built in house by the large security companies. This can stifle innovation as it restricts the data science community from contributing, limiting open source research to toy datasets and heavily anonymized/sampled data. Fortunately, this is not the case in Web3 where all important data is publicly available, opening the door for amazing research opportunities. 

Today, Forta is inviting the broader data science community to participate in a community Kaggle competition to apply innovative data science and machine learning techniques to identify scams and phishing attacks that plague Web3. The dataset used for this competition is derived from public blockchain data, but curated to expose most threat relevant information allowing data scientists to focus on innovating as opposed to spending time on data sourcing and curation.

This competition asks the Kaggle community to build a machine learning model to identify phishing attacks based on on-chain transaction data. In the context of a blockchain ice phishing attack, the attacker will trick users into disclosing their private key to the attacker or signing a special transaction allowing an attacker to move the user’s digital assets. The data set consists of a set of attacker accounts known to have engaged in phishing attacks along with a sample of benign accounts. The task is to predict whether an unknown account is engaged in a phishing attack.

The provided dataset is extremely rich. In addition to time-slicing the dataset into a train/test set with no overlap, the data contains:

all the transactions from the accounts
all the events, traces, and logs that were emitted by those transactions
first-degree neighbors of all the accounts
additional statistics for the account’s first-degree neighbors

The competition can be found here, and will be open until November 30th 2022. Winners will receive a unique Forta POAP along with being featured in an upcoming Forta blog. For discussion around the competition, please join the discussion board or the machine learning channel on Forta’s Discord.