Meet DARKBert, the dark web-trained AI tool that can combat cybersecurity threats


Large Language Models (LLMs) have gained massive popularity over the past few months, especially since the emergence of AI chatbots like ChatGPT. These AI-powered models can generate new content, such as text, images, audio, and more by studying an existing database and learning patterns to generate new and unique content. While these tools have been used to generate content using generative AI, researchers have now developed the first-of-its-kind LLM to assess and combat cybersecurity threats. Interestingly, this model has only been trained on the information present on the dark web.

What is DarKBERT?

DarkBERT is an encoder model that adopts the RoBERTa architecture, relying on transformers. Instead of being trained on the web, researchers trained this LLM on a vast dataset of dark web pages, assimilating information from places such as hacker forums, scamming websites, and other criminal internet sources. In a paper called ‘DarkBERT: A language model for the dark side of the Internet‘ published on arxiv.org that is yet to be peer-reviewed , its creators say that DarKBERT can revolutionize the fight against cybercrime by finding and analyzing the elusive domains of the Internet, which remain hidden from search engines.

While the dark web is usually concealed and inaccessible to the general public, researchers used the Tor network to access and collect data from its pages. The data then underwent several processes such as deduplication, category balancing, and pre-processing to create a refined database of the dark web, which was then finally fed to RoBERTa, which led to the creation of DarKBERT over a period of 15 days.

Cybersecurity applications

Since it is trained on the dataset of dark web pages, DarKBERT has the potential for a wide range of cybersecurity applications. It can help monitor illicit activities and bolster cybersecurity measures. It can also “combat the extreme lexical and structural diversity of the Dark Web that may be detrimental to building a proper representation of the domain,” according to the research paper.

It can automate the process of monitoring dark web forums where unlawful information is usually shared. DarKBERT can detect websites that are involved in leaking sensitive or confidential data and selling ransomware.

Lastly, it uses the BERT-family language model’s fill-mask function to detect and filter out phrases linked with criminal activities which can help identify and tackle new cyber threats.


Large Language Models (LLMs) have gained massive popularity over the past few months, especially since the emergence of AI chatbots like ChatGPT. These AI-powered models can generate new content, such as text, images, audio, and more by studying an existing database and learning patterns to generate new and unique content. While these tools have been used to generate content using generative AI, researchers have now developed the first-of-its-kind LLM to assess and combat cybersecurity threats. Interestingly, this model has only been trained on the information present on the dark web.

What is DarKBERT?

DarkBERT is an encoder model that adopts the RoBERTa architecture, relying on transformers. Instead of being trained on the web, researchers trained this LLM on a vast dataset of dark web pages, assimilating information from places such as hacker forums, scamming websites, and other criminal internet sources. In a paper called ‘DarkBERT: A language model for the dark side of the Internet‘ published on arxiv.org that is yet to be peer-reviewed , its creators say that DarKBERT can revolutionize the fight against cybercrime by finding and analyzing the elusive domains of the Internet, which remain hidden from search engines.

While the dark web is usually concealed and inaccessible to the general public, researchers used the Tor network to access and collect data from its pages. The data then underwent several processes such as deduplication, category balancing, and pre-processing to create a refined database of the dark web, which was then finally fed to RoBERTa, which led to the creation of DarKBERT over a period of 15 days.

Cybersecurity applications

Since it is trained on the dataset of dark web pages, DarKBERT has the potential for a wide range of cybersecurity applications. It can help monitor illicit activities and bolster cybersecurity measures. It can also “combat the extreme lexical and structural diversity of the Dark Web that may be detrimental to building a proper representation of the domain,” according to the research paper.

It can automate the process of monitoring dark web forums where unlawful information is usually shared. DarKBERT can detect websites that are involved in leaking sensitive or confidential data and selling ransomware.

Lastly, it uses the BERT-family language model’s fill-mask function to detect and filter out phrases linked with criminal activities which can help identify and tackle new cyber threats.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
CombatCybersecuritycybersecurity threatsDarkDark webDark web AIDark web browserdark web pagesdark web threatsDarkbertDarKBERT AI toolDarKBERT LLMInternetLatestMeetTechnoblenderTechnologyThreatsToolTor browserwebtrainedwhat is darkbert
Comments (0)
Add Comment