Techno Blender
Digitally Yours.

Stack Overflow joins Reddit and Twitter in charging AI companies for training data

0 54


Bing Image Creator/ZDNET

The bill for companies specializing in artificial intelligence continues to grow: Stack Overflow joins Reddit and Twitter as another platform planning to start charging AI companies that want to use its data for training. 

AI models like those used to create ChatGPT, Google Bard, and Bing Chat, all require a massive dataset for training. The companies behind them, like OpenAI and Google, gather data from all over the internet to train their large language models (LLM) on parameters that result in successful natural language processing (NLP). 

Also: This new technology could blow away GPT-4 and everything like it

This training data includes different subjects, from world history to software development to build its “intelligence,” as well as grammar, speech nuances, and styles derived from conversations to generate human-like responses.

According to reporting from Wired, Stack Overflow could begin charging AI companies this summer to access its forum with over 50 million questions and answers for training in AI projects.

Stack Overflow is a programming forum that offers a collaborative environment to its users, which are mostly developers. It’s a popular place for programmers to ask about coding problems and programming language, and works as a learning resource for its over 20 million users.

Also: The best AI chatbots: ChatGPT and alternatives to try

In a recent post on the company’s site, the Stack Overflow CEO, Prashanth Chandrasekar, explained that “allowing AI models to train on the data developers have created over the years, but not sharing the data and learnings from those models with the public in return, would lead to a tragedy of the commons.”

The forum made headlines last fall for banning the use of ChatGPT-generated text to create posts, deeming the practice “harmful” to the site and its users. “Unless we all continue contributing knowledge back to a shared, public platform, we risk a world in which knowledge is centralized inside the black box of AI models that require users to pay in order to access their services,” Chandrasekar added in the separate post. 


Robot counting money

Bing Image Creator/ZDNET

The bill for companies specializing in artificial intelligence continues to grow: Stack Overflow joins Reddit and Twitter as another platform planning to start charging AI companies that want to use its data for training. 

AI models like those used to create ChatGPT, Google Bard, and Bing Chat, all require a massive dataset for training. The companies behind them, like OpenAI and Google, gather data from all over the internet to train their large language models (LLM) on parameters that result in successful natural language processing (NLP). 

Also: This new technology could blow away GPT-4 and everything like it

This training data includes different subjects, from world history to software development to build its “intelligence,” as well as grammar, speech nuances, and styles derived from conversations to generate human-like responses.

According to reporting from Wired, Stack Overflow could begin charging AI companies this summer to access its forum with over 50 million questions and answers for training in AI projects.

Stack Overflow is a programming forum that offers a collaborative environment to its users, which are mostly developers. It’s a popular place for programmers to ask about coding problems and programming language, and works as a learning resource for its over 20 million users.

Also: The best AI chatbots: ChatGPT and alternatives to try

In a recent post on the company’s site, the Stack Overflow CEO, Prashanth Chandrasekar, explained that “allowing AI models to train on the data developers have created over the years, but not sharing the data and learnings from those models with the public in return, would lead to a tragedy of the commons.”

The forum made headlines last fall for banning the use of ChatGPT-generated text to create posts, deeming the practice “harmful” to the site and its users. “Unless we all continue contributing knowledge back to a shared, public platform, we risk a world in which knowledge is centralized inside the black box of AI models that require users to pay in order to access their services,” Chandrasekar added in the separate post. 

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment