Techno Blender
Digitally Yours.

Google’s AI is Not a Pro in Data Labeling! But the Comp Fails to Admit it

0 60



Google’s

The biggest problems facing Google’s AI industry are rubbish, exploitative data-labeling practices

A study published by Surge AI highlights one of the biggest problems facing the AI industry: rubbish, exploitative data-labeling practices. Google built a dataset called “GoEmotions. It is the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. Google’s AI GoEmotions dataset consists of comments from Reddit users with labels of their emotional coloring. A whopping 30% of the dataset is severely mislabeled.

According to Google: In “GoEmotions: A Dataset of Fine-Grained Emotions”, we describe GoEmotions, a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories. As the largest fully annotated English language fine-grained emotion dataset to date, we designed the GoEmotions taxonomy with both psychology and data applicability in mind. It is designed to train neural networks to perform deep analysis of the tonality of texts.

Google’s AI industry data-labeling practices:

Surge AI took a look at a sample of 1,000 labeled comments from the GoEmotions dataset and found that a significant portion of them was mislabeled. This kind of data can’t be properly labeled. It’s not that the particular labelers didn’t do a good job, it’s that they were given an impossible task. This particular kind of AI development is a grift. It’s a scam. And it’s one of the oldest in the book.

Google used data labelers unfamiliar with US English and US culture despite Reddit being a US-centric site with particularly specialized memes and jargon. When we relabeled the dataset, our technical infrastructure, and human-AI algorithms allowed us to leverage our labeling marketplace to build a team of Surgery who aren’t only native US English speakers, but also heavy Reddit and social media users who understand all of Reddit’s in-jokes, the nuances in US politics.

The researchers took an impossible problem, how to decide human feeling in the text at enormous scopes without setting, and used the magic of bullshit to turn it into a relatively simple one that any AI can tackle how to match keywords to labels. The explanation’s a gift is that you needn’t bother with AI to match keywords to labels.

Assuming that the AI’s result can be utilized to impact human prizes like surfacing every one of the resumes in a stack that have “positive opinion” in them, we need to expect that a portion of the documents it didn’t surface was unjustly oppressed. It is our position here at Neural that it is altogether untrustworthy to prepare an AI on human-made content without the communicated individual assent of the people who made it. Besides, it is likewise our position that it is unscrupulous to convey AI models prepared on the information.

Google’s scientists know that a conventional “keyword search and comparison” calculation can’t transform an AI model into a human-level master in brain science, social science, mainstream society, and semantics since they feed it a dataset loaded with haphazardly mislabeled Reddit posts. Yet, no measure of ability and innovation can transform a sack brimming with bologna into a helpful AI model when human results are in question.

More Trending Stories 
  • Bitcoin’s Production Cost is at US$13k! BTC at the Verge of a Free Fall
  • Top 10 Companies Hiring Self-Taught Data Scientists In 2022
  • Top 10 Web3 Indian Startups Making a Revolution in 2022
  • Shiba Inu will Take Over Ten Thousand Years to Reach US$1!
  • Top 10 Best Crypto Cold Storage Options for Investors In 2022
  • Top 10 Cryptocurrencies to Buy after Selling your Bitcoin Investment
  • Introducing Cognitive Intelligence via AI-driven Robots and Drones in Industry 4.0

The post Google’s AI is Not a Pro in Data Labeling! But the Comp Fails to Admit it appeared first on .



Google’s

Google’s

The biggest problems facing Google’s AI industry are rubbish, exploitative data-labeling practices

A study published by Surge AI highlights one of the biggest problems facing the AI industry: rubbish, exploitative data-labeling practices. Google built a dataset called “GoEmotions. It is the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. Google’s AI GoEmotions dataset consists of comments from Reddit users with labels of their emotional coloring. A whopping 30% of the dataset is severely mislabeled.

According to Google: In “GoEmotions: A Dataset of Fine-Grained Emotions”, we describe GoEmotions, a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories. As the largest fully annotated English language fine-grained emotion dataset to date, we designed the GoEmotions taxonomy with both psychology and data applicability in mind. It is designed to train neural networks to perform deep analysis of the tonality of texts.

Google’s AI industry data-labeling practices:

Surge AI took a look at a sample of 1,000 labeled comments from the GoEmotions dataset and found that a significant portion of them was mislabeled. This kind of data can’t be properly labeled. It’s not that the particular labelers didn’t do a good job, it’s that they were given an impossible task. This particular kind of AI development is a grift. It’s a scam. And it’s one of the oldest in the book.

Google used data labelers unfamiliar with US English and US culture despite Reddit being a US-centric site with particularly specialized memes and jargon. When we relabeled the dataset, our technical infrastructure, and human-AI algorithms allowed us to leverage our labeling marketplace to build a team of Surgery who aren’t only native US English speakers, but also heavy Reddit and social media users who understand all of Reddit’s in-jokes, the nuances in US politics.

The researchers took an impossible problem, how to decide human feeling in the text at enormous scopes without setting, and used the magic of bullshit to turn it into a relatively simple one that any AI can tackle how to match keywords to labels. The explanation’s a gift is that you needn’t bother with AI to match keywords to labels.

Assuming that the AI’s result can be utilized to impact human prizes like surfacing every one of the resumes in a stack that have “positive opinion” in them, we need to expect that a portion of the documents it didn’t surface was unjustly oppressed. It is our position here at Neural that it is altogether untrustworthy to prepare an AI on human-made content without the communicated individual assent of the people who made it. Besides, it is likewise our position that it is unscrupulous to convey AI models prepared on the information.

Google’s scientists know that a conventional “keyword search and comparison” calculation can’t transform an AI model into a human-level master in brain science, social science, mainstream society, and semantics since they feed it a dataset loaded with haphazardly mislabeled Reddit posts. Yet, no measure of ability and innovation can transform a sack brimming with bologna into a helpful AI model when human results are in question.

More Trending Stories 
  • Bitcoin’s Production Cost is at US$13k! BTC at the Verge of a Free Fall
  • Top 10 Companies Hiring Self-Taught Data Scientists In 2022
  • Top 10 Web3 Indian Startups Making a Revolution in 2022
  • Shiba Inu will Take Over Ten Thousand Years to Reach US$1!
  • Top 10 Best Crypto Cold Storage Options for Investors In 2022
  • Top 10 Cryptocurrencies to Buy after Selling your Bitcoin Investment
  • Introducing Cognitive Intelligence via AI-driven Robots and Drones in Industry 4.0

The post Google’s AI is Not a Pro in Data Labeling! But the Comp Fails to Admit it appeared first on .

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment