Techno Blender
Digitally Yours.

Google’s Gemini 1.5 Ultra LLM dives deep into oceans of video and audi

0 20


Just last week, Google unveiled its new AI chatbot lineup, featuring Gemini Advanced—its best bot, based on its most powerful large language model, Gemini 1.0 Ultra. But Gemini 1.0 Ultra’s reign as the company’s flagship LLM could turn out to be brief.

Today the company is announcing Gemini 1.5 Pro, an update to its middle-tier LLM. It says the improvements result in an LLM in the same zip code, power-wise, as Gemini 1.0 Ultra. And in a briefing for reporters on Wednesday, Google DeepMind principal scientist Oriol Vinyals showed off videos of Gemini 1.5 Pro performing some pretty spectacular feats of AI.

According to Google, Gemini 1.5 Pro punches above its weight in part because it’s engineered for efficiency, both when it’s being trained and when it’s generating content. It can also handle more tokens—the data points an LLM divides a piece of content into to process it. Gemini 1.0 could deal with 32,000 tokens at a time. By default, Gemini 1.5 has a capacity of 128,000 tokens, the same as OpenAI’s GPT-4 Turbo model. But Google will let some customers try a version with a capacity of 1 million tokens, and says it’s tested the LLM with 10 million tokens.

Those of us who aren’t AI scientists may have trouble getting our heads around those numbers. For Gemini 1.5 Ultra, they translate into an hour of video, 11 hours of audio, more than 700,000 words of text, or 30,000 lines of programming code—all of which help Gemini 1.5 deal with inputs that are way more complex than a typical typed-in prompt or photo of your cat.

[Image: Google DeepMind]

During its press briefing, for instance, Google showed a video in which it fed more than 400 pages of transcribed air-to-ground audio from the Apollo 11 moon landing to Gemini 1.5 Pro, which divvied it into 326,678 tokens. That allowed the LLM to ace the request “Find 3 funny moments. Make a list with only the quotes.” When Google gave the LLM a scrawled drawing of an astronaut’s boot taking a step, Gemini 1.5 Pro figured out that it referenced Neil Armstrong’s iconic declaration.

In another demo, Gemini 1.5 Pro turned Buster Keaton’s 45-minute silent comedy Sherlock Jr. into 696,161 tokens. It was then able to summarize the film’s plot, answer a question about the writing on a slip of paper that appears partway through it, and pinpoint the moment represented by another hasty sketch. In a third demo, the LLM ingested a grammar guide for Kalamang—a language spoken by fewer than 200 people—and was then able to translate between it and English with human-like proficiency, according to Google.

Why didn’t the company focus on readying Gemini 1.5 Pro for deployment rather than immediately applying its new advances to its top-of-the-line Ultra version, which would theoretically result in an even more, well, Ultra LLM? The bigger an LLM’s training set, the trickier it is to make it perform satisfactorily, which gave the midrange Pro version an advantage as a test bed for Google’s latest work.

“Very naturally, the first set of models that we trained to completion is the Pro series, which is on the smaller side compared to Ultra,” Vinyals told me during the briefing. “That’s the reason why, in general, this might become available earlier.”

For now, the Gemini 1.5 Pro LLM is in private testing with a select group of customers of Google’s Vertex AI cloud service and AI Studio software development platform. Google isn’t saying when its power might be available to more developers or—via its Gemini chatbots—mere mortals. Nor did Vinyals share anything about what Gemini 1.5 Ultra might be able to accomplish or when it could appear.

But with Google’s AI rivals also making progress at a furious clip—on Tuesday, The Information’s Aaron Holmes reported that OpenAI is developing a search engine—the company has every incentive to make its best LLM available far and wide as soon as it can.




Just last week, Google unveiled its new AI chatbot lineup, featuring Gemini Advanced—its best bot, based on its most powerful large language model, Gemini 1.0 Ultra. But Gemini 1.0 Ultra’s reign as the company’s flagship LLM could turn out to be brief.

Today the company is announcing Gemini 1.5 Pro, an update to its middle-tier LLM. It says the improvements result in an LLM in the same zip code, power-wise, as Gemini 1.0 Ultra. And in a briefing for reporters on Wednesday, Google DeepMind principal scientist Oriol Vinyals showed off videos of Gemini 1.5 Pro performing some pretty spectacular feats of AI.

According to Google, Gemini 1.5 Pro punches above its weight in part because it’s engineered for efficiency, both when it’s being trained and when it’s generating content. It can also handle more tokens—the data points an LLM divides a piece of content into to process it. Gemini 1.0 could deal with 32,000 tokens at a time. By default, Gemini 1.5 has a capacity of 128,000 tokens, the same as OpenAI’s GPT-4 Turbo model. But Google will let some customers try a version with a capacity of 1 million tokens, and says it’s tested the LLM with 10 million tokens.

Those of us who aren’t AI scientists may have trouble getting our heads around those numbers. For Gemini 1.5 Ultra, they translate into an hour of video, 11 hours of audio, more than 700,000 words of text, or 30,000 lines of programming code—all of which help Gemini 1.5 deal with inputs that are way more complex than a typical typed-in prompt or photo of your cat.

[Image: Google DeepMind]

During its press briefing, for instance, Google showed a video in which it fed more than 400 pages of transcribed air-to-ground audio from the Apollo 11 moon landing to Gemini 1.5 Pro, which divvied it into 326,678 tokens. That allowed the LLM to ace the request “Find 3 funny moments. Make a list with only the quotes.” When Google gave the LLM a scrawled drawing of an astronaut’s boot taking a step, Gemini 1.5 Pro figured out that it referenced Neil Armstrong’s iconic declaration.

In another demo, Gemini 1.5 Pro turned Buster Keaton’s 45-minute silent comedy Sherlock Jr. into 696,161 tokens. It was then able to summarize the film’s plot, answer a question about the writing on a slip of paper that appears partway through it, and pinpoint the moment represented by another hasty sketch. In a third demo, the LLM ingested a grammar guide for Kalamang—a language spoken by fewer than 200 people—and was then able to translate between it and English with human-like proficiency, according to Google.

Why didn’t the company focus on readying Gemini 1.5 Pro for deployment rather than immediately applying its new advances to its top-of-the-line Ultra version, which would theoretically result in an even more, well, Ultra LLM? The bigger an LLM’s training set, the trickier it is to make it perform satisfactorily, which gave the midrange Pro version an advantage as a test bed for Google’s latest work.

“Very naturally, the first set of models that we trained to completion is the Pro series, which is on the smaller side compared to Ultra,” Vinyals told me during the briefing. “That’s the reason why, in general, this might become available earlier.”

For now, the Gemini 1.5 Pro LLM is in private testing with a select group of customers of Google’s Vertex AI cloud service and AI Studio software development platform. Google isn’t saying when its power might be available to more developers or—via its Gemini chatbots—mere mortals. Nor did Vinyals share anything about what Gemini 1.5 Ultra might be able to accomplish or when it could appear.

But with Google’s AI rivals also making progress at a furious clip—on Tuesday, The Information’s Aaron Holmes reported that OpenAI is developing a search engine—the company has every incentive to make its best LLM available far and wide as soon as it can.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment