Categories: Web and IT News

Twelve Labs Launches Marengo 2.7, Introducing New Multi-Vector Approach to Video Understanding

Latest innovation yields greater than 15% improvement over previous foundation model

Twelve Labs, the video understanding company, announced Marengo 2.7, a new state-of-the-art multimodal embedding model that achieves over 15% improvement over its predecessor Marengo-2.6. Building upon the success of the previous video foundation model, Marengo-2.7 represents a significant advancement in multimodal video understanding, as it adopts a multi-vector approach that enables more precise and comprehensive video content analysis. This is the first model of its kind to do so, and early results are stunning, including 90.6% average recall in object search (32.6% improvement from previous version) and 93.2% recall in speech search (2.8% higher than specialized speech-to-text systems).

Video understanding has been a notoriously difficult problem to solve. A single video clip simultaneously contains visual elements (objects, scenes, actions), temporal dynamics (motion, transitions), audio components (speech, ambient sounds, music), and often textual information (overlays, subtitles). Traditional single-vector approaches struggle to effectively compress all these diverse aspects into one representation without losing critical information. Marengo 2.7 up-ends this thinking to do something entirely new.

Marketing Technology News: MarTech Interview with Gulab Patil, Founder & CEO @ Lemma

“Twelve Labs continues to push video understanding forward in unprecedented ways, turning the concept of a multi-vector approach into reality for the very first time,” said Jae Lee, CEO of Twelve Labs

A Novel Approach

With Marengo 2.7, Twelve Labs deploys multi-vector representation for the first time to address the complexities inherent in video. Unlike Marengo-2.6 that compresses all information into a single embedding, Marengo-2.7 decomposes the raw inputs into multiple specialized vectors. Each vector independently captures distinct aspects of the video content – from visual appearance and motion dynamics to OCR text and speech patterns.

For example, one vector might capture what things look like (e.g., “a man in a black shirt”), another tracks movement (e.g., “waving his hand”), and another remembers what was said (e.g., “video foundation model is fun”). This approach helps the model better understand videos that contain many different types of information, leading to more accurate video analysis across all aspects – visual, motion, and audio.

Marketing Technology News: In the TikTokization era of Advertising, the ‘Perfect Ad’ doesn’t exist. Where do creators go from here?

Marengo 2.7 demonstrates particular strength in detecting small objects while maintaining exceptional performance in general text-based search tasks. This level of granular representation enables more nuanced multimodal search capabilities. Now, with Marengo 2.7, users can search complex visual scenes, find specific brand appearances, locate exact audio moments, match images to video segments, and more.

“Twelve Labs continues to push video understanding forward in unprecedented ways, turning the concept of a multi-vector approach into reality for the very first time,” said Jae Lee, CEO of Twelve Labs. “Our R&D team is laser focused on solving what was previously considered unsolvable. Their groundbreaking work has been rigorously tested, and the model’s performance is vastly superior to anything on the market . We look forward to seeing how our customers will use this powerful technology.”

Write in to psen@itechseries.com to learn more about our exclusive editorial packages and programs.

The post Twelve Labs Launches Marengo 2.7, Introducing New Multi-Vector Approach to Video Understanding first appeared on PressReleaseCC.

Twelve Labs Launches Marengo 2.7, Introducing New Multi-Vector Approach to Video Understanding first appeared on Web and IT News.

awnewsor

Recent Posts

The Quiet Death of the Dumb Terminal: Why Claude’s New Computer Use Is the Real AI Interface War

Anthropic just made its AI agent permanently resident on your desktop. Not as a chatbot…

37 minutes ago

The Billionaire Who Says Your Kids Should Learn to Code Like They Learn to Read — And Why Wall Street Should Listen

Jack Clark thinks coding is the new literacy. Not in the vague, aspirational way that…

37 minutes ago

Your AI Chatbot Is Flattering You — And It’s Making Its Answers Worse

Ask a chatbot a question and you’ll get an answer. But the answer you get…

37 minutes ago

Google Photos Finally Fixes Its Most Annoying Editing Flaw — And It’s About Time

For years, cropping a photo in Google Photos has been an exercise in quiet frustration.…

38 minutes ago

The Squeeze Is On: How U.S. Sanctions, OPEC Politics, and a Shadow War Are Reshaping Global Oil Markets

OPEC’s crude oil production dropped sharply in May, and the reasons stretch far beyond the…

38 minutes ago

Google’s Gemini Is About to Know You Better Than You Know Yourself — And That’s the Whole Point

Google is making its biggest bet yet on the idea that artificial intelligence should be…

38 minutes ago

This website uses cookies.