The Future of AI Indexing Technology is Multimodal

Here’s why Multimodal Artificial Intelligence is key to unlocking the full potential of your organization’s media asset library.

Pandemic workplace disruption accelerated many organizations’ migrations to cloud storage for audiovisual content. But making these assets accessible from anywhere is only step one. To truly enhance production workflows and manage the endless accumulation of content, it also needs to be easily and accurately searchable.

‍

The problem with ‘traditional’ AI

Transcription is the traditional AI indexing solution that TV channels, media agencies and sports rights holders consider. It’s undoubtedly more efficient than relying on a team to manually tag ingested content. However this standard, unimodal system of AI has limitations which can work against the large flows of content that broadcast organizations must manage. Unoptimized tagging leaves the media asset index more prone to bias and false positives in search results.

By purely analyzing speech, the AI can miss important visual context and put production and archivist teams on the back foot.

For example, when searching for footage spanning several years to clarify a politician’s position on climate change, or to compile a highlight reel of Liverpool star Mohamed Salah’s chart-topping goals in the UK Premier League. Indexing by transcript alone can overlook key information about the location of significant events and the involvement of other key players.

This is where applying Moments Lab’s Multimodal AI rules to media sorting can make a big impact.

‍

‍

Enhancing the value of media assets

The technology is designed to mimic the human approach to understanding surroundings. Rather than rely on a single source for indexing, Multimodal AI crawls and detects hundreds of hours of AV content by:

Object,
Context,
Geo-location,
Text,
Facial recognition,
Wiki data,
Brand logos and other visual patterns,
Transcription and translation in 100+ languages.

By tapping into collective memories, personal learnings, hearing, and the notion of space and time, metadata applied through Multimodal AI indexing leads users to the exact moment, and gives them the precise context they need.

Once a set of indexing rules is chosen, future matching sequences are automatically stored together, creating valuable media asset collections. The indexing technology is being continuously improved through feedback learning, with a dedicated team of researchers working on speaker diarization and multi-language transcription among other features.

‍

Customized AI training for even greater accuracy

What media teams find particularly exciting is how their Moments Lab Multimodal AI can be further enhanced and trained with a ‘thesaurus’ of people, objects or actions linked to Wikidata.

For example, when migrating over 2,000 hours of video and 60TB of archive media, French soccer club Lille Olympique Sporting Club (LOSC) imported key information about its players and stakeholders to allow the AI to identify its club’s VIPs. The 24/7 Arabic-language multi-platform news service Asharq News trains its AI to detect and recognize specific people, objects, actions and scenes – in both English and Arabic – as it indexes 1,500 hours of video per month.

News and production teams must keep pace with an ever-increasing volume of new and archived media, and organizations want to be able to store it forever. But storing content without optimal indexing is akin to hitting the delete button. With a success rate of over 95% and the flexibility to integrate with third-party MAM platforms, Moments Lab’s multimodal and generative AI, MXT-1, provides one of the most advanced detection and indexing technologies on the market.