Artificial intelligence

Nvidia co-leads Series A round for AI startup Twelve Labs

Eun-Yi Ko

Jun 05, 2024 (Gmt+09:00)

(Photo file, courtesy of Twelve Labs)

Global chip giant Nvidia Corp. has co-led a $50 million Series A funding round for Twelve Labs, a South Korean artificial intelligence company specializing in video analysis, according to the tech startup on Wednesday. 

Nvidia's venture capital arm NVentures and New Enterprise Associates, a new investor in Twelve Labs, jointly led the Series A round. Existing global investors including Index Ventures, Radical Ventures and WndrCo, led by DreamWorks co-founder Jeffrey Katzenberg, and Seoul-based Korea Investment Partners joined the round.

The existing investors participated in pre-Series A funding of about $10 million last October, in which Nvidia made its first investment in a Korean generative AI startup. 

Twelve Labs has attracted about $77 million, including last fall's Series A funding, since its inception in 2021. The company said it will use the funds for research and development of its AI-based video understanding and search technologies and recruitment of more than 50 employees by the end of the year.

"The world-class team at Twelve Labs is leveraging Nvidia accelerated computing together with their incredible capacity for video understanding, leading to new ways for enterprise customers to take advantage of generative AI," said Mohamed Siddeek, corporate vice president and head of NVentures. 

“The large language model (LLM) market is dominated by a handful of Big Tech corporations such as OpenAI, but we believe that Twelve Labs can become a global leader in the multimodal AI industry for video understanding," said John MJ Kim, a principal at Korea Investment Partners.

Multimodal AI is used for a machine learning model, in which various data types including image, text, speech and number are combined with intelligence processing algorithms for accurate and sophisticated outputs.

Based on the multimodal model, the startup analyzes images and sounds in a video and matches them to human language. The model also can create text based on the video content, edit a short-form video and categorize videos by a certain standard. 

The technology boosts efficiency in creating YouTube Shorts, setting up advertisement strategies for videos and even finding missing persons by analyzing closed-circuit television (CCTV) footage.

Twelve Labs has integrated some of Nvidia's framework and services within its platform, including the NVIDIA H100 Tensor Core Graphic Processing Unit (GPU) and NVIDIA L40S GPU, to improve its video understanding technology. 

In March, Twelve Labs released the multimodal model Marengo-2.6, which enables various video, text, image and audio search tasks and also launched a beta version of Pegasus-1, which is specifically designed to understand and articulate video content.

Write to Eun-Yi Ko at koko@hankyung.com


Jihyun Kim edited this article.

More To Read