Twelve Labs

Software-Entwicklung

San Francisco, California 6,523 followers

Help developers build programs that can see, listen, and understand the world as we do.

View all 70 employees

Über uns

Helping developers build programs that can see, hear, and understand the world as we do by giving them the world's most powerful video-understanding infrastructure.

Website: http://www.twelvelabs.io
External link for Twelve Labs
Industrie: Software-Entwicklung
Größe des Unternehmens: 11-50 Mitarbeiter
Hauptsitz: San Francisco, Kalifornien
Typ: In Privatbesitz
Gegründet: 2021

Standorte

Primäre

555 Mission St

San Francisco, Kalifornien 94105, US

Wegbeschreibung erhalten

Employees at Twelve Labs

Alle Mitarbeiter sehen

Aktualisierungen

Twelve Labs

6,523 followers
6h
Diesen Beitrag melden
In the 57th session of #MultimodalWeekly, we have three exciting presentations - two on video captions and one on training data for foundation models. ✅ Lucas Ventura will discuss CoVR - a nice work that generates triplets given video-caption pairs, while also expanding the scope of the task to include composed video retrieval. (with Antoine Y. Gül Varol Cordelia Schmid) ✅ Shayne Longpre will discuss his new work Consent in Crisis: The Rapid Decline of the AI Data Commons and its multimodal implications. This work has been covered by The New York Times, 404 Media, Vox, and Yahoo Finance. ✅ Nina Shvetsova and Anna Kukleva will discuss HowToCaption - a nice work that leverages recent advances in LLMs and generates high-quality video captions at scale without any human supervision. (with Hilde Kuehne) Register for the webinar here: https://lnkd.in/gJGtscSH ⬅ Join our Discord to connect with the speakers: https://lnkd.in/gRt4GdDx 🤝
Wie Kommentar Aktie
Twelve Labs

6,523 followers
1d
Diesen Beitrag melden
~ New Webinar ~ The recording of #MultimodalWeekly 53 with Xiang Yue, Orr Zohar, and Mingqi Jiang is up! Watch here: https://lnkd.in/gSKtFGTb 📺 They discussed: - Evaluating multimodal models on massive multi-discipline tasks - Self-training for video language models via video instruction tuning - Explanation methods for ConvNets and Transformers Enjoy!

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

https://www.youtube.com/

Wie Kommentar Aktie
Twelve Labs reposted this

Jae Lee

Multimodal neural nets @Twelve Labs - We are hiring!
5d
Diesen Beitrag melden
All the new capabilities emerging from recent video foundation models are exciting. However, to make a real impact in the wild, it's crucial to first master the fundamentals of video comprehension: motion, appearance, and spatiotemporal understanding. TWLV-I is our response to this challenge. At Twelve Labs, we’re proud to introduce TWLV-I, our latest video foundation model, along with a new evaluation framework designed to assess these core capabilities. Unlike language or image models, video models face unique challenges that complicate fair comparisons. Our framework specifically measures two fundamental aspects of video comprehension: appearance and motion understanding. Our research reveals that existing models, like UMT, InternVideo2, and V-JEPA, fall short in at least one of these areas. TWLV-I, trained only on publicly available datasets, excels in both, demonstrating robust performance across a variety of tasks, from action recognition to spatiotemporal action localization. Next: Scaling up with our proprietary data. Congratulations Aiden L. and team! 📄 Read our report on arXiv: https://lnkd.in/grbrxh7X 👍 Upvote on Hugging Face: https://lnkd.in/gDvWrrYp 🧠 Explore the embeddings: https://lnkd.in/gcANeufq
Twelve Labs

6,523 followers
5d

Building video foundation models has been our core focus since day 1. Unlike language or image foundation models, many video foundation models are evaluated with differing parameters (such as sampling rate, number of frames, pretraining steps, etc.), making fair and robust comparisons challenging. Therefore, we present a carefully designed evaluation framework for measuring two core capabilities of video comprehension: appearance and motion understanding. ⚖ Our findings reveal that existing video foundation models, whether text-supervised like UMT or InternVideo2, or self-supervised like V-JEPA, exhibit limitations in at least one of these capabilities. As an alternative, we introduce TWLV-I, a new video foundation model that constructs robust visual representations for both motion- and appearance-based videos. 🆕 Trained exclusively on publicly available datasets, TWLV-I demonstrates notable performance across both appearance- and motion-centric action recognition benchmark datasets. 📊 TWLV-I's capabilities extend beyond action recognition. It achieves competitive performance on various video-centric tasks, including temporal action localization, spatiotemporal action localization, and temporal action segmentation. This multifaceted proficiency highlights TWLV-I's spatial and temporal understanding capabilities. 🎡 ▶ Read the technical report on arXiv: https://lnkd.in/grbrxh7X ▶ Upvote it on Hugging Face: https://lnkd.in/gDvWrrYp ▶ Play with embedding vectors obtained by TWLV-I via the evaluation source code: https://lnkd.in/gcANeufq
4 Kommentare

Wie Kommentar Aktie
Twelve Labs

6,523 followers
5d
Diesen Beitrag melden
Building video foundation models has been our core focus since day 1. Unlike language or image foundation models, many video foundation models are evaluated with differing parameters (such as sampling rate, number of frames, pretraining steps, etc.), making fair and robust comparisons challenging. Therefore, we present a carefully designed evaluation framework for measuring two core capabilities of video comprehension: appearance and motion understanding. ⚖ Our findings reveal that existing video foundation models, whether text-supervised like UMT or InternVideo2, or self-supervised like V-JEPA, exhibit limitations in at least one of these capabilities. As an alternative, we introduce TWLV-I, a new video foundation model that constructs robust visual representations for both motion- and appearance-based videos. 🆕 Trained exclusively on publicly available datasets, TWLV-I demonstrates notable performance across both appearance- and motion-centric action recognition benchmark datasets. 📊 TWLV-I's capabilities extend beyond action recognition. It achieves competitive performance on various video-centric tasks, including temporal action localization, spatiotemporal action localization, and temporal action segmentation. This multifaceted proficiency highlights TWLV-I's spatial and temporal understanding capabilities. 🎡 ▶ Read the technical report on arXiv: https://lnkd.in/grbrxh7X ▶ Upvote it on Hugging Face: https://lnkd.in/gDvWrrYp ▶ Play with embedding vectors obtained by TWLV-I via the evaluation source code: https://lnkd.in/gcANeufq
1 Kommentar

Wie Kommentar Aktie
Twelve Labs

6,523 followers
6d
Diesen Beitrag melden
Have you ever wanted to pinpoint specific color shades in a video, perhaps to find a product or a particular moment that features your favorite hues? 🌈 Shade Finder is an app designed to pinpoint moments in beauty and fashion videos where specific shades appear: https://lnkd.in/gVrdmHAv 💄 The app excels at finding videos featuring objects, colors, and shapes that closely match the images you provide. Ideal for beauty enthusiasts and fashion aficionados, Shade Finder ensures you never miss a moment of your favorite shades in action. 🤩 Meeran K. wrote this in-depth tutorial on how she built this app using the new Twelve Labs Image-to-Video Search API: https://lnkd.in/g3rdGfBc 👩💻 ☑ Watch the tutorial: https://lnkd.in/gatwadmT ☑ Check out the code: https://lnkd.in/g-e7yfVq ☑ Play with it via Replit: https://lnkd.in/gx-Ha7S4
Wie Kommentar Aktie
Twelve Labs

6,523 followers
1w
Diesen Beitrag melden
In the 56th session of #MultimodalWeekly, we have three exciting presentations across different video understanding tasks: action recognition, video description, and video summarization. ✅ Jacob Chalk and Jaesung Huh will discuss the Time Interval Machine (TIM) - which addresses the interplay between the two modalities in long videos by explicitly modeling the temporal extents of audio and visual events: https://lnkd.in/gThpCzsx ✅ Haran Raajesh and Naveen Reddy D will discuss Movie-Identity Captioner (MICap) - which is a new single-stage approach that can seamlessly switch between id-aware caption generation or fill-in-the-blanks when given a caption with blanks: https://lnkd.in/g6NW_Rp3 ✅ Aditya Singh, Dhruv Srivastava, and Assistant Professor Makarand Tapaswi will discuss their work "Previously on ..." From Recaps to Story Summarization - which tackles multimodal story summarization by leveraging TV episode recaps — short video sequences interweaving key story moments from previous episodes to bring viewers up to speed: https://lnkd.in/gD8Kr3uy Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord community: https://lnkd.in/gRt4GdDx 🤝
Wie Kommentar Aktie
Twelve Labs

6,523 followers
1w
Diesen Beitrag melden
~ New Webinar ~ The recording of #MultimodalWeekly 52 with Saelyne Yang, Bo Li/Yuanhan Zhang, and 肖俊斌 is up! Watch here: https://lnkd.in/grmE-Pye 📺 They discussed: - Learning Procedural Tasks via How-To Videos - Feeling & Building Multimodal Intelligence - Visually-Grounded VideoQA Enjoy!

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

https://www.youtube.com/

Wie Kommentar Aktie
Twelve Labs

6,523 followers
1w
Diesen Beitrag melden
🚀 New Tutorial for AI Engineers! 🚀 We've published a comprehensive tutorial on integrating Twelve Labs' Embed API with LanceDB to build advanced video understanding applications. This guide is designed for those working on semantic video search engines, content-based recommendation systems, or anomaly detection in video streams. 🔍 Key Highlights: - Twelve Labs Embed API: Generate detailed, multimodal embeddings that capture the essence of video content. - LanceDB: Efficiently store, index, and query high-dimensional vectors for accurate retrieval. - Step-by-Step Guide: This covers setting up your environment and generating, storing, and querying video embeddings. - Practical Applications: Create semantic search engines and integrate them with a Retrieval-Augmented Generation (RAG) workflow. 💡 Why This Matters: - Improve your AI projects with precise video content analysis. - Utilize the strengths of both embedding generation and vector storage. - Follow a detailed guide to get started quickly. 👉 Check out the tutorial and see how this integration can enhance your AI capabilities: https://lnkd.in/e5RK4P9j Explore the possibilities of advanced video understanding with our step-by-step guide.
2 Kommentare

Wie Kommentar Aktie
Twelve Labs

6,523 followers
2w
Diesen Beitrag melden
In the 55th session of #MultimodalWeekly, we have three Ph.D candidates from Stony Brook University working on long-form video understanding under Michael Ryoo. ✅ Jongwoo Park will introduce LVNet - a video question-answering framework with optimal strategies for key-frame selection and sequence-aware captioning: https://lnkd.in/gEf45TfJ ✅ Kumara Kahatapitiya will bring up LangRepo - a Language Repository for LLMs, that maintains concise and structured information as an interpretable (i.e., all-textual) representation: https://lnkd.in/gVSgqppb ✅ Kanchana Ranasinghe will discuss MVU - an LLM-based framework for solving long video question-answering benchmarks and discover multiple surprising results: https://lnkd.in/grTdS4Mc Register for the webinar here: https://lnkd.in/gJGtscSH 👈
1 Kommentar

Wie Kommentar Aktie
Twelve Labs

6,523 followers
2w
Diesen Beitrag melden
~ New Webinar ~ The recording of #MultimodalWeekly 51 with Jay Chia, Saptarshi Sinha, and Yunhua Zhang is up! Watch here: https://lnkd.in/gq_Z7SdD 📺 They discussed: - Multimodal data lake - Exemplar-based video repetition counting - Low-resource vision challenges for foundation models Enjoy!

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

https://www.youtube.com/

Wie Kommentar Aktie

Jobs durchsuchen

Finanzierung

Twelve Labs 5 total rounds

Letzte Runde

Serie A Jul 4, 2024

US$ 50.0M

Investoren

NVentures New Enterprise Associates + 4 Andere Investoren

Siehe mehr Informationen auf crunchbase

Twelve Labs

Software-Entwicklung

San Francisco, California 6,523 followers

Help developers build programs that can see, listen, and understand the world as we do.

Über uns

Standorte

Employees at Twelve Labs

Sean Barclay

Tiffany Luck

Simran Butalia

Senior Solutions Architect | AI-Driven Workflows & Solutions Expert

Manish Maheshwari

Multimodal AI Product Lead |Dev Experience |Ex - Datadog, Airbnb, AWS

Aktualisierungen

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

https://www.youtube.com/

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

https://www.youtube.com/

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

https://www.youtube.com/

Werden Sie jetzt Mitglied und sehen Sie, was Sie verpassen

Ähnliche Seiten

ElevenLabs

Sona (getsona.com)

Vorlon

Gynger

Backflip

re:cap

WindBorne Systems

Firefly

WitnessAI

Maven AGI

Jobs durchsuchen

Wissenschaftler Jobs

Praktikantenstellen

Direktor Jobs

Ingenieur für maschinelles Lernen Jobs

Manager Jobs

Real Estate Associate jobs

Geographic Information Systems Analyst jobs

Sustainability Consultant jobs

Education Manager jobs

Data Science Specialist jobs

Assoziierte Jobs

Operations Assistant jobs

Manufacturing Manager jobs

Reisebüro Jobs

Senior Associate jobs

Spezialist für Unternehmensentwicklung Jobs

Paralegal jobs

General Manager jobs

Associate Product Manager Stellenangebote

Digital Marketing Director jobs

Finanzierung