In the 51st session of #MultimodalWeekly, we have three exciting presentations from startup founder and researchers working in Multimodal AI. ✅ Jay Chia, the co-founder of Eventual, will share the DIY multimodal data lake with Daft data frames. -> Check out daft: https://www.getdaft.io/ ✅ Saptarshi Sinha, a Ph.D. researcher at the University of Bristol, will present his work "Every Shot Counts: Using Exemplars for Repetition Counting in Videos." -> Read the paper: https://lnkd.in/gUdY2NCh ✅ Yunhua Zhang, a Ph.D. candidate at UvA, will present her work "Low-Resource Vision Challenges for Foundation Models." -> Read the paper: https://lnkd.in/gXSe_ed5 Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord to connect with the speakers: https://lnkd.in/gRt4GdDx 🤝
Twelve Labs
Software Development
San Francisco, California 6,029 followers
Help developers build programs that can see, listen, and understand the world as we do.
About us
Helping developers build programs that can see, hear, and understand the world as we do by giving them the world's most powerful video-understanding infrastructure.
- Website
-
http://www.twelvelabs.io
External link for Twelve Labs
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2021
Locations
-
Primary
555 Mission St
San Francisco, California 94105, US
Employees at Twelve Labs
Updates
-
~ New Webinar ~ The recording of #MultimodalWeekly 49 with Jiwoo Hong from KAIST AI and Associate Professor Lei Huang and Baichuan Zhou from Beihang University is up! 📺 Watch here: https://lnkd.in/gjtn4ZQX They discussed: - Motivation for ORPO: RLHF with PPO, DPO, and SFT in alignment - Experimental results of ORPO in single-turn and multi-turn instruction following - Efficiency and scalability of ORPO - The opportunity for small-scale LMM - How to merge modality (vision) in small LMM? - TinyLLaVA: From the model, data, and training perspectives Join our Discord community: discord.gg/Sh6BRfakJa 🤝
Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49
https://www.youtube.com/
-
~ New Webinar ~ The recording of #MultimodalWeekly 48 with Letian (Max) Fu from University of California, Berkeley and Bo Zhao from Beijing Academy of Artificial Intelligence(BAAI) is up! 📺 Watch here: https://lnkd.in/gX_iSzuD They discussed: - Touch as a sensing modality is missing in multimodal models - Touch-vision-language dataset - TVL-Tactile Encoder & TVL-LLaMA - SVIT: Scaling Up Visual Instruction Tuning - Bunny: A concise open-source lightweight multimodal LLM - M3D: Advancing 3D medical image analysis with multimodal LLMs - MLVU: A comprehensive benchmark for multi-task long video understanding Join our Discord community: discord.gg/Sh6BRfakJa 🤝
Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48
https://www.youtube.com/
-
In the 50th session of #MultimodalWeekly, we have two exciting presentations from startup founders building real-world products for Multimodal AI applications. ✅ Jesse N. Clark, the Co-Founder and CTO of Marqo AI, will discuss generalized contrastive learning for multimodal retrieval and ranking. They generalize the popular CLIP training method to accommodate any number of text and images when representing documents and encode relevance (or rank) to provide better first-stage retrieval. 📄 ✅ Alexandre Berkovic, the Co-Founder and CEO of Adorno AI, will dive into how video and audio understanding technologies from Twelve Labs and Adorno AI are transforming video production. 📻 Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord community: https://lnkd.in/gRt4GdDx
-
-
Twelve Labs will be attending AWS Summit NY on July 10! Connect with our team to learn how you can streamline all your video-related workflows with our multimodal AI models and discuss the latest in tech. Don’t hesitate to say hello when you spot any of our team members Jae Lee Soyoung Lee Maninder Saini Andy Vaughan. We can’t wait to see everyone there! #AWSSummit #AWSNY
-
We got a new exciting collaboration with the Phyllo team to transform video insights on social media 😉 🌟 Why This Matters 🌟 With social media shifting to video, extracting insights is crucial. Video posts get up to 10 times more engagement and 74% of users take action after viewing a brand's video. 🔍The Phyllo and Twelve Labs Advantage🔍 Phyllo: - Customizable searches across 15+ social media platforms. - Cost-effective social data access. Twelve Labs: - Foundation models that analyze videos through visual, audio, and text modalities. - Offers semantic video search, zero-shot classification, video-to-text generation, and multimodal video embeddings. 🌐 Innovative Use Cases 🌐 1 - Insights for Videos: Get detailed answers, summaries, and sentiment analysis. 2 - Product Development: Analyze product usage in social videos. 3 - Byte-Sized Segments: Break long videos into short clips for Instagram and TikTok. 4 - Influencer Insights: Identify influencers using specific products and their impact. Read more about our collaboration here: https://lnkd.in/gC9Zjmgp 👀
-
-
~ New Webinar ~ The video recording of #MultimodalWeekly 47 with Benjamin Muller, Tu Anh NGUYEN, and Bokai Yu from AI at Meta is up! 📺 Watch here: https://lnkd.in/guZ5C_mU 👀 They discussed: - Challenges of expressive speech generation - SpiRit-LM combines TextLM and SpeechLM - SpiRit-LM training recipe and generation samples - Evaluation: zero-shot, few-shot, and text-speech sentiment-preservation benchmark - Can we observe the speech-text alignment? Join our Discord community: discord.gg/Sh6BRfakJa 🤝
SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47
https://www.youtube.com/
-
🏇 We are excited to announce the launch of Jockey: A Conversational Video Agent powered by Twelve Labs APIs and LangGraph from LangChain! Here's why developers should dive into Jockey: 👇 1 - Advanced Video Understanding: Jockey utilizes Twelve Labs' state-of-the-art video foundation models to extract rich insights from video content, offering capabilities like video search, classification, summarization, and more. 📽 2 - Flexible and Scalable Framework: Built on LangGraph, Jockey provides unparalleled control over the flow of code, prompts, and LLM calls, facilitating robust human-agent collaboration and ensuring reliable performance. ⛓ 3 - Efficient and Precise Architecture: Jockey's architecture includes key components such as the Supervisor, the Planner, and specialized Workers that handle tasks like video search, text generation, and editing, ensuring optimal token usage and accurate node responses. 🏛 4 - Customizable and Extensible: Jockey's modular design allows for easy customization and extension. Developers can modify prompts, extend state management, or add new workers to tailor Jockey to specific needs, making it a versatile foundation for advanced video AI applications. 🤟 Full blog post here: https://lnkd.in/gbudqhKM 😎
-
-
~ New Webinar ~ The video recording of #MultimodalWeekly 46 with Anoop Thomas from EMAM, Inc. is up! 📺 Watch here: https://lnkd.in/gZnWiYNS 👀 He discussed: - eMAM provides an end-to-end media workflow - eMAM's technology partners - eMAM's architecture and deployment options - Live demo of Twelve Labs models capabilities in eMAM product Join our Discord community: discord.gg/Sh6BRfakJa 🤝
Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46
https://www.youtube.com/
-
Exciting times at Twelve Labs! Our team just returned from #CVPR2024 in Seattle last week, and what an incredible experience it was! 🌌 CVPR did not disappoint this year. We immersed ourselves in the latest advancements in video understanding and multimodal AI - areas at the core of our mission at Twelve Labs. Some highlights: 🌟 • Engaging discussions on cutting-edge research in multimodal foundation models • Insights into the latest trends in video embedding and retrieval • Connecting with brilliant minds pushing the boundaries of video-language modeling 🔬 Calling all ML Researchers! 🔬 Are you passionate about advancing the field of video understanding and multimodal AI? We're expanding our ML Research team and looking for talented individuals to join us on this exciting journey. Open Roles: • ML Research Scientist • Research Internships If you're ready to tackle challenging problems in video foundation models and multimodal LLMs, we want to hear from you! Learn more and apply: https://lnkd.in/ggc-mYa8 ◀ Aiden L. Hyojun Go Ryan Scott Kate Chen Sunny Hien Nguyen James Le Jenny Jayoung Ahn Minjoon Seo
-