Learn more about Multimodal RAG from our very own Vasudev Lal in the latest DeepLearning.AI course. Andrew Ng explains more about what to expect in the video below. #DeepLearning #LVLM #RAG
We're excited to introduce Multimodal RAG: Chat with Videos, a course created in collaboration with Intel Corporation, and taught by Vasudev Lal, Principal AI Research Scientist at Intel Labs. Build a system that answers grounded responses from video content! In this course, you'll create an interactive chat system using the BridgeTower model, a multimodal transformer developed by Intel and Microsoft Research. In detail: 🔄 Generate joint embeddings from video content and store them in a vector database. 🧩 Build a Retrieval-Augmented Generation (RAG) pipeline to fetch relevant video data. 💬 Use Large Vision-Language Models (LVLMs) to answer questions using both text and image inputs. In this course, you will make API calls to access multimodal models hosted by Prediction Guard on Intel’s cloud. By the end, you'll have the expertise to build systems that can intelligently interact with video content. Enroll for free: https://hubs.la/Q02PvM8Z0