Deepgram

Deepgram

Software-Entwicklung

San Francisco, California 15,434 followers

Build with one flexible Voice AI platform – speech-to-text, text-to-speech, and audio intelligence APIs for developers

Über uns

Deepgram is a foundational AI company on a mission to transform human-machine interaction using natural language. We give any developer access to the fastest, most powerful voice AI models including speech-to-text, text-to-speech, and spoken language understanding with just an API call. From transcription to sentiment analysis to voice synthesis, Deepgram is the preferred partner for builders of voice AI applications. Beyond that, developers can: 🔊 Process live-streaming or pre-recorded audio 🗣️ Lightning-fast text-to-speech with various unique, natural-sounding voices 🌎 Accurately transcribe audio in over 30 languages ⚙️ Train custom models for unique use cases 🔑 Access deep NLU with a unified API 💻 Build in any programming language with our SDKs ✅ Deploy on-prem or on DG’s managed cloud 📈 Get scalable GPU infra for training and inference Deepgram is a proud NVIDIA partner and Y Combinator company, and we recently completed a $72M Series B to define the future of AI Speech Understanding, making us the most-funded speech AI company at its stage.

Website
https://deepgram.com
Industrie
Software-Entwicklung
Größe des Unternehmens
51-200 Mitarbeiter
Hauptsitz
San Francisco, Kalifornien
Typ
In Privatbesitz
Gegründet
2015
Spezialitäten
Speech Search, Transcription, Speech Recognition, Audio Understanding, Speech Analytics, Voice Recognition, Artificial Intelligence, Deep Learning, Natural Language Processing, Text-to-speech, Voice Generation, and Conversational AI

Standorte

Employees at Deepgram

Aktualisierungen

  • View organization page for Deepgram, graphic

    15,434 followers

    The wait is finally over, and we're thrilled to present Aura — the first text-to-speech model built for responsive, conversational AI agents and apps. 🗣️ Read the full announcement: https://lnkd.in/g4ra5vMq Over the last year, we've heard our customers' heartache about the current crop of text-to-speech products, citing roadblocks related to speed, cost, reliability, and conversational quality. “Deepgram and Groq share the belief that speed and efficiency are the missing ingredients in unlocking natural AI for daily use by everyone, as evidenced by the recent viral reception to ultra-fast LLMs when made available for the first time. Their voice AI models are prime examples of what can be achieved with the Groq API.” –Jonathan Ross, CEO & Founder of Groq That's where Aura comes in. Designed to handle real-time conversations at scale, developers can create realistic AI agents to support seamless interactions across various cases, from voice ordering systems to customer support. Aura checks all of the boxes: ✅ Lightning-fast speed with less than 250 ms latency ✅ High-quality voices with natural-sounding tone, rhythm, and emotion ✅ Cost-efficient for high-throughput applications Check out our open-source interactive demo: https://lnkd.in/gMJTDWE8 We're eager to see how Aura will fuel the next wave of AI innovation, and can't wait to see what you build!

  • View organization page for Deepgram, graphic

    15,434 followers

    New AI Minds episode drop! 🧠 🎙 Curious about how AI is transforming customer interactions? Derek Wang, Co-founder of Taalk breaks down how their AI is tackling tough communication problems, making customer interactions smoother and more compliant. Plus, don't miss the funny story about how he met his co-founder. 👇 Listen here: https://lnkd.in/gRE5-3-G

    AIMinds #035 | Derek Wang, Co-founder at Taalk | Deepgram

    AIMinds #035 | Derek Wang, Co-founder at Taalk | Deepgram

    deepgram.com

  • View organization page for Deepgram, graphic

    15,434 followers

    Greetings from San Francisco! If you haven’t already, come say hello to us at The AI Conference. 🌉 We’re already showcasing our latest voice AI demos to Silicon Valley, and we’re absolutely thrilled to continue discussing the future of AI with you. 🚀 Don’t miss out on the action. Booth 217 awaits!

  • View organization page for Deepgram, graphic

    15,434 followers

    Get ready to master the toughest parts of building voice AI agents in our virtual workshop with Groq. ⚒ Dive into the mechanics of building responsive voice agents from interruption handling to low-latency function calling, and more. Plus, receive $1,000 in Deepgram credits to jump start your agent development. Sign up with AIAGENT20 for 20% off your ticket! https://lnkd.in/g5YHA_zF 📅 Sept 20th @ 9AM PDT 📍 Virtual

    • From Code to Conversations: Building Responsive Voice AI Agents
  • View organization page for Deepgram, graphic

    15,434 followers

    In the U.S., over $1 trillion is spent on healthcare admin annually, with clinicians spending up to 50% of their day on documentation instead of patient care. AI has the power to automate nearly half of these tasks, saving billions. 💡 Innovators like Chartnote, Sonia (YC W24), and StackAI are taking the healthcare industry by storm with AI-driven medical scribes, QA tools, and healthcare agents, helping clinicians get back to what they do best—caring for patients. And we’re fueling this innovation. Our Nova-2 Medical model is purpose-built for real-time healthcare transcription, capturing everything from symptoms to treatments with unmatched precision and speed. Learn why companies rely on Nova-2 Medical to power their healthcare applications. https://lnkd.in/gGqj4z7S #medteach #healthcareinnovation

    Transforming Healthcare and Delivering ROI with the Nova-2 Medical Transcription API | Deepgram

    Transforming Healthcare and Delivering ROI with the Nova-2 Medical Transcription API | Deepgram

    deepgram.com

  • Deepgram reposted this

    View profile for Lukas Wolf, graphic

    Co-founder at Sonia (YC W24)

    Don’t spend a **** of money on generic voice AI providers. Build it yourself. What are the basic components? At the heart of a voice AI stack is an LLM that engages in the conversation by generating responses. Whether your LLM is a custom RAG or some other fancy architecture is up to you. LLMs operate on text, so you need a second model that transcribes speech into text. There’s a tradeoff between quality and speed, but if you want to build a real-time voice application these days, I don’t see anything else being used than Deepgram. What’s next? You need to take your generated text and synthesize a voice — unless you want to read the responses aloud to your clients. This is NOT what Paul Graham means by saying, “Do things that don’t scale.” There are amazing voice APIs, e.g., ElevenLabs high-quality voices and recently launched Cartesia with BLAZINGLY fast voice generation. But how do you move audio between the client and AI/server? WebRTC is the standard for sending real-time data online. Daily has an amazing WebRTC platform with client libraries that abstract playing audio for you (try that in Swift). So, what’s missing? We covered HOW to generate a response, but you must also decide WHEN to respond. Use a voice activity detection model like Silero that tells if your client is speaking. During silence, you can query a small, fast LLM to evaluate semantically if the client really finished speaking, and your AI should respond. Turn-taking is very domain-specific. At Sonia (YC W24), we’re building an AI therapist, and pace and turn-taking fundamentally differ from customer support or sales agents. To make turn-taking work really well, you want to incorporate the full audio signal, as the spoken word contains important information beyond just the text. Depending on your domain, a custom-built turn-taking model might make sense. Care about latency. Perplexity CEO Aravind Srinivas cares A LOT about latency. Improve latency by hosting models yourself. Baseten has a simple interface for model hosting with a large library of open-source models. But will GPT-4o voice mode come and haunt us? Embrace the progress in the field. It will be hard to beat multimodal model latency as long as you use a similarly complex LLM in your custom stack. But I don’t think latency matters since you get 1-2 second latency building it yourself. However, depending on the application, there will be huge benefits of feeding raw (or preprocessed) audio into the model without having text as the bottleneck modality. Many apps already go beyond voice with video avatars like Tavus. You can expect increased engagement for many applications (e.g. healthcare, education). The space is quite hot right now. Our YC friends at Arini (YC W24) handle phone calls for dentists, and Scritch (YC W24) recently launched AI assistants for vets. My calendar is open if you want to connect or get advice!

    Voice AI Chat - Lukas Wolf

    Voice AI Chat - Lukas Wolf

    calendly.com

  • View organization page for Deepgram, graphic

    15,434 followers

    Super fast voice agent built by the Cerebras Systems team. 😎 Always inspiring to see the innovation from our community of builders!

    View profile for Shayne Parmelee, graphic

    Developer Advocate @ LiveKit

    This is the fastest realtime AI agent stack in the world. LiveKit / Cerebras Systems / Cartesia / Deepgram Because every token is streamed between services + then straight to the end user, there's almost no latency. A lot of people thought we needed Audio -> Audio models to break the 500ms barrier. Cerebras' insanely fast inference speeds and TTFS paired with LiveKit agents and streamed data from Cartesia and Deepgram have shown that's not the case. Try it out here: https://lnkd.in/ey2xF8qw

Ähnliche Seiten

Jobs durchsuchen

Finanzierung