Mixedbread

Mixedbread

Softwareentwicklung

Berlin, Berlin 1.773 Follower:innen

AI for everyone on every device

Info

Your fav. AI bakers

Website
https://www.mixedbread.ai
Branche
Softwareentwicklung
Größe
2–10 Beschäftigte
Hauptsitz
Berlin, Berlin
Kunst
Privatunternehmen
Gegründet
2023

Orte

Beschäftigte von Mixedbread

Aktualisierungen

  • Mixedbread hat dies direkt geteilt

    Profil von Julius Lipp anzeigen, Grafik

    baking @ mixedbread.ai

    Adding dynamic batching to our inference APIs at Mixedbread back in the days was a real pain... We decided to open-source our implementation, so others don't have to go through that anymore! Dynamic batching is the process of automatically grouping multiple incoming inference requests into a single batch for processing. It improves GPU utilization, leading to better energy efficiency and up to 10x higher throughput. Batched works with various model architectures and supports both sync and async execution. Check it out: https://lnkd.in/d2Vpw4Bv

    • dynamic batching
    • Kein Alt-Text für dieses Bild vorhanden
  • Unternehmensseite von Mixedbread anzeigen, Grafik

    1.773 Follower:innen

    BM𝒳: A Freshly Baked Take on BM25 BMX is an eXtension of BM25. It improves the performance of the industry-standard BM25 by adding entropy-weighted similarity. BMX delivers more relevant results without the need for extensive training or computational resources. It even holds its own against embedding models in real world scenarios. You can try it out now through our open-source library Baguetter. 👇 Learn more in the comments👇

    • Kein Alt-Text für dieses Bild vorhanden
  • Unternehmensseite von Mixedbread anzeigen, Grafik

    1.773 Follower:innen

    Open Source Gets DE-licious: Mixedbread x deepset German/English Embeddings Together with deepset we trained a new open-source German/English embedding model. It outperforms domain-specific alternatives in real-world applications and offers 97%+ infrastructure cost savings through binary MRL. Blog: https://lnkd.in/eHPv39yY Model: https://lnkd.in/ehd46iTn Discord: https://lnkd.in/eUWxuE-G PS: Come work with us! https://lnkd.in/eymWDeHz

  • Mixedbread hat dies direkt geteilt

    Unternehmensseite von Mixedbread anzeigen, Grafik

    1.773 Follower:innen

    Follow-up on binary embeddings: 64 bytes per embedding, yee-haw 🤠 We combine the advantages of compressing output dimensions with MRL and the size of each dimension with binary quantization. This allows us to reduce memory usage of our embedding model by more than 98% (64x) while retaining over 90% of model performance. The implications of our findings will be wide-ranging, as this makes performing retrieval over large numbers of embeddings much more economically feasible. Find out more in our blog post: https://lnkd.in/dQeHyMyy Link to model: https://lnkd.in/d9XwcxTR

    64 bytes per embedding, yee-haw 🤠

    64 bytes per embedding, yee-haw 🤠

    mixedbread.ai

  • Unternehmensseite von Mixedbread anzeigen, Grafik

    1.773 Follower:innen

    Follow-up on binary embeddings: 64 bytes per embedding, yee-haw 🤠 We combine the advantages of compressing output dimensions with MRL and the size of each dimension with binary quantization. This allows us to reduce memory usage of our embedding model by more than 98% (64x) while retaining over 90% of model performance. The implications of our findings will be wide-ranging, as this makes performing retrieval over large numbers of embeddings much more economically feasible. Find out more in our blog post: https://lnkd.in/dQeHyMyy Link to model: https://lnkd.in/d9XwcxTR

    64 bytes per embedding, yee-haw 🤠

    64 bytes per embedding, yee-haw 🤠

    mixedbread.ai

  • Mixedbread hat dies direkt geteilt

    Profil von Tom Aarsen anzeigen, Grafik

    🤗 Sentence Transformers, SetFit & NLTK maintainer, MLE @ Hugging Face

    🏇 Embedding Quantization is here! 25x speedup in retrieval; 32x reduction in memory usage; 4x reduction in disk space; 99.3% preservation of performance🤯 The sky is the limit. Hugging Face and mixedbread.ai introduce Binary and Scalar Embedding Quantization: two embedding post-processing techniques that reduce the size of the embedding with 32x and 4x, respectively. All while preserving ~96.4% and 99.3% of retrieval performance, respectively 📈! This additionally allows for much less memory usage & disk space, which in turn allows for you to use much(!) smaller and cheaper cloud instances for your retrieval. To prove this, we have embedded **all** of the English Wikipedia and allow you to search through it (that's 41,000,000 embeddings!). Search all of Wikipedia in our demo: https://lnkd.in/ebNSHA4y Additionally, learn more about Embedding Quantization in our blogpost: https://lnkd.in/eEmC9b54 The future of search is int8 & binary. #python #nlp #retrieval #rag #huggingface

    Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

    Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

    huggingface.co

  • Mixedbread hat dies direkt geteilt

    Profil von Philipp Schmid anzeigen, Grafik

    Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

    Introducing embedding quantization!💥 A new technique to quantize embeddings to achieve up to 45x faster retrieval while keeping 96% accuracy on open Embedding Models. This will help scale RAG Application! 🚀 𝗧𝗟;𝗗𝗥: 📝 🔥 Binary quantization: 32x less storage & up to 45x faster retrieval, retaining ~96% performance with Rescoring ✨ int8 quantization: 4x less memory & up to 4x faster retrieval, retaining 99.3% with restoring 🔎 Rescoring: Use float32 query with int8/binary documents to improve retrieval 💰 For 250M embeddings, Binary MxBai needs 29GB memory vs 953GB for float32 ⚡ Mean speedup: 24.76x for binary, 3.66x for int8 quantization 🤗 Available in Sentence Transformers 🤗 Mixedbread models on Hugging Face 👉 https://lnkd.in/eciXb2bq Kudos to Tom Aarsen and mixedbread.ai for this incredible advancement! Embedding quantization can unlock exciting new possibilities for efficient and cost-effective retrieval at scale. 🚀

    • Kein Alt-Text für dieses Bild vorhanden
  • Unternehmensseite von Mixedbread anzeigen, Grafik

    1.773 Follower:innen

    Hugging Face 🤝 mixedbread.ai Imagine you walk into the bakery and you get served 25x faster while paying 32x less. 🍞 Introducing embedding quantization with mixedbread's latest flagship embedding model. Why This Matters: 🏎 Lightning-Fast Retrieval: Like slicing through warm butter, our approach speeds up the retrieval process by up to 25x. 📉 Cost and Space Savings: With quantization, you can reduce the memory, disk space, and overall cost associated with traditional retrieval methods by up to 32x. It's like fitting an entire bakery into a bread box. 🚀 Maintained Performance: ~96.45% for binary quantization and ~99.3% for int8. And the best thing. It's open-source. No vendor lock-in. 🤯 Check out how in the blog post: https://lnkd.in/dTsCvFAU Try it out by searching Wikipedia:  https://lnkd.in/dMMWuRS4 Our model: https://lnkd.in/d9XwcxTR

    Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

    Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

    huggingface.co

  • Mixedbread hat dies direkt geteilt

    Profil von Tom Aarsen anzeigen, Grafik

    🤗 Sentence Transformers, SetFit & NLTK maintainer, MLE @ Hugging Face

    🎉Today, the 5000th Sentence Transformer model was uploaded to Hugging Face! Embedding models are the Swiss army knife of NLP, so it's far from surprising to see that they're still actively being trained. Some resources to get started: - All Sentence Transformer models on the Hugging Face Hub: https://lnkd.in/eZ2H5EAC - Sentence Transformers documentation: https://sbert.net/ - Sentence Transformer models via LlamaIndex: https://lnkd.in/eA_a2qkR - Sentence Transformer models via LangChain: https://lnkd.in/ekzPJ9HG - Sentence Transformer models via Haystack by deepset: https://lnkd.in/eaJjG258 - Massive Text Embedding Benchmark (MTEB) leaderboard: https://lnkd.in/gdJ3svZF The embedding space is more active than ever, with exciting new releases every single month, so be sure to keep your eye on the trending Sentence Transformer models to stay up to date. On a personal note, I'm quite curious if you've ever used Sentence Transformers via a third party Python module, like a RAG framework or vector database. Do let me know, as I'm quite interested in further integrations 🤗! #python #nlp #embeddings #huggingface #sentencetransformers

    • Kein Alt-Text für dieses Bild vorhanden
  • Mixedbread hat dies direkt geteilt

    Profil von Stefano Fiorucci anzeigen, Grafik

    Contributing to Haystack, the LLM Framework 🏗️ | NLP Engineer, Craftsman and Explorer 🧭

    🧭 𝐂𝐡𝐨𝐨𝐬𝐢𝐧𝐠 𝐚𝐧 𝐞𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧 for open models These days we have many good open embedding models - think for example of the models released by mixedbread.ai a few days ago. There are also several libraries to use/serve them. Navigating this landscape can be complex, so let's explore together (thanks to Luca Santuari for the question). 👇 ⭐ 𝐒𝐞𝐧𝐭𝐞𝐧𝐜𝐞 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 A cornerstone library for computing text embeddings. Most of the embedding models available on Hugging Face are compatible with it. Originally developed by Ubiquitous Knowledge Processing (UKP) Lab and Nils Reimers, it was recently revamped by HuggingFace and is maintained by Tom Aarsen. 💙 Python library, depends on PyTorch, may not be the most efficient and fast. Runs best on 𝐆𝐏𝐔. 🚀 𝐇𝐅 𝐓𝐞𝐱𝐭 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 Toolkit for deploying and serving open-source text embedding models. Very fast and efficient: based on the Rust candle framework. Runs via docker and supports both 𝐂𝐏𝐔 and 𝐆𝐏𝐔. Compatible with several Sentence Transformers model architectures. Not fully liberal open source license. 🤗 𝐇𝐅 𝐎𝐩𝐭𝐢𝐦𝐮𝐦 This project is an extension of Transformers that provides a set of performance optimization tools to train and run models with maximum efficiency. It supports different specialized hardware options from various vendors (Nvidia, Intel...) and the cross-platform ONNX runtime. This toolkit can also be used to compute embeddings with different and efficient 𝐂𝐏𝐔 and 𝐆𝐏𝐔 options. ⚡️ 𝐅𝐚𝐬𝐭𝐄𝐦𝐛𝐞𝐝 Originally developed by Nirant Kasliwal and maintained by Qdrant, this library provides fast and efficient embedding generation. Easy to use. It is based on the ONNX runtime and runs on 𝐂𝐏𝐔. Supports a limited but growing selection of models. 🦙 𝐎𝐥𝐥𝐚𝐦𝐚 Very popular library for LLM serving on standard machines, using the GGUF quantized format. Recently improved support for embedding models. It uses 𝐂𝐏𝐔 + 𝐆𝐏𝐔 if available. The embedding functionality is still immature compared to previous solutions, but it might make sense if you already use it for Generative Language Models. You know what? 😉 All of these solutions are supported by the #haystack LLM framework 👉 https://lnkd.in/d8thsQ9a A special mention goes to ♾️ Infinity, by Michael Feil, which I have not tried yet, but looks great! 💬 Have I missed significant solutions? What is your experience with these libraries? Let me know in the comments! #nlp #naturallanguageprocessing #transformers #llm

    Embedders

    docs.haystack.deepset.ai

Ähnliche Seiten

Finanzierung

Mixedbread Insgesamt 1 Finanzierungsrunde

Letzte Runde

Pre-Seed

855.000,00 $

Weitere Informationen auf Crunchbase