Explosion

Explosion

Software Development

Developer tools and tailored solutions for AI and Natural Language Processing. Makers of spaCy and Prodigy.

About us

Explosion is a software company specializing in developer tools and tailored solutions for Artificial Intelligence and Natural Language Processing. We're the makers of spaCy, one of the leading open-source libraries for Natural Language Processing and Prodigy, a modern annotation tool for creating training data for machine learning models.

Website
https://explosion.ai
Industry
Software Development
Company size
11-50 employees
Headquarters
Berlin
Type
Privately Held
Founded
2016
Specialties
Artificial Intelligence, Natural Language Processing, Machine Learning, Machine Teaching, and Consulting

Locations

Employees at Explosion

Updates

  • Explosion reposted this

    View profile for Daniel Dominguez, graphic

    Managing Partner at SamXLabs - InfoQ Editor

    Check out my latest InfoQ w/ Ines Montani about the transformative power of open-source in AI! From democratizing tech to creating task-specific models, discover how open-source is shaping the future of AI and ensuring transparency, privacy, and innovation. #AI #OpenSource Read more here: https://lnkd.in/e477irds

    The AI Revolution Will Not Be Monopolized

    The AI Revolution Will Not Be Monopolized

    infoq.com

  • Explosion reposted this

    View profile for Ines Montani, graphic

    Founder at Explosion (spaCy, Prodigy)

    10 years ago today Matthew Honnibal pushed the first commit to spaCy 🎉 Since then, the library has evolved as the field moved forward, but also stayed true to its core mission: industrial-strength #NLP and bringing structure to unstructured text. It's not always been easy building an open-source company while staying as independent and self-sufficient as possible and without compromising on our vision. There was a lot of trial and error and exploring new paths. I'm incredibly grateful for the supportive open-source community, the amazing team that helped us work on the library over the years and developers across so many different industries putting their trust in our stack and building on top of it. Thank you all 💙

    • No alternative text description for this image
  • Explosion reposted this

    View profile for Allison Pike, graphic

    Co-founder at Infield

    This week for Once a Maintainer we spoke with NLP expert Sofie Van Landeghem, core maintainer of the popular open source NLP library spaCy. Sofie shared her thoughts on the evolution of NLP as applied to text mining for industry, how spaCy manages its roadmap, and why dependency management in python is so difficult. https://lnkd.in/eW55VZbe

    Once a Maintainer: Sofie Van Landeghem

    Once a Maintainer: Sofie Van Landeghem

    onceamaintainer.substack.com

  • Explosion reposted this

    View profile for Ines Montani, graphic

    Founder at Explosion (spaCy, Prodigy)

    ⚗️ New blog post: A practical guide to human-in-the-loop distillation How to use state-of-the-art #LLMs in real-world applications and distill their knowledge into smaller and faster components you can run and maintain in-house. Summary below 👇 https://lnkd.in/eKAASN9H 1️⃣ Software in industry It’s important to keep in mind that AI development is still software development. We need modular, transparent, explainable, data-private, reliable & affordable solutions. This challenges black-box models and third-party APIs. But we can change that! 2️⃣ Strategies for applied #NLP Many real-world problems need structured data & knowledge about language and the world. LLMs are very good at this. But in-context learning doesn't mean transfer learning is obsolete – it's very competitive! Even more so if we control the data. 3️⃣ LLMs for predictive tasks With in-context learning, we start with a prompt template & corresponding parser to turn raw text into task-specific structured data. We can then use that output as training data with a human in the loop to correct errors and do better than the LLM. 4️⃣ Close gap between prototype and production Many projects face the prototype plateau and never make it to production. This is often a workflow problem. We should standardize inputs and outputs, have a robust evaluation, assess utility (not just accuracy) and work iteratively. 5️⃣ Structured data spacy-llm provides LLM-powered components for tasks like NER or relation extraction and the same structured output during prototyping and production – whether you ship the LLM pipeline or replace components with distilled models. https://lnkd.in/e4mKebgH 6️⃣ Human-in-the-loop distillation The goal is to create gold-standard data by extracting only the information we’re interested in, until the accuracy of the task-specific model exceeds the zero-/few-shot LLM baseline. Prodigy can help streamline this! https://prodi.gy 7️⃣ Case studies 1. https://lnkd.in/e4F_p4id 2. https://lnkd.in/ecBCC2Gh 8️⃣ Think of it as a refactoring process Just like with software, we need to break down larger problems and balance trade-offs between complexity and quality. We should also reassess dependencies: Can we replace larger models? Can we move a dependency from runtime to development? 9️⃣ Making problems easier You're allowed to make problems easier! This isn't university or a contest to solve the most difficult problems. Less operational complexity means less can go wrong. But getting there isn't trivial and a skill that develops through experience. 🔄 Conclusion In many real-world applications, AI models are going to be one function in a larger system. Refactoring & iteration is important. The right tools can get you past the prototype plateau. And there's no need to compromise on development best practices & data privacy!

    • No alternative text description for this image
  • Explosion reposted this

    View profile for Sofie Van Landeghem, graphic

    NLP and ML expert and freelancer at OxyKodit

    Even with today's impressive zero-shot LLM capabilities, the success of any NLP project can be predicted by the quality of the data it's built on. First and foremost, you need a representative evaluation data set to measure progress and performance throughout the development of your NLP pipeline. Further, once you get to production, you'll need to balance accuracy versus a whole range of other performance measures, such as reliability, reproducability, interpretability, inference speed and compute power - to name just a few. You might just find that training a smaller, specialized supervised model is more cost efficient and robust than running a black-box LLM. Either way: data should be front and center throughout the development of your NLP pipeline. But as we all know: bias can creep in. I urge any data scientist to not just look at the performance numbers, but actually dig into the data itself, understand how it was selected for processing in the first place, and whether the data you've collected is actually representative of the use-cases you'll want to apply your model on. This may seem like a given, but in many cases there's actually a mismatch between the two, and assumptions about data and data processing pipelines are often proven false halfway through the project. Bias can also creep into your evaluation. You might be artificially boosting your numbers by accidentally leaking information from training to test -- this happens in more ways that you can imagine! Setting up an "extrinsic" evaluation will also help you better understand the final target of your NLP pipeline and how it should integrate with downstream requirements. This allows you to step back from the nitty-gritty details of training your ML/NLP model, and keep the bigger picture in mind while iterating over your data model, NLP algorithms and overall solution. At PyData London, I talked through some use-cases, inspired by various consulting projects from the last decade, and compiled a list of recommendations for running an ML project - focusing mostly on data and evaluation. Here's my (biased) list of recommendations, taken from 15+ years of experience in the field: - Avoid selection bias by formalizing the selection procedure - Create deterministic, document-level train/dev/test splits - Carefully design the data model / label scheme - Write up detailed data guidelines - Set up a meaningful extrinsic evaluation - Look at inter-annotator agreement stats and plot a learning curve - Apply a preliminary model back to the training data - Manually inspect gold annotations and incorrect predictions - Make sure you're climbing the right hill - Data quality should be front and center! More details and explanations are provided in the full presentation which you can watch over at https://lnkd.in/e42g3KXi. The slide deck is over at https://lnkd.in/e4yqfVCj. Happy data mining!

    • Picture of Sofie giving a presentation at PyData London. The slide is a summary slide containing all the recommendations for running an ML project - focusing on best practices for data and evaluation.
  • Explosion reposted this

    View profile for Ines Montani, graphic

    Founder at Explosion (spaCy, Prodigy)

    Here's my PyData London talk "Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation". Including: ⚗️ how to go from #LLMs to distilled task-specific models 🌎 real-world case studies with very cool results 📉 how to avoid the "prototype plateau" when building AI systems ✊ why you don't have to compromise on development best practices & data privacy Slides: https://lnkd.in/gEHqb_ft Video: https://lnkd.in/gQsnp2Mk

    • No alternative text description for this image
  • Explosion reposted this

    View profile for Patrick Arnecke, graphic

    Data Scientist AI + Machine Learning, Kanton Zürich. Senior Project Manager. Fachexperte für digitale Produktentwicklung und Technologie.

    Easily Simplify Complex Texts to #Plain #Language with #LLMs? We have just open-sourced a prototype app that simplifies complex administrative communication into drafts in Plain Language («Einfache Sprache» or «Leichte Sprache» in German). https://lnkd.in/eDJ9uKpb Several dozen employees from the cantonal administration in Zurich have successfully tested the app over the past few months with hundreds of text examples, yielding very promising results. To quantitatively assess these results, we developed an understandability index to measure the – you guessed it – understandability of the texts. The app is currently set up for German but can be easily adapted to other languages. Some technical notes ⚡️ The app is built with Streamlit, that I personally use a lot and highly recommend. Streamlit is super simple and light weight, yet very powerful and flexible. It is very well documented and has an active community. Streamlit has helped us tremendously to iterate and try out numerous functions and UI layouts extremely quickly and efficiently. ⚡️For all things NLP we rely on the excellent spaCy library from Explosion, given its distinctive reliability, enormous flexibility, and super comprehensive documentation. ⚡️Additionally, the maybe lesser known #TextDescriptives deserves special mention. This package makes it super easy to extract dozens of useful text metrics, crucial for setting up our understandability index. The documentation is well done, and the creators are very responsive on GitHub: https://lnkd.in/epT2yPCJ

    GitHub - machinelearningZH/simply-simplify-language: Use machine learning to make your institutional communication more understandable and inclusive.

    GitHub - machinelearningZH/simply-simplify-language: Use machine learning to make your institutional communication more understandable and inclusive.

    github.com

  • View organization page for Explosion, graphic

    15,277 followers

    📝 Out now: We’ve talked to Christopher Ewen at S&P Global about how their team shipped impressive #NLP pipelines for real-time commodities trading insights in a high-security environment, with #LLMs in the loop & a 10× speed-up of their data workflows! https://lnkd.in/ekGYG3FN --- Follow Explosion for updates: https://lnkd.in/e4MazbgB We offer NLP services, too! ✨ – https://lnkd.in/e7_s6vKK Subscribe to our newsletter and receive all our highlights in a compact format – http://eepurl.com/hZyFbf Prodigy: a modern scriptable annotation tool for machine learning – https://prodi.gy

    • No alternative text description for this image
  • Explosion reposted this

    View profile for Ines Montani, graphic

    Founder at Explosion (spaCy, Prodigy)

    Always a pleasure being back at PyData London! 💂🐍 Highlights include: ▪️ Rebecca Bilbro, PhD presenting a retrospective of 15 years in data science, mistakes made, good and bad ideas and why many projects never made it to production ▪️ giving my talk on taking #LLMs out of the black box and practical human-in-the-loop distillation, plus great discussions with the community – and so refreshing to hear from many developers who came to similar conclusions independently in their work ▪️ catching up with Christopher Ewen, whose team shipped impressively small and fast spaCy pipelines with human-in-the-loop distillation – stay tuned for the case study 👀 ▪️ visiting the S&P Global office and seeing commodities trading and price reporting in real life ▪️ Tania Allard, PhD delivering a great keynote on building and sustaining successful open-source projects – pretty much every slide had me nodding along and the talk perfectly summed up our #OSS goals and what we strive for when we build tools ▪️ Sofie Van Landeghem talking about data, structural biases, evaluation and common pitfalls in applied #ML projects, and how to avoid them – I'm obviously biased, too, but SO many good points and real-world examples! ▪️ John Sandall getting his violin out to demonstrate ML-generated clusters of folk music 🎻 ▪️ chatting to so many new people and old conference friends about #GenAI and applied NLP in the post-hype hangover world and planning some new exciting case studies with spaCy and Prodigy I'll hopefully get to write up soon Will share my slides and more resources shortly! Also thanks to Lynn Cherny for the great photo of my slides ✨

    • No alternative text description for this image
    • No alternative text description for this image

Similar pages

Browse jobs

Funding

Explosion 1 total round

Last Round

Series A

US$ 6.0M

Investors

SignalFire
See more info on crunchbase