Stars
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, loca…
An accurate GUI element detection approach based on old-fashioned CV algorithms [Upgraded on 5/July/2021]
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
A UI-Focused Agent for Windows OS Interaction.
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity.
A programming framework for agentic AI 🤖
Visualizer for neural network, deep learning and machine learning models
The official Python library for the OpenAI API
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
a state-of-the-art-level open visual language model | 多模态预训练模型
Generative Agents: Interactive Simulacra of Human Behavior
[ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
Universal LLM Deployment Engine with ML Compilation
A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"
Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.
Windows GUI Automation with Python (based on text properties)