Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Zhuge, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07317  [pdf, other

    cs.HC

    Connecting Dreams with Visual Brainstorming Instruction

    Authors: Yasheng Sun, Bohan Li, Mingchen Zhuge, Deng-Ping Fan, Salman Khan, Fahad Shahbaz Khan, Hideki Koike

    Abstract: Recent breakthroughs in understanding the human brain have revealed its impressive ability to efficiently process and interpret human thoughts, opening up possibilities for intervening in brain signals. In this paper, we aim to develop a straightforward framework that uses other modalities, such as natural language, to translate the original dreamland. We present DreamConnect, employing a dual-str… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2407.16931  [pdf, other

    cs.CL

    ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering

    Authors: Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Juexiao Zhou, Haoyang Li, Mingchen Zhuge, Jürgen Schmidhuber, Xin Gao, Xiangliang Zhang

    Abstract: Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce S… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 14 pages

  3. arXiv:2407.16741  [pdf, other

    cs.SE cs.AI cs.CL

    OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

    Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig

    Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenD… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/OpenDevin/OpenDevin

  4. arXiv:2407.12679  [pdf, other

    cs.CV

    Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

    Authors: Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Mingchen Zhuge, Jian Ding, Deyao Zhu, Jürgen Schmidhuber, Mohamed Elhoseiny

    Abstract: Most current LLM-based models for video understanding can process videos within minutes. However, they struggle with lengthy videos due to challenges such as "noise and redundancy", as well as "memory and computation" constraints. In this paper, we present Goldfish, a methodology tailored for comprehending videos of arbitrary lengths. We also introduce the TVQA-long benchmark, specifically designe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 25 pages, 11 figures, accepted by ECCV 2024

  5. arXiv:2402.18679  [pdf, other

    cs.AI cs.LG

    Data Interpreter: An LLM Agent For Data Science

    Authors: Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu

    Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de… ▽ More

    Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  6. arXiv:2402.16823  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Language Agents as Optimizable Graphs

    Authors: Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber

    Abstract: Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs c… ▽ More

    Submitted 22 August, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Project Website: https://gptswarm.org ; Github Repo: https://github.com/metauto-ai/gptswarm . In Forty-first International Conference on Machine Learning (2024)

  7. arXiv:2308.07795  [pdf, other

    cs.CV cs.AI

    Learning to Identify Critical States for Reinforcement Learning from Videos

    Authors: Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber

    Abstract: Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first lear… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: This paper was accepted to ICCV23

  8. arXiv:2308.00352  [pdf, other

    cs.AI cs.MA

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    Authors: Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, Jürgen Schmidhuber

    Abstract: Remarkable progress has been made on automated problem solving through societies of agents based on large language models (LLMs). Existing LLM-based multi-agent systems can already solve simple dialogue tasks. Solutions to more complex tasks, however, are complicated through logic inconsistencies due to cascading hallucinations caused by naively chaining LLMs. Here we introduce MetaGPT, an innovat… ▽ More

    Submitted 6 November, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

  9. arXiv:2305.17066  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.MA

    Mindstorms in Natural Language-Based Societies of Mind

    Authors: Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem , et al. (1 additional authors not shown)

    Abstract: Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overco… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 9 pages in main text + 7 pages of references + 38 pages of appendices, 14 figures in main text + 13 in appendices, 7 tables in appendices

    MSC Class: 68T07 ACM Class: I.2.6; I.2.11

  10. arXiv:2302.00952  [pdf, other

    cs.CV cs.AI

    QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning

    Authors: Weimin Shi, Mingchen Zhuge, Dehong Gao, Zhong Zhou, Ming-Ming Cheng, Deng-Ping Fan

    Abstract: Daily images may convey abstract meanings that require us to memorize and infer profound information from them. To encourage such human-like reasoning, in this work, we teach machines to predict where and when it was taken rather than performing basic tasks like traditional segmentation or classification. Inspired by Horn's QR theory, we designed a novel QR-CLIP model consisting of two components:… ▽ More

    Submitted 28 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Technical Report. Github: https://github.com/Shi-Wm/QR-CLIP

  11. Masked Vision-Language Transformer in Fashion

    Authors: Ge-Peng Ji, Mingcheng Zhuge, Dehong Gao, Deng-Ping Fan, Christos Sakaridis, Luc Van Gool

    Abstract: We present a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation. Technically, we simply utilize vision transformer architecture for replacing the BERT in the pre-training model, making MVLT the first end-to-end framework for the fashion domain. Besides, we designed masked image reconstruction (MIR) for a fine-grained understanding of fashion. MVLT is an exten… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted by Machine Intelligence Research (2023)

    Journal ref: Machine Intelligence Research. 20, 421-434 (2023)

  12. arXiv:2203.03990  [pdf, other

    cs.CV

    Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

    Authors: Jingfei Xia, Mingchen Zhuge, Tiantian Geng, Shun Fan, Yuantai Wei, Zhenyu He, Feng Zheng

    Abstract: Figure skating scoring is challenging because it requires judging the technical moves of the players as well as their coordination with the background music. Most learning-based methods cannot solve it well for two reasons: 1) each move in figure skating changes quickly, hence simply applying traditional frame sampling will lose a lot of valuable information, especially in 3 to 5 minutes long vide… ▽ More

    Submitted 17 December, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Our code is available at https://github.com/AndyFrancesco29/Audio-Visual-Figure-Skating

  13. Fast Camouflaged Object Detection via Edge-based Reversible Re-calibration Network

    Authors: Ge-Peng Ji, Lei Zhu, Mingchen Zhuge, Keren Fu

    Abstract: Camouflaged Object Detection (COD) aims to detect objects with similar patterns (e.g., texture, intensity, colour, etc) to their surroundings, and recently has attracted growing research interest. As camouflaged objects often present very ambiguous boundaries, how to determine object locations as well as their weak boundaries is challenging and also the key to this task. Inspired by the biological… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 35 pages, 7 figures, 5 tables (Accepted by Pattern Recognition 2022)

    Journal ref: Pattern Recognition 123 (2022): 108414

  14. arXiv:2103.16110  [pdf, other

    cs.CV

    Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

    Authors: Mingchen Zhuge, Dehong Gao, Deng-Ping Fan, Linbo Jin, Ben Chen, Haoming Zhou, Minghui Qiu, Ling Shao

    Abstract: We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers. In contrast to random masking strategy of recent VL models, we design alignment guided masking to jointly focus more on image-text semantic relations. To this end, we carry out five novel tasks, i.e., rotation, jigsaw,… ▽ More

    Submitted 15 April, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: CVPR2021 Accepted. Code: https://github.com/mczhuge/Kaleido-BERT

  15. arXiv:2101.07663  [pdf, other

    cs.CV

    Salient Object Detection via Integrity Learning

    Authors: Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, Ling Shao

    Abstract: Although current salient object detection (SOD) works have achieved significant progress, they are limited when it comes to the integrity of the predicted salient regions. We define the concept of integrity at both a micro and macro level. Specifically, at the micro level, the model should highlight all parts that belong to a certain salient object. Meanwhile, at the macro level, the model needs t… ▽ More

    Submitted 13 June, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

    Comments: TPAMI accepted

  16. arXiv:2101.05687  [pdf, other

    cs.CV

    Towards Accurate Camouflaged Object Detection with Mixture Convolution and Interactive Fusion

    Authors: Geng Chen, Xinrui Chen, Bo Dong, Mingchen Zhuge, Yongxiong Wang, Hongbo Bi, Jian Chen, Peng Wang, Yanning Zhang

    Abstract: Camouflaged object detection (COD), which aims to identify the objects that conceal themselves into the surroundings, has recently drawn increasing research efforts in the field of computer vision. In practice, the success of deep learning based COD is mainly determined by two key factors, including (i) A significantly large receptive field, which provides rich context information, and (ii) An eff… ▽ More

    Submitted 19 July, 2024; v1 submitted 14 January, 2021; originally announced January 2021.