Zum Hauptinhalt springen

Showing 1–34 of 34 results for author: Xin, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08152  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

    Authors: Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  2. arXiv:2406.11131  [pdf, other

    cs.CL cs.AI cs.DB

    Are Large Language Models a Good Replacement of Taxonomies?

    Authors: Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen

    Abstract: Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by VLDB 2024

  3. arXiv:2406.09467  [pdf, other

    cs.HC

    "I see it as a wellspring for my positive and upward journey in life.": Understanding Current Practices of Assistive Technology's Customized Modification in China

    Authors: Kexin Yang, Junyi Wu, Haokun Xin, Jiangtao Gong

    Abstract: Due to the significant differences in physical conditions and living environments of people with disabilities, standardized assistive technologies (ATs) often fail to meet their needs. Modified AT, especially DIY (Do It Yourself) ATs, are a popular solution in many high-income countries, but there is a lack of documentation for low- and middle-income areas, especially in China, where the culture o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    MSC Class: H.5.2

    Journal ref: CSCW2024

  4. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  5. arXiv:2406.00318  [pdf, other

    cs.LG cs.CL cs.IR

    KGLink: A column type annotation method that combines knowledge graph and pre-trained language model

    Authors: Yubo Wang, Hao Xin, Lei Chen

    Abstract: The semantic annotation of tabular data plays a crucial role in various downstream tasks. Previous research has proposed knowledge graph (KG)-based and deep learning-based methods, each with its inherent limitations. KG-based methods encounter difficulties annotating columns when there is no match for column cells in the KG. Moreover, KG-based methods can provide multiple predictions for one colum… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: To be published in ICDE 2024

  6. arXiv:2405.14414  [pdf, other

    cs.AI

    Proving Theorems Recursively

    Authors: Haiming Wang, Huajian Xin, Zhengying Liu, Wenda Li, Yinya Huang, Jianqiao Lu, Zhicheng Yang, Jing Tang, Jian Yin, Zhenguo Li, Xiaodan Liang

    Abstract: Recent advances in automated theorem proving leverages language models to explore expanded search spaces by step-by-step proof generation. However, such approaches are usually based on short-sighted heuristics (e.g., log probability or value function scores) that potentially lead to suboptimal or even distracting subgoals, preventing us from finding longer proofs. To address this challenge, we pro… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 21 pages, 5 figures, 3 tables

  7. arXiv:2405.14333  [pdf, other

    cs.AI

    DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

    Authors: Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang

    Abstract: Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data. To address this issue, we introduce an approach to generate extensive Lean 4 proof data derived from high-school and u… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2403.04299  [pdf, other

    cs.RO cs.AI

    LitSim: A Conflict-aware Policy for Long-term Interactive Traffic Simulation

    Authors: Haojie Xin, Xiaodong Zhang, Renzhi Tang, Songyang Yan, Qianrui Zhao, Chunze Yang, Wen Cui, Zijiang Yang

    Abstract: Simulation is pivotal in evaluating the performance of autonomous driving systems due to the advantages of high efficiency and low cost compared to on-road testing. Bridging the gap between simulation and the real world requires realistic agent behaviors. However, the existing works have the following shortcomings in achieving this goal: (1) log replay offers realistic scenarios but often leads to… ▽ More

    Submitted 1 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: 9 pages, 6 figures, under review

  10. arXiv:2402.08957  [pdf, other

    cs.AI cs.CL cs.FL cs.LG cs.PL

    MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

    Authors: Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang

    Abstract: Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectivenes… ▽ More

    Submitted 22 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Journal ref: ICLR 2024 spotlight

  11. arXiv:2312.09968  [pdf

    cond-mat.mtrl-sci cs.CV

    Human Perception-Inspired Grain Segmentation Refinement Using Conditional Random Fields

    Authors: Doruk Aksoy, Huolin L. Xin, Timothy J. Rupert, William J. Bowman

    Abstract: Accurate segmentation of interconnected line networks, such as grain boundaries in polycrystalline material microstructures, poses a significant challenge due to the fragmented masks produced by conventional computer vision algorithms, including convolutional neural networks. These algorithms struggle with thin masks, often necessitating intricate post-processing for effective contour closure and… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  12. arXiv:2310.00656  [pdf, other

    cs.AI

    LEGO-Prover: Neural Theorem Proving with Growing Libraries

    Authors: Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang

    Abstract: Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during th… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

  13. arXiv:2309.15806  [pdf, other

    cs.CL cs.AI

    Lyra: Orchestrating Dual Correction in Automated Theorem Proving

    Authors: Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li

    Abstract: Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framewo… ▽ More

    Submitted 24 August, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted to TMLR: https://openreview.net/forum?id=9Z0yB8rmQ2

  14. arXiv:2309.04295  [pdf, other

    cs.AI

    FIMO: A Challenge Formal Dataset for Automated Theorem Proving

    Authors: Chengwu Liu, Jianhao Shen, Huajian Xin, Zhengying Liu, Ye Yuan, Haiming Wang, Wei Ju, Chuanyang Zheng, Yichun Yin, Lin Li, Ming Zhang, Qun Liu

    Abstract: We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and… ▽ More

    Submitted 5 December, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: Added a hyperlink to the dataset made accessible on GitHub

  15. arXiv:2306.05029  [pdf, other

    cs.CV cs.LG

    Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

    Authors: Ruijie Zhang, Qiaozhe Zhang, Yingzhuang Liu, Hao Xin, Yan Liu, Xinggang Wang

    Abstract: Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD). The extremely high resolution and limited availability of region-level annotations make employing deep learning methods for WSI-based digital diagnosis challenging. Recently integrating multiple instance learning (MIL) and Transformer for WSI analysi… ▽ More

    Submitted 5 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

  16. arXiv:2305.18891  [pdf, other

    cs.CV cs.AI cs.MM

    EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation

    Authors: Xingqun Qi, Chen Liu, Lincheng Li, Jie Hou, Haoran Xin, Xin Yu

    Abstract: Generating vivid and diverse 3D co-speech gestures is crucial for various applications in animating virtual avatars. While most existing methods can generate gestures from audio directly, they usually overlook that emotion is one of the key factors of authentic co-speech gesture generation. In this work, we propose EmotionGesture, a novel framework for synthesizing vivid and diverse emotional co-s… ▽ More

    Submitted 3 January, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Under review

  17. arXiv:2304.01184  [pdf, other

    cs.CV

    WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation

    Authors: Lianghui Zhu, Yingyue Li, Jiemin Fang, Yan Liu, Hao Xin, Wenyu Liu, Xinggang Wang

    Abstract: This paper explores the properties of the plain Vision Transformer (ViT) for Weakly-supervised Semantic Segmentation (WSSS). The class activation map (CAM) is of critical importance for understanding a classification network and launching WSSS. We observe that different attention heads of ViT focus on different image areas. Thus a novel weight-based method is proposed to end-to-end estimate the im… ▽ More

    Submitted 26 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: 20 pages, 11 figures

  18. arXiv:2303.17607  [pdf

    cs.LG cs.AI quant-ph

    Machine learning for discovering laws of nature

    Authors: Lizhi Xin, Kevin Xin, Houwen Xin

    Abstract: Based on Darwin's natural selection, we developed "machine scientists" to discover the laws of nature by learning from raw data. "Machine scientists" construct physical theories by applying a logic tree (state Decision Tree) and a value tree (observation Function Tree); the logical tree determines the state of the entity, and the value tree determines the absolute value between the two observation… ▽ More

    Submitted 8 July, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

  19. arXiv:2210.11657  [pdf

    cond-mat.mtrl-sci cs.AI cs.LG physics.chem-ph

    MnEdgeNet -- Accurate Decomposition of Mixed Oxidation States for Mn XAS and EELS L2,3 Edges without Reference and Calibration

    Authors: Huolin L. Xin, Mike Hu

    Abstract: Accurate decomposition of the mixed Mn oxidation states is highly important for characterizing the electronic structures, charge transfer, and redox centers for electronic, electrocatalytic, and energy storage materials that contain Mn. Electron energy loss spectroscopy (EELS) and soft X-ray absorption spectroscopy (XAS) measurements of the Mn L2,3 edges are widely used for this purpose. To date,… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  20. arXiv:2210.09024  [pdf

    eess.IV cond-mat.mtrl-sci cs.CV

    Periodic Artifact Reduction in Fourier transforms of Full Field Atomic Resolution Images

    Authors: Robert Hovden, Yi Jiang, Huolin L. Xin, Lena F. Kourkoutis

    Abstract: The discrete Fourier transform is among the most routine tools used in high-resolution scanning / transmission electron microscopy (S/TEM). However, when calculating a Fourier transform, periodic boundary conditions are imposed and sharp discontinuities between the edges of an image cause a cross patterned artifact along the reciprocal space axes. This artifact can interfere with the analysis of r… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Journal ref: Microscopy and Microanalysis, 21(2), 436-441 (2015)

  21. arXiv:2209.13026  [pdf

    cond-mat.mtrl-sci cs.LG eess.SP physics.chem-ph

    Electron energy loss spectroscopy database synthesis and automation of core-loss edge recognition by deep-learning neural networks

    Authors: Lingli Kong, Zhengran Ji, Huolin L. Xin

    Abstract: The ionization edges encoded in the electron energy loss spectroscopy (EELS) spectra enable advanced material analysis including composition analyses and elemental quantifications. The development of the parallel EELS instrument and fast, sensitive detectors have greatly improved the acquisition speed of EELS spectra. However, the traditional way of core-loss edge recognition is experience based a… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 23 pages, 6 figures

  22. Intelligent Electric Vehicle Charging Recommendation Based on Multi-Agent Reinforcement Learning

    Authors: Weijia Zhang, Hao Liu, Fan Wang, Tong Xu, Haoran Xin, Dejing Dou, Hui Xiong

    Abstract: Electric Vehicle (EV) has become a preferable choice in the modern transportation system due to its environmental and energy sustainability. However, in many large cities, EV drivers often fail to find the proper spots for charging, because of the limited charging infrastructures and the spatiotemporally unbalanced charging demands. Indeed, the recent emergence of deep reinforcement learning provi… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: 12 pages, 10 figures

  23. arXiv:2101.12555  [pdf, other

    cs.IR cs.LG

    Out-of-Town Recommendation with Travel Intention Modeling

    Authors: Haoran Xin, Xinjiang Lu, Tong Xu, Hao Liu, Jingjing Gu, Dejing Dou, Hui Xiong

    Abstract: Out-of-town recommendation is designed for those users who leave their home-town areas and visit the areas they have never been to before. It is challenging to recommend Point-of-Interests (POIs) for out-of-town users since the out-of-town check-in behavior is determined by not only the user's home-town preference but also the user's travel intention. Besides, the user's travel intentions are comp… ▽ More

    Submitted 6 February, 2021; v1 submitted 29 January, 2021; originally announced January 2021.

    Comments: Accepted by AAAI-2021

  24. arXiv:2012.09093  [pdf

    cond-mat.mtrl-sci cs.AI cs.CV

    TEMImageNet Training Library and AtomSegNet Deep-Learning Models for High-Precision Atom Segmentation, Localization, Denoising, and Super-Resolution Processing of Atomic-Resolution Images

    Authors: Ruoqian Lin, Rui Zhang, Chunyang Wang, Xiao-Qing Yang, Huolin L. Xin

    Abstract: Atom segmentation and localization, noise reduction and deblurring of atomic-resolution scanning transmission electron microscopy (STEM) images with high precision and robustness is a challenging task. Although several conventional algorithms, such has thresholding, edge detection and clustering, can achieve reasonable performance in some predefined sceneries, they tend to fail when interferences… ▽ More

    Submitted 20 February, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

  25. arXiv:2008.02076  [pdf, other

    cs.LG cs.CR

    Attacking and Defending Machine Learning Applications of Public Cloud

    Authors: Dou Goodman, Hao Xin

    Abstract: Adversarial attack breaks the boundaries of traditional security defense. For adversarial attack and the characteristics of cloud services, we propose Security Development Lifecycle for Machine Learning applications, e.g., SDL for ML. The SDL for ML helps developers build more secure software by reducing the number and severity of vulnerabilities in ML-as-a-service, while reducing development cost… ▽ More

    Submitted 27 July, 2020; originally announced August 2020.

    Comments: arXiv admin note: text overlap with arXiv:1704.05051 by other authors

  26. arXiv:2005.04088  [pdf, other

    cs.LG stat.ML

    Automatic Cross-Domain Transfer Learning for Linear Regression

    Authors: Liu Xinshun, He Xin, Mao Hui, Liu Jing, Lai Weizhong, Ye Qingwen

    Abstract: Transfer learning research attempts to make model induction transferable across different domains. This method assumes that specific information regarding to which domain each instance belongs is known. This paper helps to extend the capability of transfer learning for linear regression problems to situations where the domain information is uncertain or unknown; in fact, the framework can be exten… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

  27. arXiv:2001.05574  [pdf, other

    cs.LG cs.CR stat.ML

    Advbox: a toolbox to generate adversarial examples that fool neural networks

    Authors: Dou Goodman, Hao Xin, Wang Yang, Wu Yuesheng, Xiong Junfeng, Zhang Huan

    Abstract: In recent years, neural networks have been extensively deployed for computer vision tasks, particularly visual classification problems, where new algorithms reported to achieve or even surpass the human performance. Recent studies have shown that they are all vulnerable to the attack of adversarial examples. Small and often imperceptible perturbations to the input images are sufficient to fool the… ▽ More

    Submitted 26 August, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

  28. arXiv:1804.09848  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci cs.ET

    Memristor Crossbars with 4.5 Terabits-per-Inch-Square Density and Two Nanometer Dimension

    Authors: Shuang Pi, Can Li, Hao Jiang, Weiwei Xia, Huolin Xin, J. Joshua Yang, Qiangfei Xia

    Abstract: Memristor is a promising building block for the next generation nonvolatile random access memory and bio-inspired computing systems. Organizing memristors into high density crossbar arrays, although challenging, is critical to meet the ever-growing high capacity and low energy demands of these applications especially in the big data era. Here, we construct memristor crossbars with a single-layer d… ▽ More

    Submitted 27 May, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

    Comments: updated version

    Journal ref: Nat. Nanotechnol. (2018)

  29. GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies

    Authors: Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, Onur Mutlu

    Abstract: Motivation: Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mappin… ▽ More

    Submitted 2 November, 2017; originally announced November 2017.

    Comments: arXiv admin note: text overlap with arXiv:1708.04329

    Journal ref: BMC Genomics, 19 (Suppl 2):89, 2018

  30. arXiv:1705.05720  [pdf, other

    cs.DB cs.AI cs.HC

    Subjective Knowledge Acquisition and Enrichment Powered By Crowdsourcing

    Authors: Rui Meng, Hao Xin, Lei Chen, Yangqiu Song

    Abstract: Knowledge bases (KBs) have attracted increasing attention due to its great success in various areas, such as Web and mobile search.Existing KBs are restricted to objective factual knowledge, such as city population or fruit shape, whereas,subjective knowledge, such as big city, which is commonly mentioned in Web and mobile queries, has been neglected. Subjective knowledge differs from objective kn… ▽ More

    Submitted 16 May, 2017; originally announced May 2017.

  31. arXiv:1704.04641  [pdf, other

    cs.IT

    Capacity of the Gaussian Two-Pair Two-Way Relay Channel to Within 1/2 Bit

    Authors: Xiaojun Yuan, Haiyang Xin, Soung-Chang Liew, Yong Li

    Abstract: This paper studies the transceiver design of the Gaussian two-pair two-way relay channel (TWRC), where two pairs of users exchange information through a common relay in a pairwise manner. Our main contribution is to show that the capacity of the Gaussian two-pair TWRC is achievable to within 1/2 bit for arbitrary channel conditions. In the proof, we develop a hybrid coding scheme involving Gaussia… ▽ More

    Submitted 31 May, 2018; v1 submitted 15 April, 2017; originally announced April 2017.

    Comments: 77 pages, 9 figures, journal

  32. arXiv:1604.01789  [pdf

    q-bio.GN cs.AR cs.DS

    GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

    Authors: Mohammed Alser, Hasan Hassan, Hongyi Xin, Oğuz Ergin, Onur Mutlu, Can Alkan

    Abstract: Motivation: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -- called short reads -- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and "candidate" locations in that reference genome. The similarity measurem… ▽ More

    Submitted 26 September, 2020; v1 submitted 6 April, 2016; originally announced April 2016.

    Journal ref: Bioinformatics. Nov 1;33(21):3355-3363, 2017

  33. Optimal Seed Solver: Optimizing Seed Selection in Read Mapping

    Authors: Hongyi Xin, Richard Zhu, Sunny Nahar, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan, Onur Mutlu

    Abstract: Motivation: Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both o… ▽ More

    Submitted 26 June, 2015; originally announced June 2015.

    Comments: 10 pages of main text. 6 pages of supplementary materials. Under review by Oxford Bioinformatics

    Journal ref: Bioinformatics, Jun 1;32(11):1632-42, 2016

  34. arXiv:1112.3059  [pdf

    cond-mat.mtrl-sci cs.CV physics.data-an

    Data Processing For Atomic Resolution EELS

    Authors: Paul Cueva, Robert Hovden, Julia A. Mundy, Huolin L. Xin, David A. Muller

    Abstract: The high beam current and sub-angstrom resolution of aberration-corrected scanning transmission electron microscopes has enabled electron energy loss spectroscopic (EELS) mapping with atomic resolution. These spectral maps are often dose-limited and spatially oversampled, leading to low counts/channel and are thus highly sensitive to errors in background estimation. However, by taking advantage of… ▽ More

    Submitted 13 December, 2011; originally announced December 2011.

    Journal ref: Microscopy and Microanalysis, Vol. 18 pp 667-675 (2012)