Search | arXiv e-print repository

A Primal-Dual Framework for Transformers and Neural Networks

Authors: Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Abstract: Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresp… ▽ More Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem, whose primal formulation has the form of a neural network layer. Using our framework, we derive popular attention layers used in practice and propose two new attentions: 1) the Batch Normalized Attention (Attention-BN) derived from the batch normalization layer and 2) the Attention with Scaled Head (Attention-SH) derived from using less training data to fit the SVR model. We empirically demonstrate the advantages of the Attention-BN and Attention-SH in reducing head redundancy, increasing the model's accuracy, and improving the model's efficiency in a variety of practical applications including image and time-series classification. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to ICLR 2023, 26 pages, 4 figures, 14 tables

arXiv:2311.14740 [pdf, other]

AutoKG: Efficient Automated Knowledge Graph Generation for Language Models

Authors: Bohan Chen, Andrea L. Bertozzi

Abstract: Traditional methods of linking large language models (LLMs) to knowledge bases via the semantic similarity search often fall short of capturing complex relational dynamics. To address these limitations, we introduce AutoKG, a lightweight and efficient approach for automated knowledge graph (KG) construction. For a given knowledge base consisting of text blocks, AutoKG first extracts keywords using… ▽ More Traditional methods of linking large language models (LLMs) to knowledge bases via the semantic similarity search often fall short of capturing complex relational dynamics. To address these limitations, we introduce AutoKG, a lightweight and efficient approach for automated knowledge graph (KG) construction. For a given knowledge base consisting of text blocks, AutoKG first extracts keywords using a LLM and then evaluates the relationship weight between each pair of keywords using graph Laplace learning. We employ a hybrid search scheme combining vector similarity and graph-based associations to enrich LLM responses. Preliminary experiments demonstrate that AutoKG offers a more comprehensive and interconnected knowledge retrieval mechanism compared to the semantic similarity search, thereby enhancing the capabilities of LLMs in generating more insightful and relevant outputs. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 10 pages, accepted by IEEE BigData 2023 as a workshop paper in GTA3

arXiv:2311.11163 [pdf, other]

Hate speech and hate crimes: a data-driven study of evolving discourse around marginalized groups

Authors: Malvina Bozhidarova, Jonathn Chang, Aaishah Ale-rasool, Yuxiang Liu, Chongyao Ma, Andrea L. Bertozzi, P. Jeffrey Brantingham, Junyuan Lin, Sanjukta Krishnagopal

Abstract: This study explores the dynamic relationship between online discourse, as observed in tweets, and physical hate crimes, focusing on marginalized groups. Leveraging natural language processing techniques, including keyword extraction and topic modeling, we analyze the evolution of online discourse after events affecting these groups. Examining sentiment and polarizing tweets, we establish correlati… ▽ More This study explores the dynamic relationship between online discourse, as observed in tweets, and physical hate crimes, focusing on marginalized groups. Leveraging natural language processing techniques, including keyword extraction and topic modeling, we analyze the evolution of online discourse after events affecting these groups. Examining sentiment and polarizing tweets, we establish correlations with hate crimes in Black and LGBTQ+ communities. Using a knowledge graph, we connect tweets, users, topics, and hate crimes, enabling network analyses. Our findings reveal divergent patterns in the evolution of user communities for Black and LGBTQ+ groups, with notable differences in sentiment among influential users. This analysis sheds light on distinctive online discourse patterns and emphasizes the need to monitor hate speech to prevent hate crimes, especially following significant events impacting marginalized communities. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2307.10495 [pdf, other]

doi 10.1117/12.2662393

Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets

Authors: James Chapman, Bohan Chen, Zheng Tan, Jeff Calder, Kevin Miller, Andrea L. Bertozzi

Abstract: Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning s… ▽ More Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 16 pages, 7 figures, Preprint

ACM Class: I.2.6; I.2.10; I.4.0; I.4.9

Journal ref: Proc. SPIE. Algorithms for Synthetic Aperture Radar Imagery XXX (Vol. 12520, pp. 96-111). 13 June 2023

arXiv:2211.00119 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096465

Active Learning of Non-semantic Speech Tasks with Pretrained Models

Authors: Harlin Lee, Aaqib Saeed, Andrea L. Bertozzi

Abstract: Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with act… ▽ More Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning. ALOE uses pretrained models in conjunction with active learning to label data incrementally and learn classifiers for downstream tasks, thereby mitigating the need to acquire labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide range of tasks, uncertainty-based acquisition functions, and model architectures. Training a linear classifier on top of a frozen encoder with ALOE is shown to achieve performance similar to several baselines that utilize the entire labeled data. △ Less

Submitted 25 February, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

Comments: Accepted at: ICASSP'23, Code: https://github.com/HarlinLee/ALOE

arXiv:2204.08621 [pdf, other]

Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs

Authors: Justin Baker, Hedi Xia, Yiwei Wang, Elena Cherkaev, Akil Narayan, Long Chen, Jack Xin, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Abstract: Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit so… ▽ More Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit solver consists of inner-outer iterations: the inner iterations approximate each implicit update step using a fast optimization algorithm, and the outer iterations solve the ODE system over time. The proximal implicit ODE solver guarantees superiority over explicit solvers in numerical stability and computational efficiency. We validate the advantages of proximal implicit solvers over existing popular neural ODE solvers on various challenging benchmark tasks, including learning continuous-depth graph neural networks and continuous normalizing flows. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: 20 pages, 7 figures

MSC Class: 68T07; 65L04 ACM Class: I.2

arXiv:2204.00005 [pdf, other]

Graph-based Active Learning for Semi-supervised Classification of SAR Data

Authors: Kevin Miller, John Mauro, Jason Setiadi, Xoaquin Baca, Zhan Shi, Jeff Calder, Andrea L. Bertozzi

Abstract: We present a novel method for classification of Synthetic Aperture Radar (SAR) data by combining ideas from graph-based learning and neural network methods within an active learning framework. Graph-based methods in machine learning are based on a similarity graph constructed from the data. When the data consists of raw images composed of scenes, extraneous information can make the classification… ▽ More We present a novel method for classification of Synthetic Aperture Radar (SAR) data by combining ideas from graph-based learning and neural network methods within an active learning framework. Graph-based methods in machine learning are based on a similarity graph constructed from the data. When the data consists of raw images composed of scenes, extraneous information can make the classification task more difficult. In recent years, neural network methods have been shown to provide a promising framework for extracting patterns from SAR images. These methods, however, require ample training data to avoid overfitting. At the same time, such training data are often unavailable for applications of interest, such as automatic target recognition (ATR) and SAR data. We use a Convolutional Neural Network Variational Autoencoder (CNNVAE) to embed SAR data into a feature space, and then construct a similarity graph from the embedded data and apply graph-based semi-supervised learning techniques. The CNNVAE feature embedding and graph construction requires no labeled data, which reduces overfitting and improves the generalization performance of graph learning at low label rates. Furthermore, the method easily incorporates a human-in-the-loop for active learning in the data-labeling process. We present promising results and compare them to other standard machine learning methods on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset for ATR with small amounts of labeled data. △ Less

Submitted 30 March, 2022; originally announced April 2022.

MSC Class: 68R10; 68T07; 68T05 ACM Class: I.2.6; I.2.10; I.4.0; I.4.9

arXiv:2112.15486 [pdf, other]

Efficient and Reliable Overlay Networks for Decentralized Federated Learning

Authors: Yifan Hua, Kevin Miller, Andrea L. Bertozzi, Chen Qian, Bao Wang

Abstract: We propose near-optimal overlay networks based on $d$-regular expander graphs to accelerate decentralized federated learning (DFL) and improve its generalization. In DFL a massive number of clients are connected by an overlay network, and they solve machine learning problems collaboratively without sharing raw data. Our overlay network design integrates spectral graph theory and the theoretical co… ▽ More We propose near-optimal overlay networks based on $d$-regular expander graphs to accelerate decentralized federated learning (DFL) and improve its generalization. In DFL a massive number of clients are connected by an overlay network, and they solve machine learning problems collaboratively without sharing raw data. Our overlay network design integrates spectral graph theory and the theoretical convergence and generalization bounds for DFL. As such, our proposed overlay networks accelerate convergence, improve generalization, and enhance robustness to clients failures in DFL with theoretical guarantees. Also, we present an efficient algorithm to convert a given graph to a practical overlay network and maintaining the network topology after potential client failures. We numerically verify the advantages of DFL with our proposed networks on various benchmark tasks, ranging from image classification to language modeling using hundreds of clients. △ Less

Submitted 12 December, 2021; originally announced December 2021.

Comments: 25 pages, 8 figures

MSC Class: 65B99; 68T01; 68T09; 68W15

arXiv:2110.07739 [pdf, other]

Model-Change Active Learning in Graph-Based Semi-Supervised Learning

Authors: Kevin Miller, Andrea L. Bertozzi

Abstract: Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier. A challenge is to identify which points to label to best improve performance while limiting the number of new labels. "Model-change" active learning quantifies the resulting change incurred in the classifier by introducing the additional… ▽ More Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier. A challenge is to identify which points to label to best improve performance while limiting the number of new labels. "Model-change" active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s). We pair this idea with graph-based semi-supervised learning methods, that use the spectrum of the graph Laplacian matrix, which can be truncated to avoid prohibitively large computational and storage costs. We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution. We show a variety of multiclass examples that illustrate improved performance over prior state-of-art. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Submitted to SIAM Journal on Mathematics of Data Science (SIMODS)

arXiv:2110.04932 [pdf, other]

An Analysis of COVID-19 Knowledge Graph Construction and Applications

Authors: Dominic Flocco, Bryce Palmer-Toy, Ruixiao Wang, Hongyu Zhu, Rishi Sonthalia, Junyuan Lin, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: The construction and application of knowledge graphs have seen a rapid increase across many disciplines in recent years. Additionally, the problem of uncovering relationships between developments in the COVID-19 pandemic and social media behavior is of great interest to researchers hoping to curb the spread of the disease. In this paper we present a knowledge graph constructed from COVID-19 relate… ▽ More The construction and application of knowledge graphs have seen a rapid increase across many disciplines in recent years. Additionally, the problem of uncovering relationships between developments in the COVID-19 pandemic and social media behavior is of great interest to researchers hoping to curb the spread of the disease. In this paper we present a knowledge graph constructed from COVID-19 related tweets in the Los Angeles area, supplemented with federal and state policy announcements and disease spread statistics. By incorporating dates, topics, and events as entities, we construct a knowledge graph that describes the connections between these useful information. We use natural language processing and change point analysis to extract tweet-topic, tweet-date, and event-date relations. Further analysis on the constructed knowledge graph provides insight into how tweets reflect public sentiments towards COVID-19 related topics and how changes in these sentiments correlate with real-world events. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2110.04840 [pdf, other]

Heavy Ball Neural Ordinary Differential Equations

Authors: Hedi Xia, Vai Suliafu, Hangjie Ji, Tan M. Nguyen, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Abstract: We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference. HBNODEs have two properties that imply practical advantages over NODEs: (i) The adjoint state of an HBNODE also satisfies an HBNODE, accelerating both forward and backward ODE solvers,… ▽ More We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference. HBNODEs have two properties that imply practical advantages over NODEs: (i) The adjoint state of an HBNODE also satisfies an HBNODE, accelerating both forward and backward ODE solvers, thus significantly reducing the number of function evaluations (NFEs) and improving the utility of the trained models. (ii) The spectrum of HBNODEs is well structured, enabling effective learning of long-term dependencies from complex sequential data. We verify the advantages of HBNODEs over NODEs on benchmark tasks, including image classification, learning complex dynamics, and sequential modeling. Our method requires remarkably fewer forward and backward NFEs, is more accurate, and learns long-term dependencies more effectively than the other ODE-based neural network models. Code is available at \url{https://github.com/hedixia/HeavyBallNODE}. △ Less

Submitted 10 October, 2021; originally announced October 2021.

Comments: 23 pages, 9 figures, Accepted for publication at Advances in Neural Information Processing Systems (NeurIPS) 2021

MSC Class: 68T07 ACM Class: I.2

arXiv:2107.01713 [pdf, other]

A Multilayer Network Model of the Coevolution of the Spread of a Disease and Competing Opinions

Authors: Kaiyan Peng, Zheng Lu, Vanessa Lin, Michael R. Lindstrom, Christian Parkinson, Chuntian Wang, Andrea L. Bertozzi, Mason A. Porter

Abstract: During the COVID-19 pandemic, conflicting opinions on physical distancing swept across social media, affecting both human behavior and the spread of COVID-19. Inspired by such phenomena, we construct a two-layer multiplex network for the coupled spread of a disease and conflicting opinions. We model each process as a contagion. On one layer, we consider the concurrent evolution of two opinions --… ▽ More During the COVID-19 pandemic, conflicting opinions on physical distancing swept across social media, affecting both human behavior and the spread of COVID-19. Inspired by such phenomena, we construct a two-layer multiplex network for the coupled spread of a disease and conflicting opinions. We model each process as a contagion. On one layer, we consider the concurrent evolution of two opinions -- pro-physical-distancing and anti-physical-distancing -- that compete with each other and have mutual immunity to each other. The disease evolves on the other layer, and individuals are less likely (respectively, more likely) to become infected when they adopt the pro-physical-distancing (respectively, anti-physical-distancing) opinion. We develop approximations of mean-field type by generalizing monolayer pair approximations to multilayer networks; these approximations agree well with Monte Carlo simulations for a broad range of parameters and several network structures. Through numerical simulations, we illustrate the influence of opinion dynamics on the spread of the disease from complex interactions both between the two conflicting opinions and between the opinions and the disease. We find that lengthening the duration that individuals hold an opinion may help suppress disease transmission, and we demonstrate that increasing the cross-layer correlations or intra-layer correlations of node degrees may lead to fewer individuals becoming infected with the disease. △ Less

Submitted 4 July, 2021; originally announced July 2021.

MSC Class: 91D30; 92D30; 37N25

arXiv:2105.10650 [pdf]

Post-Radiotherapy PET Image Outcome Prediction by Deep Learning under Biological Model Guidance: A Feasibility Study of Oropharyngeal Cancer Application

Authors: Hangjie Ji, Kyle Lafata, Yvonne Mowery, David Brizel, Andrea L. Bertozzi, Fang-Fang Yin, Chunhao Wang

Abstract: This paper develops a method of biologically guided deep learning for post-radiation FDG-PET image outcome prediction based on pre-radiation images and radiotherapy dose information. Based on the classic reaction-diffusion mechanism, a novel biological model was proposed using a partial differential equation that incorporates spatial radiation dose distribution as a patient-specific treatment info… ▽ More This paper develops a method of biologically guided deep learning for post-radiation FDG-PET image outcome prediction based on pre-radiation images and radiotherapy dose information. Based on the classic reaction-diffusion mechanism, a novel biological model was proposed using a partial differential equation that incorporates spatial radiation dose distribution as a patient-specific treatment information variable. A 7-layer encoder-decoder-based convolutional neural network (CNN) was designed and trained to learn the proposed biological model. As such, the model could generate post-radiation FDG-PET image outcome predictions with possible time-series transition from pre-radiotherapy image states to post-radiotherapy states. The proposed method was developed using 64 oropharyngeal patients with paired FDG-PET studies before and after 20Gy delivery (2Gy/daily fraction) by IMRT. In a two-branch deep learning execution, the proposed CNN learns specific terms in the biological model from paired FDG-PET images and spatial dose distribution as in one branch, and the biological model generates post-20Gy FDG-PET image prediction in the other branch. The proposed method successfully generated post-20Gy FDG-PET image outcome prediction with breakdown illustrations of biological model components. Time-series FDG-PET image predictions were generated to demonstrate the feasibility of disease response rendering. The developed biologically guided deep learning method achieved post-20Gy FDG-PET image outcome predictions in good agreement with ground-truth results. With break-down biological modeling components, the outcome image predictions could be used in adaptive radiotherapy decision-making to optimize personalized plans for the best outcome in the future. △ Less

Submitted 22 May, 2021; originally announced May 2021.

Comments: 26 pages, 5 figures

arXiv:2007.12809 [pdf, other]

doi 10.1088/1361-6420/ac1e80

Posterior Consistency of Semi-Supervised Regression on Graphs

Authors: Andrea L. Bertozzi, Bamdad Hosseini, Hao Li, Kevin Miller, Andrew M. Stuart

Abstract: Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present… ▽ More Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present a Bayesian formulation of SSR in which the weighted graph defines a Gaussian prior, using a graph Laplacian, and the labeled data defines a likelihood. We analyze the rate of contraction of the posterior measure around the ground truth in terms of parameters that quantify the small label error and inherent clustering in the graph. We obtain bounds on the rates of contraction and illustrate their sharpness through numerical experiments. The analysis also gives insight into the choice of hyperparameters that enter the definition of the prior. △ Less

Submitted 24 March, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

arXiv:2007.11126 [pdf, other]

Efficient Graph-Based Active Learning with Probit Likelihood via Gaussian Approximations

Authors: Kevin Miller, Hao Li, Andrea L. Bertozzi

Abstract: We present a novel adaptation of active learning to graph-based semi-supervised learning (SSL) under non-Gaussian Bayesian models. We present an approximation of non-Gaussian distributions to adapt previously Gaussian-based acquisition functions to these more general cases. We develop an efficient rank-one update for applying "look-ahead" based methods as well as model retraining. We also introduc… ▽ More We present a novel adaptation of active learning to graph-based semi-supervised learning (SSL) under non-Gaussian Bayesian models. We present an approximation of non-Gaussian distributions to adapt previously Gaussian-based acquisition functions to these more general cases. We develop an efficient rank-one update for applying "look-ahead" based methods as well as model retraining. We also introduce a novel "model change" acquisition function based on these approximations that further expands the available collection of active learning acquisition functions for such methods. △ Less

Submitted 21 July, 2020; originally announced July 2020.

Comments: Accepted in ICML Workshop on Real World Experiment Design and Active Learning 2020

arXiv:2006.06919 [pdf, other]

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Authors: Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Abstract: Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numeri… ▽ More Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at https://github.com/minhtannguyen/MomentumRNN. △ Less

Submitted 11 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 21 pages, 11 figures, Accepted for publication at Advances in Neural Information Processing Systems (NeurIPS) 2020

MSC Class: 68T07 ACM Class: I.2

Journal ref: Advances in Neural Information Processing Systems (NeurIPS) 2020

arXiv:2003.00631 [pdf, other]

Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets

Authors: Thu Dinh, Bao Wang, Andrea L. Bertozzi, Stanley J. Osher

Abstract: Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep l… ▽ More Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning. Such a co-design enables us to advance the goal of accommodating both sparsity and robustness. With this objective in mind, we leverage the relaxed augmented Lagrangian based algorithms to prune the weights of adversarially trained DNNs, at both structured and unstructured levels. Using a Feynman-Kac formalism principled robust and sparse DNNs, we can at least double the channel sparsity of the adversarially trained ResNet20 for CIFAR10 classification, meanwhile, improve the natural accuracy by $8.69$\% and the robust accuracy under the benchmark $20$ iterations of IFGSM attack by $5.42$\%. The code is available at \url{https://github.com/BaoWangMath/rvsm-rgsm-admm}. △ Less

Submitted 1 March, 2020; originally announced March 2020.

Comments: 16 pages, 7 figures

MSC Class: 68T01

arXiv:2002.10583 [pdf, other]

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Authors: Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Abstract: Stochastic gradient descent (SGD) with constant momentum and its variants such as Adam are the optimization algorithms of choice for training deep neural networks (DNNs). Since DNN training is incredibly computationally expensive, there is great interest in speeding up the convergence. Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimizatio… ▽ More Stochastic gradient descent (SGD) with constant momentum and its variants such as Adam are the optimization algorithms of choice for training deep neural networks (DNNs). Since DNN training is incredibly computationally expensive, there is great interest in speeding up the convergence. Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst. In this paper, we propose Scheduled Restart SGD (SRSGD), a new NAG-style scheme for training DNNs. SRSGD replaces the constant momentum in SGD by the increasing momentum in NAG but stabilizes the iterations by resetting the momentum to zero according to a schedule. Using a variety of models and benchmarks for image classification, we demonstrate that, in training DNNs, SRSGD significantly improves convergence and generalization; for instance in training ResNet200 for ImageNet classification, SRSGD achieves an error rate of 20.93% vs. the benchmark of 22.13%. These improvements become more significant as the network grows deeper. Furthermore, on both CIFAR and ImageNet, SRSGD reaches similar or even better error rates with significantly fewer training epochs compared to the SGD baseline. △ Less

Submitted 26 April, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: 35 pages, 16 figures, 18 tables

arXiv:1902.05113 [pdf, other]

A Study on Graph-Structured Recurrent Neural Networks and Sparsification with Application to Epidemic Forecasting

Authors: Zhijian Li, Xiyang Luo, Bao Wang, Andrea L. Bertozzi, Jack Xin

Abstract: We study epidemic forecasting on real-world health data by a graph-structured recurrent neural network (GSRNN). We achieve state-of-the-art forecasting accuracy on the benchmark CDC dataset. To improve model efficiency, we sparsify the network weights via transformed-$\ell_1$ penalty and maintain prediction accuracy at the same level with 70% of the network weights being zero. We study epidemic forecasting on real-world health data by a graph-structured recurrent neural network (GSRNN). We achieve state-of-the-art forecasting accuracy on the benchmark CDC dataset. To improve model efficiency, we sparsify the network weights via transformed-$\ell_1$ penalty and maintain prediction accuracy at the same level with 70% of the network weights being zero. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1811.06321 [pdf, other]

Multivariate Spatiotemporal Hawkes Processes and Network Reconstruction

Authors: Baichuan Yuan, Hao Li, Andrea L. Bertozzi, P. Jeffrey Brantingham, Mason A. Porter

Abstract: There is often latent network structure in spatial and temporal data and the tools of network analysis can yield fascinating insights into such data. In this paper, we develop a nonparametric method for network reconstruction from spatiotemporal data sets using multivariate Hawkes processes. In contrast to prior work on network reconstruction with point-process models, which has often focused on e… ▽ More There is often latent network structure in spatial and temporal data and the tools of network analysis can yield fascinating insights into such data. In this paper, we develop a nonparametric method for network reconstruction from spatiotemporal data sets using multivariate Hawkes processes. In contrast to prior work on network reconstruction with point-process models, which has often focused on exclusively temporal information, our approach uses both temporal and spatial information and does not assume a specific parametric form of network dynamics. This leads to an effective way of recovering an underlying network. We illustrate our approach using both synthetic networks and networks constructed from real-world data sets (a location-based social media network, a narrative of crime events, and violent gang crimes). Our results demonstrate that, in comparison to using only temporal data, our spatiotemporal approach yields improved network reconstruction, providing a basis for meaningful subsequent analysis --- such as community structure and motif analysis --- of the reconstructed networks. △ Less

Submitted 15 November, 2018; originally announced November 2018.

arXiv:1809.08516 [pdf, other]

Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization

Authors: Bao Wang, Alex T. Lin, Wei Zhu, Penghang Yin, Andrea L. Bertozzi, Stanley J. Osher

Abstract: We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast… ▽ More We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we combine this data-dependent activation with total variation minimization on adversarial images and training data augmentation, we achieve an improvement in robust accuracy by 38.9$\%$ for ResNet56 under the strongest IFGSM attack. Furthermore, We provide an intuitive explanation of our defense by analyzing the geometry of the feature space. △ Less

Submitted 29 April, 2020; v1 submitted 22 September, 2018; originally announced September 2018.

Comments: 17 pages, 6 figures

MSC Class: 68Pxx

Journal ref: Inverse Problems and Imaging, 2020

arXiv:1806.02485 [pdf, other]

doi 10.1007/s00332-019-09541-8

Stochastic Block Models are a Discrete Surface Tension

Authors: Zachary M. Boyd, Mason A. Porter, Andrea L. Bertozzi

Abstract: Networks, which represent agents and interactions between them, arise in myriad applications throughout the sciences, engineering, and even the humanities. To understand large-scale structure in a network, a common task is to cluster a network's nodes into sets called "communities", such that there are dense connections within communities but sparse connections between them. A popular and statisti… ▽ More Networks, which represent agents and interactions between them, arise in myriad applications throughout the sciences, engineering, and even the humanities. To understand large-scale structure in a network, a common task is to cluster a network's nodes into sets called "communities", such that there are dense connections within communities but sparse connections between them. A popular and statistically principled method to perform such clustering is to use a family of generative models known as stochastic block models (SBMs). In this paper, we show that maximum likelihood estimation in an SBM is a network analog of a well-known continuum surface-tension problem that arises from an application in metallurgy. To illustrate the utility of this relationship, we implement network analogs of three surface-tension algorithms, with which we successfully recover planted community structure in synthetic networks and which yield fascinating insights on empirical networks that we construct from hyperspectral videos. △ Less

Submitted 24 March, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: to appear in Journal of Nonlinear Science

MSC Class: 65K10; 49M20; 35Q56; 62H30; 91C20; 91D30; 94C15

arXiv:1804.00684 [pdf, other]

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

Authors: Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discov… ▽ More We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting. △ Less

Submitted 2 April, 2018; originally announced April 2018.

Comments: 9 pages, 19 figures

MSC Class: 65-06

arXiv:1711.08833 [pdf, other]

Deep Learning for Real-Time Crime Forecasting and its Ternarization

Authors: Bao Wang, Penghang Yin, Andrea L. Bertozzi, P. Jeffrey Brantingham, Stanley J. Osher, Jack Xin

Abstract: Real-time crime forecasting is important. However, accurate prediction of when and where the next crime will happen is difficult. No known physical model provides a reasonable approximation to such a complex system. Historical crime data are sparse in both space and time and the signal of interests is weak. In this work, we first present a proper representation of crime data. We then adapt the spa… ▽ More Real-time crime forecasting is important. However, accurate prediction of when and where the next crime will happen is difficult. No known physical model provides a reasonable approximation to such a complex system. Historical crime data are sparse in both space and time and the signal of interests is weak. In this work, we first present a proper representation of crime data. We then adapt the spatial temporal residual network on the well represented data to predict the distribution of crime in Los Angeles at the scale of hours in neighborhood-sized parcels. These experiments as well as comparisons with several existing approaches to prediction demonstrate the superiority of the proposed model in terms of accuracy. Finally, we present a ternarization technique to address the resource consumption issue for its deployment in real world. This work is an extension of our short conference proceeding paper [Wang et al, Arxiv 1707.03340]. △ Less

Submitted 23 November, 2017; originally announced November 2017.

Comments: 14 pages, 7 figures

MSC Class: 62-07

arXiv:1707.03340 [pdf, other]

Deep Learning for Real Time Crime Forecasting

Authors: Bao Wang, Duo Zhang, Duanhao Zhang, P. Jeffery Brantingham, Andrea L. Bertozzi

Abstract: Accurate real time crime prediction is a fundamental issue for public safety, but remains a challenging problem for the scientific community. Crime occurrences depend on many complex factors. Compared to many predictable events, crime is sparse. At different spatio-temporal scales, crime distributions display dramatically different patterns. These distributions are of very low regularity in both s… ▽ More Accurate real time crime prediction is a fundamental issue for public safety, but remains a challenging problem for the scientific community. Crime occurrences depend on many complex factors. Compared to many predictable events, crime is sparse. At different spatio-temporal scales, crime distributions display dramatically different patterns. These distributions are of very low regularity in both space and time. In this work, we adapt the state-of-the-art deep learning spatio-temporal predictor, ST-ResNet [Zhang et al, AAAI, 2017], to collectively predict crime distribution over the Los Angeles area. Our models are two staged. First, we preprocess the raw crime data. This includes regularization in both space and time to enhance predictable signals. Second, we adapt hierarchical structures of residual convolutional units to train multi-factor crime prediction models. Experiments over a half year period in Los Angeles reveal highly accurate predictive power of our models. △ Less

Submitted 9 July, 2017; originally announced July 2017.

Comments: 4 pages, 6 figures, NOLTA, 2017

MSC Class: 68T05

arXiv:1704.02955 [pdf, other]

Unsupervised record matching with noisy and incomplete data

Authors: Yves van Gennip, Blake Hunter, Anna Ma, Daniel Moyer, Ryan de Vera, Andrea L. Bertozzi

Abstract: We consider the problem of duplicate detection in noisy and incomplete data: given a large data set in which each record has multiple entries (attributes), detect which distinct records refer to the same real world entity. This task is complicated by noise (such as misspellings) and missing data, which can lead to records being different, despite referring to the same entity. Our method consists o… ▽ More We consider the problem of duplicate detection in noisy and incomplete data: given a large data set in which each record has multiple entries (attributes), detect which distinct records refer to the same real world entity. This task is complicated by noise (such as misspellings) and missing data, which can lead to records being different, despite referring to the same entity. Our method consists of three main steps: creating a similarity score between records, grouping records together into "unique entities", and refining the groups. We compare various methods for creating similarity scores between noisy records, considering different combinations of string matching, term frequency-inverse document frequency methods, and n-gram techniques. In particular, we introduce a vectorized soft term frequency-inverse document frequency method, with an optional refinement step. We also discuss two methods to deal with missing data in computing similarity scores. We test our method on the Los Angeles Police Department Field Interview Card data set, the Cora Citation Matching data set, and two sets of restaurant review data. The results show that the methods that use words as the basic units are preferable to those that use 3-grams. Moreover, in some (but certainly not all) parameter ranges soft term frequency-inverse document frequency methods can outperform the standard term frequency-inverse document frequency method. The results also confirm that our method for automatically determining the number of groups typically works well in many cases and allows for accurate results in the absence of a priori knowledge of the number of unique entities in the data set. △ Less

Submitted 30 April, 2018; v1 submitted 10 April, 2017; originally announced April 2017.

Comments: 24 pages, 17 figures; this second version has various significant updates compared to version 1 as a result of the peer review process prior to journal publication; we thank the reviewers for their comments

arXiv:1703.08816 [pdf, other]

Uncertainty quantification in graph-based classification of high dimensional data

Authors: Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis

Abstract: Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distr… ▽ More Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003]. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms. △ Less

Submitted 8 February, 2018; v1 submitted 26 March, 2017; originally announced March 2017.

Comments: 33 pages, 14 figures

arXiv:1701.01505 [pdf]

doi 10.1186/s40163-017-0074-0

Crime Topic Modeling

Authors: Da Kuang, P. Jeffrey Brantingham, Andrea L. Bertozzi

Abstract: The classification of crime into discrete categories entails a massive loss of information. Crimes emerge out of a complex mix of behaviors and situations, yet most of these details cannot be captured by singular crime type labels. This information loss impacts our ability to not only understand the causes of crime, but also how to develop optimal crime prevention strategies. We apply machine lear… ▽ More The classification of crime into discrete categories entails a massive loss of information. Crimes emerge out of a complex mix of behaviors and situations, yet most of these details cannot be captured by singular crime type labels. This information loss impacts our ability to not only understand the causes of crime, but also how to develop optimal crime prevention strategies. We apply machine learning methods to short narrative text descriptions accompanying crime records with the goal of discovering ecologically more meaningful latent crime classes. We term these latent classes "crime topics" in reference to text-based topic modeling methods that produce them. We use topic distributions to measure clustering among formally recognized crime types. Crime topics replicate broad distinctions between violent and property crime, but also reveal nuances linked to target characteristics, situational conditions and the tools and methods of attack. Formal crime types are not discrete in topic space. Rather, crime types are distributed across a range of crime topics. Similarly, individual crime topics are distributed across a range of formal crime types. Key ecological groups include identity theft, shoplifting, burglary and theft, car crimes and vandalism, criminal threats and confidence crimes, and violent crimes. Though not a replacement for formal legal crime classifications, crime topics provide a unique window into the heterogeneous causal processes underlying crime. △ Less

Submitted 6 August, 2018; v1 submitted 5 January, 2017; originally announced January 2017.

Comments: 47 pages, 4 tables, 7 figures

Journal ref: Kuang, D., Brantingham, P. J., & Bertozzi, A. L. (2017). Crime topic modeling. Crime Science, 6(1), 12

arXiv:1604.08182 [pdf, other]

doi 10.1109/TGRS.2017.2654486

Unsupervised Classification in Hyperspectral Imagery with Nonlocal Total Variation and Primal-Dual Hybrid Gradient Algorithm

Authors: Wei Zhu, Victoria Chayes, Alexandre Tiard, Stephanie Sanchez, Devin Dahlberg, Andrea L. Bertozzi, Stanley Osher, Dominique Zosso, Da Kuang

Abstract: In this paper, a graph-based nonlocal total variation method (NLTV) is proposed for unsupervised classification of hyperspectral images (HSI). The variational problem is solved by the primal-dual hybrid gradient (PDHG) algorithm. By squaring the labeling function and using a stable simplex clustering routine, an unsupervised clustering method with random initialization can be implemented. The effe… ▽ More In this paper, a graph-based nonlocal total variation method (NLTV) is proposed for unsupervised classification of hyperspectral images (HSI). The variational problem is solved by the primal-dual hybrid gradient (PDHG) algorithm. By squaring the labeling function and using a stable simplex clustering routine, an unsupervised clustering method with random initialization can be implemented. The effectiveness of this proposed algorithm is illustrated on both synthetic and real-world HSI, and numerical results show that the proposed algorithm outperforms other standard unsupervised clustering methods such as spherical K-means, nonnegative matrix factorization (NMF), and the graph-based Merriman-Bence-Osher (MBO) scheme. △ Less

Submitted 13 February, 2017; v1 submitted 27 April, 2016; originally announced April 2016.

arXiv:1510.08106 [pdf, other]

doi 10.1103/PhysRevE.93.022308

Growth and Containment of a Hierarchical Criminal Network

Authors: Charles Z. Marshak, M. Puck Rombach, Andrea L. Bertozzi, Maria R. D'Orsogna

Abstract: We model the hierarchical evolution of an organized criminal network via antagonistic recruitment and pursuit processes. Within the recruitment phase, a criminal kingpin enlists new members into the network, who in turn seek out other affiliates. New recruits are linked to established criminals according to a probability distribution that depends on the current network structure. At the same time,… ▽ More We model the hierarchical evolution of an organized criminal network via antagonistic recruitment and pursuit processes. Within the recruitment phase, a criminal kingpin enlists new members into the network, who in turn seek out other affiliates. New recruits are linked to established criminals according to a probability distribution that depends on the current network structure. At the same time, law enforcement agents attempt to dismantle the growing organization using pursuit strategies that initiate on the lower level nodes and that unfold as self-avoiding random walks. The global details of the organization are unknown to law enforcement, who must explore the hierarchy node by node. We halt the pursuit when certain local criteria of the network are uncovered, encoding if and when an arrest is made; the criminal network is assumed to be eradicated if the kingpin is arrested. We first analyze recruitment and study the large scale properties of the growing network; later we add pursuit and use numerical simulations to study the eradication probability in the case of three pursuit strategies, the time to first eradication and related costs. Within the context of this model, we find that eradication becomes increasingly costly as the network increases in size and that the optimal way of arresting the kingpin is to intervene at the early stages of network formation. We discuss our results in the context of dark network disruption and their implications on possible law enforcement strategies. △ Less

Submitted 15 January, 2016; v1 submitted 27 October, 2015; originally announced October 2015.

Comments: 16 pages, 11 Figures; New title; Updated figures with color scheme better suited for colorblind readers and for gray scale printing

arXiv:1304.4679 [pdf, other]

A Method Based on Total Variation for Network Modularity Optimization using the MBO Scheme

Authors: Huiyi Hu, Thomas Laurent, Mason A. Porter, Andrea L. Bertozzi

Abstract: The study of network structure is pervasive in sociology, biology, computer science, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups of nodes called "communities". One popular approach to find communities is to maximize a quality function known as {\em modularity} to achieve some sort of optimal clustering of nodes. In… ▽ More The study of network structure is pervasive in sociology, biology, computer science, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups of nodes called "communities". One popular approach to find communities is to maximize a quality function known as {\em modularity} to achieve some sort of optimal clustering of nodes. In this paper, we interpret the modularity function from a novel perspective: we reformulate modularity optimization as a minimization problem of an energy functional that consists of a total variation term and an $\ell_2$ balance term. By employing numerical techniques from image processing and $\ell_1$ compressive sensing -- such as convex splitting and the Merriman-Bence-Osher (MBO) scheme -- we develop a variational algorithm for the minimization problem. We present our computational results using both synthetic benchmark networks and real data. △ Less

Submitted 17 April, 2013; originally announced April 2013.

Comments: 23 pages

MSC Class: 62H30; 91C20; 91D30; 94C15

arXiv:1211.7180 [pdf, other]

doi 10.1109/ICDMW.2012.72

Multislice Modularity Optimization in Community Detection and Image Segmentation

Authors: Huiyi Hu, Yves van Gennip, Blake Hunter, Mason A. Porter, Andrea L. Bertozzi

Abstract: Because networks can be used to represent many complex systems, they have attracted considerable attention in physics, computer science, sociology, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups (i.e., "communities") of nodes. In this paper, we algorithmically detect communities in social networks and image data by opt… ▽ More Because networks can be used to represent many complex systems, they have attracted considerable attention in physics, computer science, sociology, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups (i.e., "communities") of nodes. In this paper, we algorithmically detect communities in social networks and image data by optimizing multislice modularity. A key advantage of modularity optimization is that it does not require prior knowledge of the number or sizes of communities, and it is capable of finding network partitions that are composed of communities of different sizes. By optimizing multislice modularity and subsequently calculating diagnostics on the resulting network partitions, it is thereby possible to obtain information about network structure across multiple system scales. We illustrate this method on data from both social networks and images, and we find that optimization of multislice modularity performs well on these two tasks without the need for extensive problem-specific adaptation. However, improving the computational speed of this method remains a challenging open problem. △ Less

Submitted 30 November, 2012; originally announced November 2012.

Comments: 3 pages, 2 figures, to appear in IEEE International Conference on Data Mining PhD forum conference proceedings

arXiv:1206.4969 [pdf, other]

Community detection using spectral clustering on sparse geosocial data

Authors: Yves van Gennip, Blake Hunter, Raymond Ahn, Peter Elliott, Kyle Luh, Megan Halvorson, Shannon Reid, Matt Valasik, James Wo, George E. Tita, Andrea L. Bertozzi, P. Jeffrey Brantingham

Abstract: In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters i… ▽ More In this article we identify social communities among gang members in the Hollenbeck policing district in Los Angeles, based on sparse observations of a combination of social interactions and geographic locations of the individuals. This information, coming from LAPD Field Interview cards, is used to construct a similarity graph for the individuals. We use spectral clustering to identify clusters in the graph, corresponding to communities in Hollenbeck, and compare these with the LAPD's knowledge of the individuals' gang membership. We discuss different ways of encoding the geosocial information using a graph structure and the influence on the resulting clusterings. Finally we analyze the robustness of this technique with respect to noisy and incomplete data, thereby providing suggestions about the relative importance of quantity versus quality of collected data. △ Less

Submitted 8 November, 2012; v1 submitted 21 June, 2012; originally announced June 2012.

Comments: 22 pages, 6 figures (with subfigures)

MSC Class: 62H30; 91C20; 91D30; 94C15

Showing 1–33 of 33 results for author: Bertozzi, A L