-
Reconfiguration Algorithms for Cubic Modular Robots with Realistic Movement Constraints
Authors:
MIT--NASA Space Robots Team,
Josh Brunner,
Kenneth C. Cheung,
Erik D. Demaine,
Jenny Diomidova,
Christine Gregg,
Della H. Hendrickson,
Irina Kostitsyna
Abstract:
We introduce and analyze a model for self-reconfigurable robots made up of unit-cube modules. Compared to past models, our model aims to newly capture two important practical aspects of real-world robots. First, modules often do not occupy an exact unit cube, but rather have features like bumps extending outside the allotted space so that modules can interlock. Thus, for example, our model forbids…
▽ More
We introduce and analyze a model for self-reconfigurable robots made up of unit-cube modules. Compared to past models, our model aims to newly capture two important practical aspects of real-world robots. First, modules often do not occupy an exact unit cube, but rather have features like bumps extending outside the allotted space so that modules can interlock. Thus, for example, our model forbids modules from squeezing in between two other modules that are one unit distance apart. Second, our model captures the practical scenario of many passive modules assembled by a single robot, instead of requiring all modules to be able to move on their own.
We prove two universality results. First, with a supply of auxiliary modules, we show that any connected polycube structure can be constructed by a carefully aligned plane sweep. Second, without additional modules, we show how to construct any structure for which a natural notion of external feature size is at least a constant; this property largely consolidates forbidden-pattern properties used in previous works on reconfigurable modular robots.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
ASP-Completeness of Hamiltonicity in Grid Graphs, with Applications to Loop Puzzles
Authors:
MIT Hardness Group,
Josh Brunner,
Della Hendrickson,
Lily Chung,
Erik D. Demaine,
Andy Tockman
Abstract:
We prove that Hamiltonicity in maximum-degree-3 grid graphs (directed or undirected) is ASP-complete, i.e., it has a parsimonious reduction from every NP search problem (including a polynomial-time bijection between solutions). As a consequence, given k Hamiltonian cycles, it is NP-complete to find another; and counting Hamiltonian cycles is #P-complete. If we require the grid graph's vertices to…
▽ More
We prove that Hamiltonicity in maximum-degree-3 grid graphs (directed or undirected) is ASP-complete, i.e., it has a parsimonious reduction from every NP search problem (including a polynomial-time bijection between solutions). As a consequence, given k Hamiltonian cycles, it is NP-complete to find another; and counting Hamiltonian cycles is #P-complete. If we require the grid graph's vertices to form a full $m \times n$ rectangle, then we show that Hamiltonicity remains ASP-complete if the edges are directed or if we allow removing some edges (whereas including all undirected edges is known to be easy). These results enable us to develop a stronger "T-metacell" framework for proving ASP-completeness of rectangular puzzles, which requires building just a single gadget representing a degree-3 grid-graph vertex. We apply this general theory to prove ASP-completeness of 38 pencil-and-paper puzzles where the goal is to draw a loop subject to given constraints: Slalom, Onsen-meguri, Mejilink, Detour, Tapa-Like Loop, Kouchoku, Icelom; Masyu, Yajilin, Nagareru, Castle Wall, Moon or Sun, Country Road, Geradeweg, Maxi Loop, Mid-loop, Balance Loop, Simple Loop, Haisu, Reflect Link, Linesweeper; Vertex/Touch Slitherlink, Dotchi-Loop, Ovotovata, Building Walk, Rail Pool, Disorderly Loop, Ant Mill, Koburin, Mukkonn Enn, Rassi Silai, (Crossing) Ichimaga, Tapa, Canal View, Aqre, and Paintarea. The last 14 of these puzzles were not even known to be NP-hard. Along the way, we prove ASP-completeness of some simple forms of Tree-Residue Vertex-Breaking (TRVB), including planar multigraphs with degree-6 breakable vertices, or with degree-4 breakable and degree-1 unbreakable vertices.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Learning manipulation of steep granular slopes for fast Mini Rover turning
Authors:
Deniz Kerimoglu,
Daniel Soto,
Malone Lincoln Hemsley,
Joseph Brunner,
Sehoon Ha,
Tingnan Zhang,
Daniel I. Goldman
Abstract:
Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet's internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it very challenging for autonomous rov…
▽ More
Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet's internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it very challenging for autonomous rovers to traverse. Moreover, the navigation trajectories of rovers are heavily limited by the terrain topology and future systems will need to maneuver on flowable surfaces without getting trapped, allowing them to further expand their reach and increase mission efficiency.
In this work, we used a laboratory-scale rover robot and performed maneuvering experiments on a steep granular slope of poppy seeds to explore the rover's turning capabilities. The rover is capable of lifting, sweeping, and spinning its wheels, allowing it to execute leg-like gait patterns. The high-dimensional actuation capabilities of the rover facilitate effective manipulation of the underlying granular surface. We used Bayesian Optimization (BO) to gain insight into successful turning gaits in high dimensional search space and found strategies such as differential wheel spinning and pivoting around a single sweeping wheel. We then used these insights to further fine-tune the turning gait, enabling the rover to turn 90 degrees at just above 4 seconds with minimal slip. Combining gait optimization and human-tuning approaches, we found that fast turning is empowered by creating anisotropic torques with the sweeping wheel.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Complexity of Simple Folding of Mixed Orthogonal Crease Patterns
Authors:
Hugo Akitaya,
Josh Brunner,
Erik D. Demaine,
Dylan Hendrickson,
Victor Luo,
Andy Tockman
Abstract:
Continuing results from JCDCGGG 2016 and 2017, we solve several new cases of the simple foldability problem -- deciding which crease patterns can be folded flat by a sequence of (some model of) simple folds. We give new efficient algorithms for mixed crease patterns, where some creases are assigned mountain/valley while others are unassigned, for all 1D cases and for 2D rectangular paper with orth…
▽ More
Continuing results from JCDCGGG 2016 and 2017, we solve several new cases of the simple foldability problem -- deciding which crease patterns can be folded flat by a sequence of (some model of) simple folds. We give new efficient algorithms for mixed crease patterns, where some creases are assigned mountain/valley while others are unassigned, for all 1D cases and for 2D rectangular paper with orthogonal one-layer simple folds. By contrast, we show strong NP-completeness for mixed orthogonal crease patterns on 2D rectangular paper with some-layers simple folds, complementing a previous result for all-layers simple folds. We also prove strong NP-completeness for finite simple folds (no matter the number of layers) of unassigned orthogonal crease patterns on arbitrary paper, complementing a previous result for assigned crease patterns, and contrasting with a previous positive result for infinite all-layers simple folds. In total, we obtain a characterization of polynomial vs. NP-hard for all cases -- finite/infinite one/some/all-layers simple folds of assigned/unassigned/mixed orthogonal crease patterns on 1D/rectangular/arbitrary paper -- except the unsolved case of infinite all-layers simple folds of assigned orthogonal crease patterns on arbitrary paper.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Complexity of Reconfiguration in Surface Chemical Reaction Networks
Authors:
Robert M. Alaniz,
Josh Brunner,
Michael Coulombe,
Erik D. Demaine,
Jenny Diomidova,
Ryan Knobel,
Timothy Gomez,
Elise Grizzell,
Jayson Lynch,
Andrew Rodriguez,
Robert Schweller,
Tim Wylie
Abstract:
We analyze the computational complexity of basic reconfiguration problems for the recently introduced surface Chemical Reaction Networks (sCRNs), where ordered pairs of adjacent species nondeterministically transform into a different ordered pair of species according to a predefined set of allowed transition rules (chemical reactions). In particular, two questions that are fundamental to the simul…
▽ More
We analyze the computational complexity of basic reconfiguration problems for the recently introduced surface Chemical Reaction Networks (sCRNs), where ordered pairs of adjacent species nondeterministically transform into a different ordered pair of species according to a predefined set of allowed transition rules (chemical reactions). In particular, two questions that are fundamental to the simulation of sCRNs are whether a given configuration of molecules can ever transform into another given configuration, and whether a given cell can ever contain a given species, given a set of transition rules. We show that these problems can be solved in polynomial time, are NP-complete, or are PSPACE-complete in a variety of different settings, including when adjacent species just swap instead of arbitrary transformation (swap sCRNs), and when cells can change species a limited number of times (k-burnout). Most problems turn out to be at least NP-hard except with very few distinct species (2 or 3).
△ Less
Submitted 24 October, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Complexity of Solo Chess with Unlimited Moves
Authors:
Josh Brunner,
Lily Chung,
Michael Coulombe,
Erik D. Demaine,
Timothy Gomez,
Jayson Lynch
Abstract:
We analyze Solo Chess puzzles, where the input is an $n \times n$ board containing some standard Chess pieces of the same color, and the goal is to make a sequence of capture moves to reduce down to a single piece. Prior work analyzes this puzzle for a single piece type when each piece is limited to make at most two capture moves (as in the Solo Chess puzzles on chess.com). By contrast, we study w…
▽ More
We analyze Solo Chess puzzles, where the input is an $n \times n$ board containing some standard Chess pieces of the same color, and the goal is to make a sequence of capture moves to reduce down to a single piece. Prior work analyzes this puzzle for a single piece type when each piece is limited to make at most two capture moves (as in the Solo Chess puzzles on chess.com). By contrast, we study when each piece can make an unlimited number of capture moves. We show that any single piece type can be solved in polynomial time in a general model of piece types, while any two standard Chess piece types are NP-complete. We also analyze the restriction (as on chess.com) that one piece type is unique and must be the last surviving piece, showing that in this case some pairs of piece types become tractable while others remain hard.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
The Legend of Zelda: The Complexity of Mechanics
Authors:
Jeffrey Bosboom,
Josh Brunner,
Michael Coulombe,
Erik D. Demaine,
Dylan H. Hendrickson,
Jayson Lynch,
Elle Najt
Abstract:
We analyze some of the many game mechanics available to Link in the classic Legend of Zelda series of video games. In each case, we prove that the generalized game with that mechanic is polynomial, NP-complete, NP-hard and in PSPACE, or PSPACE-complete. In the process we give an overview of many of the hardness proof techniques developed for video games over the past decade: the motion-planning-th…
▽ More
We analyze some of the many game mechanics available to Link in the classic Legend of Zelda series of video games. In each case, we prove that the generalized game with that mechanic is polynomial, NP-complete, NP-hard and in PSPACE, or PSPACE-complete. In the process we give an overview of many of the hardness proof techniques developed for video games over the past decade: the motion-planning-through-gadgets framework, the planar doors framework, the doors-and-buttons framework, the "Nintendo" platform game / SAT framework, and the collectible tokens and toll roads / Hamiltonicity framework.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Orthogonal Fold & Cut
Authors:
Hayashi Ani,
Josh Brunner,
Erik D. Demaine,
Martin L. Demaine,
Dylan Hendrickson,
Victor Luo,
Rachana Madhukara
Abstract:
We characterize the cut patterns that can be produced by "orthogonal fold & cut": folding an axis-aligned rectangular sheet of paper along horizontal and vertical creases, and then making a single straight cut (at any angle). Along the way, we solve a handful of related problems: orthogonal fold & punch, 1D fold & cut, signed 1D fold & cut, and 1D interval fold & cut.
We characterize the cut patterns that can be produced by "orthogonal fold & cut": folding an axis-aligned rectangular sheet of paper along horizontal and vertical creases, and then making a single straight cut (at any angle). Along the way, we solve a handful of related problems: orthogonal fold & punch, 1D fold & cut, signed 1D fold & cut, and 1D interval fold & cut.
△ Less
Submitted 30 April, 2023; v1 submitted 2 February, 2022;
originally announced February 2022.
-
Arithmetic Expression Construction
Authors:
Leo Alcock,
Sualeh Asif,
Jeffrey Bosboom,
Josh Brunner,
Charlotte Chen,
Erik D. Demaine,
Rogers Epstein,
Adam Hesterberg,
Lior Hirschfeld,
William Hu,
Jayson Lynch,
Sarah Scheffler,
Lillian Zhang
Abstract:
When can $n$ given numbers be combined using arithmetic operators from a given subset of $\{+, -, \times, ÷\}$ to obtain a given target number? We study three variations of this problem of Arithmetic Expression Construction: when the expression (1) is unconstrained; (2) has a specified pattern of parentheses and operators (and only the numbers need to be assigned to blanks); or (3) must match a sp…
▽ More
When can $n$ given numbers be combined using arithmetic operators from a given subset of $\{+, -, \times, ÷\}$ to obtain a given target number? We study three variations of this problem of Arithmetic Expression Construction: when the expression (1) is unconstrained; (2) has a specified pattern of parentheses and operators (and only the numbers need to be assigned to blanks); or (3) must match a specified ordering of the numbers (but the operators and parenthesization are free). For each of these variants, and many of the subsets of $\{+,-,\times,÷\}$, we prove the problem NP-complete, sometimes in the weak sense and sometimes in the strong sense. Most of these proofs make use of a "rational function framework" which proves equivalence of these problems for values in rational functions with values in positive integers.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Complexity of Retrograde and Helpmate Chess Problems: Even Cooperative Chess is Hard
Authors:
Josh Brunner,
Erik D. Demaine,
Dylan Hendrickson,
Julian Wellman
Abstract:
We prove PSPACE-completeness of two classic types of Chess problems when generalized to n-by-n boards. A "retrograde" problem asks whether it is possible for a position to be reached from a natural starting position, i.e., whether the position is "valid" or "legal" or "reachable". Most real-world retrograde Chess problems ask for the last few moves of such a sequence; we analyze the decision quest…
▽ More
We prove PSPACE-completeness of two classic types of Chess problems when generalized to n-by-n boards. A "retrograde" problem asks whether it is possible for a position to be reached from a natural starting position, i.e., whether the position is "valid" or "legal" or "reachable". Most real-world retrograde Chess problems ask for the last few moves of such a sequence; we analyze the decision question which gets at the existence of an exponentially long move sequence. A "helpmate" problem asks whether it is possible for a player to become checkmated by any sequence of moves from a given position. A helpmate problem is essentially a cooperative form of Chess, where both players work together to cause a particular player to win; it also arises in regular Chess games, where a player who runs out of time (flags) loses only if they could ever possibly be checkmated from the current position (i.e., the helpmate problem has a solution). Our PSPACE-hardness reductions are from a variant of a puzzle game called Subway Shuffle.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
1 x 1 Rush Hour with Fixed Blocks is PSPACE-complete
Authors:
Josh Brunner,
Lily Chung,
Erik D. Demaine,
Dylan Hendrickson,
Adam Hesterberg,
Adam Suhl,
Avi Zeff
Abstract:
Consider $n^2-1$ unit-square blocks in an $n \times n$ square board, where each block is labeled as movable horizontally (only), movable vertically (only), or immovable -- a variation of Rush Hour with only $1 \times 1$ cars and fixed blocks. We prove that it is PSPACE-complete to decide whether a given block can reach the left edge of the board, by reduction from Nondeterministic Constraint Logic…
▽ More
Consider $n^2-1$ unit-square blocks in an $n \times n$ square board, where each block is labeled as movable horizontally (only), movable vertically (only), or immovable -- a variation of Rush Hour with only $1 \times 1$ cars and fixed blocks. We prove that it is PSPACE-complete to decide whether a given block can reach the left edge of the board, by reduction from Nondeterministic Constraint Logic via 2-color oriented Subway Shuffle. By contrast, polynomial-time algorithms are known for deciding whether a given block can be moved by one space, or when each block either is immovable or can move both horizontally and vertically. Our result answers a 15-year-old open problem by Tromp and Cilibrasi, and strengthens previous PSPACE-completeness results for Rush Hour with vertical $1 \times 2$ and horizontal $2 \times 1$ movable blocks and 4-color Subway Shuffle.
△ Less
Submitted 1 May, 2020; v1 submitted 22 March, 2020;
originally announced March 2020.
-
Unsupervised Star Galaxy Classification with Cascade Variational Auto-Encoder
Authors:
Hao Sun,
Jiadong Guo,
Edward J. Kim,
Robert J. Brunner
Abstract:
The increasing amount of data in astronomy provides great challenges for machine learning research. Previously, supervised learning methods achieved satisfactory recognition accuracy for the star-galaxy classification task, based on manually labeled data set. In this work, we propose a novel unsupervised approach for the star-galaxy recognition task, namely Cascade Variational Auto-Encoder (CasVAE…
▽ More
The increasing amount of data in astronomy provides great challenges for machine learning research. Previously, supervised learning methods achieved satisfactory recognition accuracy for the star-galaxy classification task, based on manually labeled data set. In this work, we propose a novel unsupervised approach for the star-galaxy recognition task, namely Cascade Variational Auto-Encoder (CasVAE). Our empirical results show our method outperforms the baseline model in both accuracy and stability.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
An Optimal Algorithm for Online Freeze-tag
Authors:
Josh Brunner,
Julian Wellman
Abstract:
In the freeze-tag problem, one active robot must wake up many frozen robots. The robots are considered as points in a metric space, where active robots move at a constant rate and activate other robots by visiting them. In the (time-dependent) online variant of the problem, frozen robots are not revealed until a specified time. Hammar, Nilsson, and Persson have shown that no online algorithm can a…
▽ More
In the freeze-tag problem, one active robot must wake up many frozen robots. The robots are considered as points in a metric space, where active robots move at a constant rate and activate other robots by visiting them. In the (time-dependent) online variant of the problem, frozen robots are not revealed until a specified time. Hammar, Nilsson, and Persson have shown that no online algorithm can achieve a competitive ratio better than $7/3$ for online freeze-tag, and asked whether there is any $O(1)$-competitive algorithm. In this paper, we provide a $(1+\sqrt{2})$-competitive algorithm for online time-dependent freeze-tag, and show that no algorithm can achieve a lower competitive ratio on every metric space.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Extended Isolation Forest
Authors:
Sahand Hariri,
Matias Carrasco Kind,
Robert J. Brunner
Abstract:
We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this proble…
▽ More
We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this problem in detail and demonstrate the mechanism by which it occurs visually. We then propose two different approaches for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out the bias. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach results in remedying the artifact seen in the anomaly score heat maps. We show that the robustness of the algorithm is much improved using this method by looking at the variance of scores of data points distributed along constant level sets. We report AUROC and AUPRC for our synthetic datasets, along with real-world benchmark datasets. We find no appreciable difference in the rate of convergence nor in computation time between the standard Isolation Forest and EIF.
△ Less
Submitted 8 July, 2020; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Star-galaxy Classification Using Deep Convolutional Neural Networks
Authors:
Edward J. Kim,
Robert J. Brunner
Abstract:
Most existing star-galaxy classifiers use the reduced summary information from catalogs, requiring careful feature extraction and selection. The latest advances in machine learning that use deep convolutional neural networks allow a machine to automatically learn the features directly from data, minimizing the need for input from human experts. We present a star-galaxy classification framework tha…
▽ More
Most existing star-galaxy classifiers use the reduced summary information from catalogs, requiring careful feature extraction and selection. The latest advances in machine learning that use deep convolutional neural networks allow a machine to automatically learn the features directly from data, minimizing the need for input from human experts. We present a star-galaxy classification framework that uses deep convolutional neural networks (ConvNets) directly on the reduced, calibrated pixel values. Using data from the Sloan Digital Sky Survey (SDSS) and the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS), we demonstrate that ConvNets are able to produce accurate and well-calibrated probabilistic classifications that are competitive with conventional machine learning techniques. Future advances in deep learning may bring more success with current and forthcoming photometric surveys, such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope (LSST), because deep neural networks require very little, manual feature engineering.
△ Less
Submitted 13 October, 2016; v1 submitted 15 August, 2016;
originally announced August 2016.
-
Innovation diffusion equations on correlated scale-free networks
Authors:
M. L. Bertotti,
J. Brunner,
G. Modanese
Abstract:
We introduce a heterogeneous network structure into the Bass diffusion model, in order to study the diffusion times of innovation or information in networks with a scale-free structure, typical of regions where diffusion is sensitive to geographic and logistic influences (like for instance Alpine regions). We consider both the diffusion peak times of the total population and of the link classes. I…
▽ More
We introduce a heterogeneous network structure into the Bass diffusion model, in order to study the diffusion times of innovation or information in networks with a scale-free structure, typical of regions where diffusion is sensitive to geographic and logistic influences (like for instance Alpine regions). We consider both the diffusion peak times of the total population and of the link classes. In the familiar trickle-down processes the adoption curve of the hubs is found to anticipate the total adoption in a predictable way. In a major departure from the standard model, we model a trickle-up process by introducing heterogeneous publicity coefficients (which can also be negative for the hubs, thus turning them into stiflers) and a stochastic term which represents the erratic generation of innovation at the periphery of the network. The results confirm the robustness of the Bass model and expand considerably its range of applicability.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.
-
The Bass diffusion model on networks with correlations and inhomogeneous advertising
Authors:
M. L. Bertotti,
J. Brunner,
G. Modanese
Abstract:
The Bass model, which is an effective forecasting tool for innovation diffusion based on large collections of empirical data, assumes an homogeneous diffusion process. We introduce a network structure into this model and we investigate numerically the dynamics in the case of networks with link density $P(k)=c/k^γ$, where $k=1, \ldots , N$. The resulting curve of the total adoptions in time is qual…
▽ More
The Bass model, which is an effective forecasting tool for innovation diffusion based on large collections of empirical data, assumes an homogeneous diffusion process. We introduce a network structure into this model and we investigate numerically the dynamics in the case of networks with link density $P(k)=c/k^γ$, where $k=1, \ldots , N$. The resulting curve of the total adoptions in time is qualitatively similar to the homogeneous Bass curve corresponding to a case with the same average number of connections. The peak of the adoptions, however, tends to occur earlier, particularly when $γ$ and $N$ are large (i.e., when there are few hubs with a large maximum number of connections). Most interestingly, the adoption curve of the hubs anticipates the total adoption curve in a predictable way, with peak times which can be, for instance when $N=100$, between 10% and 60% of the total adoptions peak. This may allow to monitor the hubs for forecasting purposes. We also consider the case of networks with assortative and disassortative correlations and a case of inhomogeneous advertising where the publicity terms are "targeted" on the hubs while maintaining their total cost constant.
△ Less
Submitted 19 May, 2016;
originally announced May 2016.
-
Teaching Data Science
Authors:
Robert J. Brunner,
Edward J. Kim
Abstract:
We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language with an emphasis on data preparation, processing, and presentation. The course had no prerequisites, and students were not expected to have any programming exper…
▽ More
We describe an introductory data science course, entitled Introduction to Data Science, offered at the University of Illinois at Urbana-Champaign. The course introduced general programming concepts by using the Python programming language with an emphasis on data preparation, processing, and presentation. The course had no prerequisites, and students were not expected to have any programming experience. This introductory course was designed to cover a wide range of topics, from the nature of data, to storage, to visualization, to probability and statistical analysis, to cloud and high performance computing, without becoming overly focused on any one subject. We conclude this article with a discussion of lessons learned and our plans to develop new data science courses.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.
-
SOMz: photometric redshift PDFs with self organizing maps and random atlas
Authors:
M. Carrasco Kind,
R. J. Brunner
Abstract:
In this paper we explore the applicability of the unsupervised machine learning technique of Self Organizing Maps (SOM) to estimate galaxy photometric redshift probability density functions (PDFs). This technique takes a spectroscopic training set, and maps the photometric attributes, but not the redshifts, to a two dimensional surface by using a process of competitive learning where neurons compe…
▽ More
In this paper we explore the applicability of the unsupervised machine learning technique of Self Organizing Maps (SOM) to estimate galaxy photometric redshift probability density functions (PDFs). This technique takes a spectroscopic training set, and maps the photometric attributes, but not the redshifts, to a two dimensional surface by using a process of competitive learning where neurons compete to more closely resemble the training data multidimensional space. The key feature of a SOM is that it retains the topology of the input set, revealing correlations between the attributes that are not easily identified. We test three different 2D topological mapping: rectangular, hexagonal, and spherical, by using data from the DEEP2 survey. We also explore different implementations and boundary conditions on the map and also introduce the idea of a random atlas where a large number of different maps are created and their individual predictions are aggregated to produce a more robust photometric redshift PDF. We also introduced a new metric, the $I$-score, which efficiently incorporates different metrics, making it easier to compare different results (from different parameters or different photometric redshift codes). We find that by using a spherical topology mapping we obtain a better representation of the underlying multidimensional topology, which provides more accurate results that are comparable to other, state-of-the-art machine learning algorithms. Our results illustrate that unsupervised approaches have great potential for many astronomical problems, and in particular for the computation of photometric redshifts.
△ Less
Submitted 18 December, 2013;
originally announced December 2013.
-
Bring out your codes! Bring out your codes! (Increasing Software Visibility and Re-use)
Authors:
Alice Allen,
Bruce Berriman,
Robert Brunner,
Dan Burger,
Kimberly DuPrie,
Robert J. Hanisch,
Robert Mann,
Jessica Mink,
Christer Sandin,
Keith Shortridge,
Peter Teuben
Abstract:
Progress is being made in code discoverability and preservation, but as discussed at ADASS XXI, many codes still remain hidden from public view. With the Astrophysics Source Code Library (ASCL) now indexed by the SAO/NASA Astrophysics Data System (ADS), the introduction of a new journal, Astronomy & Computing, focused on astrophysics software, and the increasing success of education efforts such a…
▽ More
Progress is being made in code discoverability and preservation, but as discussed at ADASS XXI, many codes still remain hidden from public view. With the Astrophysics Source Code Library (ASCL) now indexed by the SAO/NASA Astrophysics Data System (ADS), the introduction of a new journal, Astronomy & Computing, focused on astrophysics software, and the increasing success of education efforts such as Software Carpentry and SciCoder, the community has the opportunity to set a higher standard for its science by encouraging the release of software for examination and possible reuse. We assembled representatives of the community to present issues inhibiting code release and sought suggestions for tackling these factors.
The session began with brief statements by panelists; the floor was then opened for discussion and ideas. Comments covered a diverse range of related topics and points of view, with apparent support for the propositions that algorithms should be readily available, code used to produce published scientific results should be made available, and there should be discovery mechanisms to allow these to be found easily. With increased use of resources such as GitHub (for code availability), ASCL (for code discovery), and a stated strong preference from the new journal Astronomy & Computing for code release, we expect to see additional progress over the next few years.
△ Less
Submitted 9 December, 2012;
originally announced December 2012.
-
Robust Machine Learning Applied to Terascale Astronomical Datasets
Authors:
Nicholas M. Ball,
Robert J. Brunner,
Adam D. Myers
Abstract:
We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not…
▽ More
We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss issues encountered to-date, and further issues for the transition to petascale. In particular, disk I/O will become a major limiting factor unless the necessary infrastructure is implemented.
△ Less
Submitted 21 April, 2008;
originally announced April 2008.