Zum Hauptinhalt springen

Showing 1–41 of 41 results for author: Fang, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.12262  [pdf, other

    cs.LG stat.ML

    Investigating Data Usage for Inductive Conformal Predictors

    Authors: Yizirui Fang, Anthony Bellotti

    Abstract: Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibrat… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2403.18994  [pdf, other

    stat.ML cs.LG

    Causal-StoNet: Causal Inference for High-Dimensional Complex Data

    Authors: Yaxin Fang, Faming Liang

    Abstract: With the advancement of data science, the collection of increasingly complex datasets has become commonplace. In such datasets, the data dimension can be extremely high, and the underlying data generation process can be unknown and highly nonlinear. As a result, the task of making causal inference with high-dimensional complex data has become a fundamental problem in many disciplines, such as medi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  3. arXiv:2402.03933  [pdf

    cs.SE stat.AP

    Development of a Evaluation Tool for Age-Appropriate Software in Aging Environments: A Delphi Study

    Authors: Zhenggang Bai, Yougxiang Fang, Hongtu Chen, Xinru Chen, Ning An, Min Zhang, Guoxin Rui, Jing Jin

    Abstract: Objective: We aimed to develop a dependable reliable tool for assessing software ageappropriateness. Methods: We conducted a systematic review to get the indicators of technology ageappropriateness from studies from January 2000 to April 2023.This study engaged 25 experts from the fields of anthropology, sociology,and social technology research across, three rounds of Delphi consultations were con… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  4. arXiv:2307.07442  [pdf

    stat.ME

    Sensitivity Analysis for Unmeasured Confounding in Medical Product Development and Evaluation Using Real World Evidence

    Authors: Peng Ding, Yixin Fang, Doug Faries, Susan Gruber, Hana Lee, Joo-Yeon Lee, Pallavi Mishra-Kalyani, Mingyang Shan, Mark van der Laan, Shu Yang, Xiang Zhang

    Abstract: The American Statistical Association Biopharmaceutical Section (ASA BIOP) working group on real-world evidence (RWE) has been making continuous, extended effort towards a goal of supporting and advancing regulatory science with respect to non-interventional, clinical studies intended to use real-world data for evidence generation for the purpose of medical product development and evaluation (i.e.,… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: 17 pages, 2 figures

  5. arXiv:2303.14211  [pdf, other

    stat.ME

    Tackling the infinite likelihood problem when fitting mixtures of shifted asymmetric Laplace distributions

    Authors: Yuan Fang, Brian C. Franczak, Sanjeena Subedi

    Abstract: Mixtures of shifted asymmetric Laplace distributions were introduced as a tool for model-based clustering that allowed for the direct parameterization of skewness in addition to location and scale. Following common practices, an expectation-maximization algorithm was developed to fit these mixtures. However, adaptations to account for the `infinite likelihood problem' led to fits that gave good cl… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  6. arXiv:2302.08606  [pdf, other

    stat.ML cs.LG stat.ME

    Intrinsic and extrinsic deep learning on manifolds

    Authors: Yihao Fang, Ilsang Ohn, Vijay Gupta, Lizhen Lin

    Abstract: We propose extrinsic and intrinsic deep neural network architectures as general frameworks for deep learning on manifolds. Specifically, extrinsic deep neural networks (eDNNs) preserve geometric features on manifolds by utilizing an equivariant embedding from the manifold to its image in the Euclidean space. Moreover, intrinsic deep neural networks (iDNNs) incorporate the underlying intrinsic geom… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  7. arXiv:2211.15943  [pdf, other

    math.OC stat.CO stat.ML

    Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems

    Authors: Yuchen Fang, Sen Na, Michael W. Mahoney, Mladen Kolar

    Abstract: We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where at each step a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to th… ▽ More

    Submitted 28 January, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: 10 figures, 33 pages

  8. Learning and Inference in Sparse Coding Models with Langevin Dynamics

    Authors: Michael Y. -S. Fang, Mayur Mudigonda, Ryan Zarcone, Amir Khosrowshahi, Bruno A. Olshausen

    Abstract: We describe a stochastic, dynamical system capable of inference and learning in a probabilistic latent variable model. The most challenging problem in such models - sampling the posterior distribution over latent variables - is proposed to be solved by harnessing natural sources of stochasticity inherent in electronic and neural systems. We demonstrate this idea for a sparse coding model by derivi… ▽ More

    Submitted 23 April, 2022; originally announced April 2022.

  9. arXiv:2203.11748  [pdf, other

    stat.ME

    On p-value combination of independent and frequent signals: asymptotic efficiency and Fisher ensemble

    Authors: Yusi Fang, Chung Chang, George Tseng

    Abstract: Combining p-values to integrate multiple effects is of long-standing interest in social science and biomedical research. In this paper, we focus on revisiting a classical scenario closely related to meta-analysis, which combines a relatively small (finite and fixed) number of p-values while the sample size for generating each p-value is large (asymptotically goes to infinity). We evaluate a list o… ▽ More

    Submitted 14 April, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: 25 pages, 1 table, 5figures

  10. arXiv:2112.05818  [pdf, other

    stat.ME

    Association study between gene expression and multiple phenotypes in omics applications of complex diseases

    Authors: Yujia Li, Yusi Fang, Peng Liu, George C. Tseng

    Abstract: Studying phenotype-gene association can uncover mechanism of diseases and develop efficient treatments. In complex disease where multiple phenotypes are available and correlated, analyzing and interpreting associated genes for each phenotype respectively may decrease statistical power and lose intepretation due to not considering the correlation between phenotypes. The typical approaches are many… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  11. arXiv:2109.12962  [pdf, other

    stat.CO stat.AP stat.ME

    pyStoNED: A Python Package for Convex Regression and Frontier Estimation

    Authors: Sheng Dai, Yu-Hsueh Fang, Chia-Yen Lee, Timo Kuosmanen

    Abstract: Shape-constrained nonparametric regression is a growing area in econometrics, statistics, operations research, machine learning and related fields. In the field of productivity and efficiency analysis, recent developments in the multivariate convex regression and related techniques such as convex quantile regression and convex expectile regression have bridged the long-standing gap between the con… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  12. arXiv:2109.00539  [pdf, other

    stat.ME cs.LG

    Spatially and Robustly Hybrid Mixture Regression Model for Inference of Spatial Dependence

    Authors: Wennan Chang, Pengtao Dang, Changlin Wan, Xiaoyu Lu, Yue Fang, Tong Zhao, Yong Zang, Bo Li, Chi Zhang, Sha Cao

    Abstract: In this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression mode… ▽ More

    Submitted 28 September, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: Accepted by ICDM IEEE 2021

  13. arXiv:2106.13097  [pdf, other

    cs.LG stat.ML

    Understanding the Spread of COVID-19 Epidemic: A Spatio-Temporal Point Process View

    Authors: Shuang Li, Lu Wang, Xinyun Chen, Yixiang Fang, Yan Song

    Abstract: Since the first coronavirus case was identified in the U.S. on Jan. 21, more than 1 million people in the U.S. have confirmed cases of COVID-19. This infectious respiratory disease has spread rapidly across more than 3000 counties and 50 states in the U.S. and have exhibited evolutionary clustering and complex triggering patterns. It is essential to understand the complex spacetime intertwined pro… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  14. arXiv:2103.12967  [pdf, other

    stat.ME

    Heavy-tailed distribution for combining dependent $p$-values with asymptotic robustness

    Authors: Yusi Fang, George C. Tseng, Chung Chang

    Abstract: The issue of combining individual $p$-values to aggregate multiple small effects is prevalent in many scientific investigations and is a long-standing statistical topic. Many classical methods are designed for combining independent and frequent signals in a traditional meta-analysis sense using the sum of transformed $p$-values with the transformation of light-tailed distributions, in which Fisher… ▽ More

    Submitted 7 September, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: 34 pages, 3 figures

  15. arXiv:2011.06682  [pdf, other

    stat.ME stat.CO

    Clustering microbiome data using mixtures of logistic normal multinomial models

    Authors: Yuan Fang, Sanjeena Subedi

    Abstract: Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional. Analyzing compositional data presents many challenges because they are restricted on a simplex. In… ▽ More

    Submitted 21 June, 2022; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: 53 pages, 6 Figures

    MSC Class: 62H30

  16. arXiv:2009.05102  [pdf, other

    cs.CV cs.LG stat.ML

    Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient Object Detection

    Authors: Xuehao Wang, Shuai Li, Chenglizhao Chen, Yuming Fang, Aimin Hao, Hong Qin

    Abstract: Existing RGB-D salient object detection methods treat depth information as an independent component to complement its RGB part, and widely follow the bi-stream parallel network architecture. To selectively fuse the CNNs features extracted from both RGB and depth as a final result, the state-of-the-art (SOTA) bi-stream networks usually consist of two independent subbranches; i.e., one subbranch is… ▽ More

    Submitted 7 August, 2020; originally announced September 2020.

  17. arXiv:2008.09624  [pdf, other

    cs.LG stat.ML

    Optimization of Graph Neural Networks with Natural Gradient Descent

    Authors: Mohammad Rasool Izadi, Yihao Fang, Robert Stevenson, Lizhen Lin

    Abstract: In this work, we propose to employ information-geometric tools to optimize a graph neural network architecture such as the graph convolutional networks. More specifically, we develop optimization algorithms for the graph-based semi-supervised learning by employing the natural gradient information in the optimization process. This allows us to efficiently exploit the geometry of the underlying stat… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  18. arXiv:2007.11123  [pdf, other

    stat.ME stat.AP

    Outcome-Guided Disease Subtyping for High-Dimensional Omics Data

    Authors: Peng Liu, Yusi Fang, Zhao Ren, Lu Tang, George C. Tseng

    Abstract: High-throughput microarray and sequencing technology have been used to identify disease subtypes that could not be observed otherwise by using clinical variables alone. The classical unsupervised clustering strategy concerns primarily the identification of subpopulations that have similar patterns in gene features. However, as the features corresponding to irrelevant confounders (e.g. gender or ag… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: 29 pages in total, 4 figures, 2 tables and 1 supplement

  19. arXiv:2007.08053  [pdf, other

    cs.LG cs.SI stat.ML

    Inductive Link Prediction for Nodes Having Only Attribute Information

    Authors: Yu Hao, Xin Cao, Yixiang Fang, Xike Xie, Sibo Wang

    Abstract: Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductive prediction for new nodes having only attribute i… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: IJCAI2020

  20. arXiv:2007.05188  [pdf, other

    cs.LG cs.CE econ.EM stat.ML

    Intelligent Credit Limit Management in Consumer Loans Based on Causal Inference

    Authors: Hang Miao, Kui Zhao, Zhun Wang, Linbo Jiang, Quanhui Jia, Yanming Fang, Quan Yu

    Abstract: Nowadays consumer loan plays an important role in promoting the economic growth, and credit cards are the most popular consumer loan. One of the most essential parts in credit cards is the credit limit management. Traditionally, credit limits are adjusted based on limited heuristic strategies, which are developed by experienced professionals. In this paper, we present a data-driven approach to man… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Comments: 7 pages

  21. arXiv:2005.08189  [pdf, other

    cs.LG stat.ML

    Multi-View Collaborative Network Embedding

    Authors: Sezin Kircali Ata, Yuan Fang, Min Wu, Jiaqi Shi, Chee Keong Kwoh, Xiaoli Li

    Abstract: Real-world networks often exist with multiple views, where each view describes one type of interaction among a common set of nodes. For example, on a video-sharing network, while two user nodes are linked if they have common favorite videos in one view, they can also be linked in another view if they share common subscribers. Unlike traditional single-view networks, multiple views maintain differe… ▽ More

    Submitted 17 December, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Accepted for publication in the ACM Transactions on Knowledge Discovery from Data, TKDD

    Journal ref: ACM Trans. Knowl. Discov. Data 15, 3, Article 39 (April 2021), 18 pages

  22. arXiv:2005.05324  [pdf, other

    stat.ME

    Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data

    Authors: Yuan Fang, Dimitris Karlis, Sanjeena Subedi

    Abstract: Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. However, for cluster analysis, using a traditional finite mixture model framework, either the number of components needs to be known $a$-$priori$ or needs to be estimated $a$-$posteriori$ using some model selection criterion after deriving result… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: 61 pages. arXiv admin note: text overlap with arXiv:2005.02585

    MSC Class: 62H30

  23. arXiv:2005.02585  [pdf, other

    stat.CO

    A Bayesian approach for clustering skewed data using mixtures of multivariate normal-inverse Gaussian distributions

    Authors: Yuan Fang, Dimitris Karlis, Sanjeena Subedi

    Abstract: Non-Gaussian mixture models are gaining increasing attention for mixture model-based clustering particularly when dealing with data that exhibit features such as skewness and heavy tails. Here, such a mixture distribution is presented, based on the multivariate normal inverse Gaussian (MNIG) distribution. For parameter estimation of the mixture, a Bayesian approach via Gibbs sampler is used; for t… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    Comments: 40 pages, 7 figures

    MSC Class: 62H30

  24. arXiv:2005.00718  [pdf, other

    cs.LG stat.ML

    Large-scale Uncertainty Estimation and Its Application in Revenue Forecast of SMEs

    Authors: Zebang Zhang, Kui Zhao, Kai Huang, Quanhui Jia, Yanming Fang, Quan Yu

    Abstract: The economic and banking importance of the small and medium enterprise (SME) sector is well recognized in contemporary society. Business credit loans are very important for the operation of SMEs, and the revenue is a key indicator of credit limit management. Therefore, it is very beneficial to construct a reliable revenue forecasting model. If the uncertainty of an enterprise's revenue forecasting… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

  25. arXiv:2004.00201  [pdf, other

    cs.LG q-fin.ST stat.ML

    NetDP: An Industrial-Scale Distributed Network Representation Framework for Default Prediction in Ant Credit Pay

    Authors: Jianbin Lin, Zhiqiang Zhang, Jun Zhou, Xiaolong Li, Jingli Fang, Yanming Fang, Quan Yu, Yuan Qi

    Abstract: Ant Credit Pay is a consumer credit service in Ant Financial Service Group. Similar to credit card, loan default is one of the major risks of this credit product. Hence, effective algorithm for default prediction is the key to losses reduction and profits increment for the company. However, the challenges facing in our scenario are different from those in conventional credit card service. The firs… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.

    Comments: 2018 IEEE International Conference on Big Data (Big Data)

  26. arXiv:2003.01171  [pdf, other

    cs.SI cs.CR cs.LG stat.ML

    A Semi-supervised Graph Attentive Network for Financial Fraud Detection

    Authors: Daixin Wang, Jianbin Lin, Peng Cui, Quanhui Jia, Zhen Wang, Yanming Fang, Quan Yu, Jun Zhou, Shuang Yang, Yuan Qi

    Abstract: With the rapid growth of financial services, fraud detection has been a very important problem to guarantee a healthy environment for both users and providers. Conventional solutions for fraud detection mainly use some rule-based methods or distract some features manually to perform prediction. However, in financial services, users have rich interactions and they themselves always show multifacete… ▽ More

    Submitted 28 February, 2020; originally announced March 2020.

    Comments: icdm

  27. arXiv:1812.11262  [pdf

    cs.LG stat.ML

    Autoencoder Based Residual Deep Networks for Robust Regression Prediction and Spatiotemporal Estimation

    Authors: Lianfa Li, Ying Fang, Jun Wu, Jinfeng Wang

    Abstract: To have a superior generalization, a deep learning neural network often involves a large size of training sample. With increase of hidden layers in order to increase learning ability, neural network has potential degradation in accuracy. Both could seriously limit applicability of deep learning in some domains particularly involving predictions of continuous variables with a small size of samples.… ▽ More

    Submitted 28 December, 2018; originally announced December 2018.

    Comments: including supplementary materials

  28. arXiv:1808.05347  [pdf, other

    cs.LG stat.ML

    Tool Breakage Detection using Deep Learning

    Authors: Guang Li, Xin Yang, Duanbing Chen, Anxing Song, Yuke Fang, Junlin Zhou

    Abstract: In manufacture, steel and other metals are mainly cut and shaped during the fabrication process by computer numerical control (CNC) machines. To keep high productivity and efficiency of the fabrication process, engineers need to monitor the real-time process of CNC machines, and the lifetime management of machine tools. In a real manufacturing process, breakage of machine tools usually happens wit… ▽ More

    Submitted 16 August, 2018; originally announced August 2018.

    Comments: 6 pages,BCD2018

  29. arXiv:1803.04640  [pdf, other

    q-bio.QM stat.AP

    Bayesian Detection of Abnormal ADS in Mutant Caenorhabditis elegans Embryos

    Authors: Wei Liang, Yuxiao Yang, Yusi Fang, Zhongying Zhao, Jie Hu

    Abstract: Cell division timing is critical for cell fate specification and morphogenesis during embryogenesis. How division timings are regulated among cells during development is poorly understood. Here we focus on the comparison of asynchrony of division between sister cells (ADS) between wild-type and mutant individuals of Caenorhabditis elegans. Since the replicate number of mutant individuals of each m… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

  30. arXiv:1707.00192  [pdf, other

    stat.ML cs.LG

    On Scalable Inference with Stochastic Gradient Descent

    Authors: Yixin Fang, Jinfeng Xu, Lei Yang

    Abstract: In many applications involving large dataset or online updating, stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates and has gained increasing popularity due to its numerical convenience and memory efficiency. While the asymptotic properties of SGD-based estimators have been established decades ago, statistical inference such as interval estimation remains much… ▽ More

    Submitted 1 July, 2017; originally announced July 2017.

  31. arXiv:1705.03831  [pdf, ps, other

    stat.CO

    Quasi-Reliable Estimates of Effective Sample Size

    Authors: Youhan Fang, Yudong Cao, Robert D. Skeel

    Abstract: The efficiency of a Markov chain Monte Carlo algorithm might be measured by the cost of generating one independent sample, or equivalently, the total cost divided by the effective sample size, defined in terms of the integrated autocorrelation time. To ensure the reliability of such an estimate, it is suggested that there be an adequate sampling of state space--- to the extent that this can be det… ▽ More

    Submitted 11 May, 2017; v1 submitted 10 May, 2017; originally announced May 2017.

  32. arXiv:1701.03772  [pdf, other

    stat.ME math.ST

    Additive Partially Linear Models for Massive Heterogeneous Data

    Authors: Binhuan Wang, Yixin Fang, Heng Lian, Hua Liang

    Abstract: We consider an additive partially linear framework for modelling massive heterogeneous data. The major goal is to extract multiple common features simultaneously across all sub-populations while exploring heterogeneity of each sub-population. We propose an aggregation type of estimators for the commonality parameters that possess the asymptotic optimal bounds and the asymptotic distributions as if… ▽ More

    Submitted 28 December, 2018; v1 submitted 13 January, 2017; originally announced January 2017.

  33. arXiv:1603.07427  [pdf, other

    stat.ME

    Penalized Weighted Least Squares for Outlier Detection and Robust Regression

    Authors: Xiaoli Gao, Yixin Fang

    Abstract: To conduct regression analysis for data contaminated with outliers, many approaches have been proposed for simultaneous outlier detection and robust regression, so is the approach proposed in this manuscript. This new approach is called "penalized weighted least squares" (PWLS). By assigning each observation an individual weight and incorporating a lasso-type penalty on the log-transformation of t… ▽ More

    Submitted 23 March, 2016; originally announced March 2016.

    Comments: 27 pages; 4 figures

  34. arXiv:1601.04586  [pdf, other

    stat.ME cs.LG stat.ML

    Sparse Convex Clustering

    Authors: Binhuan Wang, Yilong Zhang, Will Wei Sun, Yixin Fang

    Abstract: Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scena… ▽ More

    Submitted 10 February, 2017; v1 submitted 18 January, 2016; originally announced January 2016.

  35. arXiv:1503.03970  [pdf, other

    stat.AP

    Flexible combination of multiple diagnostic biomarkers to improve diagnostic accuracy

    Authors: Tu Xu, Yixin Fang, Alan Rong, Junhui Wang

    Abstract: In medical research, it is common to collect information of multiple continuous biomarkers to improve the accuracy of diagnostic tests. Combining the measurements of these biomarkers into one single score is a popular practice to integrate the collected information, where the accuracy of the resultant diagnostic test is usually improved. To measure the accuracy of a diagnostic test, the Youden ind… ▽ More

    Submitted 7 July, 2015; v1 submitted 13 March, 2015; originally announced March 2015.

  36. arXiv:1402.7107  [pdf, ps, other

    physics.comp-ph stat.CO

    Compressible Generalized Hybrid Monte Carlo

    Authors: Youhan Fang, Jesus-Maria Sanz-Serna, Robert D. Skeel

    Abstract: One of the most demanding calculations is to generate random samples from a specified probability distribution (usually with an unknown normalizing prefactor) in a high-dimensional configuration space. One often has to resort to using a Markov chain Monte Carlo method, which converges only in the limit to the prescribed distribution. Such methods typically inch through configuration space step by… ▽ More

    Submitted 27 February, 2014; originally announced February 2014.

    Comments: 27 pages, 2 figures

  37. arXiv:1402.1835  [pdf, other

    stat.AP

    A model-free estimation for the covariate-adjusted Youden index and its associated cut-point

    Authors: Tu Xu, Junhui Wang, Yixin Fang

    Abstract: In medical research, continuous markers are widely employed in diagnostic tests to distinguish diseased and non-diseased subjects. The accuracy of such diagnostic tests is commonly assessed using the receiver operating characteristic (ROC) curve. To summarize an ROC curve and determine its optimal cut-point, the Youden index is popularly used. In literature, estimation of the Youden index has been… ▽ More

    Submitted 8 February, 2014; originally announced February 2014.

  38. arXiv:1308.3416  [pdf, other

    stat.ME stat.CO

    Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

    Authors: Yixin Fang, Binhuan Wang, Yang Feng

    Abstract: Recently many regularized estimators of large covariance matrices have been proposed, and the tuning parameters in these estimators are usually selected via cross-validation. However, there is no guideline on the number of folds for conducting cross-validation and there is no comparison between cross-validation and the methods based on bootstrap. Through extensive simulations, we suggest 10-fold c… ▽ More

    Submitted 15 August, 2013; originally announced August 2013.

  39. arXiv:1301.7118  [pdf, ps, other

    stat.ME stat.ML

    A note on selection stability: combining stability and prediction

    Authors: Yixin Fang, Junhui Wang, Wei Sun

    Abstract: Recently, many regularized procedures have been proposed for variable selection in linear regression, but their performance depends on the tuning parameter selection. Here a criterion for the tuning parameter selection is proposed, which combines the strength of both stability selection and cross-validation and therefore is referred as the prediction and stability selection (PASS). The selection c… ▽ More

    Submitted 29 January, 2013; originally announced January 2013.

  40. arXiv:1208.3380  [pdf, ps, other

    stat.ML stat.ME

    Consistent selection of tuning parameters via variable selection stability

    Authors: Wei Sun, Junhui Wang, Yixin Fang

    Abstract: Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model fitting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model fitting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the esti… ▽ More

    Submitted 13 December, 2013; v1 submitted 16 August, 2012; originally announced August 2012.

    Comments: Published in JMLR (http://jmlr.org/papers/v14/)

    Journal ref: Journal of Machine Learning Research 2013, Vol. 14, 3419-3440

  41. arXiv:1203.3559  [pdf, other

    stat.OT

    A divergence formula for regularization methods with an L2 constraint

    Authors: Yixin Fang, Yuanjia Wang, Xin Huang

    Abstract: We derive a divergence formula for a group of regularization methods with an L2 constraint. The formula is useful for regularization parameter selection, because it provides an unbiased estimate for the number of degrees of freedom. We begin with deriving the formula for smoothing splines and then extend it to other settings such as penalized splines, ridge regression, and functional linear regres… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.