-
Towards Understanding Human Emotional Fluctuations with Sparse Check-In Data
Authors:
Sagar Paresh Shah,
Ga Wu,
Sean W. Kortschot,
Samuel Daviau
Abstract:
Data sparsity is a key challenge limiting the power of AI tools across various domains. The problem is especially pronounced in domains that require active user input rather than measurements derived from automated sensors. It is a critical barrier to harnessing the full potential of AI in domains requiring active user engagement, such as self-reported mood check-ins, where capturing a continuous…
▽ More
Data sparsity is a key challenge limiting the power of AI tools across various domains. The problem is especially pronounced in domains that require active user input rather than measurements derived from automated sensors. It is a critical barrier to harnessing the full potential of AI in domains requiring active user engagement, such as self-reported mood check-ins, where capturing a continuous picture of emotional states is essential. In this context, sparse data can hinder efforts to capture the nuances of individual emotional experiences such as causes, triggers, and contributing factors. Existing methods for addressing data scarcity often rely on heuristics or large established datasets, favoring deep learning models that lack adaptability to new domains. This paper proposes a novel probabilistic framework that integrates user-centric feedback-based learning, allowing for personalized predictions despite limited data. Achieving 60% accuracy in predicting user states among 64 options (chance of 1/64), this framework effectively mitigates data sparsity. It is versatile across various applications, bridging the gap between theoretical AI research and practical deployment.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Managed Geo-Distributed Feature Store: Architecture and System Design
Authors:
Anya Li,
Bhala Ranganathan,
Feng Pan,
Mickey Zhang,
Qianjun Xu,
Runhan Li,
Sethu Raman,
Shail Paragbhai Shah,
Vivienne Tang
Abstract:
Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from various data sources and materialize the same for future consumption. Without feature stores, different teams across various business groups would maintain the above…
▽ More
Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from various data sources and materialize the same for future consumption. Without feature stores, different teams across various business groups would maintain the above process independently, which can lead to conflicting and duplicated features in the system. Data scientists find it hard to search for and reuse existing features and it is painful to maintain version control. Furthermore, feature correctness violations related to online (inferencing) - offline (training) skews and data leakage are common. Although the machine learning community has extensively discussed the need for feature stores and their purpose, this paper aims to capture the core architectural components that make up a managed feature store and to share the design learning in building such a system.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Joint Inference of Genome Structure and Content in Heterogeneous Tumour Samples
Authors:
Andrew McPherson,
Andrew Roth,
Gavin Ha,
Sohrab P. Shah,
Cedric Chauve,
S. Cenk Sahinalp
Abstract:
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic…
▽ More
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic content and structure. We propose a method to unmix tumour and contaminating normal signals and jointly predict genomic structure and content of each tumour clone. We use genome graphs to represent tumour clones, and model the likelihood of the observed reads given clones and mixing proportions. Our use of haplotype blocks allows us to accurately measure allele specific read counts, and infer allele specific copy number for each clone. The proposed method is a heuristic local search based on applying incremental, locally optimal modifications of the genome graphs. Using simulated data, we show that our method predicts copy counts and gene adjacencies with reasonable accuracy.
△ Less
Submitted 24 April, 2015; v1 submitted 13 April, 2015;
originally announced April 2015.