SummerTime: Variable-length Time SeriesSummarization with Applications to PhysicalActivity Analysis
Authors:
Kevin M. Amaral,
Zihan Li,
Wei Ding,
Scott Crouter,
Ping Chen
Abstract:
\textit{SummerTime} seeks to summarize globally time series signals and provides a fixed-length, robust summarization of the variable-length time series. Many classical machine learning methods for classification and regression depend on data instances with a fixed number of features. As a result, those methods cannot be directly applied to variable-length time series data. One common approach is…
▽ More
\textit{SummerTime} seeks to summarize globally time series signals and provides a fixed-length, robust summarization of the variable-length time series. Many classical machine learning methods for classification and regression depend on data instances with a fixed number of features. As a result, those methods cannot be directly applied to variable-length time series data. One common approach is to perform classification over a sliding window on the data and aggregate the decisions made at local sections of the time series in some way, through majority voting for classification or averaging for regression. The downside to this approach is that minority local information is lost in the voting process and averaging assumes that each time series measurement is equal in significance. Also, since time series can be of varying length, the quality of votes and averages could vary greatly in cases where there is a close voting tie or bimodal distribution of regression domain. Summarization conducted by the \textit{SummerTime} method will be a fixed-length feature vector which can be used in-place of the time series dataset for use with classical machine learning methods. We use Gaussian Mixture models (GMM) over small same-length disjoint windows in the time series to group local data into clusters. The time series' rate of membership for each cluster will be a feature in the summarization. The model is naturally capable of converging to an appropriate cluster count. We compare our results to state-of-the-art studies in physical activity classification and show high-quality improvement by classifying with only the summarization. Finally, we show that regression using the summarization can augment energy expenditure estimation, producing more robust and precise results.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
Bag-of-Words Method Applied to Accelerometer Measurements for the Purpose of Classification and Energy Estimation
Authors:
Kevin M. Amaral,
Ping Chen,
Scott Crouter,
Wei Ding
Abstract:
Accelerometer measurements are the prime type of sensor information most think of when seeking to measure physical activity. On the market, there are many fitness measuring devices which aim to track calories burned and steps counted through the use of accelerometers. These measurements, though good enough for the average consumer, are noisy and unreliable in terms of the precision of measurement…
▽ More
Accelerometer measurements are the prime type of sensor information most think of when seeking to measure physical activity. On the market, there are many fitness measuring devices which aim to track calories burned and steps counted through the use of accelerometers. These measurements, though good enough for the average consumer, are noisy and unreliable in terms of the precision of measurement needed in a scientific setting. The contribution of this paper is an innovative and highly accurate regression method which uses an intermediary two-stage classification step to better direct the regression of energy expenditure values from accelerometer counts.
We show that through an additional unsupervised layer of intermediate feature construction, we can leverage latent patterns within accelerometer counts to provide better grounds for activity classification than expert-constructed timeseries features. For this, our approach utilizes a mathematical model originating in natural language processing, the bag-of-words model, that has in the past years been appearing in diverse disciplines outside of the natural language processing field such as image processing. Further emphasizing the natural language connection to stochastics, we use a gaussian mixture model to learn the dictionary upon which the bag-of-words model is built. Moreover, we show that with the addition of these features, we're able to improve regression root mean-squared error of energy expenditure by approximately 1.4 units over existing state-of-the-art methods.
△ Less
Submitted 12 April, 2017; v1 submitted 5 April, 2017;
originally announced April 2017.