Dehydration is a common, serious issue among older adults. It is important to drink fluid to prevent dehydration and the complications that come with it. As many older adults forget to drink regularly, there is a need for an automated approach, tracking intake throughout the day with limited user interaction. The current literature has used vision-based approaches with deep learning models to detect drink events; however, most use static frames (2D networks) in a lab-based setting, only performing eating and drinking. This study proposes a 3D convolutional neural network using video segments to detect drinking events. In this preliminary study, we collected data from 9 participants in a home simulated environment performing daily activities as well as eating and drinking from various containers to create a robust environment and dataset. Using state-of-the-art deep learning models, we trained our CNN using both static images and video segments to compare the results. The 3D model attained higher performance (compared to 2D CNN) with F1 scores of 93.7% and 84.2% using 10-fold and leave-one-subject-out cross-validations, respectively.
Keywords: artificial neural networks; fluid intake monitoring; image recognition; intake gesture detection; video signal processing.