Search | arXiv e-print repository

Time Series Clustering for Grouping Products Based on Price and Sales Patterns

Authors: Aysun Bozanta, Sean Berry, Mucahit Cevik, Beste Bulut, Deniz Yigit, Fahrettin F. Gonen, Ayşe Başar

Abstract: Developing technology and changing lifestyles have made online grocery delivery applications an indispensable part of urban life. Since the beginning of the COVID-19 pandemic, the demand for such applications has dramatically increased, creating new competitors that disrupt the market. An increasing level of competition might prompt companies to frequently restructure their marketing and product p… ▽ More Developing technology and changing lifestyles have made online grocery delivery applications an indispensable part of urban life. Since the beginning of the COVID-19 pandemic, the demand for such applications has dramatically increased, creating new competitors that disrupt the market. An increasing level of competition might prompt companies to frequently restructure their marketing and product pricing strategies. Therefore, identifying the change patterns in product prices and sales volumes would provide a competitive advantage for the companies in the marketplace. In this paper, we investigate alternative clustering methodologies to group the products based on the price patterns and sales volumes. We propose a novel distance metric that takes into account how product prices and sales move together rather than calculating the distance using numerical values. We compare our approach with traditional clustering algorithms, which typically rely on generic distance metrics such as Euclidean distance, and image clustering approaches that aim to group data by capturing its visual patterns. We evaluate the performances of different clustering algorithms using our custom evaluation metric as well as Calinski Harabasz and Davies Bouldin indices, which are commonly used internal validity metrics. We conduct our numerical study using a propriety price dataset from an online food and grocery delivery company, and the publicly available Favorita sales dataset. We find that our proposed clustering approach and image clustering both perform well for finding the products with similar price and sales patterns within large datasets. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: 16 pages, 6 figures

arXiv:2102.13541 [pdf, other]

Nested-block self-attention for robust radiotherapy planning segmentation

Authors: Harini Veeraraghavan, Jue Jiang, Sharif Elguindi, Sean L. Berry, Ifeanyirochukwu Onochie, Aditya Apte, Laura Cervino, Joseph O. Deasy

Abstract: Although deep convolutional networks have been widely studied for head and neck (HN) organs at risk (OAR) segmentation, their use for routine clinical treatment planning is limited by a lack of robustness to imaging artifacts, low soft tissue contrast on CT, and the presence of abnormal anatomy. In order to address these challenges, we developed a computationally efficient nested block self-attent… ▽ More Although deep convolutional networks have been widely studied for head and neck (HN) organs at risk (OAR) segmentation, their use for routine clinical treatment planning is limited by a lack of robustness to imaging artifacts, low soft tissue contrast on CT, and the presence of abnormal anatomy. In order to address these challenges, we developed a computationally efficient nested block self-attention (NBSA) method that can be combined with any convolutional network. Our method achieves computational efficiency by performing non-local calculations within memory blocks of fixed spatial extent. Contextual dependencies are captured by passing information in a raster scan order between blocks, as well as through a second attention layer that causes bi-directional attention flow. We implemented our approach on three different networks to demonstrate feasibility. Following training using 200 cases, we performed comprehensive evaluations using conventional and clinical metrics on a separate set of 172 test scans sourced from external and internal institution datasets without any exclusion criteria. NBSA required a similar number of computations (15.7 gflops) as the most efficient criss-cross attention (CCA) method and generated significantly more accurate segmentations for brain stem (Dice of 0.89 vs. 0.86) and parotid glands (0.86 vs. 0.84) than CCA. NBSA's segmentations were less variable than multiple 3D methods, including for small organs with low soft-tissue contrast such as the submandibular glands (surface Dice of 0.90). △ Less

Submitted 26 February, 2021; originally announced February 2021.

Comments: Under review at Medical Image Analysis

arXiv:2007.09465 [pdf, other]

doi 10.1109/TMI.2020.3011626

PSIGAN: Joint probabilistic segmentation and image distribution matching for unpaired cross-modality adaptation based MRI segmentation

Authors: Jue Jiang, Yu Chi Hu, Neelam Tyagi, Andreas Rimner, Nancy Lee, Joseph O. Deasy, Sean Berry, Harini Veeraraghavan

Abstract: We developed a new joint probabilistic segmentation and image distribution matching generative adversarial network (PSIGAN) for unsupervised domain adaptation (UDA) and multi-organ segmentation from magnetic resonance (MRI) images. Our UDA approach models the co-dependency between images and their segmentation as a joint probability distribution using a new structure discriminator. The structure d… ▽ More We developed a new joint probabilistic segmentation and image distribution matching generative adversarial network (PSIGAN) for unsupervised domain adaptation (UDA) and multi-organ segmentation from magnetic resonance (MRI) images. Our UDA approach models the co-dependency between images and their segmentation as a joint probability distribution using a new structure discriminator. The structure discriminator computes structure of interest focused adversarial loss by combining the generated pseudo MRI with probabilistic segmentations produced by a simultaneously trained segmentation sub-network. The segmentation sub-network is trained using the pseudo MRI produced by the generator sub-network. This leads to a cyclical optimization of both the generator and segmentation sub-networks that are jointly trained as part of an end-to-end network. Extensive experiments and comparisons against multiple state-of-the-art methods were done on four different MRI sequences totalling 257 scans for generating multi-organ and tumor segmentation. The experiments included, (a) 20 T1-weighted (T1w) in-phase mdixon and (b) 20 T2-weighted (T2w) abdominal MRI for segmenting liver, spleen, left and right kidneys, (c) 162 T2-weighted fat suppressed head and neck MRI (T2wFS) for parotid gland segmentation, and (d) 75 T2w MRI for lung tumor segmentation. Our method achieved an overall average DSC of 0.87 on T1w and 0.90 on T2w for the abdominal organs, 0.82 on T2wFS for the parotid glands, and 0.77 on T2w MRI for lung tumors. △ Less

Submitted 18 July, 2021; v1 submitted 18 July, 2020; originally announced July 2020.

Comments: This paper has been accepted by IEEE Transactions on Medical Imaging

Journal ref: IEEE Transactions on Medical Imaging, 2020

arXiv:1909.05054 [pdf, other]

Local block-wise self attention for normal organ segmentation

Authors: Jue Jiang, Elguindi Sharif, Hyemin Um, Sean Berry, Harini Veeraraghavan

Abstract: We developed a new and computationally simple local block-wise self attention based normal structures segmentation approach applied to head and neck computed tomography (CT) images. Our method uses the insight that normal organs exhibit regularity in their spatial location and inter-relation within images, which can be leveraged to simplify the computations required to aggregate feature informatio… ▽ More We developed a new and computationally simple local block-wise self attention based normal structures segmentation approach applied to head and neck computed tomography (CT) images. Our method uses the insight that normal organs exhibit regularity in their spatial location and inter-relation within images, which can be leveraged to simplify the computations required to aggregate feature information. We accomplish this by using local self attention blocks that pass information between each other to derive the attention map. We show that adding additional attention layers increases the contextual field and captures focused attention from relevant structures. We developed our approach using U-net and compared it against multiple state-of-the-art self attention methods. All models were trained on 48 internal headneck CT scans and tested on 48 CT scans from the external public domain database of computational anatomy dataset. Our method achieved the highest Dice similarity coefficient segmentation accuracy of 0.85$\pm$0.04, 0.86$\pm$0.04 for left and right parotid glands, 0.79$\pm$0.07 and 0.77$\pm$0.05 for left and right submandibular glands, 0.93$\pm$0.01 for mandible and 0.88$\pm$0.02 for the brain stem with the lowest increase of 66.7\% computing time per image and 0.15\% increase in model parameters compared with standard U-net. The best state-of-the-art method called point-wise spatial attention, achieved \textcolor{black}{comparable accuracy but with 516.7\% increase in computing time and 8.14\% increase in parameters compared with standard U-net.} Finally, we performed ablation tests and studied the impact of attention block size, overlap of the attention blocks, additional attention layers, and attention block placement on segmentation performance. △ Less

Submitted 11 September, 2019; originally announced September 2019.

arXiv:1909.04542 [pdf, other]

Integrating cross-modality hallucinated MRI with CT to aid mediastinal lung tumor segmentation

Authors: Jue Jiang, Jason Hu, Neelam Tyagi, Andreas Rimner, Sean L. Berry, Joseph O. Deasy, Harini Veeraraghavan

Abstract: Lung tumors, especially those located close to or surrounded by soft tissues like the mediastinum, are difficult to segment due to the low soft tissue contrast on computed tomography images. Magnetic resonance images contain superior soft-tissue contrast information that can be leveraged if both modalities were available for training. Therefore, we developed a cross-modality educed learning approa… ▽ More Lung tumors, especially those located close to or surrounded by soft tissues like the mediastinum, are difficult to segment due to the low soft tissue contrast on computed tomography images. Magnetic resonance images contain superior soft-tissue contrast information that can be leveraged if both modalities were available for training. Therefore, we developed a cross-modality educed learning approach where MR information that is educed from CT is used to hallucinate MRI and improve CT segmentation. Our approach, called cross-modality educed deep learning segmentation (CMEDL) combines CT and pseudo MR produced from CT by aligning their features to obtain segmentation on CT. Features computed in the last two layers of parallelly trained CT and MR segmentation networks are aligned. We implemented this approach on U-net and dense fully convolutional networks (dense-FCN). Our networks were trained on unrelated cohorts from open-source the Cancer Imaging Archive CT images (N=377), an internal archive T2-weighted MR (N=81), and evaluated using separate validation (N=304) and testing (N=333) CT-delineated tumors. Our approach using both networks were significantly more accurate (U-net $P <0.001$; denseFCN $P <0.001$) than CT-only networks and achieved an accuracy (Dice similarity coefficient) of 0.71$\pm$0.15 (U-net), 0.74$\pm$0.12 (denseFCN) on validation and 0.72$\pm$0.14 (U-net), 0.73$\pm$0.12 (denseFCN) on the testing sets. Our novel approach demonstrated that educing cross-modality information through learned priors enhances CT segmentation performance △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: This paper has been accepted by MICCAI 2019

arXiv:1904.09609 [pdf, other]

doi 10.1002/sam11416

TiK-means: $K$-means clustering for skewed groups

Authors: Nicholas S. Berry, Ranjan Maitra

Abstract: The $K$-means algorithm is extended to allow for partitioning of skewed groups. Our algorithm is called TiK-Means and contributes a $K$-means type algorithm that assigns observations to groups while estimating their skewness-transformation parameters. The resulting groups and transformation reveal general-structured clusters that can be explained by inverting the estimated transformation. Further,… ▽ More The $K$-means algorithm is extended to allow for partitioning of skewed groups. Our algorithm is called TiK-Means and contributes a $K$-means type algorithm that assigns observations to groups while estimating their skewness-transformation parameters. The resulting groups and transformation reveal general-structured clusters that can be explained by inverting the estimated transformation. Further, a modification of the jump statistic chooses the number of groups. Our algorithm is evaluated on simulated and real-life datasets and then applied to a long-standing astronomical dispute regarding the distinct kinds of gamma ray bursts. △ Less

Submitted 21 April, 2019; originally announced April 2019.

Comments: 15 pages, 6 figures, to appear in Statistical Analysis and Data Mining - The ASA Data Science Journal

Journal ref: Statistical Analysis and Data Mining -- The ASA Data Science Journal, 2019, volume 12, number 3, pages 223-233

arXiv:1502.00996 [pdf, other]

doi 10.1016/j.ascom.2015.01.009

Learning from FITS: Limitations in use in modern astronomical research

Authors: Brian Thomas, Tim Jenness, Frossie Economou, Perry Greenfield, Paul Hirst, David S. Berry, Erik Bray, Norman Gray, Demitri Muna, James Turner, Miguel de Val-Borro, Juande Santander-Vela, David Shupe, John Good, G. Bruce Berriman, Slava Kitaeff, Jonathan Fay, Omar Laurino, Anastasia Alexov, Walter Landry, Joe Masters, Adam Brazier, Reinhold Schaaf, Kevin Edwards, Russell O. Redman , et al. (13 additional authors not shown)

Abstract: The Flexible Image Transport System (FITS) standard has been a great boon to astronomy, allowing observatories, scientists and the public to exchange astronomical information easily. The FITS standard, however, is showing its age. Developed in the late 1970s, the FITS authors made a number of implementation choices that, while common at the time, are now seen to limit its utility with modern data.… ▽ More The Flexible Image Transport System (FITS) standard has been a great boon to astronomy, allowing observatories, scientists and the public to exchange astronomical information easily. The FITS standard, however, is showing its age. Developed in the late 1970s, the FITS authors made a number of implementation choices that, while common at the time, are now seen to limit its utility with modern data. The authors of the FITS standard could not anticipate the challenges which we are facing today in astronomical computing. Difficulties we now face include, but are not limited to, addressing the need to handle an expanded range of specialized data product types (data models), being more conducive to the networked exchange and storage of data, handling very large datasets, and capturing significantly more complex metadata and data relationships. There are members of the community today who find some or all of these limitations unworkable, and have decided to move ahead with storing data in other formats. If this fragmentation continues, we risk abandoning the advantages of broad interoperability, and ready archivability, that the FITS format provides for astronomy. In this paper we detail some selected important problems which exist within the FITS standard today. These problems may provide insight into deeper underlying issues which reside in the format and we provide a discussion of some lessons learned. It is not our intention here to prescribe specific remedies to these issues; rather, it is to call attention of the FITS and greater astronomical computing communities to these problems in the hope that it will spur action to address them. △ Less

Submitted 10 February, 2015; v1 submitted 3 February, 2015; originally announced February 2015.

arXiv:cmp-lg/9604024 [pdf, ps, other]

Connectivity in Bag Generation

Authors: Arturo Trujillo, Simon Berry

Abstract: This paper presents a pruning technique which can be used to reduce the number of paths searched in rule-based bag generators of the type proposed by \cite{poznanskietal95} and \cite{popowich95}. Pruning the search space in these generators is important given the computational cost of bag generation. The technique relies on a connectivity constraint between the semantic indices associated with e… ▽ More This paper presents a pruning technique which can be used to reduce the number of paths searched in rule-based bag generators of the type proposed by \cite{poznanskietal95} and \cite{popowich95}. Pruning the search space in these generators is important given the computational cost of bag generation. The technique relies on a connectivity constraint between the semantic indices associated with each lexical sign in a bag. Testing the algorithm on a range of sentences shows reductions in the generation time and the number of edges constructed. △ Less

Submitted 30 April, 1996; originally announced April 1996.

Comments: Latex, 6 pages, needs colap.sty. To appear in COLING-96

Showing 1–8 of 8 results for author: Berry, S