Search | arXiv e-print repository

doi 10.1145/3687041

Investigating Characteristics of Media Recommendation Solicitation in r/ifyoulikeblank

Authors: Md Momen Bhuiyan, Donghan Hu, Andrew Jelson, Tanushree Mitra, Sang Won Lee

Abstract: Despite the existence of search-based recommender systems like Google, Netflix, and Spotify, online users sometimes may turn to crowdsourced recommendations in places like the r/ifyoulikeblank subreddit. In this exploratory study, we probe why users go to r/ifyoulikeblank, how they look for recommendation, and how the subreddit users respond to recommendation requests. To answer, we collected samp… ▽ More Despite the existence of search-based recommender systems like Google, Netflix, and Spotify, online users sometimes may turn to crowdsourced recommendations in places like the r/ifyoulikeblank subreddit. In this exploratory study, we probe why users go to r/ifyoulikeblank, how they look for recommendation, and how the subreddit users respond to recommendation requests. To answer, we collected sample posts from r/ifyoulikeblank and analyzed them using a qualitative approach. Our analysis reveals that users come to this subreddit for various reasons, such as exhausting popular search systems, not knowing what or how to search for an item, and thinking crowd have better knowledge than search systems. Examining users query and their description, we found novel information users provide during recommendation seeking using r/ifyoulikeblank. For example, sometimes they ask for artifacts recommendation based on the tools used to create them. Or, sometimes indicating a recommendation seeker's time constraints can help better suit recommendations to their needs. Finally, recommendation responses and interactions revealed patterns of how requesters and responders refine queries and recommendations. Our work informs future intelligent recommender systems design. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: page 23

arXiv:2406.07485 [pdf, other]

PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction

Authors: Adnan Abbas, Sang Won Lee

Abstract: Efficient task planning is essential for productivity and mental well-being, yet individuals often struggle to create realistic plans and reflect upon their productivity. Leveraging the advancement in artificial intelligence (AI), conversational agents have emerged as a promising tool for enhancing productivity. Our work focuses on externalizing plans through conversation, aiming to solidify inten… ▽ More Efficient task planning is essential for productivity and mental well-being, yet individuals often struggle to create realistic plans and reflect upon their productivity. Leveraging the advancement in artificial intelligence (AI), conversational agents have emerged as a promising tool for enhancing productivity. Our work focuses on externalizing plans through conversation, aiming to solidify intentions and foster focused action, thereby positively impacting their productivity and mental well-being. We share our plan of designing a conversational agent to offer insightful questions and reflective prompts for increasing plan adherence by leveraging the social interactivity of natural conversations. Previous studies have shown the effectiveness of such agents, but many interventions remain static, leading to decreased user engagement over time. To address this limitation, we propose a novel rotation and context-aware prompting strategy, providing users with varied interventions daily. Our system, PITCH, utilizes large language models (LLMs) to facilitate externalization and reflection on daily plans. Through this study, we investigate the impact of externalizing tasks with conversational agents on productivity and mental well-being, and the effectiveness of a rotation strategy in maintaining user engagement. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.16731 [pdf, other]

Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

Authors: Jeonghwan Cheon, Sang Wan Lee, Se-Bum Paik

Abstract: The brain prepares for learning even before interacting with the environment, by refining and optimizing its structures through spontaneous neural activity that resembles random noise. However, the mechanism of such a process has yet to be thoroughly understood, and it is unclear whether this process can benefit the algorithm of machine learning. Here, we study this issue using a neural network wi… ▽ More The brain prepares for learning even before interacting with the environment, by refining and optimizing its structures through spontaneous neural activity that resembles random noise. However, the mechanism of such a process has yet to be thoroughly understood, and it is unclear whether this process can benefit the algorithm of machine learning. Here, we study this issue using a neural network with a feedback alignment algorithm, demonstrating that pretraining neural networks with random noise increases the learning efficiency as well as generalization abilities without weight transport. First, we found that random noise training modifies forward weights to match backward synaptic feedback, which is necessary for teaching errors by feedback alignment. As a result, a network with pre-aligned weights learns notably faster than a network without random noise training, even reaching a convergence speed comparable to that of a backpropagation algorithm. Sequential training with both random noise and data brings weights closer to synaptic feedback than training solely with data, enabling more precise credit assignment and faster learning. We also found that each readout probability approaches the chance level and that the effective dimensionality of weights decreases in a network pretrained with random noise. This pre-regularization allows the network to learn simple solutions of a low rank, reducing the generalization loss during subsequent training. This also enables the network robustly to generalize a novel, out-of-distribution dataset. Lastly, we confirmed that random noise pretraining reduces the amount of meta-loss, enhancing the network ability to adapt to various tasks. Overall, our results suggest that random noise training with feedback alignment offers a straightforward yet effective method of pretraining that facilitates quick and reliable learning without weight transport. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.13968 [pdf, other]

TaleMate: Exploring the use of Voice Agents for Parent-Child Joint Reading Experiences

Authors: Daniel Vargas-Diaz, Jisun Kim, Sulakna Karunaratna, Maegan Reinhardt, Caroline Hornburg, Koeun Choi, Sang Won Lee

Abstract: Joint reading is a key activity for early learners, with caregiver-child interactions such as questioning and feedback playing an essential role in children's cognitive and linguistic development. However, for some parents, actively engaging children in storytelling can be challenging. To address this, we introduce TaleMate a platform designed to enhance shared reading by leveraging conversational… ▽ More Joint reading is a key activity for early learners, with caregiver-child interactions such as questioning and feedback playing an essential role in children's cognitive and linguistic development. However, for some parents, actively engaging children in storytelling can be challenging. To address this, we introduce TaleMate a platform designed to enhance shared reading by leveraging conversational agents that have been shown to support children's engagement and learning. TaleMate enables a dynamic, participatory reading experience where parents and children can choose which characters they wish to embody. Moreover, the system navigates the challenges posed by digital reading tools, such as decreased parent-child interaction, and builds upon the benefits of traditional and digital reading techniques. TaleMate offers an innovative approach to fostering early reading habits, bridging the gap between traditional joint reading practices and the digital reading landscape. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 4 pages, 2 figures, CHI 2024 Workshop on Child-centred AI Design

arXiv:2405.13890 [pdf, other]

An empirical study to understand how students use ChatGPT for writing essays and how it affects their ownership

Authors: Andrew Jelson, Sang Won Lee

Abstract: As large language models (LLMs) become more powerful and ubiquitous, systems like ChatGPT are increasingly used by students to help them with writing tasks. To better understand how these tools are used, we investigate how students might use an LLM for essay writing, for example, to study the queries asked to ChatGPT and the responses that ChatGPT gives. To that end, we plan to conduct a user stud… ▽ More As large language models (LLMs) become more powerful and ubiquitous, systems like ChatGPT are increasingly used by students to help them with writing tasks. To better understand how these tools are used, we investigate how students might use an LLM for essay writing, for example, to study the queries asked to ChatGPT and the responses that ChatGPT gives. To that end, we plan to conduct a user study that will record the user writing process and present them with the opportunity to use ChatGPT as an AI assistant. This study's findings will help us understand how these tools are used and how practitioners -- such as educators and essay readers -- should consider writing education and evaluation based on essay writing. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 5 pages, 2 figures, submitted and accepted to ACM CHI Workshop In2Writing in 2024

arXiv:2405.13154 [pdf, other]

Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming

Authors: Xinyi Zhang, Pamela J. Wisniewski, Jin-hee Cho, Lifu Huang, Sang Won Lee

Abstract: Cybergrooming emerges as a growing threat to adolescent safety and mental health. One way to combat cybergrooming is to leverage predictive artificial intelligence (AI) to detect predatory behaviors in social media. However, these methods can encounter challenges like false positives and negative implications such as privacy concerns. Another complementary strategy involves using generative artifi… ▽ More Cybergrooming emerges as a growing threat to adolescent safety and mental health. One way to combat cybergrooming is to leverage predictive artificial intelligence (AI) to detect predatory behaviors in social media. However, these methods can encounter challenges like false positives and negative implications such as privacy concerns. Another complementary strategy involves using generative artificial intelligence to empower adolescents by educating them about predatory behaviors. To this end, we envision developing state-of-the-art conversational agents to simulate the conversations between adolescents and predators for educational purposes. Yet, one key challenge is the lack of a dataset to train such conversational agents. In this position paper, we present our motivation for empowering adolescents to cope with cybergrooming. We propose to develop large-scale, authentic datasets through an online survey targeting adolescents and parents. We discuss some initial background behind our motivation and proposed design of the survey, such as situating the participants in artificial cybergrooming scenarios, then allowing participants to respond to the survey to obtain their authentic responses. We also present several open questions related to our proposed approach and hope to discuss them with the workshop attendees. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.02135 [pdf, other]

Enhancing Ship Classification in Optical Satellite Imagery: Integrating Convolutional Block Attention Module with ResNet for Improved Performance

Authors: Ryan Donghan Kwon, Gangjoo Robin Nam, Jisoo Tak, Junseob Shin, Hyerin Cha, Seung Won Lee

Abstract: In this study, we present an advanced convolutional neural network (CNN) architecture for ship classification based on optical satellite imagery, which significantly enhances performance through the integration of a convolutional block attention module (CBAM) and additional architectural innovations. Building upon the foundational ResNet50 model, we first incorporated a standard CBAM to direct the… ▽ More In this study, we present an advanced convolutional neural network (CNN) architecture for ship classification based on optical satellite imagery, which significantly enhances performance through the integration of a convolutional block attention module (CBAM) and additional architectural innovations. Building upon the foundational ResNet50 model, we first incorporated a standard CBAM to direct the model's focus toward more informative features, achieving an accuracy of 87% compared to 85% of the baseline ResNet50. Further augmentations involved multiscale feature integration, depthwise separable convolutions, and dilated convolutions, culminating in an enhanced ResNet model with improved CBAM. This model demonstrated a remarkable accuracy of 95%, with precision, recall, and F1 scores all witnessing substantial improvements across various ship classes. In particular, the bulk carrier and oil tanker classes exhibited nearly perfect precision and recall rates, underscoring the enhanced capability of the model to accurately identify and classify ships. Attention heatmap analyses further validated the efficacy of the improved model, revealing more focused attention on relevant ship features regardless of background complexities. These findings underscore the potential of integrating attention mechanisms and architectural innovations into CNNs for high-resolution satellite imagery classification. This study navigates through the class imbalance and computational costs and proposes future directions for scalability and adaptability in new or rare ship-type recognition. This study lays the groundwork for applying advanced deep learning techniques in remote sensing, offering insights into scalable and efficient satellite image classification. △ Less

Submitted 20 August, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE Access on August 16, 2024

arXiv:2403.15249 [pdf, other]

Spectral Motion Alignment for Video Motion Transfer using Diffusion Models

Authors: Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee, Jong Chul Ye

Abstract: The evolution of diffusion models has greatly impacted video generation and understanding. Particularly, text-to-video diffusion models (VDMs) have significantly facilitated the customization of input video with target appearance, motion, etc. Despite these advances, challenges persist in accurately distilling motion information from video frames. While existing works leverage the consecutive fram… ▽ More The evolution of diffusion models has greatly impacted video generation and understanding. Particularly, text-to-video diffusion models (VDMs) have significantly facilitated the customization of input video with target appearance, motion, etc. Despite these advances, challenges persist in accurately distilling motion information from video frames. While existing works leverage the consecutive frame residual as the target motion vector, they inherently lack global motion context and are vulnerable to frame-wise distortions. To address this, we present Spectral Motion Alignment (SMA), a novel framework that refines and aligns motion vectors using Fourier and wavelet transforms. SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics, and mitigating spatial artifacts. Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Project page: https://geonyeong-park.github.io/spectral-motion-alignment/

arXiv:2312.02557 [pdf, other]

BOgen: Generating Part-Level 3D Designs Based on User Intention Inference through Bayesian Optimization and Variational Autoencoder

Authors: Seung Won Lee, Jiin Choi, Kyung Hoon Hyun

Abstract: Advancements in generative artificial intelligence (AI) have introduced various AI models capable of producing impressive visual design outputs. However, when it comes to AI models in the design process, prioritizing outputs that align with designers' needs over mere visual craftsmanship becomes even more crucial. Furthermore, designers often intricately combine parts of various designs to create… ▽ More Advancements in generative artificial intelligence (AI) have introduced various AI models capable of producing impressive visual design outputs. However, when it comes to AI models in the design process, prioritizing outputs that align with designers' needs over mere visual craftsmanship becomes even more crucial. Furthermore, designers often intricately combine parts of various designs to create novel designs. The ability to generate designs that align with the designers' intentions at the part level is pivotal for assisting designers. Hence, we introduced BOgen, which empowers designers to proactively generate and explore part-level designs through Bayesian optimization and variational autoencoders, thereby enhancing their overall user experience. We assessed BOgen's performance using a study involving 30 designers. The results revealed that, compared to the baseline, BOgen fulfilled the designer requirements for part recommendations and design exploration space guidance. BOgen assists designers in navigation and development, offering valuable design suggestions and fosters proactive design exploration and creation. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 17 pages, 13 figures

ACM Class: H.5.2; I.2.1

arXiv:2311.10430 [pdf, other]

Deep Residual CNN for Multi-Class Chest Infection Diagnosis

Authors: Ryan Donghan Kwon, Dohyun Lim, Yoonha Lee, Seung Won Lee

Abstract: The advent of deep learning has significantly propelled the capabilities of automated medical image diagnosis, providing valuable tools and resources in the realm of healthcare and medical diagnostics. This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections, utilizing chest X-ray images. The im… ▽ More The advent of deep learning has significantly propelled the capabilities of automated medical image diagnosis, providing valuable tools and resources in the realm of healthcare and medical diagnostics. This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections, utilizing chest X-ray images. The implemented model, trained and validated on a dataset amalgamated from diverse sources, demonstrated a robust overall accuracy of 93%. However, nuanced disparities in performance across different classes, particularly Fibrosis, underscored the complexity and challenges inherent in automated medical image diagnosis. The insights derived pave the way for future research, focusing on enhancing the model's proficiency in classifying conditions that present more subtle and nuanced visual features in the images, as well as optimizing and refining the model architecture and training process. This paper provides a comprehensive exploration into the development, implementation, and evaluation of the model, offering insights and directions for future research and development in the field. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2306.09869 [pdf, other]

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models

Authors: Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye

Abstract: Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework for adaptive context control by mode… ▽ More Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors. Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder. Then, we obtain the gradient of the log posterior of context vectors, which can be updated and transferred to the subsequent cross-attention layer, thereby implicitly minimizing a nested hierarchy of energy functions. Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts. Using extensive experiments, we demonstrate that the proposed method is highly effective in handling various image generation tasks, including multi-concept generation, text-guided image inpainting, and real and synthetic image editing. Code: https://github.com/EnergyAttention/Energy-Based-CrossAttention. △ Less

Submitted 4 November, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023

arXiv:2302.04219 [pdf, other]

doi 10.1145/3544548.3581244

NewsComp: Facilitating Diverse News Reading through Comparative Annotation

Authors: Md Momen Bhuiyan, Sang Won Lee, Nitesh Goyal, Tanushree Mitra

Abstract: To support efficient, balanced news consumption, merging articles from diverse sources into one, potentially through crowdsourcing, could alleviate some hurdles. However, the merging process could also impact annotators' attitudes towards the content. To test this theory, we propose comparative news annotation, i.e., annotating similarities and differences between a pair of articles. By developing… ▽ More To support efficient, balanced news consumption, merging articles from diverse sources into one, potentially through crowdsourcing, could alleviate some hurdles. However, the merging process could also impact annotators' attitudes towards the content. To test this theory, we propose comparative news annotation, i.e., annotating similarities and differences between a pair of articles. By developing and deploying NewsComp -- a prototype system -- we conducted a between-subjects experiment(N=109) to examine how users' annotations compare to experts', and how comparative annotation affects users' perceptions of article credibility and quality. We found that comparative annotation can marginally impact users' credibility perceptions in certain cases. While users' annotations were not on par with experts', they showed greater precision in finding similarities than in identifying disparate important statements. The comparison process led users to notice differences in information placement/depth, degree of factuality/opinion, and empathetic/inflammatory language use. We discuss implications for the design of future comparative annotation tasks. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: 2023 ACM CHI Conference on Human Factors in Computing Systems, 17 pages

arXiv:2210.09012 [pdf, other]

SAICL: Student Modelling with Interaction-level Auxiliary Contrastive Tasks for Knowledge Tracing and Dropout Prediction

Authors: Jungbae Park, Jinyoung Kim, Soonwoo Kwon, Sang Wan Lee

Abstract: Knowledge tracing and dropout prediction are crucial for online education to estimate students' knowledge states or to prevent dropout rates. While traditional systems interacting with students suffered from data sparsity and overfitting, recent sample-level contrastive learning helps to alleviate this issue. One major limitation of sample-level approaches is that they regard students' behavior in… ▽ More Knowledge tracing and dropout prediction are crucial for online education to estimate students' knowledge states or to prevent dropout rates. While traditional systems interacting with students suffered from data sparsity and overfitting, recent sample-level contrastive learning helps to alleviate this issue. One major limitation of sample-level approaches is that they regard students' behavior interaction sequences as a bundle, so they often fail to encode temporal contexts and track their dynamic changes, making it hard to find optimal representations for knowledge tracing and dropout prediction. To apply temporal context within the sequence, this study introduces a novel student modeling framework, SAICL: \textbf{s}tudent modeling with \textbf{a}uxiliary \textbf{i}nteraction-level \textbf{c}ontrastive \textbf{l}earning. In detail, SAICL can utilize both proposed self-supervised/supervised interaction-level contrastive objectives: MilCPC (\textbf{M}ulti-\textbf{I}nteraction-\textbf{L}evel \textbf{C}ontrastive \textbf{P}redictive \textbf{C}oding) and SupCPC (\textbf{Sup}ervised \textbf{C}ontrastive \textbf{P}redictive \textbf{C}oding). While previous sample-level contrastive methods for student modeling are highly dependent on data augmentation methods, the SAICL is free of data augmentation while showing better performance in both self-supervised and supervised settings. By combining cross-entropy with contrastive objectives, the proposed SAICL achieved comparable knowledge tracing and dropout prediction performance with other state-of-art models without compromising inference costs. △ Less

Submitted 19 October, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: preprint, under review

arXiv:2210.05248 [pdf, other]

Self-supervised debiasing using low rank regularization

Authors: Geon Yeong Park, Chanyong Jung, Sangmin Lee, Jong Chul Ye, Sang Wan Lee

Abstract: Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability. While most existing debiasing methods require full supervision on either spurious attributes or target labels, training a debiased model from a limited amount of both annotations is still an open question. To address this issue, we investigate an interesting phenomenon using the spectral analys… ▽ More Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability. While most existing debiasing methods require full supervision on either spurious attributes or target labels, training a debiased model from a limited amount of both annotations is still an open question. To address this issue, we investigate an interesting phenomenon using the spectral analysis of latent representations: spuriously correlated attributes make neural networks inductively biased towards encoding lower effective rank representations. We also show that a rank regularization can amplify this bias in a way that encourages highly correlated features. Leveraging these findings, we propose a self-supervised debiasing framework potentially compatible with unlabeled samples. Specifically, we first pretrain a biased encoder in a self-supervised manner with the rank regularization, serving as a semantic bottleneck to enforce the encoder to learn the spuriously correlated attributes. This biased encoder is then used to discover and upweight bias-conflicting samples in a downstream task, serving as a boosting to effectively debias the main model. Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines and, in some cases, even outperforms state-of-the-art supervised debiasing approaches. △ Less

Submitted 8 October, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2210.05247 [pdf, other]

Training Debiased Subnetworks with Contrastive Weight Pruning

Authors: Geon Yeong Park, Sangmin Lee, Sang Wan Lee, Jong Chul Ye

Abstract: Neural networks are often biased to spuriously correlated features that provide misleading statistical evidence that does not generalize. This raises an interesting question: ``Does an optimal unbiased functional subnetwork exist in a severely biased network? If so, how to extract such subnetwork?" While empirical evidence has been accumulated about the existence of such unbiased subnetworks, thes… ▽ More Neural networks are often biased to spuriously correlated features that provide misleading statistical evidence that does not generalize. This raises an interesting question: ``Does an optimal unbiased functional subnetwork exist in a severely biased network? If so, how to extract such subnetwork?" While empirical evidence has been accumulated about the existence of such unbiased subnetworks, these observations are mainly based on the guidance of ground-truth unbiased samples. Thus, it is unexplored how to discover the optimal subnetworks with biased training datasets in practice. To address this, here we first present our theoretical insight that alerts potential limitations of existing algorithms in exploring unbiased subnetworks in the presence of strong spurious correlations. We then further elucidate the importance of bias-conflicting samples on structure learning. Motivated by these observations, we propose a Debiased Contrastive Weight Pruning (DCWP) algorithm, which probes unbiased subnetworks without expensive group annotations. Experimental results demonstrate that our approach significantly outperforms state-of-the-art debiasing methods despite its considerable reduction in the number of parameters. △ Less

Submitted 26 June, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: CVPR 2023, code: https://github.com/ParkGeonYeong/DCWP

arXiv:2209.12318 [pdf, other]

doi 10.1145/3526113.3545678

Scrapbook: Screenshot-Based Bookmarks for Effective Digital Resource Curation across Applications

Authors: Donghan Hu, Sang Won Lee

Abstract: Modern knowledge workers typically need to use multiple resources, such as documents, web pages, and applications, at the same time. This complexity in their computing environments forces workers to restore various resources in the course of their work. However, conventional curation methods like bookmarks, recent document histories, and file systems place limitations on effective retrieval. Such… ▽ More Modern knowledge workers typically need to use multiple resources, such as documents, web pages, and applications, at the same time. This complexity in their computing environments forces workers to restore various resources in the course of their work. However, conventional curation methods like bookmarks, recent document histories, and file systems place limitations on effective retrieval. Such features typically work only for resources of one type within one application, ignoring the interdependency between resources needed for a single task. In addition, text-based handles do not provide rich cues for users to recognize their associated resources. Hence, the need to locate and reopen relevant resources can significantly hinder knowledge workers' productivity. To address these issues, we designed and developed Scrapbook, a novel application for digital resource curation across applications that uses screenshot-based bookmarks. Scrapbook extracts and stores all the metadata (URL, file location, and application name) of windows visible in a captured screenshot to facilitate restoring them later. A week-long field study indicated that screenshot-based bookmarks helped participants curate digital resources. Additionally, participants reported that multimodal -- visual and textual -- data helped them recall past computer activities and reconstruct working contexts efficiently. △ Less

Submitted 25 September, 2022; originally announced September 2022.

arXiv:2208.05568 [pdf, other]

doi 10.1098/rspb.2023.1716

The emergence of division of labor through decentralized social sanctioning

Authors: Anil Yaman, Joel Z. Leibo, Giovanni Iacca, Sang Wan Lee

Abstract: Human ecological success relies on our characteristic ability to flexibly self-organize into cooperative social groups, the most successful of which employ substantial specialization and division of labor. Unlike most other animals, humans learn by trial and error during their lives what role to take on. However, when some critical roles are more attractive than others, and individuals are self-in… ▽ More Human ecological success relies on our characteristic ability to flexibly self-organize into cooperative social groups, the most successful of which employ substantial specialization and division of labor. Unlike most other animals, humans learn by trial and error during their lives what role to take on. However, when some critical roles are more attractive than others, and individuals are self-interested, then there is a social dilemma: each individual would prefer others take on the critical but unremunerative roles so they may remain free to take one that pays better. But disaster occurs if all act thusly and a critical role goes unfilled. In such situations learning an optimum role distribution may not be possible. Consequently, a fundamental question is: how can division of labor emerge in groups of self-interested lifetime-learning individuals? Here we show that by introducing a model of social norms, which we regard as emergent patterns of decentralized social sanctioning, it becomes possible for groups of self-interested individuals to learn a productive division of labor involving all critical roles. Such social norms work by redistributing rewards within the population to disincentivize antisocial roles while incentivizing prosocial roles that do not intrinsically pay as well as others. △ Less

Submitted 30 September, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2205.09185 [pdf, other]

doi 10.1016/j.nima.2022.167748

AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector. △ Less

Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: 16 pages, 18 figures, 2 appendices, 3 tables

arXiv:2201.11709 [pdf, other]

doi 10.1145/3491102.3502028

OtherTube: Facilitating Content Discovery and Reflection by Exchanging YouTube Recommendations with Strangers

Authors: Md Momen Bhuiyan, Carlos Augusto Bautista Isaza, Tanushree Mitra, Sang Won Lee

Abstract: To promote engagement, recommendation algorithms on platforms like YouTube increasingly personalize users' feeds, limiting users' exposure to diverse content and depriving them of opportunities to reflect on their interests compared to others'. In this work, we investigate how exchanging recommendations with strangers can help users discover new content and reflect. We tested this idea by developi… ▽ More To promote engagement, recommendation algorithms on platforms like YouTube increasingly personalize users' feeds, limiting users' exposure to diverse content and depriving them of opportunities to reflect on their interests compared to others'. In this work, we investigate how exchanging recommendations with strangers can help users discover new content and reflect. We tested this idea by developing OtherTube -- a browser extension for YouTube that displays strangers' personalized YouTube recommendations. OtherTube allows users to (i) create an anonymized profile for social comparison, (ii) share their recommended videos with others, and (iii) browse strangers' YouTube recommendations. We conducted a 10-day-long user study (n=41) followed by a post-study interview (n=11). Our results reveal that users discovered and developed new interests from seeing OtherTube recommendations. We identified user and content characteristics that affect interaction and engagement with exchanged recommendations; for example, younger users interacted more with OtherTube, while the perceived irrelevance of some content discouraged users from watching certain videos. Users reflected on their interests as well as others', recognizing similarities and differences. Our work shows promise for designs leveraging the exchange of personalized recommendations with strangers. △ Less

Submitted 27 January, 2022; originally announced January 2022.

Comments: CHI 2022, 17 pages

arXiv:2112.05403 [pdf, ps, other]

Computing Diverse Shortest Paths Efficiently: A Theoretical and Experimental Study

Authors: Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, See Woo Lee, Yota Otachi

Abstract: Finding diverse solutions in combinatorial problems recently has received considerable attention (Baste et al. 2020; Fomin et al. 2020; Hanaka et al. 2021). In this paper we study the following type of problems: given an integer $k$, the problem asks for $k$ solutions such that the sum of pairwise (weighted) Hamming distances between these solutions is maximized. Such solutions are called diverse… ▽ More Finding diverse solutions in combinatorial problems recently has received considerable attention (Baste et al. 2020; Fomin et al. 2020; Hanaka et al. 2021). In this paper we study the following type of problems: given an integer $k$, the problem asks for $k$ solutions such that the sum of pairwise (weighted) Hamming distances between these solutions is maximized. Such solutions are called diverse solutions. We present a polynomial-time algorithm for finding diverse shortest $st$-paths in weighted directed graphs. Moreover, we study the diverse version of other classical combinatorial problems such as diverse weighted matroid bases, diverse weighted arborescences, and diverse bipartite matchings. We show that these problems can be solved in polynomial time as well. To evaluate the practical performance of our algorithm for finding diverse shortest $st$-paths, we conduct a computational experiment with synthetic and real-world instances.The experiment shows that our algorithm successfully computes diverse solutions within reasonable computational time. △ Less

Submitted 15 December, 2021; v1 submitted 10 December, 2021; originally announced December 2021.

arXiv:2108.02325 [pdf, other]

doi 10.1145/3479539

Designing Transparency Cues in Online News Platforms to Promote Trust: Journalists' & Consumers' Perspectives

Authors: Md Momen Bhuiyan, Hayden Whitley, Michael Horning, Sang Won Lee, Tanushree Mitra

Abstract: As news organizations embrace transparency practices on their websites to distinguish themselves from those spreading misinformation, HCI designers have the opportunity to help them effectively utilize the ideals of transparency to build trust. How can we utilize transparency to promote trust in news? We examine this question through a qualitative lens by interviewing journalists and news consumer… ▽ More As news organizations embrace transparency practices on their websites to distinguish themselves from those spreading misinformation, HCI designers have the opportunity to help them effectively utilize the ideals of transparency to build trust. How can we utilize transparency to promote trust in news? We examine this question through a qualitative lens by interviewing journalists and news consumers -- the two stakeholders in a news system. We designed a scenario to demonstrate transparency features using two fundamental news attributes that convey the trustworthiness of a news article: source and message. In the interviews, our news consumers expressed the idea that news transparency could be best shown by providing indicators of objectivity in two areas (news selection and framing) and by providing indicators of evidence in four areas (presence of source materials, anonymous sourcing, verification, and corrections upon erroneous reporting). While our journalists agreed with news consumers' suggestions of using evidence indicators, they also suggested additional transparency indicators in areas such as the news reporting process and personal/organizational conflicts of interest. Prompted by our scenario, participants offered new design considerations for building trustworthy news platforms, such as designing for easy comprehension, presenting appropriate details in news articles (e.g., showing the number and nature of corrections made to an article), and comparing attributes across news organizations to highlight diverging practices. Comparing the responses from our two stakeholder groups reveals conflicting suggestions with trade-offs between them. Our study has implications for HCI designers in building trustworthy news systems. △ Less

Submitted 20 September, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 31 pages, CSCW 2021

arXiv:2108.01536 [pdf, other]

doi 10.1145/3479571

NudgeCred: Supporting News Credibility Assessment on Social Media Through Nudges

Authors: Md Momen Bhuiyan, Michael Horning, Sang Won Lee, Tanushree Mitra

Abstract: Struggling to curb misinformation, social media platforms are experimenting with design interventions to enhance consumption of credible news on their platforms. Some of these interventions, such as the use of warning messages, are examples of nudges -- a choice-preserving technique to steer behavior. Despite their application, we do not know whether nudges could steer people into making conscious… ▽ More Struggling to curb misinformation, social media platforms are experimenting with design interventions to enhance consumption of credible news on their platforms. Some of these interventions, such as the use of warning messages, are examples of nudges -- a choice-preserving technique to steer behavior. Despite their application, we do not know whether nudges could steer people into making conscious news credibility judgments online and if they do, under what constraints. To answer, we combine nudge techniques with heuristic based information processing to design NudgeCred -- a browser extension for Twitter. NudgeCred directs users' attention to two design cues: authority of a source and other users' collective opinion on a report by activating three design nudges -- Reliable, Questionable, and Unreliable, each denoting particular levels of credibility for news tweets. In a controlled experiment, we found that NudgeCred significantly helped users (n=430) distinguish news tweets' credibility, unrestricted by three behavioral confounds -- political ideology, political cynicism, and media skepticism. A five-day field deployment with twelve participants revealed that NudgeCred improved their recognition of news items and attention towards all of our nudges, particularly towards Questionable. Among other considerations, participants proposed that designers should incorporate heuristics that users' would trust. Our work informs nudge-based system design approaches for online media. △ Less

Submitted 20 September, 2021; v1 submitted 3 August, 2021; originally announced August 2021.

Comments: 30 pages, CSCW 2021

arXiv:2106.10015 [pdf, other]

doi 10.1371/journal.pcbi.1009882

Meta-control of social learning strategies

Authors: Anil Yaman, Nicolas Bredeche, Onur Çaylak, Joel Z. Leibo, Sang Wan Lee

Abstract: Social learning, copying other's behavior without actual experience, offers a cost-effective means of knowledge acquisition. However, it raises the fundamental question of which individuals have reliable information: successful individuals versus the majority. The former and the latter are known respectively as success-based and conformist social learning strategies. We show here that while the su… ▽ More Social learning, copying other's behavior without actual experience, offers a cost-effective means of knowledge acquisition. However, it raises the fundamental question of which individuals have reliable information: successful individuals versus the majority. The former and the latter are known respectively as success-based and conformist social learning strategies. We show here that while the success-based strategy fully exploits the benign environment of low uncertainly, it fails in uncertain environments. On the other hand, the conformist strategy can effectively mitigate this adverse effect. Based on these findings, we hypothesized that meta-control of individual and social learning strategies provides effective and sample-efficient learning in volatile and uncertain environments. Simulations on a set of environments with various levels of volatility and uncertainty confirmed our hypothesis. The results imply that meta-control of social learning affords agents the leverage to resolve environmental uncertainty with minimal exploration cost, by exploiting others' learning as an external knowledge base. △ Less

Submitted 7 March, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

Journal ref: PLoS Comput Biol 18(2): e1009882 (2022)

arXiv:2104.01575 [pdf, other]

Reliably fast adversarial training via latent adversarial perturbation

Authors: Geon Yeong Park, Sang Wan Lee

Abstract: While multi-step adversarial training is widely popular as an effective defense method against strong adversarial attacks, its computational cost is notoriously expensive, compared to standard training. Several single-step adversarial training methods have been proposed to mitigate the above-mentioned overhead cost; however, their performance is not sufficiently reliable depending on the optimizat… ▽ More While multi-step adversarial training is widely popular as an effective defense method against strong adversarial attacks, its computational cost is notoriously expensive, compared to standard training. Several single-step adversarial training methods have been proposed to mitigate the above-mentioned overhead cost; however, their performance is not sufficiently reliable depending on the optimization setting. To overcome such limitations, we deviate from the existing input-space-based adversarial training regime and propose a single-step latent adversarial training method (SLAT), which leverages the gradients of latent representation as the latent adversarial perturbation. We demonstrate that the L1 norm of feature gradients is implicitly regularized through the adopted latent perturbation, thereby recovering local linearity and ensuring reliable performance, compared to the existing single-step adversarial training methods. Because latent perturbation is based on the gradients of the latent representations which can be obtained for free in the process of input gradients computation, the proposed method costs roughly the same time as the fast gradient sign method. Experiment results demonstrate that the proposed method, despite its structural simplicity, outperforms state-of-the-art accelerated adversarial training methods. △ Less

Submitted 29 November, 2021; v1 submitted 4 April, 2021; originally announced April 2021.

Comments: ICCV 2021 (Oral)

arXiv:2104.01568 [pdf, other]

Information-theoretic regularization for Multi-source Domain Adaptation

Authors: Geon Yeong Park, Sang Wan Lee

Abstract: Adversarial learning strategy has demonstrated remarkable performance in dealing with single-source Domain Adaptation (DA) problems, and it has recently been applied to Multi-source DA (MDA) problems. Although most existing MDA strategies rely on a multiple domain discriminator setting, its effect on the latent space representations has been poorly understood. Here we adopt an information-theoreti… ▽ More Adversarial learning strategy has demonstrated remarkable performance in dealing with single-source Domain Adaptation (DA) problems, and it has recently been applied to Multi-source DA (MDA) problems. Although most existing MDA strategies rely on a multiple domain discriminator setting, its effect on the latent space representations has been poorly understood. Here we adopt an information-theoretic approach to identify and resolve the potential adverse effect of the multiple domain discriminators on MDA: disintegration of domain-discriminative information, limited computational scalability, and a large variance in the gradient of the loss during training. We examine the above issues by situating adversarial DA in the context of information regularization. This also provides a theoretical justification for using a single and unified domain discriminator. Based on this idea, we implement a novel neural architecture called a Multi-source Information-regularized Adaptation Networks (MIAN). Large-scale experiments demonstrate that MIAN, despite its structural simplicity, reliably and significantly outperforms other state-of-the-art methods. △ Less

Submitted 29 November, 2021; v1 submitted 4 April, 2021; originally announced April 2021.

Comments: ICCV 2021

arXiv:2009.09417 [pdf, other]

F^2-Softmax: Diversifying Neural Text Generation via Frequency Factorized Softmax

Authors: Byung-Ju Choi, Jimin Hong, David Keetae Park, Sang Wan Lee

Abstract: Despite recent advances in neural text generation, encoding the rich diversity in human language remains elusive. We argue that the sub-optimal text generation is mainly attributable to the imbalanced token distribution, which particularly misdirects the learning model when trained with the maximum-likelihood objective. As a simple yet effective remedy, we propose two novel methods, F^2-Softmax an… ▽ More Despite recent advances in neural text generation, encoding the rich diversity in human language remains elusive. We argue that the sub-optimal text generation is mainly attributable to the imbalanced token distribution, which particularly misdirects the learning model when trained with the maximum-likelihood objective. As a simple yet effective remedy, we propose two novel methods, F^2-Softmax and MefMax, for a balanced training even with the skewed frequency distribution. MefMax assigns tokens uniquely to frequency classes, trying to group tokens with similar frequencies and equalize frequency mass between the classes. F^2-Softmax then decomposes a probability distribution of the target token into a product of two conditional probabilities of (i) frequency class, and (ii) token from the target frequency class. Models learn more uniform probability distributions because they are confined to subsets of vocabularies. Significant performance gains on seven relevant metrics suggest the supremacy of our approach in improving not only the diversity but also the quality of generated texts. △ Less

Submitted 4 October, 2020; v1 submitted 20 September, 2020; originally announced September 2020.

Comments: EMNLP 2020

arXiv:2008.06146 [pdf]

End-to-End Trainable Self-Attentive Shallow Network for Text-Independent Speaker Verification

Authors: Hyeonmook Park, Jungbae Park, Sang Wan Lee

Abstract: Generalized end-to-end (GE2E) model is widely used in speaker verification (SV) fields due to its expandability and generality regardless of specific languages. However, the long-short term memory (LSTM) based on GE2E has two limitations: First, the embedding of GE2E suffers from vanishing gradient, which leads to performance degradation for very long input sequences. Secondly, utterances are not… ▽ More Generalized end-to-end (GE2E) model is widely used in speaker verification (SV) fields due to its expandability and generality regardless of specific languages. However, the long-short term memory (LSTM) based on GE2E has two limitations: First, the embedding of GE2E suffers from vanishing gradient, which leads to performance degradation for very long input sequences. Secondly, utterances are not represented as a properly fixed dimensional vector. In this paper, to overcome issues mentioned above, we propose a novel framework for SV, end-to-end trainable self-attentive shallow network (SASN), incorporating a time-delay neural network (TDNN) and a self-attentive pooling mechanism based on the self-attentive x-vector system during an utterance embedding phase. We demonstrate that the proposed model is highly efficient, and provides more accurate speaker verification than GE2E. For VCTK dataset, with just less than half the size of GE2E, the proposed model showed significant performance improvement over GE2E of about 63%, 67%, and 85% in EER (Equal error rate), DCF (Detection cost function), and AUC (Area under the curve), respectively. Notably, when the input length becomes longer, the DCF score improvement of the proposed model is about 17 times greater than that of GE2E. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: 5 pages, 3 figures, 3 tables

arXiv:2007.11653 [pdf]

Darwin's Neural Network: AI-based Strategies for Rapid and Scalable Cell and Coronavirus Screening

Authors: Sang Won Lee, Yueh-Ting Chiu, Philip Brudnicki, Audrey M. Bischoff, Angus Jelinek, Jenny Zijun Wang, Danielle R. Bogdanowicz, Andrew F. Laine, Jia Guo, Helen H. Lu

Abstract: Recent advances in the interdisciplinary scientific field of machine perception, computer vision, and biomedical engineering underpin a collection of machine learning algorithms with a remarkable ability to decipher the contents of microscope and nanoscope images. Machine learning algorithms are transforming the interpretation and analysis of microscope and nanoscope imaging data through use in co… ▽ More Recent advances in the interdisciplinary scientific field of machine perception, computer vision, and biomedical engineering underpin a collection of machine learning algorithms with a remarkable ability to decipher the contents of microscope and nanoscope images. Machine learning algorithms are transforming the interpretation and analysis of microscope and nanoscope imaging data through use in conjunction with biological imaging modalities. These advances are enabling researchers to carry out real-time experiments that were previously thought to be computationally impossible. Here we adapt the theory of survival of the fittest in the field of computer vision and machine perception to introduce a new framework of multi-class instance segmentation deep learning, Darwin's Neural Network (DNN), to carry out morphometric analysis and classification of COVID19 and MERS-CoV collected in vivo and of multiple mammalian cell types in vitro. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: 19 pages, 7 figures

ACM Class: I.5.0

arXiv:2007.04578 [pdf, other]

On the Reliability and Generalizability of Brain-inspired Reinforcement Learning Algorithms

Authors: Dongjae Kim, Jee Hang Lee, Jae Hoon Shin, Minsu Abel Yang, Sang Wan Lee

Abstract: Although deep RL models have shown a great potential for solving various types of tasks with minimal supervision, several key challenges remain in terms of learning from limited experience, adapting to environmental changes, and generalizing learning from a single task. Recent evidence in decision neuroscience has shown that the human brain has an innate capacity to resolve these issues, leading t… ▽ More Although deep RL models have shown a great potential for solving various types of tasks with minimal supervision, several key challenges remain in terms of learning from limited experience, adapting to environmental changes, and generalizing learning from a single task. Recent evidence in decision neuroscience has shown that the human brain has an innate capacity to resolve these issues, leading to optimism regarding the development of neuroscience-inspired solutions toward sample-efficient, and generalizable RL algorithms. We show that the computational model combining model-based and model-free control, which we term the prefrontal RL, reliably encodes the information of high-level policy that humans learned, and this model can generalize the learned policy to a wide range of tasks. First, we trained the prefrontal RL, and deep RL algorithms on 82 subjects' data, collected while human participants were performing two-stage Markov decision tasks, in which we manipulated the goal, state-transition uncertainty and state-space complexity. In the reliability test, which includes the latent behavior profile and the parameter recoverability test, we showed that the prefrontal RL reliably learned the latent policies of the humans, while all the other models failed. Second, to test the ability to generalize what these models learned from the original task, we situated them in the context of environmental volatility. Specifically, we ran large-scale simulations with 10 Markov decision tasks, in which latent context variables change over time. Our information-theoretic analysis showed that the prefrontal RL showed the highest level of adaptability and episodic encoding efficacy. This is the first attempt to formally test the possibility that computational models mimicking the way the brain solves general problems can lead to practical solutions to key challenges in machine learning. △ Less

Submitted 9 July, 2020; originally announced July 2020.

arXiv:2002.01171

Towards a Fast Steady-State Visual Evoked Potentials (SSVEP) Brain-Computer Interface (BCI)

Authors: Aung Aung Phyo Wai, Yangsong Zhang, Heng Guo, Ying Chi, Lei Zhang, Xian-Sheng Hua, Seong Whan Lee, Cuntai Guan

Abstract: Steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI) provides reliable responses leading to high accuracy and information throughput. But achieving high accuracy typically requires a relatively long time window of one second or more. Various methods were proposed to improve sub-second response accuracy through subject-specific training and calibration. Substantial performan… ▽ More Steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI) provides reliable responses leading to high accuracy and information throughput. But achieving high accuracy typically requires a relatively long time window of one second or more. Various methods were proposed to improve sub-second response accuracy through subject-specific training and calibration. Substantial performance improvements were achieved with tedious calibration and subject-specific training; resulting in the user's discomfort. So, we propose a training-free method by combining spatial-filtering and temporal alignment (CSTA) to recognize SSVEP responses in sub-second response time. CSTA exploits linear correlation and non-linear similarity between steady-state responses and stimulus templates with complementary fusion to achieve desirable performance improvements. We evaluated the performance of CSTA in terms of accuracy and Information Transfer Rate (ITR) in comparison with both training-based and training-free methods using two SSVEP data-sets. We observed that CSTA achieves the maximum mean accuracy of 97.43$\pm$2.26 % and 85.71$\pm$13.41 % with four-class and forty-class SSVEP data-sets respectively in sub-second response time in offline analysis. CSTA yields significantly higher mean performance (p<0.001) than the training-free method on both data-sets. Compared with training-based methods, CSTA shows 29.33$\pm$19.65 % higher mean accuracy with statistically significant differences in time window less than 0.5 s. In longer time windows, CSTA exhibits either better or comparable performance though not statistically significantly better than training-based methods. We show that the proposed method brings advantages of subject-independent SSVEP classification without requiring training while enabling high target recognition performance in sub-second response time. △ Less

Submitted 12 May, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

Comments: Further improvements or modifications required to algorithm design

arXiv:2001.10898 [pdf, other]

doi 10.1145/3313831.3376753

ScreenTrack: Using a Visual History of a Computer Screen to Retrieve Documents and Web Pages

Authors: Donghan Hu, Sang Won Lee

Abstract: Computers are used for various purposes, so frequent context switching is inevitable. In this setting, retrieving the documents, files, and web pages that have been used for a task can be a challenge. While modern applications provide a history of recent documents for users to resume work, this is not sufficient to retrieve all the digital resources relevant to a given primary document. The histor… ▽ More Computers are used for various purposes, so frequent context switching is inevitable. In this setting, retrieving the documents, files, and web pages that have been used for a task can be a challenge. While modern applications provide a history of recent documents for users to resume work, this is not sufficient to retrieve all the digital resources relevant to a given primary document. The histories currently available do not take into account the complex dependencies among resources across applications. To address this problem, we tested the idea of using a visual history of a computer screen to retrieve digital resources within a few days of their use through the development of ScreenTrack. ScreenTrack is software that captures screenshots of a computer at regular intervals. It then generates a time-lapse video from the captured screenshots and lets users retrieve a recently opened document or web page from a screenshot after recognizing the resource by its appearance. A controlled user study found that participants were able to retrieve requested information more quickly with ScreenTrack than under the baseline condition with existing tools. A follow-up study showed that the participants used ScreenTrack to retrieve previously used resources and to recover the context for task resumption. △ Less

Submitted 31 January, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: CHI 2020, 10 pages, 7 figures

arXiv:1910.02377 [pdf, other]

doi 10.5281/zenodo.1471026

Liveness in Interactive Systems

Authors: Sang Won Lee

Abstract: Creating an artifact in front of public offers an opportunity to involve spectators in the creation process. For example, in a live music concert, audience members can clap, stomp and sing with the musicians to be part of the music piece. Live creation can facilitate collaboration with the spectators. The questions I set out to answer are what does it mean to have liveness in interactive systems t… ▽ More Creating an artifact in front of public offers an opportunity to involve spectators in the creation process. For example, in a live music concert, audience members can clap, stomp and sing with the musicians to be part of the music piece. Live creation can facilitate collaboration with the spectators. The questions I set out to answer are what does it mean to have liveness in interactive systems to support large-scale hybrid events that involve audience participation. The notion of liveness is subtle in human-computer interaction. In this paper, I revisit the notion of liveness and provide definitions of both live and liveness from the perspective of designing interactive systems. In addition, I discuss why liveness matters in facilitating hybrid events and suggest future research works △ Less

Submitted 6 October, 2019; originally announced October 2019.

Journal ref: the CSCW 2018 workshop on Hybrid Events (CSCW) the CSCW 2018 workshop on Hybrid Events (CSCW) , 2018

arXiv:1910.02368 [pdf, other]

Computer-mediated Empathy

Authors: Sang Won Lee

Abstract: While novel social networks and emerging technologies help us transcend the spatial and temporal constraints inherent to in-person communication, the trade-off is a loss of natural expressivity. While empathetic interaction is already challenging in in-person communication, computer-mediated communication makes such empathetically rich communication even more difficult. Are technology and intellig… ▽ More While novel social networks and emerging technologies help us transcend the spatial and temporal constraints inherent to in-person communication, the trade-off is a loss of natural expressivity. While empathetic interaction is already challenging in in-person communication, computer-mediated communication makes such empathetically rich communication even more difficult. Are technology and intelligent systems opportunities or threats to more empathic interpersonal communication? Realizing empathy is suggested not only as a way to communicate with others but also to design products for users and facilitate creativity. In this position paper, I suggest a framework to breakdown empathy, introduce each element, and show how computing, technologies, and algorithms can support (or hinder) certain elements of the empathy framework. △ Less

Submitted 6 October, 2019; originally announced October 2019.

Journal ref: Virginia Tech Workshop on the Future of Human-Computer Interaction, 2019

arXiv:1807.09408 [pdf]

Deterministic Hypothesis Generation for Robust Fitting of Multiple Structures

Authors: Kwang Hee Lee, Chanki Yu, Sang Wook Lee

Abstract: We present a novel algorithm for generating robust and consistent hypotheses for multiple-structure model fitting. Most of the existing methods utilize random sampling which produce varying results especially when outlier ratio is high. For a structure where a model is fitted, the inliers of other structures are regarded as outliers when multiple structures are present. Global optimization has rec… ▽ More We present a novel algorithm for generating robust and consistent hypotheses for multiple-structure model fitting. Most of the existing methods utilize random sampling which produce varying results especially when outlier ratio is high. For a structure where a model is fitted, the inliers of other structures are regarded as outliers when multiple structures are present. Global optimization has recently been investigated to provide stable and unique solutions, but the computational cost of the algorithms is prohibitively high for most image data with reasonable sizes. The algorithm presented in this paper uses a maximum feasible subsystem (MaxFS) algorithm to generate consistent initial hypotheses only from partial datasets in spatially overlapping local image regions. Our assumption is that each genuine structure will exist as a dominant structure in at least one of the local regions. To refine initial hypotheses estimated from partial datasets and to remove residual tolerance dependency of the MaxFS algorithm, iterative re-weighted L1 (IRL1) minimization is performed for all the image data. Initial weights of IRL1 framework are determined from the initial hypotheses generated in local regions. Our approach is significantly more efficient than those that use only global optimization for all the image data. Experimental results demonstrate that the presented method can generate more reliable and consistent hypotheses than random-sampling methods for estimating single and multiple structures from data with a large amount of outliers. We clearly expose the influence of algorithm parameter settings on the results in our experiments. △ Less

Submitted 24 July, 2018; originally announced July 2018.

arXiv:1807.09210 [pdf]

doi 10.1109/ICCV.2013.12

Deterministic Fitting of Multiple Structures using Iterative MaxFS with Inlier Scale Estimation and Subset Updating

Authors: Kwang Hee Lee, Sang Wook Lee

Abstract: We present an efficient deterministic hypothesis generation algorithm for robust fitting of multiple structures based on the maximum feasible subsystem (MaxFS) framework. Despite its advantage, a global optimization method such as MaxFS has two main limitations for geometric model fitting. First, its performance is much influenced by the user-specified inlier scale. Second, it is computationally i… ▽ More We present an efficient deterministic hypothesis generation algorithm for robust fitting of multiple structures based on the maximum feasible subsystem (MaxFS) framework. Despite its advantage, a global optimization method such as MaxFS has two main limitations for geometric model fitting. First, its performance is much influenced by the user-specified inlier scale. Second, it is computationally inefficient for large data. The presented MaxFS-based algorithm iteratively estimates model parameters and inlier scale and also overcomes the second limitation by reducing data for the MaxFS problem. Further it generates hypotheses only with top-n ranked subsets based on matching scores and data fitting residuals. This reduction of data for the MaxFS problem makes the algorithm computationally realistic. Our method, called iterative MaxFS with inlier scale estimation and subset updating (IMaxFS-ISE-SU) in this paper, performs hypothesis generation and fitting alternately until all of true structures are found. The IMaxFS-ISE-SU algorithm generates substantially more reliable hypotheses than random sampling-based methods especially as (pseudo-)outlier ratios increase. Experimental results demonstrate that our method can generate more reliable and consistent hypotheses than random sampling-based methods for estimating multiple structures from data with many outliers. △ Less

Submitted 24 July, 2018; originally announced July 2018.

Comments: An extended version of our ICCV 2013 paper

arXiv:1701.02123 [pdf]

Green-Blue Stripe Pattern for Range Sensing from a Single Image

Authors: Changsoo Je, Kyuhyoung Choi, Sang Wook Lee

Abstract: In this paper, we present a novel method for rapid high-resolution range sensing using green-blue stripe pattern. We use green and blue for designing high-frequency stripe projection pattern. For accurate and reliable range recovery, we identify the stripe patterns by our color-stripe segmentation and unwrapping algorithms. The experimental result for a naked human face shows the effectiveness of… ▽ More In this paper, we present a novel method for rapid high-resolution range sensing using green-blue stripe pattern. We use green and blue for designing high-frequency stripe projection pattern. For accurate and reliable range recovery, we identify the stripe patterns by our color-stripe segmentation and unwrapping algorithms. The experimental result for a naked human face shows the effectiveness of our method. △ Less

Submitted 20 July, 2021; v1 submitted 9 January, 2017; originally announced January 2017.

Comments: 7 pages, 5 figures. Updated version of a conference paper

ACM Class: I.2.10; I.4.8

Journal ref: Proc. 30th Fall Semiannual Conference of Korea Information Science Society, vol. 2, pp. 661-663, Seoul, Korea, October, 2003

arXiv:1609.01382 [pdf, ps, other]

Creating Interactive Behaviors in Early Sketch by Recording and Remixing Crowd Demonstrations

Authors: Sang Won Lee, Yi Wei Yang, Shiyan Yan, Yujin Zhang, Isabelle Wong, Zhengxi Tan, Miles McGruder, Christopher Homan, Walter Lasecki

Abstract: In the early stages of designing graphical user interfaces (GUIs), the look (appearance) can be easily presented by sketching, but the feel (interactive behaviors) cannot, and often requires an accompanying description of how it works (Myers et al. 2008). We propose to use crowdsourcing to augment early sketches with interactive behaviors generated, used, and reused by collective "wizards-of-oz" a… ▽ More In the early stages of designing graphical user interfaces (GUIs), the look (appearance) can be easily presented by sketching, but the feel (interactive behaviors) cannot, and often requires an accompanying description of how it works (Myers et al. 2008). We propose to use crowdsourcing to augment early sketches with interactive behaviors generated, used, and reused by collective "wizards-of-oz" as opposed to a single wizard as in prior work (Davis et al. 2007). This demo presents an extension of Apparition (Lasecki et al. 2015), a crowd-powered prototyping tool that allows end users to create functional GUIs using speech and sketch. In Apparition, crowd workers collaborate in real-time on a shared canvas to refine the user-requested sketch interactively, and with the assistance of the end users. Our demo extends this functionality to let crowd workers "demonstrate" the canvas changes that are needed for a behavior and refine their demonstrations to improve the fidelity of interactive behaviors. The system then lets workers "remix" these behaviors to make creating future behaviors more efficient. △ Less

Submitted 5 September, 2016; originally announced September 2016.

Comments: HCOMP conference 2016

ACM Class: H.5.2; D.2.2

arXiv:1512.01809 [pdf, other]

doi 10.1007/s11042-015-3039-x

High quality voice conversion using prosodic and high-resolution spectral features

Authors: Hy Quy Nguyen, Siu Wa Lee, Xiaohai Tian, Minghui Dong, Eng Siong Chng

Abstract: Voice conversion methods have advanced rapidly over the last decade. Studies have shown that speaker characteristics are captured by spectral feature as well as various prosodic features. Most existing conversion methods focus on the spectral feature as it directly represents the timbre characteristics, while some conversion methods have focused only on the prosodic feature represented by the fund… ▽ More Voice conversion methods have advanced rapidly over the last decade. Studies have shown that speaker characteristics are captured by spectral feature as well as various prosodic features. Most existing conversion methods focus on the spectral feature as it directly represents the timbre characteristics, while some conversion methods have focused only on the prosodic feature represented by the fundamental frequency. In this paper, a comprehensive framework using deep neural networks to convert both timbre and prosodic features is proposed. The timbre feature is represented by a high-resolution spectral feature. The prosodic features include F0, intensity and duration. It is well known that DNN is useful as a tool to model high-dimensional features. In this work, we show that DNN initialized by our proposed autoencoder pretraining yields good quality DNN conversion models. This pretraining is tailor-made for voice conversion and leverages on autoencoder to capture the generic spectral shape of source speech. Additionally, our framework uses segmental DNN models to capture the evolution of the prosodic features over time. To reconstruct the converted speech, the spectral feature produced by the DNN model is combined with the three prosodic features produced by the DNN segmental models. Our experimental results show that the application of both prosodic and high-resolution spectral features leads to quality converted speech as measured by objective evaluation and subjective listening tests. △ Less

Submitted 6 December, 2015; originally announced December 2015.

arXiv:1510.01443 [pdf, other]

A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

Authors: Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, Minghui Dong

Abstract: State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filt… ▽ More State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filter during synthesis and the speech quality suffers. To bypass this bottleneck in vocoded speech, this paper proposes a phase-embedded waveform representation framework and establishes a magnitude-phase joint modeling platform for high-quality SPSS. Our experiments on waveform reconstruction show that the performance is better than that of the widely-used STRAIGHT. Furthermore, the proposed modeling and synthesis platform outperforms a leading-edge, vocoded, deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN)-based baseline system in various objective evaluation metrics conducted. △ Less

Submitted 6 October, 2015; originally announced October 2015.

Comments: accepted and will appear in APSIPA2015; keywords: speech synthesis, LSTM-RNN, vocoder, phase, waveform, modeling

MSC Class: 68T10

arXiv:1509.05592 [pdf]

doi 10.1007/978-3-540-76390-1_50

Color-Stripe Structured Light Robust to Surface Color and Discontinuity

Authors: Kwang Hee Lee, Changsoo Je, Sang Wook Lee

Abstract: Multiple color stripes have been employed for structured light-based rapid range imaging to increase the number of uniquely identifiable stripes. The use of multiple color stripes poses two problems: (1) object surface color may disturb the stripe color and (2) the number of adjacent stripes required for identifying a stripe may not be maintained near surface discontinuities such as occluding boun… ▽ More Multiple color stripes have been employed for structured light-based rapid range imaging to increase the number of uniquely identifiable stripes. The use of multiple color stripes poses two problems: (1) object surface color may disturb the stripe color and (2) the number of adjacent stripes required for identifying a stripe may not be maintained near surface discontinuities such as occluding boundaries. In this paper, we present methods to alleviate those problems. Log-gradient filters are employed to reduce the influence of object colors, and color stripes in two and three directions are used to increase the chance of identifying correct stripes near surface discontinuities. Experimental results demonstrate the effectiveness of our methods. △ Less

Submitted 18 September, 2015; originally announced September 2015.

Comments: 10 pages, 9 figures, 8th Asian Conference on Computer Vision (ACCV), Tokyo, Japan, November 2007, Proceedings, Part II

ACM Class: I.2.10; I.4.8

Journal ref: Computer Vision - ACCV 2007, LNCS 4844, pp. 507-516, Springer Berlin Heidelberg, November 14, 2007

arXiv:1509.04115 [pdf]

Color-Phase Analysis for Sinusoidal Structured Light in Rapid Range Imaging

Authors: Changsoo Je, Sang Wook Lee, Rae-Hong Park

Abstract: Active range sensing using structured-light is the most accurate and reliable method for obtaining 3D information. However, most of the work has been limited to range sensing of static objects, and range sensing of dynamic (moving or deforming) objects has been investigated recently only by a few researchers. Sinusoidal structured-light is one of the well-known optical methods for 3D measurement.… ▽ More Active range sensing using structured-light is the most accurate and reliable method for obtaining 3D information. However, most of the work has been limited to range sensing of static objects, and range sensing of dynamic (moving or deforming) objects has been investigated recently only by a few researchers. Sinusoidal structured-light is one of the well-known optical methods for 3D measurement. In this paper, we present a novel method for rapid high-resolution range imaging using color sinusoidal pattern. We consider the real-world problem of nonlinearity and color-band crosstalk in the color light projector and color camera, and present methods for accurate recovery of color-phase. For high-resolution ranging, we use high-frequency patterns and describe new unwrapping algorithms for reliable range recovery. The experimental results demonstrate the effectiveness of our methods. △ Less

Submitted 14 September, 2015; originally announced September 2015.

Comments: 6 pages, 12 figures. 6th Asian Conference on Computer Vision (ACCV 2004)

ACM Class: I.2.10; I.4.8

Journal ref: Proc. 6th Asian Conference on Computer Vision (ACCV 2004), vol. 1, pp. 270-275, Jeju Island, Korea, January 27, 2004

arXiv:1508.07859 [pdf]

doi 10.1016/j.image.2013.05.005

Multi-Projector Color Structured-Light Vision

Authors: Changsoo Je, Kwang Hee Lee, Sang Wook Lee

Abstract: Research interest in rapid structured-light imaging has grown increasingly for the modeling of moving objects, and a number of methods have been suggested for the range capture in a single video frame. The imaging area of a 3D object using a single projector is restricted since the structured light is projected only onto a limited area of the object surface. Employing additional projectors to broa… ▽ More Research interest in rapid structured-light imaging has grown increasingly for the modeling of moving objects, and a number of methods have been suggested for the range capture in a single video frame. The imaging area of a 3D object using a single projector is restricted since the structured light is projected only onto a limited area of the object surface. Employing additional projectors to broaden the imaging area is a challenging problem since simultaneous projection of multiple patterns results in their superposition in the light-intersected areas and the recognition of original patterns is by no means trivial. This paper presents a novel method of multi-projector color structured-light vision based on projector-camera triangulation. By analyzing the behavior of superposed-light colors in a chromaticity domain, we show that the original light colors cannot be properly extracted by the conventional direct estimation. We disambiguate multiple projectors by multiplexing the orientations of projector patterns so that the superposed patterns can be separated by explicit derivative computations. Experimental studies are carried out to demonstrate the validity of the presented method. The proposed method increases the efficiency of range acquisition compared to conventional active stereo using multiple projectors. △ Less

Submitted 31 August, 2015; originally announced August 2015.

Comments: 25 pages, 13 figures

ACM Class: I.2.10; I.4.8

Journal ref: Signal Processing: Image Communication, Volume 28, Issue 9, pp. 1046-1058, October, 2013

arXiv:1508.04981 [pdf]

doi 10.1007/978-3-540-24670-1_8

High-Contrast Color-Stripe Pattern for Rapid Structured-Light Range Imaging

Authors: Changsoo Je, Sang Wook Lee, Rae-Hong Park

Abstract: For structured-light range imaging, color stripes can be used for increasing the number of distinguishable light patterns compared to binary BW stripes. Therefore, an appropriate use of color patterns can reduce the number of light projections and range imaging is achievable in single video frame or in "one shot". On the other hand, the reliability and range resolution attainable from color stripe… ▽ More For structured-light range imaging, color stripes can be used for increasing the number of distinguishable light patterns compared to binary BW stripes. Therefore, an appropriate use of color patterns can reduce the number of light projections and range imaging is achievable in single video frame or in "one shot". On the other hand, the reliability and range resolution attainable from color stripes is generally lower than those from multiply projected binary BW patterns since color contrast is affected by object color reflectance and ambient light. This paper presents new methods for selecting stripe colors and designing multiple-stripe patterns for "one-shot" and "two-shot" imaging. We show that maximizing color contrast between the stripes in one-shot imaging reduces the ambiguities resulting from colored object surfaces and limitations in sensor/projector resolution. Two-shot imaging adds an extra video frame and maximizes the color contrast between the first and second video frames to diminish the ambiguities even further. Experimental results demonstrate the effectiveness of the presented one-shot and two-shot color-stripe imaging schemes. △ Less

Submitted 20 August, 2015; originally announced August 2015.

Comments: 13 pages, 12 figures, 8th European Conference on Computer Vision (ECCV), Prague, Czech Republic, May 2004, Proceedings, Part I

ACM Class: I.2.10; I.4.8

Journal ref: Computer Vision - ECCV 2004, LNCS 3021, pp. 95-107, Springer-Verlag Berlin Heidelberg, May 10, 2004

Showing 1–43 of 43 results for author: Lee, S W