Search | arXiv e-print repository

FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering

Authors: Xiaochen Wang, Junqing He, Zhe yang, Yiru Wang, Xiangdi Meng, Kunhao Pan, Zhifang Sui

Abstract: Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. However, they tend to perform poorly on Multi-hop Question Answering (MHQA) tasks due to several challenges, including hallucination, error propagation and limited context length. We propose a prompting method, Finite State Machine (FSM) to enhance th… ▽ More Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. However, they tend to perform poorly on Multi-hop Question Answering (MHQA) tasks due to several challenges, including hallucination, error propagation and limited context length. We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks in addition to improved effectiveness and trustworthiness. Different from COT methods, FSM addresses MHQA by iteratively decomposing a question into multi-turn sub-questions, and self-correcting in time, improving the accuracy of answers in each step. Specifically, FSM addresses one sub-question at a time and decides on the next step based on its current result and state, in an automaton-like format. Experiments on benchmarks show the effectiveness of our method. Although our method performs on par with the baseline on relatively simpler datasets, it excels on challenging datasets like Musique. Moreover, this approach mitigates the hallucination phenomenon, wherein the correct final answer can be recovered despite errors in intermediate reasoning. Furthermore, our method improves LLMs' ability to follow specified output format requirements, significantly reducing the difficulty of answer interpretation and the need for reformatting. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.19593 [pdf, other]

SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

Authors: Xin Su, Man Luo, Kris W Pan, Tien Pei Chou, Vasudev Lal, Phillip Howard

Abstract: Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for conte… ▽ More Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for context-augmented generation. Resources for adapting such models are therefore crucial for enabling their use in retrieval-augmented generation (RAG) settings, where a retriever is used to gather relevant information that is then subsequently provided to a generative model via context augmentation. To address this challenging problem, we generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs which require external knowledge to determine the final answer. Our dataset is both larger and significantly more diverse than existing resources of its kind, possessing over 11x more unique questions and containing images from a greater variety of sources than previously-proposed datasets. Through extensive experiments, we demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18070 [pdf, other]

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

Authors: Baoqi Pei, Guo Chen, Jilan Xu, Yuping He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao

Abstract: In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the uniqu… ▽ More In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the unique characteristics of egocentric videos and provides strong support for our competition submissions. In the Ego4D challenges, we tackle various tasks including Natural Language Queries, Step Grounding, Moment Queries, Short-term Object Interaction Anticipation, and Long-term Action Anticipation. In addition, we also participate in the EPIC-Kitchens challenge, where we engage in the Action Recognition, Multiple Instance Retrieval, and Domain Adaptation for Action Recognition tracks. By adapting EgoVideo to these diverse tasks, we showcase its versatility and effectiveness in different egocentric video analysis scenarios, demonstrating the powerful representation ability of EgoVideo as an egocentric foundation model. Our codebase and pretrained models are publicly available at https://github.com/OpenGVLab/EgoVideo. △ Less

Submitted 30 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: Champion solutions in the EgoVis CVPR 2024 workshop

arXiv:2405.01926 [pdf, other]

Auto-Encoding Morph-Tokens for Multimodal LLM

Authors: Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang

Abstract: For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge. This is due to a conflicting objective: for comprehension, an MLLM needs to abstract the visuals; for generation, it needs to preserve the visuals as much as possible. Thus, the objective is a dilemma for visual-tokens. To resolve the conflict, we propose encoding… ▽ More For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge. This is due to a conflicting objective: for comprehension, an MLLM needs to abstract the visuals; for generation, it needs to preserve the visuals as much as possible. Thus, the objective is a dilemma for visual-tokens. To resolve the conflict, we propose encoding images into morph-tokens to serve a dual purpose: for comprehension, they act as visual prompts instructing MLLM to generate texts; for generation, they take on a different, non-conflicting role as complete visual-tokens for image reconstruction, where the missing visual cues are recovered by the MLLM. Extensive experiments show that morph-tokens can achieve a new SOTA for multimodal comprehension and generation simultaneously. Our project is available at https://github.com/DCDmllm/MorphTokens. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024

arXiv:2312.07226 [pdf, other]

Super-Resolution on Rotationally Scanned Photoacoustic Microscopy Images Incorporating Scanning Prior

Authors: Kai Pan, Linyang Li, Li Lin, Pujin Cheng, Junyan Lyu, Lei Xi, Xiaoyin Tang

Abstract: Photoacoustic Microscopy (PAM) images integrating the advantages of optical contrast and acoustic resolution have been widely used in brain studies. However, there exists a trade-off between scanning speed and image resolution. Compared with traditional raster scanning, rotational scanning provides good opportunities for fast PAM imaging by optimizing the scanning mechanism. Recently, there is a t… ▽ More Photoacoustic Microscopy (PAM) images integrating the advantages of optical contrast and acoustic resolution have been widely used in brain studies. However, there exists a trade-off between scanning speed and image resolution. Compared with traditional raster scanning, rotational scanning provides good opportunities for fast PAM imaging by optimizing the scanning mechanism. Recently, there is a trend to incorporate deep learning into the scanning process to further increase the scanning speed.Yet, most such attempts are performed for raster scanning while those for rotational scanning are relatively rare. In this study, we propose a novel and well-performing super-resolution framework for rotational scanning-based PAM imaging. To eliminate adjacent rows' displacements due to subject motion or high-frequency scanning distortion,we introduce a registration module across odd and even rows in the preprocessing and incorporate displacement degradation in the training. Besides, gradient-based patch selection is proposed to increase the probability of blood vessel patches being selected for training. A Transformer-based network with a global receptive field is applied for better performance. Experimental results on both synthetic and real datasets demonstrate the effectiveness and generalizability of our proposed framework for rotationally scanned PAM images'super-resolution, both quantitatively and qualitatively. Code is available at https://github.com/11710615/PAMSR.git. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.09198 [pdf, other]

Never Lost in the Middle: Improving Large Language Models via Attention Strengthening Question Answering

Authors: Junqing He, Kunhao Pan, Xiaoqun Dong, Zhuoyang Song, Yibo Liu, Yuxin Liang, Hao Wang, Qianguo Sun, Songxin Zhang, Zejian Xie, Jiaxing Zhang

Abstract: While large language models (LLMs) are equipped with longer text input capabilities than before, they are struggling to seek correct information in long contexts. The "lost in the middle" problem challenges most LLMs, referring to the dramatic decline in accuracy when correct information is located in the middle. To overcome this crucial issue, this paper proposes to enhance the information search… ▽ More While large language models (LLMs) are equipped with longer text input capabilities than before, they are struggling to seek correct information in long contexts. The "lost in the middle" problem challenges most LLMs, referring to the dramatic decline in accuracy when correct information is located in the middle. To overcome this crucial issue, this paper proposes to enhance the information searching and reflection ability of LLMs in long contexts via specially designed tasks called Attention Strengthening Multi-doc QA (ASM QA). Following these tasks, our model excels in focusing more precisely on the desired information. Experimental results show substantial improvement in Multi-doc QA and other benchmarks, superior to state-of-the-art models by 13.7% absolute gain in shuffled settings, by 21.5% in passage retrieval task. We release our model, Ziya-Reader to promote related research in the community. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.03301 [pdf, other]

Ziya2: Data-centric Learning is All LLMs Need

Authors: Ruyi Gan, Ziwei Wu, Renliang Sun, Junyu Lu, Xiaojun Wu, Dixiang Zhang, Kunhao Pan, Junqing He, Yuanhe Tian, Ping Yang, Qi Yang, Hao Wang, Jiaxing Zhang, Yan Song

Abstract: Various large language models (LLMs) have been proposed in recent years, including closed- and open-source ones, continually setting new records on multiple benchmarks. However, the development of LLMs still faces several issues, such as high cost of training models from scratch, and continual pre-training leading to catastrophic forgetting, etc. Although many such issues are addressed along the l… ▽ More Various large language models (LLMs) have been proposed in recent years, including closed- and open-source ones, continually setting new records on multiple benchmarks. However, the development of LLMs still faces several issues, such as high cost of training models from scratch, and continual pre-training leading to catastrophic forgetting, etc. Although many such issues are addressed along the line of research on LLMs, an important yet practical limitation is that many studies overly pursue enlarging model sizes without comprehensively analyzing and optimizing the use of pre-training data in their learning process, as well as appropriate organization and leveraging of such data in training LLMs under cost-effective settings. In this work, we propose Ziya2, a model with 13 billion parameters adopting LLaMA2 as the foundation model, and further pre-trained on 700 billion tokens, where we focus on pre-training techniques and use data-centric optimization to enhance the learning process of Ziya2 on different stages. We define three data attributes and firstly establish data-centric scaling laws to illustrate how different data impacts LLMs. Experiments show that Ziya2 significantly outperforms other models in multiple benchmarks especially with promising results compared to representative open-source ones. Ziya2 (Base) is released at https://huggingface.co/IDEA-CCNL/Ziya2-13B-Base and https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary. △ Less

Submitted 4 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.02821 [pdf, other]

Improving Vision Anomaly Detection with the Guidance of Language Modality

Authors: Dong Chen, Kaihang Pan, Guoming Wang, Yueting Zhuang, Siliang Tang

Abstract: Recent years have seen a surge of interest in anomaly detection for tackling industrial defect detection, event detection, etc. However, existing unsupervised anomaly detectors, particularly those for the vision modality, face significant challenges due to redundant information and sparse latent space. Conversely, the language modality performs well due to its relatively single data. This paper ta… ▽ More Recent years have seen a surge of interest in anomaly detection for tackling industrial defect detection, event detection, etc. However, existing unsupervised anomaly detectors, particularly those for the vision modality, face significant challenges due to redundant information and sparse latent space. Conversely, the language modality performs well due to its relatively single data. This paper tackles the aforementioned challenges for vision modality from a multimodal point of view. Specifically, we propose Cross-modal Guidance (CMG), which consists of Cross-modal Entropy Reduction (CMER) and Cross-modal Linear Embedding (CMLE), to tackle the redundant information issue and sparse space issue, respectively. CMER masks parts of the raw image and computes the matching score with the text. Then, CMER discards irrelevant pixels to make the detector focus on critical contents. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality, and then the latent space of vision modality will be learned with the guidance of the matrix. Thereafter, the vision latent space will get semantically similar images closer. Extensive experiments demonstrate the effectiveness of the proposed methods. Particularly, CMG outperforms the baseline that only uses images by 16.81%. Ablation experiments further confirm the synergy among the proposed methods, as each component depends on the other to achieve optimal performance. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 9 pages, 10 figures

arXiv:2309.09526 [pdf, other]

DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues

Authors: Kun Pan, Yin Yifang, Yao Wei, Feng Lin, Zhongjie Ba, Zhenguang Liu, ZhiBo Wang, Lorenzo Cavallaro, Kui Ren

Abstract: The malicious use and widespread dissemination of deepfake pose a significant crisis of trust. Current deepfake detection models can generally recognize forgery images by training on a large dataset. However, the accuracy of detection models degrades significantly on images generated by new deepfake methods due to the difference in data distribution. To tackle this issue, we present a novel increm… ▽ More The malicious use and widespread dissemination of deepfake pose a significant crisis of trust. Current deepfake detection models can generally recognize forgery images by training on a large dataset. However, the accuracy of detection models degrades significantly on images generated by new deepfake methods due to the difference in data distribution. To tackle this issue, we present a novel incremental learning framework that improves the generalization of deepfake detection models by continual learning from a small number of new samples. To cope with different data distributions, we propose to learn a domain-invariant representation based on supervised contrastive learning, preventing overfit to the insufficient new data. To mitigate catastrophic forgetting, we regularize our model in both feature-level and label-level based on a multi-perspective knowledge distillation approach. Finally, we propose to select both central and hard representative samples to update the replay set, which is beneficial for both domain-invariant representation learning and rehearsal-based knowledge preserving. We conduct extensive experiments on four benchmark datasets, obtaining the new state-of-the-art average forgetting rate of 7.01 and average accuracy of 85.49 on FF++, DFDC-P, DFD, and CDF2. Our code is released at https://github.com/DeepFakeIL/DFIL. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted by ACMMM2023

arXiv:2308.10025 [pdf, other]

I3: Intent-Introspective Retrieval Conditioned on Instructions

Authors: Kaihang Pan, Juncheng Li, Wenjie Wang, Hao Fei, Hongye Song, Wei Ji, Jun Lin, Xiaozhong Liu, Tat-Seng Chua, Siliang Tang

Abstract: Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents. To address this challenge, in this work we leverage instructions to flexibly describe retrieval intents and introduce I3, a unified retrieval system that performs Intent-Introspective… ▽ More Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents. To address this challenge, in this work we leverage instructions to flexibly describe retrieval intents and introduce I3, a unified retrieval system that performs Intent-Introspective retrieval across various tasks, conditioned on Instructions without any task-specific training. I3 innovatively incorporates a pluggable introspector in a parameter-isolated manner to comprehend specific retrieval intents by jointly reasoning over the input query and instruction, and seamlessly integrates the introspected intent into the original retrieval model for intent-aware retrieval. Furthermore, we propose progressively-pruned intent learning. It utilizes extensive LLM-generated data to train I3 phase-by-phase, embodying two key designs: progressive structure pruning and drawback extrapolation-based data refinement. Extensive experiments show that in the BEIR benchmark, I3 significantly outperforms baseline methods designed with task-specific retrievers, achieving state-of-the-art zero-shot performance without any task-specific tuning. △ Less

Submitted 25 April, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

Comments: Accepted by SIGIR 2024

arXiv:2308.04152 [pdf, other]

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

Authors: Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Hanwang Zhang, Yueting Zhuang

Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can recognize. This is achieved by training the VPGs on millions of image-caption pairs, where the VPG-generated tokens of images are fed into a frozen LLM to generate the corresponding captions. However, this image-captioning based tr… ▽ More Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can recognize. This is achieved by training the VPGs on millions of image-caption pairs, where the VPG-generated tokens of images are fed into a frozen LLM to generate the corresponding captions. However, this image-captioning based training objective inherently biases the VPG to concentrate solely on the primary visual contents sufficient for caption generation, often neglecting other visual details. This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task. To address this issue, we introduce a generic and lightweight Visual Prompt Generator Complete module (VPG-C), which can infer and complete the missing details essential for comprehending demonstrative instructions. Further, we propose a synthetic discriminative training strategy to fine-tune VPG-C, eliminating the need for supervised demonstrative instructions. As for evaluation, we build DEMON, a comprehensive benchmark for demonstrative instruction understanding. Synthetically trained with the proposed strategy, VPG-C achieves significantly stronger zero-shot performance across all tasks of DEMON. Further evaluation on the MME and OwlEval benchmarks also demonstrate the superiority of VPG-C. Our benchmark, code, and pre-trained models are available at https://github.com/DCDmllm/Cheetah. △ Less

Submitted 25 May, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted by ICLR 2024 (Spotlight)

arXiv:2307.16180 [pdf, other]

Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models

Authors: Keyu Pan, Yawen Zeng

Abstract: The field of large language models (LLMs) has made significant progress, and their knowledge storage capacity is approaching that of human beings. Furthermore, advanced techniques, such as prompt learning and reinforcement learning, are being employed to address ethical concerns and hallucination problems associated with LLMs, bringing them closer to aligning with human values. This situation natu… ▽ More The field of large language models (LLMs) has made significant progress, and their knowledge storage capacity is approaching that of human beings. Furthermore, advanced techniques, such as prompt learning and reinforcement learning, are being employed to address ethical concerns and hallucination problems associated with LLMs, bringing them closer to aligning with human values. This situation naturally raises the question of whether LLMs with human-like abilities possess a human-like personality? In this paper, we aim to investigate the feasibility of using the Myers-Briggs Type Indicator (MBTI), a widespread human personality assessment tool, as an evaluation metric for LLMs. Specifically, extensive experiments will be conducted to explore: 1) the personality types of different LLMs, 2) the possibility of changing the personality types by prompt engineering, and 3) How does the training dataset affect the model's personality. Although the MBTI is not a rigorous assessment, it can still reflect the similarity between LLMs and human personality. In practice, the MBTI has the potential to serve as a rough indicator. Our codes are available at https://github.com/HarderThenHarder/transformers_tasks/tree/main/LLM/llms_mbti. △ Less

Submitted 30 July, 2023; originally announced July 2023.

arXiv:2303.12314 [pdf, other]

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization

Authors: Kaihang Pan, Juncheng Li, Hongye Song, Jun Lin, Xiaozhong Liu, Siliang Tang

Abstract: Prompt tuning is a parameter-efficient method, which learns soft prompts and conditions frozen language models to perform specific downstream tasks. Though effective, prompt tuning under few-shot settings on the one hand heavily relies on a good initialization of soft prompts. On the other hand, it can easily overfit to few-shot training samples, thereby undermining generalizability. Existing work… ▽ More Prompt tuning is a parameter-efficient method, which learns soft prompts and conditions frozen language models to perform specific downstream tasks. Though effective, prompt tuning under few-shot settings on the one hand heavily relies on a good initialization of soft prompts. On the other hand, it can easily overfit to few-shot training samples, thereby undermining generalizability. Existing works leverage pre-training or supervised meta-learning to initialize soft prompts but they fail to data-efficiently generalize to unseen downstream tasks. To address the above problems, this paper proposes a novel Self-sUpervised meta-Prompt learning framework with MEta-gradient Regularization for few-shot generalization (SUPMER). SUPMER leverages self-supervised meta-learning with a diverse set of well-designed meta-training tasks to learn a universal prompt initialization for efficient adaptation using only unlabeled data. Additionally, it jointly meta-learns a gradient regularization function to transform raw gradients into a domain-generalizable direction, thus alleviating the problem of overfitting. Extensive experiments show that SUPMER achieves better performance for different few-shot downstream tasks, and also exhibits a stronger domain generalization ability. The code for SUPMER will be available at https://github.com/beepkh/SUPMER. △ Less

Submitted 23 October, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: Accepted by EMNLP 2023 (Findings)

arXiv:2211.11304 [pdf, other]

TCBERT: A Technical Report for Chinese Topic Classification BERT

Authors: Ting Han, Kunhao Pan, Xinyu Chen, Dingjie Song, Yuchen Fan, Xinyu Gao, Ruyi Gan, Jiaxing Zhang

Abstract: Bidirectional Encoder Representations from Transformers or BERT~\cite{devlin-etal-2019-bert} has been one of the base models for various NLP tasks due to its remarkable performance. Variants customized for different languages and tasks are proposed to further improve the performance. In this work, we investigate supervised continued pre-training~\cite{gururangan-etal-2020-dont} on BERT for Chinese… ▽ More Bidirectional Encoder Representations from Transformers or BERT~\cite{devlin-etal-2019-bert} has been one of the base models for various NLP tasks due to its remarkable performance. Variants customized for different languages and tasks are proposed to further improve the performance. In this work, we investigate supervised continued pre-training~\cite{gururangan-etal-2020-dont} on BERT for Chinese topic classification task. Specifically, we incorporate prompt-based learning and contrastive learning into the pre-training. To adapt to the task of Chinese topic classification, we collect around 2.1M Chinese data spanning various topics. The pre-trained Chinese Topic Classification BERTs (TCBERTs) with different parameter sizes are open-sourced at \url{https://huggingface.co/IDEA-CCNL}. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2209.02970 [pdf, other]

Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

Authors: Jiaxing Zhang, Ruyi Gan, Junjie Wang, Yuxiang Zhang, Lin Zhang, Ping Yang, Xinyu Gao, Ziwei Wu, Xiaoqun Dong, Junqing He, Jianheng Zhuo, Qi Yang, Yongfeng Huang, Xiayu Li, Yanghan Wu, Junyu Lu, Xinyu Zhu, Weifeng Chen, Ting Han, Kunhao Pan, Rui Wang, Hao Wang, Xiaojun Wu, Zhongshen Zeng, Chongpei Chen

Abstract: Nowadays, foundation models become one of fundamental infrastructures in artificial intelligence, paving ways to the general intelligence. However, the reality presents two urgent challenges: existing foundation models are dominated by the English-language community; users are often given limited resources and thus cannot always use foundation models. To support the development of the Chinese-lang… ▽ More Nowadays, foundation models become one of fundamental infrastructures in artificial intelligence, paving ways to the general intelligence. However, the reality presents two urgent challenges: existing foundation models are dominated by the English-language community; users are often given limited resources and thus cannot always use foundation models. To support the development of the Chinese-language community, we introduce an open-source project, called Fengshenbang, which leads by the research center for Cognitive Computing and Natural Language (CCNL). Our project has comprehensive capabilities, including large pre-trained models, user-friendly APIs, benchmarks, datasets, and others. We wrap all these in three sub-projects: the Fengshenbang Model, the Fengshen Framework, and the Fengshen Benchmark. An open-source roadmap, Fengshenbang, aims to re-evaluate the open-source community of Chinese pre-trained large-scale models, prompting the development of the entire Chinese large-scale model community. We also want to build a user-centered open-source ecosystem to allow individuals to access the desired models to match their computing resources. Furthermore, we invite companies, colleges, and research institutions to collaborate with us to build the large-scale open-source model-based ecosystem. We hope that this project will be the foundation of Chinese cognitive intelligence. △ Less

Submitted 30 March, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: Added the Chinese version and is now a bilingual paper

arXiv:2203.06920 [pdf, other]

DS3-Net: Difficulty-perceived Common-to-T1ce Semi-Supervised Multimodal MRI Synthesis Network

Authors: Ziqi Huang, Li Lin, Pujin Cheng, Kai Pan, Xiaoying Tang

Abstract: Contrast-enhanced T1 (T1ce) is one of the most essential magnetic resonance imaging (MRI) modalities for diagnosing and analyzing brain tumors, especially gliomas. In clinical practice, common MRI modalities such as T1, T2, and fluid attenuation inversion recovery are relatively easy to access while T1ce is more challenging considering the additional cost and potential risk of allergies to the con… ▽ More Contrast-enhanced T1 (T1ce) is one of the most essential magnetic resonance imaging (MRI) modalities for diagnosing and analyzing brain tumors, especially gliomas. In clinical practice, common MRI modalities such as T1, T2, and fluid attenuation inversion recovery are relatively easy to access while T1ce is more challenging considering the additional cost and potential risk of allergies to the contrast agent. Therefore, it is of great clinical necessity to develop a method to synthesize T1ce from other common modalities. Current paired image translation methods typically have the issue of requiring a large amount of paired data and do not focus on specific regions of interest, e.g., the tumor region, in the synthesization process. To address these issues, we propose a Difficulty-perceived common-to-T1ce Semi-Supervised multimodal MRI Synthesis network (DS3-Net), involving both paired and unpaired data together with dual-level knowledge distillation. DS3-Net predicts a difficulty map to progressively promote the synthesis task. Specifically, a pixelwise constraint and a patchwise contrastive constraint are guided by the predicted difficulty map. Through extensive experiments on the publiclyavailable BraTS2020 dataset, DS3-Net outperforms its supervised counterpart in each respect. Furthermore, with only 5% paired data, the proposed DS3-Net achieves competitive performance with state-of-theart image translation methods utilizing 100% paired data, delivering an average SSIM of 0.8947 and an average PSNR of 23.60. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 10 pages, 2 figures

arXiv:2104.06677 [pdf, ps, other]

Multi-Party Dual Learning

Authors: Maoguo Gong, Yuan Gao, Yu Xie, A. K. Qin, Ke Pan, Yew-Soon Ong

Abstract: The performance of machine learning algorithms heavily relies on the availability of a large amount of training data. However, in reality, data usually reside in distributed parties such as different institutions and may not be directly gathered and integrated due to various data policy constraints. As a result, some parties may suffer from insufficient data available for training machine learning… ▽ More The performance of machine learning algorithms heavily relies on the availability of a large amount of training data. However, in reality, data usually reside in distributed parties such as different institutions and may not be directly gathered and integrated due to various data policy constraints. As a result, some parties may suffer from insufficient data available for training machine learning models. In this paper, we propose a multi-party dual learning (MPDL) framework to alleviate the problem of limited data with poor quality in an isolated party. Since the knowledge sharing processes for multiple parties always emerge in dual forms, we show that dual learning is naturally suitable to handle the challenge of missing data, and explicitly exploits the probabilistic correlation and structural relationship between dual tasks to regularize the training process. We introduce a feature-oriented differential privacy with mathematical proof, in order to avoid possible privacy leakage of raw features in the dual inference process. The approach requires minimal modifications to the existing multi-party learning structure, and each party can build flexible and powerful models separately, whose accuracy is no less than non-distributed self-learning approaches. The MPDL framework achieves significant improvement compared with state-of-the-art multi-party learning methods, as we demonstrated through simulations on real-world datasets. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: submitted to IEEE Transactions on Cybernetics

arXiv:2006.14390 [pdf, other]

A New Modal Autoencoder for Functionally Independent Feature Extraction

Authors: Yuzhu Guo, Kang Pan, Simeng Li, Zongchang Han, Kexin Wang, Li Li

Abstract: Autoencoders have been widely used for dimensional reduction and feature extraction. Various types of autoencoders have been proposed by introducing regularization terms. Most of these regularizations improve representation learning by constraining the weights in the encoder part, which maps input into hidden nodes and affects the generation of features. In this study, we show that a constraint to… ▽ More Autoencoders have been widely used for dimensional reduction and feature extraction. Various types of autoencoders have been proposed by introducing regularization terms. Most of these regularizations improve representation learning by constraining the weights in the encoder part, which maps input into hidden nodes and affects the generation of features. In this study, we show that a constraint to the decoder can also significantly improve its performance because the decoder determines how the latent variables contribute to the reconstruction of input. Inspired by the structural modal analysis method in mechanical engineering, a new modal autoencoder (MAE) is proposed by othogonalising the columns of the readout weight matrix. The new regularization helps to disentangle explanatory factors of variation and forces the MAE to extract fundamental modes in data. The learned representations are functionally independent in the reconstruction of input and perform better in consecutive classification tasks. The results were validated on the MNIST variations and USPS classification benchmark suite. Comparative experiments clearly show that the new algorithm has a surprising advantage. The new MAE introduces a very simple training principle for autoencoders and could be promising for the pre-training of deep neural networks. △ Less

Submitted 25 June, 2020; originally announced June 2020.

arXiv:2004.13927 [pdf, other]

Dynamic Anomaly Detection with High-fidelity Simulators: A Convex Optimization Approach

Authors: Kaikai Pan, Peter Palensky, Peyman Mohajerin Esfahani

Abstract: The main objective of this article is to develop scalable dynamic anomaly detectors when high-fidelity simulators of power systems are at our disposal. On the one hand, mathematical models of these high-fidelity simulators are typically "intractable" to apply existing model-based approaches. On the other hand, pure data-driven methods developed primarily in the machine learning literature neglect… ▽ More The main objective of this article is to develop scalable dynamic anomaly detectors when high-fidelity simulators of power systems are at our disposal. On the one hand, mathematical models of these high-fidelity simulators are typically "intractable" to apply existing model-based approaches. On the other hand, pure data-driven methods developed primarily in the machine learning literature neglect our knowledge about the underlying dynamics of the systems. In this study, we combine tools from these two mainstream approaches to develop a diagnosis filter that utilizes the knowledge of both the dynamical system as well as the simulation data of the high-fidelity simulators. The proposed diagnosis filter aims to achieve two desired features: (i) performance robustness with respect to model mismatch; (ii) high scalability. To this end, we propose a tractable (convex) optimization-based reformulation in which decisions are the filter parameters, the model-based information introduces feasible sets, and the data from the simulator forms the objective function to-be-minimized regarding the effect of model mismatch on the filter performance. To validate the theoretical results, we implement the developed diagnosis filter in DIgSILENT PowerFactory to detect false data injection attacks on the Automatic Generation Control measurements in the three-area IEEE 39-bus system. △ Less

Submitted 6 October, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: 19 pages

arXiv:2003.02229 [pdf, other]

doi 10.1109/PMAPS47429.2020.9183526

Detection of False Data Injection Attacks Using the Autoencoder Approach

Authors: Chenguang Wang, Simon Tindemans, Kaikai Pan, Peter Palensky

Abstract: State estimation is of considerable significance for the power system operation and control. However, well-designed false data injection attacks can utilize blind spots in conventional residual-based bad data detection methods to manipulate measurements in a coordinated manner and thus affect the secure operation and economic dispatch of grids. In this paper, we propose a detection approach based… ▽ More State estimation is of considerable significance for the power system operation and control. However, well-designed false data injection attacks can utilize blind spots in conventional residual-based bad data detection methods to manipulate measurements in a coordinated manner and thus affect the secure operation and economic dispatch of grids. In this paper, we propose a detection approach based on an autoencoder neural network. By training the network on the dependencies intrinsic in 'normal' operation data, it effectively overcomes the challenge of unbalanced training data that is inherent in power system attack detection. To evaluate the detection performance of the proposed mechanism, we conduct a series of experiments on the IEEE 118-bus power system. The experiments demonstrate that the proposed autoencoder detector displays robust detection performance under a variety of attack scenarios. △ Less

Submitted 14 December, 2022; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: 6 pages, 5 figures, 1 table, conference

Journal ref: 2020 International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), IEEE, Liege, Belgium, 2020, pp. 1-6

arXiv:2001.07068 [pdf, other]

False Data Injection Attacks on Hybrid AC/HVDC Interconnected System with Virtual Inertia -- Vulnerability, Impact and Detection

Authors: Kaikai Pan, Elyas Rakhshani, Peter Palensky

Abstract: Power systems are moving towards hybrid AC/DC grids with the integration of HVDC links, renewable resources and energy storage modules. New models of frequency control have to consider the complex interactions between these components. Meanwhile, more attention should be paid to cyber security concerns as these control strategies highly depend on data communications which may be exposed to cyber a… ▽ More Power systems are moving towards hybrid AC/DC grids with the integration of HVDC links, renewable resources and energy storage modules. New models of frequency control have to consider the complex interactions between these components. Meanwhile, more attention should be paid to cyber security concerns as these control strategies highly depend on data communications which may be exposed to cyber attacks. In this regard, this article aims to analyze the false data injection (FDI) attacks on the AC/DC interconnected system with virtual inertia and develop advanced diagnosis tools to reveal their occurrence. We build an optimization-based framework for the purpose of vulnerability and attack impact analysis. Considering the attack impact on the system frequency stability, it is shown that the hybrid grid with parallel AC/DC links and emulated inertia is more vulnerable to the FDI attacks, compared with the one without virtual inertia and the normal AC system. We then propose a detection approach to detect and isolate each FDI intrusion with a sufficient fast response, and even recover the attack value. In addition to theoretical results, the effectiveness of the proposed methods is validated through simulations on the two-area AC/DC interconnected system with virtual inertia emulation capabilities. △ Less

Submitted 30 May, 2020; v1 submitted 20 January, 2020; originally announced January 2020.

arXiv:1911.08802 [pdf]

Blockchain-Assisted Spectrum Trading between Elastic Virtual Optical Networks

Authors: Shifeng Ding, Kevin X. Pan, Sanjay K. Bose, Qiong Zhang, Gangxiang Shen

Abstract: In communication networks, network virtualization can usually provide better capacity utilization and quality of service (QoS) than what can be achieved otherwise. However, conventional resource allocation for virtualized networks would still follow a fixed pattern based on the predicted capacity needs of the users, even though, in reality, the actual traffic demand of a user will always tend to f… ▽ More In communication networks, network virtualization can usually provide better capacity utilization and quality of service (QoS) than what can be achieved otherwise. However, conventional resource allocation for virtualized networks would still follow a fixed pattern based on the predicted capacity needs of the users, even though, in reality, the actual traffic demand of a user will always tend to fluctuate. The mismatch between the fixed capacity allocation and the actual fluctuating traffic would lead to degradation of provisioned network services and inefficiency in the assigned network capacity. To overcome this, we propose a new spectrum trading (ST) scheme between virtual optical networks (VONs) in the context of an elastic optical network (EON). The key idea here is to allow different VONs to trade their spectrum resources according to the actual capacity they need at different time instants. A VON with unused spectra can then trade away its unused spectra to other VONs that are short of spectrum resources at that time. In exchange, it is rewarded with a certain amount of credit for its contribution to the ST community, which it can then use later to get extra bandwidth, if needed. The trust-worthiness of the trading records between the VONs is ensured in a distributed fashion through a blockchain-assisted account book that is updated whenever a new trade occurs. For this, we develop a software-defined control plane to enable spectrum trading in an EON. The performance of the ST scheme is evaluated and compared with a scenario without such trading. Our results show that the proposed ST scheme is effective in improving the QoS of each VON and significantly improves the overall network capacity utilization. △ Less

Submitted 20 November, 2019; originally announced November 2019.

Comments: 7 pages, 5 figures

arXiv:1708.08355 [pdf, other]

Data Attacks on Power System State Estimation: Limited Adversarial Knowledge vs. Limited Attack Resources

Authors: Kaikai Pan, André Teixeira, Milos Cvetkovic, Peter Palensky

Abstract: A class of data integrity attack, known as false data injection (FDI) attack, has been studied with a considerable amount of work. It has shown that with perfect knowledge of the system model and the capability to manipulate a certain number of measurements, the FDI attacks can coordinate measurements corruption to keep stealth against the bad data detection. However, a more realistic attack is es… ▽ More A class of data integrity attack, known as false data injection (FDI) attack, has been studied with a considerable amount of work. It has shown that with perfect knowledge of the system model and the capability to manipulate a certain number of measurements, the FDI attacks can coordinate measurements corruption to keep stealth against the bad data detection. However, a more realistic attack is essentially an attack with limited adversarial knowledge of the system model and limited attack resources due to various reasons. In this paper, we generalize the data attacks that they can be pure FDI attacks or combined with availability attacks (e.g., DoS attacks) and analyze the attacks with limited adversarial knowledge or limited attack resources. The attack impact is evaluated by the proposed metrics and the detection probability of attacks is calculated using the distribution property of data with or without attacks. The analysis is supported with results from a power system use case. The results show how important the knowledge is to the attacker and which measurements are more vulnerable to attacks with limited resources. △ Less

Submitted 28 August, 2017; originally announced August 2017.

Comments: Accepted in the 43rd Annual Conference of the IEEE Industrial Electronics Society (IECON 2017)

arXiv:1708.08349 [pdf, other]

Cyber Risk Analysis of Combined Data Attacks Against Power System State Estimation

Authors: Kaikai Pan, André Teixeira, Milos Cvetkovic, Peter Palensky

Abstract: Understanding smart grid cyber attacks is key for developing appropriate protection and recovery measures. Advanced attacks pursue maximized impact at minimized costs and detectability. This paper conducts risk analysis of combined data integrity and availability attacks against the power system state estimation. We compare the combined attacks with pure integrity attacks - false data injection (F… ▽ More Understanding smart grid cyber attacks is key for developing appropriate protection and recovery measures. Advanced attacks pursue maximized impact at minimized costs and detectability. This paper conducts risk analysis of combined data integrity and availability attacks against the power system state estimation. We compare the combined attacks with pure integrity attacks - false data injection (FDI) attacks. A security index for vulnerability assessment to these two kinds of attacks is proposed and formulated as a mixed integer linear programming problem. We show that such combined attacks can succeed with fewer resources than FDI attacks. The combined attacks with limited knowledge of the system model also expose advantages in keeping stealth against the bad data detection. Finally, the risk of combined attacks to reliable system operation is evaluated using the results from vulnerability assessment and attack impact analysis. The findings in this paper are validated and supported by a detailed case study. △ Less

Submitted 28 August, 2017; originally announced August 2017.

Comments: Submitted to IEEE Transactions on Smart Grid on August 14th, 2017

arXiv:1708.08322 [pdf, other]

Co-simulation for Cyber Security Analysis: Data Attacks against Energy Management System

Authors: Kaikai Pan, André Teixeira, Claudio López, Peter Palensky

Abstract: It is challenging to assess the vulnerability of a cyber-physical power system to data attacks from an integral perspective. In order to support vulnerability assessment except analytic analysis, suitable platform for security tests needs to be developed. In this paper we analyze the cyber security of energy management system (EMS) against data attacks. First we extend our analytic framework that… ▽ More It is challenging to assess the vulnerability of a cyber-physical power system to data attacks from an integral perspective. In order to support vulnerability assessment except analytic analysis, suitable platform for security tests needs to be developed. In this paper we analyze the cyber security of energy management system (EMS) against data attacks. First we extend our analytic framework that characterizes data attacks as optimization problems with the objectives specified as security metrics and constraints corresponding to the communication network properties. Second, we build a platform in the form of co-simulation - coupling the power system simulator DIgSILENT PowerFactory with communication network simulator OMNeT++, and Matlab for EMS applications (state estimation, optimal power flow). Then the framework is used to conduct attack simulations on the co-simulation based platform for a power grid test case. The results indicate how vulnerable of EMS to data attacks and how co-simulation can help assess vulnerability. △ Less

Submitted 28 August, 2017; originally announced August 2017.

Comments: Accepted in 8th IEEE International Conference on Smart Grid Communications (SmartGridComm 2017)

arXiv:1607.05606 [pdf, other]

doi 10.1016/j.joi.2018.06.005

The Memory of Science: Inflation, Myopia, and the Knowledge Network

Authors: Raj K. Pan, Alexander M. Petersen, Fabio Pammolli, Santo Fortunato

Abstract: Science is a growing system, exhibiting ~4% annual growth in publications and ~1.8% annual growth in the number of references per publication. Combined these trends correspond to a 12-year doubling period in the total supply of references, thereby challenging traditional methods of evaluating scientific production, from researchers to institutions. Against this background, we analyzed a citation n… ▽ More Science is a growing system, exhibiting ~4% annual growth in publications and ~1.8% annual growth in the number of references per publication. Combined these trends correspond to a 12-year doubling period in the total supply of references, thereby challenging traditional methods of evaluating scientific production, from researchers to institutions. Against this background, we analyzed a citation network comprised of 837 million references produced by 32.6 million publications over the period 1965-2012, allowing for a temporal analysis of the `attention economy' in science. Unlike previous studies, we analyzed the entire probability distribution of reference ages - the time difference between a citing and cited paper - thereby capturing previously overlooked trends. Over this half-century period we observe a narrowing range of attention - both classic and recent literature are being cited increasingly less, pointing to the important role of socio-technical processes. To better understand the impact of exponential growth on the underlying knowledge network we develop a network-based model, featuring the redirection of scientific attention via publications' reference lists, and validate the model against several empirical benchmarks. We then use the model to test the causal impact of real paradigm shifts, thereby providing guidance for science policy analysis. In particular, we show how perturbations to the growth rate of scientific output affects the reference age distribution and the functionality of the vast science citation network as an aid for the search & retrieval of knowledge. In order to account for the inflation of science, our study points to the need for a systemic overhaul of the counting methods used to evaluate citation impact - especially in the case of evaluating science careers, which can span several decades and thus several doubling periods. △ Less

Submitted 19 July, 2016; originally announced July 2016.

Comments: 17 pages, 8 figures, Supplementary Material available at http://physics.bu.edu/~amp17/webpage_files/MyPapers/pppf_Arxiv_July2016_SI.pdf

Journal ref: Journal of Informetrics 12, 656-678 (2018)

arXiv:1503.01881 [pdf, other]

doi 10.1016/j.joi.2015.07.006

Attention decay in science

Authors: Pietro Della Briotta Parolo, Raj Kumar Pan, Rumi Ghosh, Bernardo A. Huberman, Kimmo Kaski, Santo Fortunato

Abstract: The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typicall… ▽ More The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typically, the citation rate of a paper increases up to a few years after its publication, reaches a peak and then decreases rapidly. This decay can be described by an exponential or a power law behavior, as in ultradiffusive processes, with exponential fitting better than power law for the majority of cases. The decay is also becoming faster over the years, signaling that nowadays papers are forgotten more quickly. However, when time is counted in terms of the number of published papers, the rate of decay of citations is fairly independent of the period considered. This indicates that the attention of scholars depends on the number of published items, and not on real time. △ Less

Submitted 23 November, 2015; v1 submitted 6 March, 2015; originally announced March 2015.

Comments: Published version. 14 pages, 9 Figures,

Journal ref: Journal of Informetrics, 9, 734-745 (2015)

arXiv:1405.7136 [pdf, other]

The Nobel Prize delay

Authors: Francesco Becattini, Arnab Chatterjee, Santo Fortunato, Marija Mitrović, Raj Kumar Pan, Pietro Della Briotta Parolo

Abstract: The time lag between the publication of a Nobel discovery and the conferment of the prize has been rapidly increasing for all disciplines, especially for Physics. Does this mean that fundamental science is running out of groundbreaking discoveries? The time lag between the publication of a Nobel discovery and the conferment of the prize has been rapidly increasing for all disciplines, especially for Physics. Does this mean that fundamental science is running out of groundbreaking discoveries? △ Less

Submitted 28 May, 2014; originally announced May 2014.

Comments: Extended version of Nature 508, 186 (2014) http://www.nature.com/nature/journal/v508/n7495/full/508186a.html ; http://scitation.aip.org/content/aip/magazine/physicstoday/news/10.1063/PT.5.2012

arXiv:1403.1177 [pdf, other]

doi 10.1103/PhysRevE.89.062815

Effects of temporal correlations on cascades: Threshold models on temporal networks

Authors: Ville-Pekka Backlund, Jari Saramäki, Raj Kumar Pan

Abstract: A person's decision to adopt an idea or product is often driven by the decisions of peers, mediated through a network of social ties. A common way of modeling adoption dynamics is to use threshold models, where a node may become an adopter given a high enough rate of contacts with adopted neighbors. We study the dynamics of threshold models that take both the network topology and the timings of co… ▽ More A person's decision to adopt an idea or product is often driven by the decisions of peers, mediated through a network of social ties. A common way of modeling adoption dynamics is to use threshold models, where a node may become an adopter given a high enough rate of contacts with adopted neighbors. We study the dynamics of threshold models that take both the network topology and the timings of contacts into account, using empirical contact sequences as substrates. The models are designed such that adoption is driven by the number of contacts with different adopted neighbors within a chosen time. We find that while some networks support cascades leading to network-level adoption, some do not: the propagation of adoption depends on several factors from the frequency of contacts to burstiness and timing correlations of contact sequences. More specifically, burstiness is seen to suppress cascades sizes when compared to randomised contact timings, while timing correlations between contacts on adjacent links facilitate cascades. △ Less

Submitted 27 June, 2014; v1 submitted 5 March, 2014; originally announced March 2014.

Comments: 9 pages, 7 figures, Published version

Journal ref: Phys. Rev. E 89, 062815 (2014)

arXiv:1312.2650 [pdf, other]

doi 10.1038/srep04880

Author Impact Factor: tracking the dynamics of individual scientific impact

Authors: Raj Kumar Pan, Santo Fortunato

Abstract: The impact factor (IF) of scientific journals has acquired a major role in the evaluations of the output of scholars, departments and whole institutions. Typically papers appearing in journals with large values of the IF receive a high weight in such evaluations. However, at the end of the day one is interested in assessing the impact of individuals, rather than papers. Here we introduce Author Im… ▽ More The impact factor (IF) of scientific journals has acquired a major role in the evaluations of the output of scholars, departments and whole institutions. Typically papers appearing in journals with large values of the IF receive a high weight in such evaluations. However, at the end of the day one is interested in assessing the impact of individuals, rather than papers. Here we introduce Author Impact Factor (AIF), which is the extension of the IF to authors. The AIF of an author A in year $t$ is the average number of citations given by papers published in year $t$ to papers published by A in a period of $Δt$ years before year $t$. Due to its intrinsic dynamic character, AIF is capable to capture trends and variations of the impact of the scientific output of scholars in time, unlike the $h$-index, which is a growing measure taking into account the whole career path. △ Less

Submitted 12 May, 2014; v1 submitted 9 December, 2013; originally announced December 2013.

Comments: Published version. 6 pages, 5 figures + Appendix

Journal ref: Sci. Rep. 4, 4880 (2014)

arXiv:1306.0114 [pdf, other]

doi 10.1038/srep03052

On the Predictability of Future Impact in Science

Authors: Orion Penner, Raj Kumar Pan, Alexander M. Petersen, Kimmo Kaski, Santo Fortunato

Abstract: Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By… ▽ More Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their "predictive power". Moreover, the predictive power of these models depend heavily upon scientists' career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions. △ Less

Submitted 29 October, 2013; v1 submitted 1 June, 2013; originally announced June 2013.

Comments: Published version, 8 pages, 5 figures + Appendix

Journal ref: Scientific Reports 3, 3052 (2013)

arXiv:1304.0627 [pdf, other]

doi 10.1063/PT.3.1928

The case for caution in predicting scientists' future impact

Authors: Orion Penner, Raj K. Pan, Alexander M. Petersen, Santo Fortunato

Abstract: We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career mea… ▽ More We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career measures taken at career age t. Here we investigate how the "predictability" depends on the aggregation of career data across multiple age cohorts. We confirm that the Acuna model does a respectable job of predicting h(t+Δt) up to roughly 6 years into the future when aggregating all age cohorts together. However, when calculated using subsets of specific age cohorts (e.g. using data for only t=3), we find that the model's predictive power significantly decreases, especially when applied to early career years. For young careers, the model does a much worse job of predicting future impact, and hence, exposes a serious limitation. The limitation is particularly concerning as early career decisions make up a significant portion, if not the majority, of cases where quantitative approaches are likely to be applied. △ Less

Submitted 2 April, 2013; originally announced April 2013.

Comments: 2 pages, 1 figure

Journal ref: Physics Today 66, 8-9 (2013)

arXiv:1303.7274 [pdf, other]

doi 10.1073/pnas.1323111111

Reputation and Impact in Academic Careers

Authors: Alexander M. Petersen, Santo Fortunato, Raj K. Pan, Kimmo Kaski, Orion Penner, Armando Rungi, Massimo Riccaboni, H. Eugene Stanley, Fabio Pammolli

Abstract: Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original… ▽ More Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate $Δc$ depends on the reputation of its central author $i$, in addition to its net citation count $c$. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations $C_{i}$ of each scientist as his/her reputation measure. We find a citation crossover $c_{\times}$ which distinguishes the strength of the reputation effect. For publications with $c < c_{\times}$, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in $C_{i}$. However, the reputation effect becomes negligible for highly cited publications meaning that for $c\geq c_{\times}$ the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science. △ Less

Submitted 7 October, 2014; v1 submitted 28 March, 2013; originally announced March 2013.

Comments: Final published version of the main manuscript including additional analysis: 9 pages, 4 figures, 1 table, and full reference list, including those in the Supplementary Information. For the SI Appendix, see http://physics.bu.edu/~amp17/webpage_files/MyPapers/Reputation_SI.pdf

Journal ref: Proceedings of the National Academy of Sciences 111, 15316-15321 (2014)

arXiv:1209.0781 [pdf, other]

doi 10.1038/srep00902

World citation and collaboration networks: uncovering the role of geography in science

Authors: Raj Kumar Pan, Kimmo Kaski, Santo Fortunato

Abstract: Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analy… ▽ More Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analysis of citation and collaboration networks between cities and countries, by assigning papers to the geographic locations of their authors' affiliations. The citation flows as well as the collaboration strengths between cities decrease with the distance between them and follow gravity laws. In addition, the total research impact of a country grows linearly with the amount of national funding for research & development. However, the average impact reveals a peculiar threshold effect: the scientific output of a country may reach an impact larger than the world average only if the country invests more than about 100,000 USD per researcher annually. △ Less

Submitted 17 December, 2012; v1 submitted 4 September, 2012; originally announced September 2012.

Comments: Published version. 9 pages, 5 figures + Appendix, The world citation and collaboration networks at both city and country level are available at http://becs.aalto.fi/~rajkp/datasets.html

Journal ref: Scientific Reports 2, 902 (2012)

arXiv:1206.0108 [pdf, other]

doi 10.1038/srep00551

The evolution of interdisciplinarity in physics research

Authors: Raj Kumar Pan, Sitabhra Sinha, Kimmo Kaski, Jari Saramäki

Abstract: Science, being a social enterprise, is subject to fragmentation into groups that focus on specialized areas or topics. Often new advances occur through cross-fertilization of ideas between sub-fields that otherwise have little overlap as they study dissimilar phenomena using different techniques. Thus to explore the nature and dynamics of scientific progress one needs to consider the large-scale o… ▽ More Science, being a social enterprise, is subject to fragmentation into groups that focus on specialized areas or topics. Often new advances occur through cross-fertilization of ideas between sub-fields that otherwise have little overlap as they study dissimilar phenomena using different techniques. Thus to explore the nature and dynamics of scientific progress one needs to consider the large-scale organization and interactions between different subject areas. Here, we study the relationships between the sub-fields of Physics using the Physics and Astronomy Classification Scheme (PACS) codes employed for self-categorization of articles published over the past 25 years (1985-2009). We observe a clear trend towards increasing interactions between the different sub-fields. The network of sub-fields also exhibits core-periphery organization, the nucleus being dominated by Condensed Matter and General Physics. However, over time Interdisciplinary Physics is steadily increasing its share in the network core, reflecting a shift in the overall trend of Physics research. △ Less

Submitted 16 August, 2012; v1 submitted 1 June, 2012; originally announced June 2012.

Comments: Published version, 10 pages, 8 figures + Supplementary Information

Journal ref: Scientific Reports 2, 551 (2012)

arXiv:1112.4312 [pdf, other]

doi 10.1088/1742-5468/2012/03/P03005

Multiscale Analysis of Spreading in a Large Communication Network

Authors: Mikko Kivelä, Raj Kumar Pan, Kimmo Kaski, János Kertész, Jari Saramäki, Márton Karsai

Abstract: In temporal networks, both the topology of the underlying network and the timings of interaction events can be crucial in determining how some dynamic process mediated by the network unfolds. We have explored the limiting case of the speed of spreading in the SI model, set up such that an event between an infectious and susceptible individual always transmits the infection. The speed of this proce… ▽ More In temporal networks, both the topology of the underlying network and the timings of interaction events can be crucial in determining how some dynamic process mediated by the network unfolds. We have explored the limiting case of the speed of spreading in the SI model, set up such that an event between an infectious and susceptible individual always transmits the infection. The speed of this process sets an upper bound for the speed of any dynamic process that is mediated through the interaction events of the network. With the help of temporal networks derived from large scale time-stamped data on mobile phone calls, we extend earlier results that point out the slowing-down effects of burstiness and temporal inhomogeneities. In such networks, links are not permanently active, but dynamic processes are mediated by recurrent events taking place on the links at specific points in time. We perform a multi-scale analysis and pinpoint the importance of the timings of event sequences on individual links, their correlations with neighboring sequences, and the temporal pathways taken by the network-scale spreading process. This is achieved by studying empirically and analytically different characteristic relay times of links, relevant to the respective scales, and a set of temporal reference models that allow for removing selected time-domain correlations one by one. △ Less

Submitted 19 December, 2011; originally announced December 2011.

Journal ref: J. Stat. Mech. (2012) P03005

arXiv:1106.5249 [pdf, ps, other]

doi 10.1209/0295-5075/97/18007

The strength of strong ties in scientific collaboration networks

Authors: Raj Kumar Pan, Jari Saramäki

Abstract: Network topology and its relationship to tie strengths may hinder or enhance the spreading of information in social networks. We study the correlations between tie strengths and topology in networks of scientific collaboration, and show that these are very different from ordinary social networks. For the latter, it has earlier been shown that strong ties are associated with dense network neighborh… ▽ More Network topology and its relationship to tie strengths may hinder or enhance the spreading of information in social networks. We study the correlations between tie strengths and topology in networks of scientific collaboration, and show that these are very different from ordinary social networks. For the latter, it has earlier been shown that strong ties are associated with dense network neighborhoods, while weaker ties act as bridges between these. Because of this, weak links act as bottlenecks for the diffusion of information. We show that on the contrary, in co-authorship networks dense local neighborhoods mainly consist of weak links, whereas strong links are more important for overall connectivity. The important role of strong links is further highlighted in simulations of information spreading, where their topological position is seen to dramatically speed up spreading dynamics. Thus, in contrast to ordinary social networks, weight-topology correlations enhance the flow of information across scientific collaboration networks. △ Less

Submitted 11 January, 2012; v1 submitted 26 June, 2011; originally announced June 2011.

Comments: 6 Pages, 6 Figures, Published version, Minor changes, Results also verified using new weight-scheme

Journal ref: Europhys. Lett. 97, 18007 (2012)

arXiv:1106.0288 [pdf, other]

doi 10.1371/journal.pone.0022687

Emergence of Bursts and Communities in Evolving Weighted Networks

Authors: Hang-Hyun Jo, Raj Kumar Pan, Kimmo Kaski

Abstract: Understanding the patterns of human dynamics and social interaction, and the way they lead to the formation of an organized and functional society are important issues especially for techno-social development. Addressing these issues of social networks has recently become possible through large scale data analysis of e.g. mobile phone call records, which has revealed the existence of modular or co… ▽ More Understanding the patterns of human dynamics and social interaction, and the way they lead to the formation of an organized and functional society are important issues especially for techno-social development. Addressing these issues of social networks has recently become possible through large scale data analysis of e.g. mobile phone call records, which has revealed the existence of modular or community structure with many links between nodes of the same community and relatively few links between nodes of different communities. The weights of links, e.g. the number of calls between two users, and the network topology are found correlated such that intra-community links are stronger compared to the weak inter-community links. This is known as Granovetter's "The strength of weak ties" hypothesis. In addition to this inhomogeneous community structure, the temporal patterns of human dynamics turn out to be inhomogeneous or bursty, characterized by the heavy tailed distribution of inter-event time between two consecutive events. In this paper, we study how the community structure and the bursty dynamics emerge together in an evolving weighted network model. The principal mechanisms behind these patterns are social interaction by cyclic closure, i.e. links to friends of friends and the focal closure, i.e. links to individuals sharing similar attributes or interests, and human dynamics by task handling process. These three mechanisms have been implemented as a network model with local attachment, global attachment, and priority-based queuing processes. By comprehensive numerical simulations we show that the interplay of these mechanisms leads to the emergence of heavy tailed inter-event time distribution and the evolution of Granovetter-type community structure. Moreover, the numerical results are found to be in qualitative agreement with empirical results from mobile phone call dataset. △ Less

Submitted 1 June, 2011; originally announced June 2011.

Comments: 9 pages, 6 figures

Journal ref: PLoS ONE 6(8): e22687 (2011)

arXiv:1101.5913 [pdf, other]

doi 10.1103/PhysRevE.84.016105

Path lengths, correlations, and centrality in temporal networks

Authors: Raj Kumar Pan, Jari Saramäki

Abstract: In temporal networks, where nodes interact via sequences of temporary events, information or resources can only flow through paths that follow the time-ordering of events. Such temporal paths play a crucial role in dynamic processes. However, since networks have so far been usually considered static or quasi-static, the properties of temporal paths are not yet well understood. Building on a defini… ▽ More In temporal networks, where nodes interact via sequences of temporary events, information or resources can only flow through paths that follow the time-ordering of events. Such temporal paths play a crucial role in dynamic processes. However, since networks have so far been usually considered static or quasi-static, the properties of temporal paths are not yet well understood. Building on a definition and algorithmic implementation of the average temporal distance between nodes, we study temporal paths in empirical networks of human communication and air transport. Although temporal distances correlate with static graph distances, there is a large spread, and nodes that appear close from the static network view may be connected via slow paths or not at all. Differences between static and temporal properties are further highlighted in studies of the temporal closeness centrality. In addition, correlations and heterogeneities in the underlying event sequences affect temporal path lengths, increasing temporal distances in communication networks and decreasing them in the air transport network. △ Less

Submitted 19 July, 2011; v1 submitted 31 January, 2011; originally announced January 2011.

Comments: 10 pages, 8 figures, Published version

Journal ref: Phys. Rev. E 84, 016105 (2011)

arXiv:1010.3171 [pdf, other]

doi 10.1103/PhysRevE.83.046112

Using explosive percolation in analysis of real-world networks

Authors: Raj Kumar Pan, Mikko Kivelä, Jari Saramäki, Kimmo Kaski, János Kertész

Abstract: We apply a variant of the explosive percolation procedure to large real-world networks, and show with finite-size scaling that the university class, ordinary or explosive, of the resulting percolation transition depends on the structural properties of the network as well as the number of unoccupied links considered for comparison in our procedure. We observe that in our social networks, the percol… ▽ More We apply a variant of the explosive percolation procedure to large real-world networks, and show with finite-size scaling that the university class, ordinary or explosive, of the resulting percolation transition depends on the structural properties of the network as well as the number of unoccupied links considered for comparison in our procedure. We observe that in our social networks, the percolation clusters close to the critical point are related to the community structure. This relationship is further highlighted by applying the procedure to model networks with pre-defined communities. △ Less

Submitted 18 April, 2011; v1 submitted 15 October, 2010; originally announced October 2010.

Comments: 6 pages, 4 figures. Published version. Elongated to include the results and figures of finite-size scaling and modularity analysis

Journal ref: Phys. Rev. E 83, 046112 (2011)

arXiv:1006.2125 [pdf, ps, other]

doi 10.1103/PhysRevE.83.025102

Small But Slow World: How Network Topology and Burstiness Slow Down Spreading

Authors: M. Karsai, M. Kivelä, R. K. Pan, K. Kaski, J. Kertész, A. -L. Barabási, J. Saramäki

Abstract: Communication networks show the small-world property of short paths, but the spreading dynamics in them turns out slow. We follow the time evolution of information propagation through communication networks by using the SI model with empirical data on contact sequences. We introduce null models where the sequences are randomly shuffled in different ways, enabling us to distinguish between the cont… ▽ More Communication networks show the small-world property of short paths, but the spreading dynamics in them turns out slow. We follow the time evolution of information propagation through communication networks by using the SI model with empirical data on contact sequences. We introduce null models where the sequences are randomly shuffled in different ways, enabling us to distinguish between the contributions of different impeding effects. The slowing down of spreading is found to be caused mostly by weight-topology correlations and the bursty activity patterns of individuals. △ Less

Submitted 22 August, 2010; v1 submitted 10 June, 2010; originally announced June 2010.

Journal ref: Phys. Rev. E 83, 025102(R) (2011)

arXiv:1005.4997 [pdf, ps, other]

doi 10.1016/j.csl.2010.05.007

Network analysis of a corpus of undeciphered Indus civilization inscriptions indicates syntactic organization

Authors: Sitabhra Sinha, Md Izhar Ashraf, Raj Kumar Pan, Bryan Kenneth Wells

Abstract: Archaeological excavations in the sites of the Indus Valley civilization (2500-1900 BCE) in Pakistan and northwestern India have unearthed a large number of artifacts with inscriptions made up of hundreds of distinct signs. To date there is no generally accepted decipherment of these sign sequences and there have been suggestions that the signs could be non-linguistic. Here we apply complex networ… ▽ More Archaeological excavations in the sites of the Indus Valley civilization (2500-1900 BCE) in Pakistan and northwestern India have unearthed a large number of artifacts with inscriptions made up of hundreds of distinct signs. To date there is no generally accepted decipherment of these sign sequences and there have been suggestions that the signs could be non-linguistic. Here we apply complex network analysis techniques to a database of available Indus inscriptions, with the aim of detecting patterns indicative of syntactic organization. Our results show the presence of patterns, e.g., recursive structures in the segmentation trees of the sequences, that suggest the existence of a grammar underlying these inscriptions. △ Less

Submitted 27 May, 2010; originally announced May 2010.

Comments: 17 pages (includes 4 page appendix containing Indus sign list), 14 figures

Journal ref: Computer Speech and Language, 25 (2011) 639-654

Showing 1–42 of 42 results for author: Pan, K