-
Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models
Authors:
Wentian Wang,
Paul Kantor,
Jacob Feldman,
Lazaros Gallos,
Hao Wang
Abstract:
We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms. We reasoned that an agent that ``truly'' understands a concept can still evaluate it when key terms are replaced by suitably defined alternate terms, and sought to differentiate such comprehension…
▽ More
We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms. We reasoned that an agent that ``truly'' understands a concept can still evaluate it when key terms are replaced by suitably defined alternate terms, and sought to differentiate such comprehension from mere text replacement. In our study, we modified standardized test questions by replacing a key term with a dummy word along with its definition. The key term could be in the context of questions, answers, or both questions and answers.
Notwithstanding the high scores achieved by recent popular LLMs on the MMLU leaderboard, we found a substantial reduction in model performance after such replacement, suggesting poor comprehension. This new benchmark provides a rigorous benchmark for testing true model comprehension, and poses a challenge to the broader scientific community.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Simple and efficient self-healing strategy for damaged complex networks
Authors:
Lazaros K. Gallos,
Nina H. Fefferman
Abstract:
The process of destroying a complex network through node removal has been the subject of extensive interest and research. Node loss typically leaves the network disintegrated into many small and isolated clusters. Here we show that these clusters typically remain close to each other and we suggest a simple algorithm that is able to reverse the inflicted damage by restoring the network's functional…
▽ More
The process of destroying a complex network through node removal has been the subject of extensive interest and research. Node loss typically leaves the network disintegrated into many small and isolated clusters. Here we show that these clusters typically remain close to each other and we suggest a simple algorithm that is able to reverse the inflicted damage by restoring the network's functionality. After damage, each node decides independently whether to create a new link depending on the fraction of neighbors it has lost. In addition to relying only on local information, where nodes do not need knowledge of the global network status, we impose the additional constraint that new links should be as short as possible (i.e. that the new edge completes a shortest possible new cycle). We demonstrate that this self-healing method operates very efficiently, both in model and real networks. For example, after removing the most connected airports in USA, the self-healing algorithm re-joined almost 90\% of the surviving airports.
△ Less
Submitted 20 November, 2015;
originally announced November 2015.
-
Revealing effective classifiers through network comparison
Authors:
Lazaros K. Gallos,
Nina H. Fefferman
Abstract:
The ability to compare complex systems can provide new insight into the fundamental nature of the processes captured in ways that are otherwise inaccessible to observation. Here, we introduce the $n$-tangle method to directly compare two networks for structural similarity, based on the distribution of edge density in network subgraphs. We demonstrate that this method can efficiently introduce comp…
▽ More
The ability to compare complex systems can provide new insight into the fundamental nature of the processes captured in ways that are otherwise inaccessible to observation. Here, we introduce the $n$-tangle method to directly compare two networks for structural similarity, based on the distribution of edge density in network subgraphs. We demonstrate that this method can efficiently introduce comparative analysis into network science and opens the road for many new applications. For example, we show how the construction of a phylogenetic tree across animal taxa according to their social structure can reveal commonalities in the behavioral ecology of the populations, or how students create similar networks according to the University size. Our method can be expanded to study a multitude of additional properties, such as network classification, changes during time evolution, convergence of growth models, and detection of structural changes during damage.
△ Less
Submitted 26 November, 2014; v1 submitted 11 March, 2014;
originally announced March 2014.
-
IMDB network revisited: unveiling fractal and modular properties from a typical small-world network
Authors:
Lazaros K. Gallos,
Fabricio Q. Potiguar,
José S. Andrade Jr.,
Hernan A. Makse
Abstract:
We study a subset of the movie collaboration network, imdb.com, where only adult movies are included. We show that there are many benefits in using such a network, which can serve as a prototype for studying social interactions. We find that the strength of links, i.e., how many times two actors have collaborated with each other, is an important factor that can significantly influence the network…
▽ More
We study a subset of the movie collaboration network, imdb.com, where only adult movies are included. We show that there are many benefits in using such a network, which can serve as a prototype for studying social interactions. We find that the strength of links, i.e., how many times two actors have collaborated with each other, is an important factor that can significantly influence the network topology. We see that when we link all actors in the same movie with each other, the network becomes small-world, lacking a proper modular structure. On the other hand, by imposing a threshold on the minimum number of links two actors should have to be in our studied subset, the network topology becomes naturally fractal. This occurs due to a large number of meaningless links, namely, links connecting actors that did not actually interact. We focus our analysis on the fractal and modular properties of this resulting network, and show that the renormalization group analysis can characterize the self-similar structure of these networks.
△ Less
Submitted 7 May, 2013; v1 submitted 6 May, 2013;
originally announced May 2013.
-
Collective behavior in the spatial spreading of obesity
Authors:
Lazaros K. Gallos,
Pablo Barttfeld,
Shlomo Havlin,
Mariano Sigman,
Hernan A. Makse
Abstract:
Non-communicable diseases like diabetes, obesity and certain forms of cancer have been increasing in many countries at alarming levels. A difficulty in the conception of policies to reverse these trends is the identification of the drivers behind the global epidemics. Here, we implement a spatial spreading analysis to investigate whether diabetes, obesity and cancer show spatial correlations revea…
▽ More
Non-communicable diseases like diabetes, obesity and certain forms of cancer have been increasing in many countries at alarming levels. A difficulty in the conception of policies to reverse these trends is the identification of the drivers behind the global epidemics. Here, we implement a spatial spreading analysis to investigate whether diabetes, obesity and cancer show spatial correlations revealing the effect of collective and global factors acting above individual choices. We adapt a theoretical framework for critical physical systems displaying collective behavior to decipher the laws of spatial spreading of diseases. We find a regularity in the spatial fluctuations of their prevalence revealed by a pattern of scale-free long-range correlations. The fluctuations are anomalous, deviating in a fundamental way from the weaker correlations found in the underlying population distribution. This collective behavior indicates that the spreading dynamics of obesity, diabetes and some forms of cancer like lung cancer are analogous to a critical point of fluctuations, just as a physical system in a second-order phase transition. According to this notion, individual interactions and habits may have negligible influence in shaping the global patterns of spreading. Thus, obesity turns out to be a global problem where local details are of little importance. Interestingly, we find the same critical fluctuations in obesity and diabetes, and in the activities of economic sectors associated with food production such as supermarkets, food and beverage stores--- which cluster in a different universality class than other generic sectors of the economy. These results motivate future interventions to investigate the causality of this relation providing guidance for the implementation of preventive health policies.
△ Less
Submitted 27 March, 2016; v1 submitted 28 February, 2012;
originally announced February 2012.
-
How people interact in evolving online affiliation networks
Authors:
Lazaros K. Gallos,
Diego Rybski,
Fredrik Liljeros,
Shlomo Havlin,
Hernan A. Makse
Abstract:
The study of human interactions is of central importance for understanding the behavior of individuals, groups and societies. Here, we observe the formation and evolution of networks by monitoring the addition of all new links and we analyze quantitatively the tendencies used to create ties in these evolving online affiliation networks. We first show that an accurate estimation of these probabilis…
▽ More
The study of human interactions is of central importance for understanding the behavior of individuals, groups and societies. Here, we observe the formation and evolution of networks by monitoring the addition of all new links and we analyze quantitatively the tendencies used to create ties in these evolving online affiliation networks. We first show that an accurate estimation of these probabilistic tendencies can only be achieved by following the time evolution of the network. For example, actions that are attributed to the usual friend of a friend mechanism through a static snapshot of the network are overestimated by a factor of two. A detailed analysis of the dynamic network evolution shows that half of those triangles were generated through other mechanisms, in spite of the characteristic static pattern. We start by characterizing every single link when the tie was established in the network. This allows us to describe the probabilistic tendencies of tie formation and extract sociological conclusions as follows. The tendencies to add new links differ significantly from what we would expect if they were not affected by the individuals' structural position in the network, i.e., from random link formation. We also find significant differences in behavioral traits among individuals according to their degree of activity, gender, age, popularity and other attributes. For instance, in the particular datasets analyzed here, we find that women reciprocate connections three times as much as men and this difference increases with age. Men tend to connect with the most popular people more often than women across all ages. On the other hand, triangular ties tendencies are similar and independent of gender. Our findings can be useful to build models of realistic social network structures and discover the underlying laws that govern establishment of ties in evolving social networks.
△ Less
Submitted 23 November, 2011;
originally announced November 2011.
-
A small-world of weak ties provides optimal global integration of self-similar modules in functional brain networks
Authors:
Lazaros K. Gallos,
Hernan A. Makse,
Mariano Sigman
Abstract:
The human brain is organized in functional modules. Such an organization presents a basic conundrum: modules ought to be sufficiently independent to guarantee functional specialization and sufficiently connected to bind multiple processors for efficient information transfer. It is commonly accepted that small-world architecture of short lengths and large local clustering may solve this problem. Ho…
▽ More
The human brain is organized in functional modules. Such an organization presents a basic conundrum: modules ought to be sufficiently independent to guarantee functional specialization and sufficiently connected to bind multiple processors for efficient information transfer. It is commonly accepted that small-world architecture of short lengths and large local clustering may solve this problem. However, there is intrinsic tension between shortcuts generating small-worlds and the persistence of modularity; a global property unrelated to local clustering. Here, we present a possible solution to this puzzle. We first show that a modified percolation theory can define a set of hierarchically organized modules made of strong links in functional brain networks. These modules are "large-world" self-similar structures and, therefore, are far from being small-world. However, incorporating weaker ties to the network converts it into a small-world preserving an underlying backbone of well-defined modules. Remarkably, weak ties are precisely organized as predicted by theory maximizing information transfer with minimal wiring cost. This trade-off architecture is reminiscent of the "strength of weak ties" crucial concept of social networks. Such a design suggests a natural solution to the paradox of efficient information flow in the highly modular structure of the brain.
△ Less
Submitted 11 May, 2012; v1 submitted 3 February, 2011;
originally announced February 2011.