-
AI and Social Theory
Authors:
Jakob Mokander,
Ralph Schroeder
Abstract:
In this paper, we sketch a programme for AI driven social theory. We begin by defining what we mean by artificial intelligence (AI) in this context. We then lay out our model for how AI based models can draw on the growing availability of digital data to help test the validity of different social theories based on their predictive power. In doing so, we use the work of Randall Collins and his stat…
▽ More
In this paper, we sketch a programme for AI driven social theory. We begin by defining what we mean by artificial intelligence (AI) in this context. We then lay out our model for how AI based models can draw on the growing availability of digital data to help test the validity of different social theories based on their predictive power. In doing so, we use the work of Randall Collins and his state breakdown model to exemplify that, already today, AI based models can help synthesize knowledge from a variety of sources, reason about the world, and apply what is known across a wide range of problems in a systematic way. However, we also find that AI driven social theory remains subject to a range of practical, technical, and epistemological limitations. Most critically, existing AI systems lack three essential capabilities needed to advance social theory in ways that are cumulative, holistic, open-ended, and purposeful. These are (1) semanticization, i.e., the ability to develop and operationalize verbal concepts to represent machine-manipulable knowledge, (2) transferability, i.e., the ability to transfer what has been learned in one context to another, and (3) generativity, i.e., the ability to independently create and improve on concepts and models. We argue that if the gaps identified here are addressed by further research, there is no reason why, in the future, the most advanced programme in social theory should not be led by AI-driven cumulative advances.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Artificial intelligence, rationalization, and the limits of control in the public sector: the case of tax policy optimization
Authors:
Jakob Mokander,
Ralph Schroeder
Abstract:
The use of artificial intelligence (AI) in the public sector is best understood as a continuation and intensification of long standing rationalization and bureaucratization processes. Drawing on Weber, we take the core of these processes to be the replacement of traditions with instrumental rationality, i.e., the most calculable and efficient way of achieving any given policy objective. In this ar…
▽ More
The use of artificial intelligence (AI) in the public sector is best understood as a continuation and intensification of long standing rationalization and bureaucratization processes. Drawing on Weber, we take the core of these processes to be the replacement of traditions with instrumental rationality, i.e., the most calculable and efficient way of achieving any given policy objective. In this article, we demonstrate how much of the criticisms, both among the public and in scholarship, directed towards AI systems spring from well known tensions at the heart of Weberian rationalization. To illustrate this point, we introduce a thought experiment whereby AI systems are used to optimize tax policy to advance a specific normative end, reducing economic inequality. Our analysis shows that building a machine-like tax system that promotes social and economic equality is possible. However, it also highlights that AI driven policy optimization (i) comes at the exclusion of other competing political values, (ii) overrides citizens sense of their noninstrumental obligations to each other, and (iii) undermines the notion of humans as self-determining beings. Contemporary scholarship and advocacy directed towards ensuring that AI systems are legal, ethical, and safe build on and reinforce central assumptions that underpin the process of rationalization, including the modern idea that science can sweep away oppressive systems and replace them with a rule of reason that would rescue humans from moral injustices. That is overly optimistic. Science can only provide the means, they cannot dictate the ends. Nonetheless, the use of AI in the public sector can also benefit the institutions and processes of liberal democracies. Most importantly, AI driven policy optimization demands that normative ends are made explicit and formalized, thereby subjecting them to public scrutiny and debate.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
Authors:
Manuel Tonneau,
Diyi Liu,
Samuel Fraiberger,
Ralph Schroeder,
Scott A. Hale,
Paul Röttger
Abstract:
Perceptions of hate can vary greatly across cultural contexts. Hate speech (HS) datasets, however, have traditionally been developed by language. This hides potential cultural biases, as one language may be spoken in different countries home to different cultures. In this work, we evaluate cultural bias in HS datasets by leveraging two interrelated cultural proxies: language and geography. We cond…
▽ More
Perceptions of hate can vary greatly across cultural contexts. Hate speech (HS) datasets, however, have traditionally been developed by language. This hides potential cultural biases, as one language may be spoken in different countries home to different cultures. In this work, we evaluate cultural bias in HS datasets by leveraging two interrelated cultural proxies: language and geography. We conduct a systematic survey of HS datasets in eight languages and confirm past findings on their English-language bias, but also show that this bias has been steadily decreasing in the past few years. For three geographically-widespread languages -- English, Arabic and Spanish -- we then leverage geographical metadata from tweets to approximate geo-cultural contexts by pairing language and country information. We find that HS datasets for these languages exhibit a strong geo-cultural bias, largely overrepresenting a handful of countries (e.g., US and UK for English) relative to their prominence in both the broader social media population and the general population speaking these languages. Based on these findings, we formulate recommendations for the creation of future HS datasets.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Local elimination in the traveling salesman problem
Authors:
William Cook,
Keld Helsgaun,
Stefan Hougardy,
Rasmus T. Schroeder
Abstract:
Hougardy and Schroeder (WG 2014) proposed a combinatorial technique for pruning the search space in the traveling salesman problem, establishing that, for a given instance, certain edges cannot be present in any optimal tour. We describe an implementation of their technique, employing an exact TSP solver to locate k-opt moves in the elimination process. In our computational study, we combine LP re…
▽ More
Hougardy and Schroeder (WG 2014) proposed a combinatorial technique for pruning the search space in the traveling salesman problem, establishing that, for a given instance, certain edges cannot be present in any optimal tour. We describe an implementation of their technique, employing an exact TSP solver to locate k-opt moves in the elimination process. In our computational study, we combine LP reduced-cost elimination together with the new combinatorial algorithm. We report results on a set of geometric instances, with the number of points n ranging from 3,038 up to 115,475. The test set includes all TSPLIB instances having at least 3,000 points, together with 250 randomly generated instances, each with 10,000 points, and three currently unsolved instances having 100,000 or more points. In all but two of the test instances, the complete-graph edge sets were reduced to under 3n edges. For the three large unsolved instances, repeated runs of the elimination process reduced the graphs to under 2.5n edges.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
SYNTA: A novel approach for deep learning-based image analysis in muscle histopathology using photo-realistic synthetic data
Authors:
Leonid Mill,
Oliver Aust,
Jochen A. Ackermann,
Philipp Burger,
Monica Pascual,
Katrin Palumbo-Zerr,
Gerhard Krönke,
Stefan Uderhardt,
Georg Schett,
Christoph S. Clemen,
Rolf Schröder,
Christian Holtzhausen,
Samir Jabari,
Andreas Maier,
Anika Grüneboom
Abstract:
Artificial intelligence (AI), machine learning, and deep learning (DL) methods are becoming increasingly important in the field of biomedical image analysis. However, to exploit the full potential of such methods, a representative number of experimentally acquired images containing a significant number of manually annotated objects is needed as training data. Here we introduce SYNTA (synthetic dat…
▽ More
Artificial intelligence (AI), machine learning, and deep learning (DL) methods are becoming increasingly important in the field of biomedical image analysis. However, to exploit the full potential of such methods, a representative number of experimentally acquired images containing a significant number of manually annotated objects is needed as training data. Here we introduce SYNTA (synthetic data) as a novel approach for the generation of synthetic, photo-realistic, and highly complex biomedical images as training data for DL systems. We show the versatility of our approach in the context of muscle fiber and connective tissue analysis in histological sections. We demonstrate that it is possible to perform robust and expert-level segmentation tasks on previously unseen real-world data, without the need for manual annotations using synthetic training data alone. Being a fully parametric technique, our approach poses an interpretable and controllable alternative to Generative Adversarial Networks (GANs) and has the potential to significantly accelerate quantitative image analysis in a variety of biomedical applications in microscopy and beyond.
△ Less
Submitted 3 January, 2024; v1 submitted 29 July, 2022;
originally announced July 2022.
-
Channel Estimation and Hybrid Architectures for RIS-Assisted Communications
Authors:
Jiguang He,
Nhan Thanh Nguyen,
Rafaela Schroeder,
Visa Tapio,
Joonas Kokkoniemi,
Markku Juntti
Abstract:
Reconfigurable intelligent surfaces (RISs) are considered as potential technologies for the upcoming sixth-generation (6G) wireless communication system. Various benefits brought by deploying one or multiple RISs include increased spectrum and energy efficiency, enhanced connectivity, extended communication coverage, reduced complexity at transceivers, and even improved localization accuracy. Howe…
▽ More
Reconfigurable intelligent surfaces (RISs) are considered as potential technologies for the upcoming sixth-generation (6G) wireless communication system. Various benefits brought by deploying one or multiple RISs include increased spectrum and energy efficiency, enhanced connectivity, extended communication coverage, reduced complexity at transceivers, and even improved localization accuracy. However, to unleash their full potential, fundamentals related to RISs, ranging from physical-layer (PHY) modelling to RIS phase control, need to be addressed thoroughly. In this paper, we provide an overview of some timely research problems related to the RIS technology, i.e., PHY modelling (including also physics), channel estimation, potential RIS architectures, and RIS phase control (via both model-based and data-driven approaches), along with recent numerical results. We envision that more efforts will be devoted towards intelligent wireless environments, enabled by RISs.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Authors:
Scott A. Hale,
Taha Yasseri,
Josh Cowls,
Eric T. Meyer,
Ralph Schroeder,
Helen Margetts
Abstract:
This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time.…
▽ More
This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research.
△ Less
Submitted 12 May, 2014;
originally announced May 2014.
-
Edge Elimination in TSP Instances
Authors:
Stefan Hougardy,
Rasmus T. Schroeder
Abstract:
The Traveling Salesman Problem is one of the best studied NP-hard problems in combinatorial optimization. Powerful methods have been developed over the last 60 years to find optimum solutions to large TSP instances. The largest TSP instance so far that has been solved optimally has 85,900 vertices. Its solution required more than 136 years of total CPU time using the branch-and-cut based Concorde…
▽ More
The Traveling Salesman Problem is one of the best studied NP-hard problems in combinatorial optimization. Powerful methods have been developed over the last 60 years to find optimum solutions to large TSP instances. The largest TSP instance so far that has been solved optimally has 85,900 vertices. Its solution required more than 136 years of total CPU time using the branch-and-cut based Concorde TSP code [1]. In this paper we present graph theoretic results that allow to prove that some edges of a TSP instance cannot occur in any optimum TSP tour. Based on these results we propose a combinatorial algorithm to identify such edges. The runtime of the main part of our algorithm is $O(n^2 \log n)$ for an n-vertex TSP instance. By combining our approach with the Concorde TSP solver we are able to solve a large TSPLIB instance more than 11 times faster than Concorde alone.
△ Less
Submitted 28 February, 2014;
originally announced February 2014.
-
MANCaLog: A Logic for Multi-Attribute Network Cascades (Technical Report)
Authors:
Paulo Shakarian,
Gerardo I. Simari,
Robert Schroeder
Abstract:
The modeling of cascade processes in multi-agent systems in the form of complex networks has in recent years become an important topic of study due to its many applications: the adoption of commercial products, spread of disease, the diffusion of an idea, etc. In this paper, we begin by identifying a desiderata of seven properties that a framework for modeling such processes should satisfy: the ab…
▽ More
The modeling of cascade processes in multi-agent systems in the form of complex networks has in recent years become an important topic of study due to its many applications: the adoption of commercial products, spread of disease, the diffusion of an idea, etc. In this paper, we begin by identifying a desiderata of seven properties that a framework for modeling such processes should satisfy: the ability to represent attributes of both nodes and edges, an explicit representation of time, the ability to represent non-Markovian temporal relationships, representation of uncertain information, the ability to represent competing cascades, allowance of non-monotonic diffusion, and computational tractability. We then present the MANCaLog language, a formalism based on logic programming that satisfies all these desiderata, and focus on algorithms for finding minimal models (from which the outcome of cascades can be obtained) as well as how this formalism can be applied in real world scenarios. We are not aware of any other formalism in the literature that meets all of the above requirements.
△ Less
Submitted 18 January, 2013; v1 submitted 2 January, 2013;
originally announced January 2013.
-
Affinity-based XML Fragmentation
Authors:
Rebeca Schroeder,
Ronaldo Santos Mello,
Carmem Satie Hara
Abstract:
In this paper we tackle the fragmentation problem for highly distributed databases. In such an environment, a suitable fragmentation strategy may provide scalability and availability by minimizing distributed transactions. We propose an approach for XML fragmentation that takes as input both the application's expected workload and a storage threshold, and produces as output an XML fragmentation sc…
▽ More
In this paper we tackle the fragmentation problem for highly distributed databases. In such an environment, a suitable fragmentation strategy may provide scalability and availability by minimizing distributed transactions. We propose an approach for XML fragmentation that takes as input both the application's expected workload and a storage threshold, and produces as output an XML fragmentation schema. Our workload-aware method aims to minimize the execution of distributed transactions by packing up related data in a small set of fragments. We present experiments that compare alternative fragmentation schemas, showing that the one produced by our technique provides a finer-grained result and better system throughput.
△ Less
Submitted 24 April, 2013; v1 submitted 23 October, 2012;
originally announced October 2012.
-
Untangling the Web of E-Research: Towards a Sociology of Online Knowledge
Authors:
Eric T. Meyer,
Ralph Schroeder
Abstract:
e-Research is a rapidly growing research area, both in terms of publications and in terms of funding. In this article we argue that it is necessary to reconceptualize the ways in which we seek to measure and understand e-Research by developing a sociology of knowledge based on our understanding of how science has been transformed historically and shifted into online forms. Next, we report data w…
▽ More
e-Research is a rapidly growing research area, both in terms of publications and in terms of funding. In this article we argue that it is necessary to reconceptualize the ways in which we seek to measure and understand e-Research by developing a sociology of knowledge based on our understanding of how science has been transformed historically and shifted into online forms. Next, we report data which allows the examination of e-Research through a variety of traces in order to begin to understand how the knowledge in the realm of e-Research has been and is being constructed. These data indicate that e-Research has had a variable impact in different fields of research. We argue that only an overall account of the scale and scope of e-Research within and between different fields makes it possible to identify the organizational coherence and diffuseness of e-Research in terms of its socio-technical networks, and thus to identify the contributions of e-Research to various research fronts in the online production of knowledge.
△ Less
Submitted 14 August, 2009;
originally announced August 2009.