-
The Problems with Proxies: Making Data Work Visible through Requester Practices
Authors:
Annabel Rothschild,
Ding Wang,
Niveditha Jayakumar Vilvanathan,
Lauren Wilcox,
Carl DiSalvo,
Betsy DiSalvo
Abstract:
Fairness in AI and ML systems is increasingly linked to the proper treatment and recognition of data workers involved in training dataset development. Yet, those who collect and annotate the data, and thus have the most intimate knowledge of its development, are often excluded from critical discussions. This exclusion prevents data annotators, who are domain experts, from contributing effectively…
▽ More
Fairness in AI and ML systems is increasingly linked to the proper treatment and recognition of data workers involved in training dataset development. Yet, those who collect and annotate the data, and thus have the most intimate knowledge of its development, are often excluded from critical discussions. This exclusion prevents data annotators, who are domain experts, from contributing effectively to dataset contextualization. Our investigation into the hiring and engagement practices of 52 data work requesters on platforms like Amazon Mechanical Turk reveals a gap: requesters frequently hold naive or unchallenged notions of worker identities and capabilities and rely on ad-hoc qualification tasks that fail to respect the workers' expertise. These practices not only undermine the quality of data but also the ethical standards of AI development. To rectify these issues, we advocate for policy changes to enhance how data annotation tasks are designed and managed and to ensure data workers are treated with the respect they deserve.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Polynomials: a new tool for length reduction in binary discrete convolutions
Authors:
Amihood Amir,
Oren Kapah,
Ely Porat,
Amir Rothschild
Abstract:
Efficient handling of sparse data is a key challenge in Computer Science. Binary convolutions, such as polynomial multiplication or the Walsh Transform are a useful tool in many applications and are efficiently solved.
In the last decade, several problems required efficient solution of sparse binary convolutions. both randomized and deterministic algorithms were developed for efficiently computi…
▽ More
Efficient handling of sparse data is a key challenge in Computer Science. Binary convolutions, such as polynomial multiplication or the Walsh Transform are a useful tool in many applications and are efficiently solved.
In the last decade, several problems required efficient solution of sparse binary convolutions. both randomized and deterministic algorithms were developed for efficiently computing the sparse polynomial multiplication. The key operation in all these algorithms was length reduction. The sparse data is mapped into small vectors that preserve the convolution result. The reduction method used to-date was the modulo function since it preserves location (of the "1" bits) up to cyclic shift.
To date there is no known efficient algorithm for computing the sparse Walsh transform. Since the modulo function does not preserve the Walsh transform a new method for length reduction is needed. In this paper we present such a new method - polynomials. This method enables the development of an efficient algorithm for computing the binary sparse Walsh transform. To our knowledge, this is the first such algorithm. We also show that this method allows a faster deterministic computation of sparse polynomial multiplication than currently known in the literature.
△ Less
Submitted 21 October, 2014;
originally announced October 2014.
-
Improved Deterministic Length Reduction
Authors:
Amihood Amir,
Klim Efremenko,
Oren Kapah,
Ely Porat,
Amir Rothschild
Abstract:
This paper presents a new technique for deterministic length reduction. This technique improves the running time of the algorithm presented in \cite{LR07} for performing fast convolution in sparse data. While the regular fast convolution of vectors $V_1,V_2$ whose sizes are $N_1,N_2$ respectively, takes $O(N_1 \log N_2)$ using FFT, using the new technique for length reduction, the algorithm prop…
▽ More
This paper presents a new technique for deterministic length reduction. This technique improves the running time of the algorithm presented in \cite{LR07} for performing fast convolution in sparse data. While the regular fast convolution of vectors $V_1,V_2$ whose sizes are $N_1,N_2$ respectively, takes $O(N_1 \log N_2)$ using FFT, using the new technique for length reduction, the algorithm proposed in \cite{LR07} performs the convolution in $O(n_1 \log^3 n_1)$, where $n_1$ is the number of non-zero values in $V_1$. The algorithm assumes that $V_1$ is given in advance, and $V_2$ is given in running time. The novel technique presented in this paper improves the convolution time to $O(n_1 \log^2 n_1)$ {\sl deterministically}, which equals the best running time given achieved by a {\sl randomized} algorithm.
The preprocessing time of the new technique remains the same as the preprocessing time of \cite{LR07}, which is $O(n_1^2)$. This assumes and deals the case where $N_1$ is polynomial in $n_1$. In the case where $N_1$ is exponential in $n_1$, a reduction to a polynomial case can be used. In this paper we also improve the preprocessing time of this reduction from $O(n_1^4)$ to $O(n_1^3{\rm polylog}(n_1))$.
△ Less
Submitted 31 January, 2008;
originally announced February 2008.
-
Explicit Non-Adaptive Combinatorial Group Testing Schemes
Authors:
Ely Porat,
Amir Rothschild
Abstract:
Group testing is a long studied problem in combinatorics: A small set of $r$ ill people should be identified out of the whole ($n$ people) by using only queries (tests) of the form "Does set X contain an ill human?". In this paper we provide an explicit construction of a testing scheme which is better (smaller) than any known explicit construction. This scheme has $\bigT{\min[r^2 \ln n,n]}$ test…
▽ More
Group testing is a long studied problem in combinatorics: A small set of $r$ ill people should be identified out of the whole ($n$ people) by using only queries (tests) of the form "Does set X contain an ill human?". In this paper we provide an explicit construction of a testing scheme which is better (smaller) than any known explicit construction. This scheme has $\bigT{\min[r^2 \ln n,n]}$ tests which is as many as the best non-explicit schemes have. In our construction we use a fact that may have a value by its own right: Linear error-correction codes with parameters $[m,k,δm]_q$ meeting the Gilbert-Varshamov bound may be constructed quite efficiently, in $\bigT{q^km}$ time.
△ Less
Submitted 29 April, 2008; v1 submitted 22 December, 2007;
originally announced December 2007.