-
A Clustering Framework for Lexical Normalization of Roman Urdu
Authors:
Abdul Rafae Khan,
Asim Karim,
Hassan Sajjad,
Faisal Kamiran,
Jia Xu
Abstract:
Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content. It lacks standard spelling and hence poses several normalization challenges during automatic language processing. In this article, we present a feature-based clustering framework for the lexical normalization of Roman Urdu corpora, which includes a phonetic al…
▽ More
Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content. It lacks standard spelling and hence poses several normalization challenges during automatic language processing. In this article, we present a feature-based clustering framework for the lexical normalization of Roman Urdu corpora, which includes a phonetic algorithm UrduPhone, a string matching component, a feature-based similarity function, and a clustering algorithm Lex-Var. UrduPhone encodes Roman Urdu strings to their pronunciation-based representations. The string matching component handles character-level variations that occur when writing Urdu using Roman script.
△ Less
Submitted 31 March, 2020;
originally announced April 2020.
-
Machine Learning Based Student Grade Prediction: A Case Study
Authors:
Zafar Iqbal,
Junaid Qadir,
Adnan Noor Mian,
Faisal Kamiran
Abstract:
In higher educational institutes, many students have to struggle hard to complete different courses since there is no dedicated support offered to students who need special attention in the registered courses. Machine learning techniques can be utilized for students' grades prediction in different courses. Such techniques would help students to improve their performance based on predicted grades a…
▽ More
In higher educational institutes, many students have to struggle hard to complete different courses since there is no dedicated support offered to students who need special attention in the registered courses. Machine learning techniques can be utilized for students' grades prediction in different courses. Such techniques would help students to improve their performance based on predicted grades and would enable instructors to identify such individuals who might need assistance in the courses. In this paper, we use Collaborative Filtering (CF), Matrix Factorization (MF), and Restricted Boltzmann Machines (RBM) techniques to systematically analyze a real-world data collected from Information Technology University (ITU), Lahore, Pakistan. We evaluate the academic performance of ITU students who got admission in the bachelor's degree program in ITU's Electrical Engineering department. The RBM technique is found to be better than the other techniques used in predicting the students' performance in the particular course.
△ Less
Submitted 17 August, 2017;
originally announced August 2017.
-
Causal Inference for Social Discrimination Reasoning
Authors:
Bilal Qureshi,
Faisal Kamiran,
Asim Karim,
Salvatore Ruggieri,
Dino Pedreschi
Abstract:
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery ba…
▽ More
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery based on propensity score analysis, a statistical tool for filtering out the effect of confounding variables. We introduce causal measures of discrimination which quantify the effect of group membership on the decisions, and highlight causal discrimination/favoritism patterns by learning regression trees over the novel measures. We validate our approach on two real world datasets. Our proposed framework for causal discrimination has the potential to enhance the transparency of machine learning with tools for detecting discriminatory bias both in the training data and in the learning algorithms.
△ Less
Submitted 4 November, 2019; v1 submitted 12 August, 2016;
originally announced August 2016.