Search | arXiv e-print repository

RapidProM: Mine Your Processes and Not Just Your Data

Authors: Wil M. P. van der Aalst, Alfredo Bolt, Sebastiaan J. van Zelst

Abstract: The number of events recorded for operational processes is growing every year. This applies to all domains: from health care and e-government to production and maintenance. Event data are a valuable source of information for organizations that need to meet requirements related to compliance, efficiency, and customer service. Process mining helps to turn these data into real value: by discovering t… ▽ More The number of events recorded for operational processes is growing every year. This applies to all domains: from health care and e-government to production and maintenance. Event data are a valuable source of information for organizations that need to meet requirements related to compliance, efficiency, and customer service. Process mining helps to turn these data into real value: by discovering the real processes, by automatically identifying bottlenecks, by analyzing deviations and sources of non-compliance, by revealing the actual behavior of people, etc. Process mining is very different from conventional data mining and machine learning techniques. ProM is a powerful open-source process mining tool supporting hundreds of analysis techniques. However, ProM does not support analysis based on scientific workflows. RapidProM, an extension of RapidMiner based on ProM, combines the best of both worlds. Complex process mining workflows can be modeled and executed easily and subsequently reused for other data sets. Moreover, using RapidProM, one can benefit from combinations of process mining with other types of analysis available through the RapidMiner marketplace. △ Less

Submitted 10 March, 2017; originally announced March 2017.

Comments: Will be published in 2nd version of "RapidMiner: Data Mining Use Cases and Business Analytics Applications"; Markus Hofmann, Ralf Klinkenberg; published by Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

arXiv:1610.02876 [pdf, other]

doi 10.1109/SSCI.2016.7849948

Heuristic Approaches for Generating Local Process Models through Log Projections

Authors: Niek Tax, Natalia Sidorova, Wil M. P. van der Aalst, Reinder Haakma

Abstract: Local Process Model (LPM) discovery is focused on the mining of a set of process models where each model describes the behavior represented in the event log only partially, i.e. subsets of possible events are taken into account to create so-called local process models. Often such smaller models provide valuable insights into the behavior of the process, especially when no adequate and comprehensib… ▽ More Local Process Model (LPM) discovery is focused on the mining of a set of process models where each model describes the behavior represented in the event log only partially, i.e. subsets of possible events are taken into account to create so-called local process models. Often such smaller models provide valuable insights into the behavior of the process, especially when no adequate and comprehensible single overall process model exists that is able to describe the traces of the process from start to end. The practical application of LPM discovery is however hindered by computational issues in the case of logs with many activities (problems may already occur when there are more than 17 unique activities). In this paper, we explore three heuristics to discover subsets of activities that lead to useful log projections with the goal of speeding up LPM discovery considerably while still finding high-quality LPMs. We found that a Markov clustering approach to create projection sets results in the largest improvement of execution time, with discovered LPMs still being better than with the use of randomly generated activity sets of the same size. Another heuristic, based on log entropy, yields a more moderate speedup, but enables the discovery of higher quality LPMs. The third heuristic, based on the relative information gain, shows unstable performance: for some data sets the speedup and LPM quality are higher than with the log entropy based method, while for other data sets there is no speedup at all. △ Less

Submitted 10 October, 2016; originally announced October 2016.

Comments: paper accepted and to appear in the proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), special session on Process Mining, part of the Symposium Series on Computational Intelligence (SSCI)

Journal ref: Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), (2016) 1-8

arXiv:1606.07283 [pdf, other]

doi 10.1007/978-3-319-56994-9_18

Event Abstraction for Process Mining using Supervised Learning Techniques

Authors: Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M. P. van der Aalst

Abstract: Process mining techniques focus on extracting insight in processes from event logs. In many cases, events recorded in the event log are too fine-grained, causing process discovery algorithms to discover incomprehensible process models or process models that are not representative of the event log. We show that when process discovery algorithms are only able to discover an unrepresentative process… ▽ More Process mining techniques focus on extracting insight in processes from event logs. In many cases, events recorded in the event log are too fine-grained, causing process discovery algorithms to discover incomprehensible process models or process models that are not representative of the event log. We show that when process discovery algorithms are only able to discover an unrepresentative process model from a low-level event log, structure in the process can in some cases still be discovered by first abstracting the event log to a higher level of granularity. This gives rise to the challenge to bridge the gap between an original low-level event log and a desired high-level perspective on this log, such that a more structured or more comprehensible process model can be discovered. We show that supervised learning can be leveraged for the event abstraction task when annotations with high-level interpretations of the low-level events are available for a subset of the sequences (i.e., traces). We present a method to generate feature vector representations of events based on XES extensions, and describe an approach to abstract events in an event log with Condition Random Fields using these event features. Furthermore, we propose a sequence-focused metric to evaluate supervised event abstraction results that fits closely to the tasks of process discovery and conformance checking. We conclude this paper by demonstrating the usefulness of supervised event abstraction for obtaining more structured and/or more comprehensible process models using both real life event data and synthetic event data. △ Less

Submitted 23 June, 2016; originally announced June 2016.

Comments: paper to appear in the proceedings of the SAI Intelligent Systems Conference 2016

Journal ref: Lecture Notes in Networks and Systems, 15 (2017), 251-269

arXiv:1606.07259 [pdf, other]

doi 10.1016/j.procs.2016.08.096

Log-based Evaluation of Label Splits for Process Models

Authors: Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M. P. van der Aalst

Abstract: Process mining techniques aim to extract insights in processes from event logs. One of the challenges in process mining is identifying interesting and meaningful event labels that contribute to a better understanding of the process. Our application area is mining data from smart homes for elderly, where the ultimate goal is to signal deviations from usual behavior and provide timely recommendation… ▽ More Process mining techniques aim to extract insights in processes from event logs. One of the challenges in process mining is identifying interesting and meaningful event labels that contribute to a better understanding of the process. Our application area is mining data from smart homes for elderly, where the ultimate goal is to signal deviations from usual behavior and provide timely recommendations in order to extend the period of independent living. Extracting individual process models showing user behavior is an important instrument in achieving this goal. However, the interpretation of sensor data at an appropriate abstraction level is not straightforward. For example, a motion sensor in a bedroom can be triggered by tossing and turning in bed or by getting up. We try to derive the actual activity depending on the context (time, previous events, etc.). In this paper we introduce the notion of label refinements, which links more abstract event descriptions with their more refined counterparts. We present a statistical evaluation method to determine the usefulness of a label refinement for a given event log from a process perspective. Based on data from smart homes, we show how our statistical evaluation method for label refinements can be used in practice. Our method was able to select two label refinements out of a set of candidate label refinements that both had a positive effect on model precision. △ Less

Submitted 23 June, 2016; originally announced June 2016.

Comments: Paper accepted at the 20th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, to appear in Procedia Computer Science

Journal ref: Procedia Computer Science, 96 (2016) 63-72

arXiv:1606.06066 [pdf, ps, other]

doi 10.1016/j.jides.2016.11.001

Mining Local Process Models

Authors: Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M. P. van der Aalst

Abstract: In this paper we describe a method to discover frequent behavioral patterns in event logs. We express these patterns as \emph{local process models}. Local process model mining can be positioned in-between process discovery and episode / sequential pattern mining. The technique presented in this paper is able to learn behavioral patterns involving sequential composition, concurrency, choice and loo… ▽ More In this paper we describe a method to discover frequent behavioral patterns in event logs. We express these patterns as \emph{local process models}. Local process model mining can be positioned in-between process discovery and episode / sequential pattern mining. The technique presented in this paper is able to learn behavioral patterns involving sequential composition, concurrency, choice and loop, like in process mining. However, we do not look at start-to-end models, which distinguishes our approach from process discovery and creates a link to episode / sequential pattern mining. We propose an incremental procedure for building local process models capturing frequent patterns based on so-called process trees. We propose five quality dimensions and corresponding metrics for local process models, given an event log. We show monotonicity properties for some quality dimensions, enabling a speedup of local process model discovery through pruning. We demonstrate through a real life case study that mining local patterns allows us to get insights in processes where regular start-to-end process discovery techniques are only able to learn unstructured, flower-like, models. △ Less

Submitted 16 May, 2017; v1 submitted 20 June, 2016; originally announced June 2016.

Comments: Published in Elsevier's Journal of Innovation in Digital Ecosystems, Special Issue on Data Mining

Journal ref: Journal of Innovation in Digital Ecosystems volume 3 issue 2 (2016), pages 183-196

arXiv:1212.6383 [pdf, other]

doi 10.1109/CEC.2014.6900341

Heuristics Miners for Streaming Event Data

Authors: Andrea Burattin, Alessandro Sperduti, Wil M. P. van der Aalst

Abstract: More and more business activities are performed using information systems. These systems produce such huge amounts of event data that existing systems are unable to store and process them. Moreover, few processes are in steady-state and due to changing circumstances processes evolve and systems need to adapt continuously. Since conventional process discovery algorithms have been defined for batch… ▽ More More and more business activities are performed using information systems. These systems produce such huge amounts of event data that existing systems are unable to store and process them. Moreover, few processes are in steady-state and due to changing circumstances processes evolve and systems need to adapt continuously. Since conventional process discovery algorithms have been defined for batch processing, it is difficult to apply them in such evolving environments. Existing algorithms cannot cope with streaming event data and tend to generate unreliable and obsolete results. In this paper, we discuss the peculiarities of dealing with streaming event data in the context of process mining. Subsequently, we present a general framework for defining process mining algorithms in settings where it is impossible to store all events over an extended period or where processes evolve while being analyzed. We show how the Heuristics Miner, one of the most effective process discovery algorithms for practical applications, can be modified using this framework. Different stream-aware versions of the Heuristics Miner are defined and implemented in ProM. Moreover, experimental results on artificial and real logs are reported. △ Less

Submitted 27 December, 2012; originally announced December 2012.

ACM Class: H.2.8; F.1.2

Showing 101–106 of 106 results for author: van der Aalst, W M