980-1-7281-3003-3 / 19 / $ 31.00 © 2019 IEEE A Compact Trace Representation Using Deep Neural Networks for Process Mining 1st Hong-Nhung Bui Vietnam National University VNU VNU-Univer
Trang 1980-1-7281-3003-3 / 19 / $ 31.00 © 2019 IEEE
A Compact Trace Representation Using Deep
Neural Networks for Process Mining
1st Hong-Nhung Bui
Vietnam National University (VNU)
VNU-University of Engineering and
Technology (UET)
Banking Academy of Vietnam
Hanoi, Vietnam
nhungbth@hvnh.edu.vn
4th Thi-Cham Nguyen
Vietnam National University (VNU)
VNU-University of Engineering and
Technology (UET)
Hanoi, Vietnam
Hai Phong University of Medicine and
Pharmacy
Haiphong, Vietnam
nthicham@hpmu.edu.vn
2nd Trong-Sinh Vu
School of Information Science Japan Advanced Institute of Science
and Technology Ishikawa, Japan
sinhvtr@jaist.ac.jp
5th Quang-Thuy Ha
Vietnam National University (VNU) VNU-University of Engineering and Technology (UET)
Hanoi, Vietnam thuyhq@vnu.edu.vn
3rd Tri-Thanh Nguyen
Vietnam National University (VNU) VNU-University of Engineering and Technology (UET)
Hanoi, Vietnam ntthanh@vnu.edu.vn
Abstract— In process mining, trace representation has a
significant effect on the process discovery problem The
challenge is to get highly informative but low-dimensional
vector space from event logs This is required to improve the
quality of the trace clustering problem for generating the
process models clear enough to inspect Though traditional trace
representation methods have specific advantages, their vector
space often has a big number of dimensions In this paper, we
address this problem by proposing a new trace representation
method based on the deep neural networks Experimental
results prove our proposal not only is better than the
alternatives, but also significantly helps to reduce the dimension
of trace representation
Keywords—event logs, trace clustering, trace representation,
deep neural networks, compact trace representation
I INTRODUCTION
Process mining, with three main tasks, i.e., process
discovery, conformance checking, and process enhancement,
plays a vital role in most today’s companies Thus, many
enterprises are in need of process models generated from the
event logs of their software systems to assist the enterprises in
monitoring their employee’s process behaviors as well as
process optimization However, generating process models
from the whole raw event logs often results in complicated
models that can not be used for inspection Trace clustering is
an effective solution which separates an input event log into
groups (clusters) containing similar events The generated
model from an event cluster will have much lower complexity
[1, 3, 4, 5, 6, 7, 8, 9], and be easy for inspection
The trace representation method is one of the most
important factors that affects the quality of trace clustering
Traditional trace representations, such as Bag-of-activities,
k-grams, Maximal Repeats, Distance Graphs, etc., create the
vector space model with high dimension [2, 10] This
increases the computational complexity, the execution time,
as well as the storage space for clustering algorithms
Motivated by the study in natural language processing [18],
this article proposes a new method for trace representation,
i.e., a compact representation, via deep neural networks
Deep learning methods are based on several layers of
neural networks It has the ability to learn at multi-layers of
the networks Thus, the knowledge at a higher layer is richer
than that of the lower one This motivates us to train a deep neural network, and get the data at last layer, i.e., the one before the output layer of the network, to use for trace representation This representation has low dimension (called
compact), however rich information, thus, it helps to reduce
the complexity of clustering algorithms, and improve the results
We also propose a method to transform the unlabeled trace into labeled trace for training the deep neural network in order
to get the compact trace representation The results of the experiments on three real event logs indicates the effectiveness of our method It not only increases the quality
of the generated process model, but also reduces the dimension of the vector space of trace representation The rest of this paper is organized as follows: In section 2,
we introduce some traditional trace representation methods In section 3, the application of deep learning with neural networks for trace representation is presented Section 4 provides the experimental results Finally, section 5 shows some conclusions and future work
II TRACE REPRESENTATION METHODS IN PROCESS
DISCOVERY
A The process discovery task
Process discovery has the role of reconstructing an actual business process model by extracting the information from an event log recorded by transactional systems In this task, an event log is taken as the input and a business model is produced without using any prior information
Fig 1 A fragment of the airline compensation requests event log [19]
Trang 2To discover the process models, the α algorithm can be
utilized [12], and the obtained models can be represented by
Petri nets The input of process discovery is an event log
consisting of a list of events An event is considered as an
actual action accompanied with some information, such as
event id, activity, timestamp, resources (e.g., the involved
person and/or device), and cost Fig 1 provides a snippet of
an event log
In this task, we only consider the activity of events A set
of events, ordered by timestamp, having the same “case id”
forms a case which can be represented as a “trace” of the
actual process as depicted in “Fig 2” [10]
Fig 2 The trace in an event log, where =“register request”, =“examine
thoroughly”, =“examine casually”, =“check ticket”, =“decide”,
=“reinitiate request”, =“pay compensation”, ℎ=“reject request”
B Traditional trace representation methods
In the process discovery problem, the contemporary the
quality of the discovered model depends not only on the
complexity of the event log but also on the trace representation
method Different methods will exploit the different
relationship/characteristics between the activities Similar to
document representation, a trace can be represented by the
vector space model The relationship/characteristics of
activities is converted into numerical values as the elements of
a vector There are two objectives of trace representation: one
is to increase the relationship between the activities; the other
is to reduce the dimension of the vector space model Existing
work suggests many different approaches for trace
representation, such as bag-of-activities, k-grams, maximal
repeats, distance graph, etc [2, 10], as briefly described
below:
1) Bag-of-activities: This is the most common trace
representation method for clustering A trace is transformed
into a vector of distinct activities appearing in the event log
Each trace is converted into a vector in the form of the binary
vector space model [2] If an activity appears in the trace, then
its corresponding element in the vector is 1, otherwise 0
For example, let an event log = [ ℎ, ℎ,
ℎ, ℎ] The set of distinct activities is
{ , , , , , , ℎ}, then the set of binary vectors of
bag-of-activities representation of the above event log is
{(1,1,0,1,1,0,1), (1,1,1,1,1,1,1), (1,1,0,1,1,0,1), (1,0,1,1,1,1,
1)} The dimension of the vector space model is 7
2) k-grams: a -gram refers to a sequence of
consecutive activities For the trace [ ℎ], the set of
1-gram corresponds to { , , , , , , ℎ}; the set of 2-gram is
{ , , , , , , , ℎ}, etc Each distinct -gram is
mapped into a feature in the vector space
3) Maximal Repeat: The Maximal repeat is defined as
follows [2]: a maximal pair in a sequence is a subsequence
1
https://www.rsipvision.com/wp-content/uploads/2015/04/Slide5.png
that manifests in at two distinct positions and such that the element to the immediate left (right) of the manifestation of at position i is different from the element
to the left (right) of the manifestation of at position , i.e., ( , + | | − 1 = ( , + | | − 1) = , and ( − 1) ≠ ( − 1) and ( + | |) ≠ ( + | |) , for 1 ≤ < ≤ | | ( (0) and (| | + 1) are considered as null, i.e., <>) Given the event log , when concatinating all the traces in L to form
a sequence, the features in the vector space of maximal repeat
dimension is 12
4) Distance graph: Given a corpus , the order distance
graph of a document generated from C is defined as graph ( ; ; ) = ( ( ); ( ; )) , where ( ) is the set of nodes, i.e., the set of distinct words in the entire corpus , and ( ; ) is the set of edges in the graph The set ( ; ) contains a set of directed edges from node to node if the word precedes word by at most positions Each edge in the graph is mapped into a feature in a vector space [13]
To apply theory of distance graph in trace representation problem, the set of activities in the event log is considered
as the set of “distinct words” in corpus , and a trace in the event log is considered as a document , thus the distance graphs for an event log can be constructed [10] The set of edges/features in distance graph order 0 is { , , , , , , ℎ}; the set of corresponding features in distance graph order 1 is { , , , , , , ℎ, , , , , , , , ℎ} ; the set of features in distance graph order 2 is { , , , , , , ℎ, ,
corresponding dimensions are 7, 15, and 21
For a small event log with only 4 traces, the dimension of the vector space is rather big depending on the different representation methods For the real-life event logs, the dimension of the vector space usually reaches to thousands or even tens of thousands This greatly affects the performance
of clustering algorithms in terms of execution time and storage space Therefore, reducing the number of dimensions of the vector space model is a significant problem We will present our solution to this problem in the next section
III DEEP NEURAL NETWORKS IN TRACE
REPRESENTATION
A Deep neural networks
Deep Neural Networks (DNN) is a class of machine learning algorithms DNN, based on Artificial Neural Networks (ANN), allows computers to be able to "learn" at different levels of abstraction With the foundation of artificial neural networks with multiple layers between the input and output layers, deep neural networks were improved to imitate the human brain’s activities by using a great number of neurons connected to each other to process information [14,15,16] The structure of a deep neural network includes 3 layers as described in Fig 31, where:
Input layer: consists of neurons receiving input values
Hidden layers: include neurons for performing transformations, the output of a layer is the input for the next layer
Trang 3Fig 3 Deep neural networks model
Output layer: contains neurons returning the desired
output data
The neurons are connected to each other by the formulas
in hidden layers and output layer as following:
where , , are the input, hidden and output values of a
neuron, correspondingly; is an activation function (e.g.,
common activation functions are sigmoid, tanh, and ReLu);
, are parameters of the networks, in which the connection
weights are very important in a DNN, representing the
importance of each input data in the information conversion
process from one layer to another Learning in a DNN can be
described as the process of receiving the expected results from
adjusting the weights of the input The bias value permits
the activation function to move line up and down to more
effectively fit the prediction with the data
Supervised learning and unsupervised learning are two
basic techniques that a DNN is trained Supervised learning
uses labeled data , and the learning process is repeated until
the output value reaches the desired value In this work, the
supervised learning technique is applied with three steps: (1)
The output value is calculated (2) Output is compared with
desired value (3) If the desired value is not satisfied, the
weights and bias are adjusted, the output is
recalculated by going back to step 1
In the process of training, the initial weights , in a deep
neural network are initialized randomly with the dimension
depending on the dimension of the input value and the
desired value Assuming that is an matrix, and is
a matrix At the hidden layer, is initialized as a
matrix, and is initialized as a matrix At the
output layer, is initialized as a matrix, is
initialized as a matrix Where is a value defined by the
user and is much smaller than After applying formula
(1), we obtain a matrix of dimension ; after applying
formula (2), we obtain a matrix of dimension
B Trace representation based on deep neural networks
One of the purposes of deep neural networks is to train the
input value into a compact intermediate representation (the
hidden value ) with a new and better representation to
accurately predict the output value In this paper, we apply
this idea of the supervised learning technique in deep neural
networks to improve the efficiency of the trace representation
method in the event log Instead of using the original trace
representation, the compact trace representation will be used for clustering
For instance, the credit process has some procedures, i.e., personal loan, corporate loan, home loan, and consumer loan, where each procedure shares common characteristics or activities The common characteristic is defined as trace context in [17] In other words, each procedure may contain a common sequence of activities, which is defined as a trace context Let = { , , … } be an event log, where is a trace Let be the longest common prefix of a trace subset, i.e., = { ∈ | = ^ }, such that | | > 1, where d is a activity sequence, d can be empty; the notation ‘^’ in ^
denotes sequence concatenation operation, then is called as
a trace context [17] For example, given an event log =
then the set of trace contexts of is { , , }
In order to apply the supervised learning in deep neural networks, the set of traces in the event log is considered as the set of input data, and the trace context set is considered as the
labeled data Z We design a deep neural network consisting of
one input layer that receives the traces of event logs, two hidden layers and one output layer to predict the trace context as depicted in Fig 4
Fig 4 The idea of using deep neural networks for trace representation
At the input layer, we represent a trace by binary bag-of-activity Each input neuron receives a trace is an -dimensional binary vector, = [ … ], where is either
0 or 1 For the labeled data Z, the trace context set is
represented as one-hot vectors Suppose an event log has different trace contexts, each trace context will be represented by a one-hot vector = [ , , , ] with
= 1 if = , otherwise = 0 (1 ≤ , ≤ ) Example,
an event log has 3 different trace contexts { , , }, so the trace context = [1,0,0]; = [0,1,0]; = [0,0,1]
At the hidden layers, the hidden values are calculated according to the following formula:
The final vector obtained at this layer, i.e., the last neural network layer in the hidden layer, has the dimension of , will be used as the compact trace representation Any input trace vector with dimension of will be transformed into a vector with the dimension of We expect is much smaller than Moreover, the value of is adjusted during the training process, it has richer information than that of the input vector The value of each element in vector is a real
Trang 4number ranging from (0,1] instead of two discrete 0 and 1
values as in the input vectors, thus it contain finer information
This characteristic is another clue why we select this vector as
the input for the clustering task Its richer information will
help to improve the clustering performance
The output layer receives the hidden values , and the
output values are calculated by the following formula:
where , , , are the model parameters which
can be adjusted repeatedly during the training process The
sigmoid function is used as the activation function as in (5):
The training process will finish when the output value
reaches the trace context in an allowable error
IV EXPERIMENTAL RESULTS
A Experimental Method
For experiments, we used the three-phase process
discovery framework [10], i.e., “Trace Processing and
Clustering”, “Discovering Process Model”, and “Evaluating
Model”, as depicted in Fig.5
For evaluation, in the Trace Processing step, we
implemented some other trace representation methods as the
baseline, i.e., bag-of-activities, k-grams, maximal repeats, and
the distance graph model The experimental platform is on
Ubuntu 16.04, Python 2.7, Tensorflow2 1.2 In the Clustering
step, the K-means clustering algorithm and the tool of data
mining - Rapid Miner Studio3 were used In the Discovering
Process Model phase, we use -algorithm (a plug-in in the
tool of process mining - ProM4 6.6) to get the process models
from event clusters
Other hyperparameters of the model are set as follows:
learning rate = 0.1, iterations = 10.000, and the
softmax_cross_entropy_with_logits_v2 of Tensorflow is
used
The Evaluating Model phase determines the quality of the
generated process models using two main measures Fitness
and Precision [17] These two measures are in the range of
[0,1], and the bigger the better We use the “conformance
checker” plug-in in ProM 6.6 to calculate the fitness and
precision measure for each sub-models Since there are more
than one sub-model, we calculate the weighted average of the
fitness and precision of all sub-models for comparison as
follow:
where is the average value of the fitness or precision
measure; is the number of models; is the number of traces
in the event log; is the number of traces in ith cluster; and
is the value of the fitness or precision measure in the
model, correspondingly
2https://www.tensorflow.org/
3https://rapidminer.com
4http://www.promtools.org/
5www.processmining.org/event_logs_and_models_used_in_
book
Fig 5 A three-phase framework of process discovery
B Experimental Data
For the objectivity of experiments, we used three event logs Lfull5, prAm66and prHm67 in the process mining community with the following characteristics “Table I”
TABLE I THE CHARACTERISTICS OF THREE EVENT LOGS
Event log #cases #events #contexts characteristics
Duplicated traces, repeated activities in a trace
prAm6 1200 49792 11 Few duplicated traces,
no-repeated activities prHm6 1155 1720 7 No-duplicated traces,
no-repeated activities
C Experimental Results
We set the parameter equal to a set of values [30, 40, 50,
60, 70, 80] to evaluate its importance The best experimental results, i.e., the dimension (Dim) of the traces and the Time (s-second, h-hour) to create a trace representation as well as the Fitness and Precision of the resulted process models are described in the Table II
For Lfull, when is from 30 to 80 the best result is at 50, other resutls are almost the same For the other datasets, i.e., prAm6 and prBm6, the results are almost the same when this parameter is changed
6
https://data.4tu.nl/repository/uuid:44c32783-15d0-4dbd-af8a-78b97be3de49
7
https://data.4tu.nl/repository/uuid:44c32783-15d0-4dbd-af8a-78b97be3de49
Trang 5TABLE II THE RESULTS OF TRADITIONAL AND COMPACT TRACE
REPRESENTATIONS Event
log
Method of Trace
Representation
Measure Dim Time Fitness Precision
Lfull
Scenario 1: Traditional Trace Representations
Bag-of-activities 8 0.1s 0.991 0.754
Maximal Repeats 50 2s 0.950 1
Distance Graphs 43 1.9s 0.992 1
Scenario 2: Deep neural networks
Compact trace 50 17s 0.99995 0.794
prAm6
Scenario 1: Traditional Trace Representations
Bag-of-activities 317 0.3s 0.968 0.809
Maximal Repeats 9493 8h 0.968 0.332
Distance Graphs 1927 93s 0.968 0.809
Scenario 2: Deep neural networks
Compact trace 30 43s 0.973 0.911
prHm6
Scenario 1: Traditional Trace Representations
Bag-of-activities 321 0.2s 0.902 0.660
Maximal Repeats 592 59s 0.897 0.730
Distance Graphs 1841 54s 0.902 0.660
Scenario 2: Deep neural networks
Compact trace 30 37s 0.902 0.762
The experimental results show that, in compact trace
representation, the fitness is higher than the precision for all
datasets The training time of DNN does not take too much
Except the Lfull dataset, which has a small number of
activities, for the other datasets with a large number of
activities, compact trace representation always has the best
results
The deep neural network has the ability to learn the
relation among the activities in the input vector to generate the
compact representation which may contain richer information
than the input Thanks to this fact, the clustering can produce
better results Especially, the dimension of compact trace
representation is very small in comparison with other
representation methods This indicates our method is a good
choice for complex event logs with a large number of
activities For two complex datasets in the experiments, the
dimension of the compact trace representation is reduced
about ten times, i.e., its dimension is 30 versus the input trace
dimension of 317 and 321 This is exactly what we expect to
reduce the dimension of feature space
V CONCLUSIONS AND FUTURE WORK
This paper proposes a new trace representation method
using deep neural networks The output vectors at the last
hidden layer of deep neural networks are used as the trace
representation for later clustering phase The compactness of
this representation helps to reduce the clustering complexity,
while its richness of information helps to improve the performance of clustering The experimental results indicate this method is quite suitable for complex event logs which contain a large number of activities The dimension of representation is reduced about ten times, while the precision and fitness are improved
One possible future direction, we will try advanced deep
learning methods based on Recurrent neural network (RNN)
or Long-short term memory (LSTM), i.e., an improved deep
learning method base on Recurrent neural network, to investigate whether they can improve the performance of trace clustering
REFERENCES [1] RPJ Chandra Bose, WMP Van der Aalst, “Trace Clustering Based on Conserved Patterns Towards Achieving Better Process Models”, Business Process Management Workshops, pp.170 (2009)
[2] RPJ Chandra Bose, “Process Mining in the Large Preprocessing, Discovery, and Diagnostics”, PhD Thesis, Eindhoven University of Technology (2012)
[3] Gianluigi Greco, Antonella Guzzo, Luigi Pontieri, Domenico Saccà,
“Discovering Expressive Process Models by Clustering Log Traces”, IEEE Trans Knowl Data Eng, pp.1010 (2006)
[4] A.K.A de Medeiros, A Guzzo, G Greco, WMP Van der Aalst, A J M
M Weijters, Boudewijn F van Dongen, Domenico Saccà, “Process Mining Based on Clustering: A Quest for Precision”, BMP Workshops, pp.17 (2007)
[5] M Song, Christian W Günther, WMP Van der Aalst, “Trace Clustering in Process Mining”, Business Process Management Workshops, pp.109 (2008)
[6] De Weerdt, J., van den Broucke, S.K.L.M., Vanthienen, and J., Baesens, “Leveraging process discovery with trace clustering and text mining for intelligent analysis of incident management processes”, IEEE Congress on Evolutionary Computation, pp.1 (2012)
[7] J De Weerdt, Seppe K L M vanden Broucke, Jan Vanthienen, Bart Baesens, “Active Trace Clustering for Improved Process Discovery”, IEEE Trans Knowl Data Eng 25(12), pp.2708 (2013)
[8] Igor Fischer, Jan Poland, “New Methods for Spectral Clustering”, In Proc ISDIA (2004)
[9] Joerg Evermann, Tom Thaler, Peter Fettke: “Clustering Traces using Sequence Alignment”, Business Process Management Workshops, pp 179-190 (2015)
[10] Quang-Thuy Ha, Hong-Nhung Bui, Tri-Thanh Nguyen, “A trace clustering solution based on using the distance graph model”, In proceedings of ICCCI, pp 313-322 (2016)
[11] T Thaler, Simon Felix Ternis, Peter Fettke, Peter Loos, “A Comparative Analysis of Process Instance Cluster Technique”s, Wirtschaftsinformatik, pp.423 (2015)
[12] WMP Van der Aalst, “Process Mining - Data Science in Action”, Springer 2nd edition (2016)
[13] Charu C Aggarwal, Peixiang Zhao, “Towards graphical models for text processing”, Knowl Inf Syst 36(1), pp.1-21 (2013)
[14] Li Deng, Dong Yu: “Deep Learning: Methods and Applications”, NOW Publishers (2014)
[15] Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu, Fuad E Alsaadi, “A survey of deep neural network architectures and their applications”, Neurocomputing 234: 11-26 (2017)
[16] Md Zahangir Alom, Tarek M Taha, Chris Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Mahmudul Hasan, Brian C Van Essen, Abdul A S Awwal and Vijayan K Asari, “A State-of-the-Art Survey on Deep Learning Theory and Architectures”, Electronics 8(3), 292 (2019)
[17] Hong-Nhung Bui, Tri-Thanh Nguyen, Thi-Cham Nguyen, Quang-Thuy Ha, “A New Trace Clustering Algorithm Based on Context in Process Mining”, In proceedings of IJCRS, pp 644–657 (2018)
[18] Tom Younga, Devamanyu Hazarikab, Soujanya Poriac, Erik Cambria,
“Recent Trends in Deep Learning Based Natural Language Processing”, IEEE Comp Int Mag 13(3), pp 55-75 (2018).
[19] WMP Van der Aalst: Process Mining Discovery, Conformance and Enhancement of Busi-ness Processes Springer (2011)