A Compact Trace Representation Using Deep Neural Networks for Process Mining44928

Trang 1

A Compact Trace Representation Using Deep

Neural Networks for Process Mining

1st Hong-Nhung Bui

Vietnam National University (VNU)

VNU-University of Engineering and

Technology (UET)

Banking Academy of Vietnam

Hanoi, Vietnam

nhungbth@hvnh.edu.vn

4th Thi-Cham Nguyen

Vietnam National University (VNU)

VNU-University of Engineering and

Technology (UET)

Hanoi, Vietnam

Hai Phong University of Medicine and

Pharmacy

Haiphong, Vietnam

nthicham@hpmu.edu.vn

2nd Trong-Sinh Vu

School of Information Science Japan Advanced Institute of Science

and Technology Ishikawa, Japan

sinhvtr@jaist.ac.jp

5th Quang-Thuy Ha

Vietnam National University (VNU) VNU-University of Engineering and Technology (UET)

Hanoi, Vietnam thuyhq@vnu.edu.vn

3rd Tri-Thanh Nguyen

Vietnam National University (VNU) VNU-University of Engineering and Technology (UET)

Hanoi, Vietnam ntthanh@vnu.edu.vn

Abstract— In process mining, trace representation has a

significant effect on the process discovery problem The

challenge is to get highly informative but low-dimensional

vector space from event logs This is required to improve the

quality of the trace clustering problem for generating the

process models clear enough to inspect Though traditional trace

representation methods have specific advantages, their vector

space often has a big number of dimensions In this paper, we

address this problem by proposing a new trace representation

method based on the deep neural networks Experimental

results prove our proposal not only is better than the

alternatives, but also significantly helps to reduce the dimension

of trace representation

Keywords—event logs, trace clustering, trace representation,

deep neural networks, compact trace representation

I INTRODUCTION

Process mining, with three main tasks, i.e., process

discovery, conformance checking, and process enhancement,

plays a vital role in most today’s companies Thus, many

enterprises are in need of process models generated from the

event logs of their software systems to assist the enterprises in

monitoring their employee’s process behaviors as well as

process optimization However, generating process models

from the whole raw event logs often results in complicated

models that can not be used for inspection Trace clustering is

an effective solution which separates an input event log into

groups (clusters) containing similar events The generated

model from an event cluster will have much lower complexity

[1, 3, 4, 5, 6, 7, 8, 9], and be easy for inspection

The trace representation method is one of the most

important factors that affects the quality of trace clustering

Traditional trace representations, such as Bag-of-activities,

k-grams, Maximal Repeats, Distance Graphs, etc., create the

vector space model with high dimension [2, 10] This

increases the computational complexity, the execution time,

as well as the storage space for clustering algorithms

Motivated by the study in natural language processing [18],

this article proposes a new method for trace representation,

i.e., a compact representation, via deep neural networks

Deep learning methods are based on several layers of

neural networks It has the ability to learn at multi-layers of

the networks Thus, the knowledge at a higher layer is richer

than that of the lower one This motivates us to train a deep neural network, and get the data at last layer, i.e., the one before the output layer of the network, to use for trace representation This representation has low dimension (called

compact), however rich information, thus, it helps to reduce

the complexity of clustering algorithms, and improve the results

We also propose a method to transform the unlabeled trace into labeled trace for training the deep neural network in order

to get the compact trace representation The results of the experiments on three real event logs indicates the effectiveness of our method It not only increases the quality

of the generated process model, but also reduces the dimension of the vector space of trace representation The rest of this paper is organized as follows: In section 2,

we introduce some traditional trace representation methods In section 3, the application of deep learning with neural networks for trace representation is presented Section 4 provides the experimental results Finally, section 5 shows some conclusions and future work

II TRACE REPRESENTATION METHODS IN PROCESS

DISCOVERY

A The process discovery task

Process discovery has the role of reconstructing an actual business process model by extracting the information from an event log recorded by transactional systems In this task, an event log is taken as the input and a business model is produced without using any prior information

Fig 1 A fragment of the airline compensation requests event log [19]

Trang 2

To discover the process models, the α algorithm can be

utilized [12], and the obtained models can be represented by

Petri nets The input of process discovery is an event log

consisting of a list of events An event is considered as an

actual action accompanied with some information, such as

event id, activity, timestamp, resources (e.g., the involved

person and/or device), and cost Fig 1 provides a snippet of

an event log

In this task, we only consider the activity of events A set

of events, ordered by timestamp, having the same “case id”

forms a case which can be represented as a “trace” of the

actual process as depicted in “Fig 2” [10]

Fig 2 The trace in an event log, where =“register request”, =“examine

thoroughly”, =“examine casually”, =“check ticket”, =“decide”,

=“reinitiate request”, =“pay compensation”, ℎ=“reject request”

B Traditional trace representation methods

In the process discovery problem, the contemporary the

quality of the discovered model depends not only on the

complexity of the event log but also on the trace representation

method Different methods will exploit the different

relationship/characteristics between the activities Similar to

document representation, a trace can be represented by the

vector space model The relationship/characteristics of

activities is converted into numerical values as the elements of

a vector There are two objectives of trace representation: one

is to increase the relationship between the activities; the other

is to reduce the dimension of the vector space model Existing

work suggests many different approaches for trace

representation, such as bag-of-activities, k-grams, maximal

repeats, distance graph, etc [2, 10], as briefly described

below:

1) Bag-of-activities: This is the most common trace

representation method for clustering A trace is transformed

into a vector of distinct activities appearing in the event log

Each trace is converted into a vector in the form of the binary

vector space model [2] If an activity appears in the trace, then

its corresponding element in the vector is 1, otherwise 0

For example, let an event log = [ ℎ, ℎ,

ℎ, ℎ] The set of distinct activities is

{ , , , , , , ℎ}, then the set of binary vectors of

bag-of-activities representation of the above event log is

{(1,1,0,1,1,0,1), (1,1,1,1,1,1,1), (1,1,0,1,1,0,1), (1,0,1,1,1,1,

1)} The dimension of the vector space model is 7

2) k-grams: a -gram refers to a sequence of

consecutive activities For the trace [ ℎ], the set of

1-gram corresponds to { , , , , , , ℎ}; the set of 2-gram is

{ , , , , , , , ℎ}, etc Each distinct -gram is

mapped into a feature in the vector space

3) Maximal Repeat: The Maximal repeat is defined as

follows [2]: a maximal pair in a sequence is a subsequence

1

https://www.rsipvision.com/wp-content/uploads/2015/04/Slide5.png

that manifests in at two distinct positions and such that the element to the immediate left (right) of the manifestation of at position i is different from the element

to the left (right) of the manifestation of at position , i.e., ( , + | | − 1 = ( , + | | − 1) = , and ( − 1) ≠ ( − 1) and ( + | |) ≠ ( + | |) , for 1 ≤ < ≤ | | ( (0) and (| | + 1) are considered as null, i.e., <>) Given the event log , when concatinating all the traces in L to form

a sequence, the features in the vector space of maximal repeat

dimension is 12

4) Distance graph: Given a corpus , the order distance

graph of a document generated from C is defined as graph ( ; ; ) = ( ( ); ( ; )) , where ( ) is the set of nodes, i.e., the set of distinct words in the entire corpus , and ( ; ) is the set of edges in the graph The set ( ; ) contains a set of directed edges from node to node if the word precedes word by at most positions Each edge in the graph is mapped into a feature in a vector space [13]

To apply theory of distance graph in trace representation problem, the set of activities in the event log is considered

as the set of “distinct words” in corpus , and a trace in the event log is considered as a document , thus the distance graphs for an event log can be constructed [10] The set of edges/features in distance graph order 0 is { , , , , , , ℎ}; the set of corresponding features in distance graph order 1 is { , , , , , , ℎ, , , , , , , , ℎ} ; the set of features in distance graph order 2 is { , , , , , , ℎ, ,

corresponding dimensions are 7, 15, and 21

For a small event log with only 4 traces, the dimension of the vector space is rather big depending on the different representation methods For the real-life event logs, the dimension of the vector space usually reaches to thousands or even tens of thousands This greatly affects the performance

of clustering algorithms in terms of execution time and storage space Therefore, reducing the number of dimensions of the vector space model is a significant problem We will present our solution to this problem in the next section

III DEEP NEURAL NETWORKS IN TRACE

REPRESENTATION

A Deep neural networks

Deep Neural Networks (DNN) is a class of machine learning algorithms DNN, based on Artificial Neural Networks (ANN), allows computers to be able to "learn" at different levels of abstraction With the foundation of artificial neural networks with multiple layers between the input and output layers, deep neural networks were improved to imitate the human brain’s activities by using a great number of neurons connected to each other to process information [14,15,16] The structure of a deep neural network includes 3 layers as described in Fig 31, where:

 Input layer: consists of neurons receiving input values

 Hidden layers: include neurons for performing transformations, the output of a layer is the input for the next layer

Trang 3

Fig 3 Deep neural networks model

 Output layer: contains neurons returning the desired

output data

The neurons are connected to each other by the formulas

in hidden layers and output layer as following:

where , , are the input, hidden and output values of a

neuron, correspondingly; is an activation function (e.g.,

common activation functions are sigmoid, tanh, and ReLu);

, are parameters of the networks, in which the connection

weights are very important in a DNN, representing the

importance of each input data in the information conversion

process from one layer to another Learning in a DNN can be

described as the process of receiving the expected results from

adjusting the weights of the input The bias value permits

the activation function to move line up and down to more

effectively fit the prediction with the data

Supervised learning and unsupervised learning are two

basic techniques that a DNN is trained Supervised learning

uses labeled data , and the learning process is repeated until

the output value reaches the desired value In this work, the

supervised learning technique is applied with three steps: (1)

The output value is calculated (2) Output is compared with

desired value (3) If the desired value is not satisfied, the

weights and bias are adjusted, the output is

recalculated by going back to step 1

In the process of training, the initial weights , in a deep

neural network are initialized randomly with the dimension

depending on the dimension of the input value and the

desired value Assuming that is an matrix, and is

a matrix At the hidden layer, is initialized as a

matrix, and is initialized as a matrix At the

output layer, is initialized as a matrix, is

initialized as a matrix Where is a value defined by the

user and is much smaller than After applying formula

(1), we obtain a matrix of dimension ; after applying

formula (2), we obtain a matrix of dimension

B Trace representation based on deep neural networks

One of the purposes of deep neural networks is to train the

input value into a compact intermediate representation (the

hidden value ) with a new and better representation to

accurately predict the output value In this paper, we apply

this idea of the supervised learning technique in deep neural

networks to improve the efficiency of the trace representation

method in the event log Instead of using the original trace

representation, the compact trace representation will be used for clustering

For instance, the credit process has some procedures, i.e., personal loan, corporate loan, home loan, and consumer loan, where each procedure shares common characteristics or activities The common characteristic is defined as trace context in [17] In other words, each procedure may contain a common sequence of activities, which is defined as a trace context Let = { , , … } be an event log, where is a trace Let be the longest common prefix of a trace subset, i.e., = { ∈ | = ^ }, such that | | > 1, where d is a activity sequence, d can be empty; the notation ‘^’ in ^

denotes sequence concatenation operation, then is called as

a trace context [17] For example, given an event log =

then the set of trace contexts of is { , , }

In order to apply the supervised learning in deep neural networks, the set of traces in the event log is considered as the set of input data, and the trace context set is considered as the

labeled data Z We design a deep neural network consisting of

one input layer that receives the traces of event logs, two hidden layers and one output layer to predict the trace context as depicted in Fig 4

Fig 4 The idea of using deep neural networks for trace representation

At the input layer, we represent a trace by binary bag-of-activity Each input neuron receives a trace is an -dimensional binary vector, = [ … ], where is either

0 or 1 For the labeled data Z, the trace context set is

represented as one-hot vectors Suppose an event log has different trace contexts, each trace context will be represented by a one-hot vector = [ , , , ] with

= 1 if = , otherwise = 0 (1 ≤ , ≤ ) Example,

an event log has 3 different trace contexts { , , }, so the trace context = [1,0,0]; = [0,1,0]; = [0,0,1]

At the hidden layers, the hidden values are calculated according to the following formula:

The final vector obtained at this layer, i.e., the last neural network layer in the hidden layer, has the dimension of , will be used as the compact trace representation Any input trace vector with dimension of will be transformed into a vector with the dimension of We expect is much smaller than Moreover, the value of is adjusted during the training process, it has richer information than that of the input vector The value of each element in vector is a real

Trang 4

number ranging from (0,1] instead of two discrete 0 and 1

values as in the input vectors, thus it contain finer information

This characteristic is another clue why we select this vector as

the input for the clustering task Its richer information will

help to improve the clustering performance

The output layer receives the hidden values , and the

output values are calculated by the following formula:

where , , , are the model parameters which

can be adjusted repeatedly during the training process The

sigmoid function is used as the activation function as in (5):

The training process will finish when the output value

reaches the trace context in an allowable error

IV EXPERIMENTAL RESULTS

A Experimental Method

For experiments, we used the three-phase process

discovery framework [10], i.e., “Trace Processing and

Clustering”, “Discovering Process Model”, and “Evaluating

Model”, as depicted in Fig.5

For evaluation, in the Trace Processing step, we

implemented some other trace representation methods as the

baseline, i.e., bag-of-activities, k-grams, maximal repeats, and

the distance graph model The experimental platform is on

Ubuntu 16.04, Python 2.7, Tensorflow2 1.2 In the Clustering

step, the K-means clustering algorithm and the tool of data

mining - Rapid Miner Studio3 were used In the Discovering

Process Model phase, we use -algorithm (a plug-in in the

tool of process mining - ProM4 6.6) to get the process models

from event clusters

Other hyperparameters of the model are set as follows:

learning rate = 0.1, iterations = 10.000, and the

softmax_cross_entropy_with_logits_v2 of Tensorflow is

used

The Evaluating Model phase determines the quality of the

generated process models using two main measures Fitness

and Precision [17] These two measures are in the range of

[0,1], and the bigger the better We use the “conformance

checker” plug-in in ProM 6.6 to calculate the fitness and

precision measure for each sub-models Since there are more

than one sub-model, we calculate the weighted average of the

fitness and precision of all sub-models for comparison as

follow:

where is the average value of the fitness or precision

measure; is the number of models; is the number of traces

in the event log; is the number of traces in ith cluster; and

is the value of the fitness or precision measure in the

model, correspondingly

2https://www.tensorflow.org/

3https://rapidminer.com

4http://www.promtools.org/

5www.processmining.org/event_logs_and_models_used_in_

book

Fig 5 A three-phase framework of process discovery

B Experimental Data

For the objectivity of experiments, we used three event logs Lfull5, prAm66and prHm67 in the process mining community with the following characteristics “Table I”

TABLE I THE CHARACTERISTICS OF THREE EVENT LOGS

Event log #cases #events #contexts characteristics

Duplicated traces, repeated activities in a trace

prAm6 1200 49792 11 Few duplicated traces,

no-repeated activities prHm6 1155 1720 7 No-duplicated traces,

no-repeated activities

C Experimental Results

We set the parameter equal to a set of values [30, 40, 50,

60, 70, 80] to evaluate its importance The best experimental results, i.e., the dimension (Dim) of the traces and the Time (s-second, h-hour) to create a trace representation as well as the Fitness and Precision of the resulted process models are described in the Table II

For Lfull, when is from 30 to 80 the best result is at 50, other resutls are almost the same For the other datasets, i.e., prAm6 and prBm6, the results are almost the same when this parameter is changed

6

https://data.4tu.nl/repository/uuid:44c32783-15d0-4dbd-af8a-78b97be3de49

7

https://data.4tu.nl/repository/uuid:44c32783-15d0-4dbd-af8a-78b97be3de49

Trang 5

TABLE II THE RESULTS OF TRADITIONAL AND COMPACT TRACE

REPRESENTATIONS Event

log

Method of Trace

Representation

Measure Dim Time Fitness Precision

Lfull

Scenario 1: Traditional Trace Representations

Bag-of-activities 8 0.1s 0.991 0.754

Maximal Repeats 50 2s 0.950 1

Distance Graphs 43 1.9s 0.992 1

Scenario 2: Deep neural networks

Compact trace 50 17s 0.99995 0.794

prAm6

Maximal Repeats 9493 8h 0.968 0.332

Distance Graphs 1927 93s 0.968 0.809

Compact trace 30 43s 0.973 0.911

prHm6

Maximal Repeats 592 59s 0.897 0.730

Distance Graphs 1841 54s 0.902 0.660

Compact trace 30 37s 0.902 0.762

The experimental results show that, in compact trace

representation, the fitness is higher than the precision for all

datasets The training time of DNN does not take too much

Except the Lfull dataset, which has a small number of

activities, for the other datasets with a large number of

activities, compact trace representation always has the best

results

The deep neural network has the ability to learn the

relation among the activities in the input vector to generate the

compact representation which may contain richer information

than the input Thanks to this fact, the clustering can produce

better results Especially, the dimension of compact trace

representation is very small in comparison with other

representation methods This indicates our method is a good

choice for complex event logs with a large number of

activities For two complex datasets in the experiments, the

dimension of the compact trace representation is reduced

about ten times, i.e., its dimension is 30 versus the input trace

dimension of 317 and 321 This is exactly what we expect to

reduce the dimension of feature space

V CONCLUSIONS AND FUTURE WORK

This paper proposes a new trace representation method

using deep neural networks The output vectors at the last

hidden layer of deep neural networks are used as the trace

representation for later clustering phase The compactness of

this representation helps to reduce the clustering complexity,

while its richness of information helps to improve the performance of clustering The experimental results indicate this method is quite suitable for complex event logs which contain a large number of activities The dimension of representation is reduced about ten times, while the precision and fitness are improved

One possible future direction, we will try advanced deep

learning methods based on Recurrent neural network (RNN)

or Long-short term memory (LSTM), i.e., an improved deep

learning method base on Recurrent neural network, to investigate whether they can improve the performance of trace clustering

REFERENCES [1] RPJ Chandra Bose, WMP Van der Aalst, “Trace Clustering Based on Conserved Patterns Towards Achieving Better Process Models”, Business Process Management Workshops, pp.170 (2009)

[2] RPJ Chandra Bose, “Process Mining in the Large Preprocessing, Discovery, and Diagnostics”, PhD Thesis, Eindhoven University of Technology (2012)

[3] Gianluigi Greco, Antonella Guzzo, Luigi Pontieri, Domenico Saccà,

“Discovering Expressive Process Models by Clustering Log Traces”, IEEE Trans Knowl Data Eng, pp.1010 (2006)

[4] A.K.A de Medeiros, A Guzzo, G Greco, WMP Van der Aalst, A J M

M Weijters, Boudewijn F van Dongen, Domenico Saccà, “Process Mining Based on Clustering: A Quest for Precision”, BMP Workshops, pp.17 (2007)

[5] M Song, Christian W Günther, WMP Van der Aalst, “Trace Clustering in Process Mining”, Business Process Management Workshops, pp.109 (2008)

[6] De Weerdt, J., van den Broucke, S.K.L.M., Vanthienen, and J., Baesens, “Leveraging process discovery with trace clustering and text mining for intelligent analysis of incident management processes”, IEEE Congress on Evolutionary Computation, pp.1 (2012)

[7] J De Weerdt, Seppe K L M vanden Broucke, Jan Vanthienen, Bart Baesens, “Active Trace Clustering for Improved Process Discovery”, IEEE Trans Knowl Data Eng 25(12), pp.2708 (2013)

[8] Igor Fischer, Jan Poland, “New Methods for Spectral Clustering”, In Proc ISDIA (2004)

[9] Joerg Evermann, Tom Thaler, Peter Fettke: “Clustering Traces using Sequence Alignment”, Business Process Management Workshops, pp 179-190 (2015)

[10] Quang-Thuy Ha, Hong-Nhung Bui, Tri-Thanh Nguyen, “A trace clustering solution based on using the distance graph model”, In proceedings of ICCCI, pp 313-322 (2016)

[11] T Thaler, Simon Felix Ternis, Peter Fettke, Peter Loos, “A Comparative Analysis of Process Instance Cluster Technique”s, Wirtschaftsinformatik, pp.423 (2015)

[12] WMP Van der Aalst, “Process Mining - Data Science in Action”, Springer 2nd edition (2016)

[13] Charu C Aggarwal, Peixiang Zhao, “Towards graphical models for text processing”, Knowl Inf Syst 36(1), pp.1-21 (2013)

[14] Li Deng, Dong Yu: “Deep Learning: Methods and Applications”, NOW Publishers (2014)

[15] Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu, Fuad E Alsaadi, “A survey of deep neural network architectures and their applications”, Neurocomputing 234: 11-26 (2017)

[16] Md Zahangir Alom, Tarek M Taha, Chris Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Mahmudul Hasan, Brian C Van Essen, Abdul A S Awwal and Vijayan K Asari, “A State-of-the-Art Survey on Deep Learning Theory and Architectures”, Electronics 8(3), 292 (2019)

[17] Hong-Nhung Bui, Tri-Thanh Nguyen, Thi-Cham Nguyen, Quang-Thuy Ha, “A New Trace Clustering Algorithm Based on Context in Process Mining”, In proceedings of IJCRS, pp 644–657 (2018)

[18] Tom Younga, Devamanyu Hazarikab, Soujanya Poriac, Erik Cambria,

“Recent Trends in Deep Learning Based Natural Language Processing”, IEEE Comp Int Mag 13(3), pp 55-75 (2018).

[19] WMP Van der Aalst: Process Mining Discovery, Conformance and Enhancement of Busi-ness Processes Springer (2011)

Định dạng
Số trang	5
Dung lượng	392,33 KB