Mining Closed Strict Episodes

Abstract. Discovering patterns in a sequence is an important aspect of data mining. One
popular choice of such patterns are episodes, patterns in sequential data
describing events that often occur in the vicinity of each other. Episodes also
enforce in which order the events are allowed to occur.

In this work we introduce a technique for discovering closed episodes. Adopting
existing approaches for discovering traditional patterns, such as closed
itemsets, to episodes is not straightforward. First of all, we cannot define a
unique closure based on frequency because an episode may have several closed
superepisodes. Moreover, to define a closedness concept for episodes we need a
subset relationship between episodes, which is not trivial to define.

We approach these problems by introducing strict episodes. We argue that
this class is general enough, and at the same time we are able to define a
natural subset relationship within it and use it efficiently. In order to mine closed
episodes we define an auxiliary closure operator. We show that this closure
satisfies the needed properties so that we can use the existing framework
for mining closed patterns. Discovering the true closed episodes can be done as
a post-processing step. We combine these observations into an efficient mining
algorithm and demonstrate empirically its performance in practice.

Download source code Mining Closed Strict Episodes C++ source code