Recursive Tile Sampling

Abstract: Known pattern discovery algorithms for finding tilings (covers of 0/1-databases consisting of 1-rectangles) cannot be integrated in instant and interactive KD tools, because they do not satisfy at least one of two key requirements: a) to provide results within a short response time of only a few seconds and b) to return a concise set of patterns with only a few elements that nevertheless covers a large fraction of the input database. In this paper we present a novel randomized algorithm that works well under these requirements. It is based on the recursive application of a simple tile sample procedure that can be implemented efficiently using rejection sampling. While, as we analyse, the theoretical solution distribution can be weak in the worst case, the approach performs extremely well in practice and outperforms previous sampling as well as deterministic algorithms.

VERSION 2 (data model = RealKD)

Download RTS version 2.0.0 (version released on the 9th of June 2018)

VERSION 1 (data model = mime_plain)

Download RTS version 1.0.0 (version released on the 24th of Februari 2014)

CODING

All releases can be found on GitLab: Adrem Data Lab releases as maven repository. Source code is available at GitLab: Adrem Data Lab: rts.