This project is an implementation of CartiClus algorithm which is introduced in the paper:
"Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data" by Emin Aksehirli, Bart Goethals, Emmannuel Müller, and Jilles Vreeken in Data Mining, 2013. ICDM 2013. Thirteenth IEEE International Conference on, 2013 IEEE
CartiClus is packaged as a runnable .jar file (carticlus.jar). The .jar file includes source code as well. You can run the application on command line with the commands,
java -jar carticlus.jar data-file k minsup numOfdimensions [cartLog] [outputfile]
Following command can be used to cartify the file without the mining step. It will create the cartified files in the same directory with the source file.
java -cp carticlus.jar cart.CartifierDriver data-file k
CartiClus accepts parameters as command line arguments in a specified order. If the optional parameters are omitted, their default values will be used instead.
data-file
: Path to the multi dimensional datafile that will be cartified. Please find the properties of the data file below.k
: Parameter for the k nearest neighbors. minsup
: Minimum support count for the mining. This is the actual count not a percentage.numOfdimensions
: The number of dimensions in the data-file.cartLog
: (optional) Direct the output of the mining step instead of /dev/nulloutputfile
: (optional) Direct output to this file instead of standart outputCartiClus outputs the found clusters to the standard output. Each line of output represents a subspace cluster. Output format:
Subspaces for cluster [Size of cluster] Objects of the cluster
For example, 1 1 0 0 0 1 0 [10] 0 1 2 3 4 5 6 7 8 9
means a cluster is detected at 1st, 2nd and 6th subspaces and it has '10' objects, i.e.,
0 1 2 3 4 5 6 7 8 9
java -jar carticlus.jar data/10c20d.mime 125 300 20
Link for the artificial datasets with many irrelevant dimensions are given here. For the datasets from Opesnsubspace please refer to http://dme.rwth-aachen.de/en/OpenSubspace/evaluation
Code repository of the project is located at https://gitlab.com/adrem/carticlus
For more information you can visit http://adrem.ua.ac.be/cartification or send an email to Emin Aksehirli emin.aksehirli@uantwerpen.be.
Attachment | Size |
---|---|
noiseDimension-data.tar.bz2 | 3.09 MB |
carticlus.zip | 2.35 MB |