Wanna keep your research data safe? Join our workshop 404 ERROR – Won’t happen to your data! on 2024-11-13!
Published September 16, 2024 | Version 1.0.0
Dataset Open

SDOstreamclust: Stream Clustering Robust to Concept Drift - Evaluation Tests

  • 1. TU Wien

Description

SDOstreamclust Evaluation Tests

conducted for the paper: Stream Clustering Robust to Concept Drift 

Context and methodology

SDOstreamclust is a stream clustering algorithm able to process data incrementally or per batches. It is a combination of the previous SDOstream (anomaly detection in data streams) and SDOclust (static clustering). SDOstreamclust holds the characteristics of SDO algoritmhs: lightweight, intuitive, self-adjusting, resistant to noise, capable of identifying non-convex clusters, and constructed upon robust parameters and interpretable models. Moreover, it shows excellent adaptation to concept drift 

In this repository, SDOclust is evaluated with 165 datasets (both synthetic and real) and compared with CluStream, DBstream, DenStream, StreamKMeans.

This repository is framed within the research on the following domains: algorithm evaluation, stream clustering, unsupervised learning, machine learning, data mining, streaming data analysis. Datasets and algorithms can be used for experiment replication and for further evaluation and comparison.

Docker

A Docker version is also available in: https://hub.docker.com/r/fiv5/sdostreamclust

Technical details

Experiments are conducted in Python v3.8.14. The file and folder structure is as follows:- [algorithms] contains a script with functions related to algorithm configurations.

  •  [data] contains datasets in ARFF format.
  •  [results] contains CSV files with algorithms' performances obtained from running the "run.sh" script (as shown in the paper).
  • "dependencies.sh" lists and installs python dependencies.
  • "pysdoclust-stream-main.zip" contains the SDOstreamclust python package. 
  • "README.md" shows details and intructions to use this repository.
  • "run.sh" runs the complete experiments.
  • "run_comp.py"for running experiments specified by arguments.
  • "TSindex.py" implements functions for the Temporal Silhouette index.
Note: if codes in SDOstreamclust are modified, SWIG (v4.2.1) wrappers have to be rebuilt and SDOstreamclust consequently reinstalled with pip. 

License

The CC-BY license applies to all data generated with MDCgen. All distributed code is under the GPLv3+  license.

 

Files

evaluation_tests.zip

Files (227.4 MiB)

Name Size
md5:ebf480b322fb04120843cf29398b2c11
227.4 MiB Preview Download

Additional details