MClustRep, IDEA Group at SDU

Multiple Clustering Repository

Content List:
Description Code of Multiple Clustering Methods Original space based Subspace based Multi-view based Multiple Co-Clusterings Data Image data Text data Biology

Description:
The code or data listed below were developed or collected by IDEA Lab members. We appreciate all the authors who generously shared their codes and datasets with us. We also welcome new datasets and codes related with multiple clusterings, and will add them into this repository as soon as possible. These resources are aggregated here for facilitating the research community of multiple clustering and will be regularly maintained. The resources can be freely used for academic research, given that the contributions are appropriately credited to the contributors. For other purposes, please contact with Prof. Yu Guoxian (guoxian85@gmail.com; gxyu@sdu.edu.cn) and Mr. Ren Liangrui. If you have any question on using these resources, please do not hesitate to contact us. Please click the name of the code or data below to download.

Code of Multiple Clustering Methods::
Original space based: COALA ( The Constrained Orthogonal Average Link Algorithm ) Ref.: E. Bae and J. Bailey, Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity,in IEEE International Conference on Data Mining, 2006, pp. 53-62.[paper] De-kmeans ( approach The Decorrelated k-means approach ) Ref.: P. Jain, R. Meka, and I. S. Dhillon, Simultaneous unsupervised learning of disparate clusterings, Statistical Analysis and Data Mining, vol. 1, no. 3, pp. 195-210, 2008.[paper] Meta Clustering Ref.: R. Caruana, M. Elhawary, N. Nguyen, and C. Smith, Meta clustering,in IEEE International Conference on Data Mining, 2006, pp. 107-118.[paper] MNMF ( Multiple Nonnegative Matrix Factorization ) Ref.: S. Yang and L. Zhang, Non-redundant multiple clustering by nonnegative matrix factorization, Machine Learning, vol. 106, pp. 695-712, 2017.[paper] Subspace based: ADFT ( The Alternative Distance Function Tistance Function Transformation Approach ) Ref.: I. Davidson and Z. Qi, Finding alternative clusterings using constraints, in IEEE International Conference on Data Mining, 2008, pp. 773-778.[paper] ENRC Embedded Non-Redundant Clustering (Deep The Embedded Non- Redundant Clustering ) Ref.: L. Miklautz, D. Mautz, M. C. Altinigneli, C. Bohm, and C. Plant, Deep embedded non-redundant clustering, in AAAI Conference on Artificial Intelligence, 2020, pp. 5174-5181.[paper] ISAAC ( Independent Subspace Analysis and Alternative Clustering ) Ref.: W. Ye, S. Maurus, N. Hubig, and C. Plant, Generalized independent subspace clustering, in IEEE International Conference on Data Mining, 2016, pp. 569-578.[paper] ISM ( Iterative Spectral Method ) Ref.: C. Wu, S. Ioannidis, M. Sznaier, X. Li, D. Kaeli, and J. Dy, Iterative spectral method for alternative clustering, in International Conference on Artificial Intelligence and Statistics, 2018, pp. 115-123.[paper] MISC ( Multiple Independent Subspace Clusterings ) Ref.: X. Wang, J. Wang, C. Domeniconi, G. Yu, G. Xiao, and M. Guo, Multiple independent subspace clusterings, in AAAI Conference on Artificial Intelligence, 2019, pp. 5353-5360.[paper] mSC ( multiple Spectral Clustering ) Ref.: D. Niu, J. G. Dy, and M. I. Jordan, Multiple non-redundant spectral clustering views, in International Conference on Machine Learning, 2010, pp. 831-838.[paper] Nr-kmeans and Nr-Dipmeans ( Non- redundant K-means and Non- redundant Dipmeans ) Ref.: D. Mautz, W. Ye, C. Plant, and C. Bohm, Discovering nonredundant k-means clusterings in optimal subspaces, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018, pp. 1973-1982.[paper] OC and OSC ( Orthogonal Clustering and Orthogonal Subspaces Clustering ) Ref.: Y. Cui, X. Z. Fern, and J. G. Dy, Non-redundant multi-view clustering via orthogonalization, in IEEE International Conference on Data Mining, 2007, pp. 133-142.[paper] Multi-view based: DMClusts ( The Deep Matrix Factorization for Multiple Clusterings ) Ref.: S. Wei, J. Wang, G. Yu, C. Domeniconi, and X. Zhang, Multi-view multiple clusterings using deep matrix factorization, in AAAI Conference on Artificial Intelligence, 2020, pp. 6348-6355.[paper] MVMC ( Multiple View Multiple Clusterings ) Ref.: S. Yao, G. Yu, J. Wang, C. Domeniconi, and X. Zhang, Multi-view multiple clustering, in International Joint Conference on Artificial Intelligence, 2019, pp. 4121-4127.[paper] NetMCs ( Networks Multiple Clusterings ) Ref.: S. Wei, G. Yu, J. Wang, C. Domeniconi, and X. Zhang, Multiple clusterings of heterogeneous information networks, Machine Learning, vol. 110, pp. 1505-1526, 2021.[paper] Multiple co-clusterings: MultiCC ( Multiple Co- Clusterings ) Ref.: J. Wang, X. Wang, G. Yu, C. Domeniconi, Z. Yu, and Z. Zhang, Discovering multiple co-clusterings with matrix factorization,IEEE Transactions on Cybernetics, vol. 51, no. 7, pp. 3576-3587, 2021.[paper] MultiCC-SS ( Multiple Co- Clusterings in Subspaces s ) Ref.: S. Yao, G. Yu, X. Wang, J. Wang, C. Domeniconi, and M. Guo, Discovering multiple co-clusterings in subspaces, in SIAM International Conference on Data Mining, 2019, pp. 423-431.[paper] NMCC ( Nonparametric Multiple Co- Clusterings ) Ref.: T. Tokuda, J. Yoshimoto, Y. Shimizu, G. Okada, M. Takamura, Y. Okamoto, S. Yamawaki, and K. Doya, Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions, PLoS ONE, vol. 12, no. 10, p. e0186566, 2017.[paper]

Data:
Image data: ALOI ALOI (Amsterdam Library of Objects Images) is a classic image dataset that contains 1000 common objects taken under different angles and illumination conditions. The data we share here has 288 samples and 287 features, and it can be clustered from two perspectives: color and shape.[link] Ref.: J.-M. Geusebroek, G. J. Burghouts, and A.W. Smeulders, The amsterdam library of object images, International Journal of Computer Vision, vol. 61, no. 1, pp. 103-112, 2005. Card Card is a playing card image dataset, which has 8029 224x224x3 images and two clusterings suits (clubs, diamonds, hearts, spades) and rank (Ace, King, Queen, etc).[link] CMUface CMUface has 640 120128 gray images of 20 individuals. These images can usually be clustered by pose, identity, with/without glass and emotions.[link] Fruits Fruits has 7 subgroup (red apples, yellow apples, green apples, yellow bananas, green bananas, red grapes, and green grapes), and each subgroup has 15 images. The currently available data has been processed into 105 samples with 6 features, and it can be clustered according to color (red, yellow and green) and species (apples, bananas and grapes).[link] Ref.: J. Hu, Q. Qian, J. Pei, R. Jin, and S. Zhu, Finding multiple stable clusterings, Knowledge and Information Systems, vol. 51, no. 3, pp. 991-1021, 2017. Fruit360 Fruit360 has 4856 100x100 images and can be clustered into two clusterings according to color (red, green, yellow, and maroon) and species (apple, banana, cherry, and grape).[link] Ref.: Muresan H, Oltean M. Fruit recognition from images using deep learning. Acta Universitatis Sapientiae, Informatica, 2018, 10(1): 26-42. StickFigures StickFigures has 900 2020 images. According to different body postures, these images can be divided into three clusters for the upper body and three clusters for the lower body, respectively.[link] Ref.: S. Gunnemann, I. Farber, M. R udiger, and T. Seidl, Smvc: semisupervised multi-view clustering in subspace projections, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 253-262. GTSRB GTSRB is a subset of the real world images of traffic signs from the German Traffic Sign Benchmark (GTSRB) dataset. The data we share here has 6720 images and each image can be non-redundantly clustered by four types of traffic sign and two colors.[link] Ref.: Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C. Detection of traffic signs in real-world images: The german traffic sign detection benchmark, International Joint Conference on Neural Networks, pp. 1-8, 2013. NR-Objects NR-Objects has 10000 images, which is generated by the publicly available rendering software. Each object can be clustered by three shapes, two materials and six colors.[link] Ref.:Johnson, J., Hariharan, B., Van Der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2901-2910, 2017. Text data: WebKB WebKB contains 1041 web documents that can be grouped by university (Cornell, Austin, Washington and Madison), or by topic (courses, teachers, projects, and students).[link] Ref.: M. Craven, A. McCallum, D. PiPasquo, T. Mitchell, and D. Freitag, Learning to extract symbolic knowledge from the world wide web, in AAAI Conference on Artificial Intelligence, 1998, pp. 509-516. Biology data: Mice Mice is collected from the well known functional genomics data repository GEO , with 146 samples and 41092 features. It contains 74 individual oocytes and 72 cumulus cell samples, and it can be clustered by cell type (oocytes and cumulus) or age (young, middle, old, and middle with Calorie Restriction).[link] Ref.: T. Mishina, N. Tabata, T. Hayashi, M. Yoshimura, M. Umeda, M. Mori, Y. Ikawa, H. Hamada, I. Nikaido, and T. S. Kitajima, Single-oocyte transcriptome analysis reveals aging-associated effects influenced by life stage and calorie restriction, Aging cell, vol. 20, no. 8, p. e13428, 2021.

last modified: 2023-11-29