DenGraph - Density based Graph Clustering

Inspired by the algorithm DBSCAN [1] for spatial data, Falkowski et al. [2] propose the density based graph clustering algorithm DenGraph. The intention of DenGraph is to cluster similar nodes in a graph into communities. The density-based approach applies a local cluster criterion. Clusters are regarded as regions in the graph in which the nodes are dense, and which are separated by regions of low node density. To detect regions of higher density, DenGraph computes neighborhoods which have a given radius and must contain a minimum of nodes to ensure that the neighborhood is dense. A node that has such a neighborhood is termed a core node. Nodes that have no such neighborhood are either border nodes if they are in the neighborhood of a core node or noise nodes.

To build a cluster, DenGraph traverses the graph by randomly picking nodes and places all density connected nodes it encounters to the same cluster. If a node is not density connected to the nodes seen thus far, it is assigned to the next cluster candidate. Not each node becomes member of a cluster: If a node does not have an adequately dense neighborhood and is not density connected to any other node, then it is termed a noise node.

To allow for tracking and analyzing the temporal dynamics of clusters, DenGraph-I [3][4] is designed as an incremental procedure: The clustering is updated incrementally based on the changes that are observed in the graph structure from one interval to another. These changes may evoke one of the following clustering updates: creation of a new cluster, removal of a cluster, absorption of a new cluster member, reduction of a cluster member, merge of two or mores clusters and split of a cluster into two or more clusters.

In social networking sites it is often observable that members belong to more than one community. So far, if a member is close to more than one community, it is assigned to the cluster which is discovered first. In this case, the clustering result is not deterministic but depends on the order in which the vertices are visited. To overcome this problem we propose DenGraph-IO that extends the existing algorithm to handle overlapping clusters. By this, we also achieve a more realistic clustering as individuals can be members in different communities now.

In 2008, DenGraph was used for a temporal analysis of the Enron email data set. Later on, we applied the proposed method on a music data set to analyse the music listen behaviour of users on the Last.fm platform. Based on the incremental algorithm we present the evolution of groups of users with similar music listening behaviour over time in [5] and [6].


References

  1. Ester M, Kriegel HP, Sander J, Xu X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. of 2nd International Conference on Knowledge Discovery and.; 1996. p. 226-31.
  2. Falkowski T, Barth A, Spiliopoulou M. DENGRAPH: A Density-based Community Detection Algorithm. In: In Proc. of the 2007 IEEE / WIC / ACM International Conference on.; 2007. p. 112-5.
  3. Falkowski T, Barth A, Spiliopoulou M. Studying Community Dynamics with an Incremental Graph Mining Algorithm. In: Proc. of the 14 th Americas Conference on Information Systems (AMCIS).; 2008.
  4. Falkowski T, Barth A. Density-based Temporal Graph Clustering for Subgroup Detection in Social Networks.; 2007.
  5. Falkowski T, Schlitter N. Analyzing the Music Listening Behavior and its Temporal Dynamics Using Data from a Social Networking Site. Zurich; 2008.
  6. Schlitter N, Falkowski T. Mining the Dynamics of Music Preferences from a Social Networking Site. In: Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining. Athens: IEEE Computer Society; 2009. p. 243-8.