Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Tips

Spectral graph multiway partitioning for large and sparse graph Laplacians

Spectral graph partitioning in education is of interest because it will allow to efficiently divide a graph into two subgraphs (by removing only a few arcs ); such a graph can range from a simple social network between learners to one in which nodes/vertices are questions and edges/arcs are relationships established between them based on their keywords.

Assuming for simplicity that the graph is undirected, the procedure to be followed consists of first forming the Laplacian matrix $L=D-A$ of the graph (where $D$ is the diagonal degree matrix and $A$ is the adjacency matrix ) and normalize it by making $L_N=D^{-1/2}LD^{-1/2}$. Then we calculate its Fiedler eigenvector (the one associated with the second smallest eigenvalue ), rearrange in ascending order its components (which induces a reordering of the nodes and the arcs modification accordingly ) and heuristically analyze how the conductance (which is a connectivity measure after a partition ) varies after splitting the network by the vicinity of the sign change between the components of the Fiedler eigenvector, choosing the one that provides the lowest conductance.

In order to spectrally partition in more than two parts (multiway partitioning ), it is not a good idea to go separating in two with the described algorithms: it is better to calculate the $k$ smallest eigenvalues and consider the rows of the matrix corresponding to their eigenvectors as vectors of $k-1$ components to which to apply a $k$-means type algorithm to cluster them into $k$ groups. In addition, when the matrix $A$ is large and sparse, it is computationally better to work (using Krylov methods, not calculating SVDs ) equivalently by calculating the eigenvector associated with the second largest eigenvalue of the $D^{-1/2}AD^{-1/2}$ normalized adjacency matrix.

Example:
To the classic Zachary's example of the 34 karate club members after the conflict between two leaders, one additional example (taken from the SuiteSparse Matrix Collection ) is added in which the 1224 blogs of two political parties (based on the links between them and the fact that usually one party cites more of its own than the others ) is tried to be splitted in two, and another one to classify texts (without training ) on the basis of the bipartite graph representing a term-phrase matrix $M$ of 389 terms and 545 phrases, dividing in two on one side the terms with $MM^T$ as adjacency matrix and on the other side the phrases with $M^TM$ as adjacency matrix.

Reference:
Ulrike von Luxburg, A tutorial on spectral clustering, Statistics and Computing 17(4, December 2007 ):395--416.

Maria A. Riolo and Mark E.J. Newman, First-principles multiway spectral partitioning of graphs, Journal of Complex Networks (2, June 2014 ):121--140.

Lars Eldén, Matrix Methods in Data Mining and Pattern Recognition, 2nd edition (ISBN 978-1-611975-85-7 ). SIAM Publications, Philadelphia, 2019, chapters 1, 10 and 16.

Author of the tip:
Pablo Guerrero-Garcia
University of Malaga