Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Tips

Face recognition using tensor SVD

Face recognition in education is interesting because it will allow, for example, to gauge the degree of attention (and thus, the degree of involvement ) that the learner is paying during the performance of an activity. Each two-dimensional image of $n_p$ people is stored in columns forming a single vector of $n_i$ components; in addition, $n_e$ different facial expressions are taken from each person, and all this is organized in a three-dimensional ${\cal A}$ matrix (also called trimodal tensor (3-mode tensor ) ) with $n_i\gg n_en_p$. The classification problem consists of determining to which of the $n_p$ persons a given $z$ vector of $n_i$ components corresponds, or conclude that it is not in the database.

To approach this classification problem, the first thing to do is to compute the reduced (thin ) HOSVD (higher-order SVD ) factorization of ${\cal A}={\cal S}\times_i F\times_e G\times_p H = {\cal C}\times_p H$ and then identify the two tensors corresponding to a particular expression $e$ with matrices $A_e$ and $C_e$ (which is $n_i\times n_p$ ). Since they verify $A_e=C_eH^T$ for the same orthogonal $H^T$ matrix whose $p$-th column $h_p$ will have the coordinates of the images of the person $p$ whatever the basis (i.e., the expression ) is, this will allow us to solve the problem by dealing again with standard two-dimensional matrices.

In this way, the classification problem is reduced to solving $n_e$ linear least-squares problems $\min ||C_e\alpha-z||_2$ to obtain the $\alpha_e$ coordinates and then classify the image as corresponding to person $p$ if its Euclidean distance to $h_p$ is sufficiently small. The algorithm can be improved in two ways: one is by factoring $C_e=FB_e$ and having precomputed the reduced (also called thin or skinny ) QR factorization of $B_e$ (which is $n_en_p\times n_p$ ), and the other is by considering a truncated SVD factorization with rank $k$ approximations (with $n_i\gg k\geq n_p$ if the compression is performed only on the image dimension ) of ${\cal A}={\cal S}\times_i F\times_e G\times_p H$.

Example:
As an example of application, the images $n_i=112\times 78=8736$ of $n_p=10$ people from the Yale Face Database with $n_e=10$ different expressions are used. In all cases, when $z$ was taken as the image of the winking person, the algorithm identified the correct person with $k$ going from 100 to 10 (with $k=5$, two of the ten images were misclassified ). Compression on the expression and person dimensions was not required. Compared to the usual PCA techniques (which only work well if the images were taken under similar conditions ), the accuracy of these recognition algorithms is improved.

Reference:
Lieven de Lathauwer, Bart de Moor and Joos Vandewalle, A multilinear singular value decomposition, SIAM Journal on Matrix Analysis and Applications 21(4, April 2000 ): 1253--1278.

M. Alex O. Vasilescu and Demetri Terzopoulos, Multilinear image analysis for facial recognition, Proceedings of the 16th IEEE International Conference on Pattern Recognition, vol. 2(August, 2002 ): 511--514.

Lars Eldén, Matrix Methods in Data Mining and Pattern Recognition, 2nd edition (ISBN 978-1-611975-85-7 ). SIAM Publications, Philadelphia, 2019, chapters 8 and 15.

Author of the tip:
Pablo Guerrero-Garcia
University of Malaga