iMath Project

Classification of Handwritten Digits

Classifying handwritten digits is a common task in pattern recognition, often explained with the widely used MNIST database. A simple algorithm could consist in taking the centroids of each group of classified data as a Euclidean vector space where the dimensions correspond to the dataset images. Then, an unknown digit is assigned to the group with the closest mean (centroid ). Another approach with a better fit utilizes orthogonal basis vectors computed using Singular Value Decomposition (SVD ). This method entails computing the SVD of each set of digits in the training set to derive an orthogonal basis that captures the variation within the set. During the test phase, the unknown digit is approximated using the basis vectors, and the residual (distance ) between the unknown digit and each of the known digits is computed. The unknown digit is then assigned to the class with the smallest residual. Another algorithm, the Tangent Distance algorithm, estimates the manifold distance by employing a linear approximation of the transformation manifolds. During the training phase, the tangent matrix T_p is computed for each digit in the training set. Thereafter, during the classification phase, the tangent matrix for each test digit is computed, and the tangent distance to all training digits is calculated. The test digit is then classified as the closest training digit based on the tangent distance. This makes this algorithm computational expensive, since we must compare each test digit with all the training digits.

Given the extensive research in the field, a great number of classification algorithms and data preprocessing techniques have been explored. Some other examples that might be interesting are the application of Convolutional neural network, K-Nearest Neighbors or Random Forest classifiers to the field, providing also low error rates.

Example:

Given a dataset consisting of mathematical questions in image form, it is possible to use these techniques to extract information for further analysis. Obtaining LaTeX-formatted questions from these images could facilitate the retrieval of information in an easy-to-work-with manner, particularly when employing other techniques like LLMs.

Reference:

An, S., Lee, M., Park, S., Yang, H., & So, J. (2020 ). An ensemble of simple convolutional neural network models for mnist digit recognition. arXiv preprint arXiv:2008.10400.

Keysers, D., Deselaers, T., Gollan, C., & Ney, H. (2007 ). Deformation models for image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8 ), 1422-1435.

Paul, A., Mukherjee, D. P., Das, P., Gangopadhyay, A., Chintha, A. R., & Kundu, S. (2018 ). Improved random forest for classification. IEEE Transactions on Image Processing, 27(8 ), 4012-4024.

Eldén, L. (2019 ). Chapter 11. In Matrix Methods in Data Mining and Pattern Recognition, Second Edition (pp. 119–132 ).

Author of the tip:

Alejandro Fuster López

University of Malaga

Back to the Tips List