iMath Project

< Back to the Tips List

Use Large Language Models (LLM) to evaluate the difficulty of a test and to evaluate it the question is written correctly

Using Large Language Models (LLMs ) like GPT-4 can provide a comprehensive tool to evaluate the difficulty of a test and ascertain the grammatical correctness of a question. Through semantic analysis, LLMs can detect nuances and complexities in test items, allowing for a reliable measure of difficulty. In terms of evaluating grammatical correctness, LLMs can identify errors in syntax, punctuation, and semantics. Moreover, the generation abilities of LLMs can be used to reformulate poorly constructed questions. This application of LLMs integrates the advancements in Natural Language Processing (NLP ) and Machine Learning (ML ), promising a more accurate and efficient assessment process.

Example:

This is an example of a question and an answer using ChatGPT (https://chat.openai.com/ ): it has been asked to ChatGPT how much difficult is the question “What is the pseudocode of the gradient descent algorithm” and it answered the following: “The difficulty of a question can vary depending on the individual’s familiarity with the topic. However, the question you provided can be considered moderately difficult for someone who is already familiar with the gradient descend algorithm and its implementation.”
Then, it included a pseudocode representation of the algorithm and added it can be a challenging question for anyone new to the concept of gradient descend.

Reference:

[1] Luca Benedetto, Paolo Cremonesi, Andrew Caines, Paula Buttery, Andrea Cappelli, Andrea Giussani, and Roberto Turrin. 2023. A Survey on Recent Approaches to Question Difficulty Estimation from Text. ACM Comput. Surv. 55, 9, Article 178 (September 2023 ), 37 pages. https://doi.org/10.1145/3556538

Author of the tip:

Giulia Cademartori

University of Genoa

Back to the Tips List