Learning Management Systems (LMSs) rely more and more on digital assessments to support automated evaluation, content reuse, and flexible learning scenarios. Furthermore, assessment content is often created and distributed in document-oriented formats, particularly PDFs. These lack explicit structural and semantic information and are therefore difficult to transform into machine-processable representations. Therefore, it is essential to have effective techniques for extracting assessment structure and semantics from unstructured or semi-structured documents.

The IMS Question and Test Interoperability (QTI) specification provides a standardized format for representing assessment items. However, its practical adoption remains limited due to fragmented tool support, partial specification coverage, and insufficient integration with execution environments.  As a result, assessment content is often restricted to proprietary formats or manually re-authored, thereby increasing development effort and the risk of semantic inconsistencies.

Our recent work (accepted at the RCIS conference) proposes a novel transformation pipeline for the automated generation of LMS-ready assessment content from document-based sources. This pipeline combines Large Language Models (LLMs) with Model-Driven Engineering (MDE) to convert the assessment content into LMS-compatible artifacts. Download the full paper or keep reading for a summary.

As illustrated in the figure below, the pipeline is organized into two phases:

  • PDF-to-QTI (LLM-based phase): Using LLMs to transform assessment content from PDF documents into QTI 3.0–compliant XML.
  • QTI-to-LMS transformation (deterministic phase): involves the conversion of QTI XML into a model-based representation, followed by the transformation of the model into a concrete LMS format.

Flowchart showing PDF-to-QTI transformation and code generation pipeline for LMS assessments

PDF-to-QTI Transformation

An LLM-based module transforms assessment content from PDF documents into QTI 3.0–compliant XML. The extracted text is provided to the LLM via a carefully designed prompt that enforces well-formed QTI output and specifies required constructs. This design also enables extensibility to additional input modalities without affecting the deterministic transformation stages.

Figure 2 illustrates an assessment item related to a PDF and used as input to this phase. The outcome is shown in Figure 3, which demonstrates the structured QTI representation produced by the LLM.

Multiple-choice question about binary search on reversed array with five answer options

Figure 2. Example of PDF-Based Assessment Item.

 

QTI 3.0 XML code for binary search assessment question with multiple choice options.

Figure 3. Generated QTI 3.0 Representation of a PDF-Based Assessment Item.

 

QTI-to-LMS Transformation

The first step is to transform QTI XML into a model-based representation. The second step is to transform the model into a concrete LMS format.

Text-to-Model Transformation

To enable LMS-independent processing, QTI XML from the previous phase is parsed into a model based on QTI 3.0. To this end, we have proposed a QTI-based metamodel to capture the essential semantics required for assessment execution. This abstraction enables clearer reasoning about assessment logic and facilitates deterministic downstream transformations. The proposed metamodel is structured into three layers: (i) Assessment Organization, (ii) Content and Presentation, and (iii) Response and Evaluation Semantics.

Assessment Organization

Assessments (see Figure 4) are modeled using the concepts AssessmentDefinition, AssessmentPart, AssessmentSection, and Question. These elements capture the logical composition and flow of an assessment. This layer avoids presentation and scoring details to maintain a clear, high-level abstraction of assessment structure.

Assessment metamodel showing TestDefinition, TestPart, TestSection, and Question relationships

Figure 4. Assessment Metamodel

Content and Presentation

Question content and learner interaction (see Figure 5) are modeled through the Question and QuestionBody abstractions. A Question represents a complete assessment item and aggregates content, response declarations, and feedback. The QuestionBody encapsulates the instructional and interactional elements of an item, including prompts, paragraph blocks, and selection constraints. This layered design preserves essential assessment semantics while maintaining a level of abstraction suitable for model-driven transformation and reuse.

UML class diagram showing Question, QuestionBody, Choice, Prompt, and other assessment metamodel classes

Figure 5. Content and Presentation Metamodel

Response and Evaluation Semantics

A Question (see Figure 6) may be associated with the ResponseDeclaration metaclass. The ResponseDeclaration abstraction is employed to model response semantics independently from interaction content. This design ensures uniform representation of correct and alternative answers, while maintaining a clear separation between response structure and evaluation logic.

UML diagram showing Question, ResponseDeclaration, Answer, Choice, alternatives, and Penalty entities with relationships

Figure 6. Response and Evaluation Semantics Metamodel

Model-to-Text Transformation

The final step generates executable LMS-compatible artifacts from the model-based representation using a model-to-text transformation. To demonstrate the feasibility of our approach, Moodle is considered as a concrete target LMS.

Figure 7 shows the successful rendering of an item generated by the pipeline within the LMS environment, based on the PDF example shown in Figure 2. This result confirms both the correctness and practical applicability of the proposed approach.

LMS quiz interface showing binary search algorithm question with multiple choice answers

Figure 7. LMS Rendering of an Assessment Item Generated by the Pipeline.

 

Evaluation & Results

To evaluate our pipeline, we consider 120 case studies from real-world repositories, including the official IMS QTI examples and the Canterbury Question Bank. The results demonstrate effective semantic mapping, coverage of key constructs, and successful import into an LMS environment.

Try It Yourself

The entire infrastructure is available as an open-source project. If you’re interested in trying our tool, check out the GitHub repository.

 

Pin It on Pinterest

Share This