Learning Management Systems (LMSs) rely more and more on digital assessments to support automated evaluation, content reuse, and flexible learning scenarios. Furthermore, assessment content is often created and distributed in document-oriented formats, particularly PDFs. These lack explicit structural and semantic information and are therefore difficult to transform into machine-processable representations. Therefore, it is essential to have effective techniques for extracting assessment structure and semantics from unstructured or semi-structured documents.
The IMS Question and Test Interoperability (QTI) specification provides a standardized format for representing assessment items. However, its practical adoption remains limited due to fragmented tool support, partial specification coverage, and insufficient integration with execution environments. As a result, assessment content is often restricted to proprietary formats or manually re-authored, thereby increasing development effort and the risk of semantic inconsistencies.
Our recent work (accepted at the RCIS conference) proposes a novel transformation pipeline for the automated generation of LMS-ready assessment content from document-based sources. This pipeline combines Large Language Models (LLMs) with Model-Driven Engineering (MDE) to convert the assessment content into LMS-compatible artifacts. Download the full paper or keep reading for a summary.
As illustrated in the figure below, the pipeline is organized into two phases:
- PDF-to-QTI (LLM-based phase): Using LLMs to transform assessment content from PDF documents into QTI 3.0–compliant XML.
- QTI-to-LMS transformation (deterministic phase): involves the conversion of QTI XML into a model-based representation, followed by the transformation of the model into a concrete LMS format.
PDF-to-QTI Transformation
An LLM-based module transforms assessment content from PDF documents into QTI 3.0–compliant XML. The extracted text is provided to the LLM via a carefully designed prompt that enforces well-formed QTI output and specifies required constructs. This design also enables extensibility to additional input modalities without affecting the deterministic transformation stages.
Figure 2 illustrates an assessment item related to a PDF and used as input to this phase. The outcome is shown in Figure 3, which demonstrates the structured QTI representation produced by the LLM.
QTI-to-LMS Transformation
The first step is to transform QTI XML into a model-based representation. The second step is to transform the model into a concrete LMS format.
Text-to-Model Transformation
To enable LMS-independent processing, QTI XML from the previous phase is parsed into a model based on QTI 3.0. To this end, we have proposed a QTI-based metamodel to capture the essential semantics required for assessment execution. This abstraction enables clearer reasoning about assessment logic and facilitates deterministic downstream transformations. The proposed metamodel is structured into three layers: (i) Assessment Organization, (ii) Content and Presentation, and (iii) Response and Evaluation Semantics.
Assessment Organization
Assessments (see Figure 4) are modeled using the concepts AssessmentDefinition, AssessmentPart, AssessmentSection, and Question. These elements capture the logical composition and flow of an assessment. This layer avoids presentation and scoring details to maintain a clear, high-level abstraction of assessment structure.
Content and Presentation
Question content and learner interaction (see Figure 5) are modeled through the Question and QuestionBody abstractions. A Question represents a complete assessment item and aggregates content, response declarations, and feedback. The QuestionBody encapsulates the instructional and interactional elements of an item, including prompts, paragraph blocks, and selection constraints. This layered design preserves essential assessment semantics while maintaining a level of abstraction suitable for model-driven transformation and reuse.
Response and Evaluation Semantics
A Question (see Figure 6) may be associated with the ResponseDeclaration metaclass. The ResponseDeclaration abstraction is employed to model response semantics independently from interaction content. This design ensures uniform representation of correct and alternative answers, while maintaining a clear separation between response structure and evaluation logic.
Model-to-Text Transformation
The final step generates executable LMS-compatible artifacts from the model-based representation using a model-to-text transformation. To demonstrate the feasibility of our approach, Moodle is considered as a concrete target LMS.
Figure 7 shows the successful rendering of an item generated by the pipeline within the LMS environment, based on the PDF example shown in Figure 2. This result confirms both the correctness and practical applicability of the proposed approach.
Evaluation & Results
To evaluate our pipeline, we consider 120 case studies from real-world repositories, including the official IMS QTI examples and the Canterbury Question Bank. The results demonstrate effective semantic mapping, coverage of key constructs, and successful import into an LMS environment.
Try It Yourself
The entire infrastructure is available as an open-source project. If you’re interested in trying our tool, check out the GitHub repository.







Recent Comments