{"id":7701,"date":"2021-03-16T03:15:17","date_gmt":"2021-03-16T03:15:17","guid":{"rendered":"https:\/\/modeling-languages.com\/?p=7701"},"modified":"2021-08-06T14:45:14","modified_gmt":"2021-08-06T14:45:14","slug":"nlp-architecture-model-autocompletion-domain","status":"publish","type":"post","link":"https:\/\/modeling-languages.com\/nlp-architecture-model-autocompletion-domain\/","title":{"rendered":"An NLP-based architecture for the autocompletion of partial domain models"},"content":{"rendered":"
Domain models capture the key concepts and relationships of a business domain<\/strong>, leaving out superfluous details. During the domain modeling activity carried out as part of a software development project, informal descriptions of a domain are translated into a structured and unambiguous representation using a concrete (formal) notation.<\/p>\n Despite the broad variety of languages (UML<\/a>, DSLs<\/a>, ER<\/a>, etc<\/a>.), tools and methods for domain modeling, these models are typically created by hand,<\/strong> making their definition a crucial (but also time-consuming) task in the development life-cycle. Given that the knowledge to be used as input to define such domain models is already (partially) captured in textual format (manuals, requirement documents, technical reports, transcripts of interviews, etc.) and provided by the different stakeholders in the project, we propose to move towards a more assisted domain modeling building process<\/strong>.<\/p>\nYou don't need to model alone. Our modeling assistant uses NLP-based techniques to read any existing document (including Wikipedia!) and helps you with good autocompletion suggestions to create better models faster! <\/a><\/span>Click To Tweet<\/a><\/span>\n To facilitate the definition of domain models and improve their quality, we present an approach where a natural language processing-based (NLP-based) assistant will provide autocomplete suggestions for the partial model<\/strong> under construction based on the automatic analysis of the textual information available for the project (contextual knowledge<\/strong>) and\/or its related business domain (general knowledge<\/strong>). The process will also take into account the feedback collected from the designer’s interaction with the assistant. This is a joint work by L. Burgue\u00f1o<\/a>, R. Claris\u00f3<\/a>, S. G\u00e9rard<\/a>, S. Li<\/a> and J. Cabot<\/a> that will be part of the 33rd International Conference on Advanced Information Systems Engineering<\/a> (CAiSE’21). Summary slides are also available at the end of the post.<\/p>\n\n Our proposal aims to assist designers while they build their domain models. Given a partial domain model, our system is able to propose new model elements that seem relevant to the model-under-construction but are still missing<\/strong>. This is, it assists the software designer by generating potential new model elements to add to the partial model she is already authoring. We believe this is more realistic than trying to generate full models<\/strong> out of the requirements documents in a fully automated way.<\/p>\n We propose a configurable framework that follows an iterative approach to help in the modeling process. It uses Natural Language Processing (NLP) techniques for the creation of word embeddings from text documents together with additional NLP tools for the morphological analysis and lemmatization of words<\/strong>. With this NLP support, we have designed a model recommendation engine that queries the NLP models and historical data about previous suggestions accepted or rejected by the designer and builds and suggests potential new domain model elements to add to the ongoing working domain model. Our first experiments show the potential of this line of work.<\/p>\n To provide meaningful suggestions, our framework relies on knowledge extracted out of textual documents. Two kinds of knowledge\/sources are considered:<\/p>\n We do not require these documents to follow any specific template to exploit the information they contain.<\/p>\n General and contextual knowledge complement each other. The need for contextual knowledge<\/strong> is obvious and intuitive: designers appreciate suggestions coming from documents directly related to the project they are modeling. General knowledge<\/strong> is needed when there is no contextual knowledge or this is not enough to provide all meaningful suggestions (i.e., it may not cover all the aspects that have to be described in the domain model as some textual specifications omit aspects considered to be commonly understood by all parties). For instance, project documents may never explicitly state that users have a name since it is common sense and both concepts go hand-by-hand. Thus, general sources of knowledge fill the gaps in contextual knowledge and make this implicit knowledge explicit. Leveraging both types of knowledge to provide model autocomplete suggestions to the designer would significantly improve the quality and completeness of the specified domain models. As most common knowledge sources are available as some type of text documents (this is especially true for the contextual knowledge, embedded in the myriad of documents created during the initial discussions on the scope and features of any software project), we propose to use state-of-the-art NLP techniques to leverage this textual-based knowledge sources.<\/p>\n Methods such as GloVE<\/a>, word2vec<\/a>, BERT<\/a> and GPT-3<\/a> create word embeddings<\/em> (i.e., vectorial representations of words) that preserve certain semantic relationships among the words and about the context in which they usually appear. For instance, a NLP model trained with a general knowledge corpus is able to tell us that the concepts plane<\/em> and airport<\/em> are more closely related than plane<\/em> and cat<\/em> because they appear more frequently together. For instance, the Stanford NLP Group’s pretrained GloVe model with the Wikipedia corpus estimates that the relatedness (measured as the euclidean distance between vectors) between plane<\/em> and airport<\/em> is 6.94, while the distance between plane<\/em> and cat<\/em> is 9.04. Relatedness is measured by the frequency in which words appear closely together in a corpus of text. Apart from giving a quantifiable measure of relatedness between words, once an NLP model is trained, it enables us to make queries to obtain an ordered list with the closest words to a given word or set of words. This latter functionality is the one we use in our approach. Another advantage of these techniques is that they are able to deal with text documents regardless of whether they contain structured or unstructured data.<\/p>\n Our framework uses the lexical and semantic information provided by NLP learning algorithms and tools, together with the current state of the partial model and the historical data stored about the designer’s interaction with the framework. As output, it provides recommendations for new model elements (classes, attributes and relationships). The main components of our configurable architecture as well as the process that it follows to generate autocompletion suggestions are depicted in Fig 1. The logic of the algorithm implemented for the recommendation engine is depicted using an UML Activity Diagram.<\/p>\nAutocompletion of partial domain models<\/h2>\n
Framework and Process<\/h2>\n
\n