Robert Tairas explains our work on applying cloning techniques to DSLs (you can also read the full paper , recently presented at the SLE conference).
Code clones represent similar fragments of source code, where the similarity of these clones can vary ranging from clones that are exactly the same syntactically to clones that are similar because they represent the same semantics.
Research related to code clones has received much attention in the past decade (see Code Clones Literature for a listing of clone-related papers and tools). However, most of current clone-related research is geared toward clones found in source code written in general purpose language (GPLs). The evaluation of cloning in domain-specific languages (DSLs), specifically textual DSLs, has not received as much attention. These languages are typically smaller in size compared to programs written in GPLs. However, the language constructs in DSLs are more specific to a domain and hence could be used more often in the code, which could potentially introduce duplication resulting in clones.
Due to this motivation, we were interested in determining the relevance of cloning as it relates to DSLs to support the utility of clone-based tools for DSLs. Our paper focuses on the evaluation of cloning in artifacts containing code associated to the Object Constraint Language (OCL) as an initial step of a broader understanding of cloning in DSLs.
The existence of cloning in DSLs can suggest potential application scenarios involving the use clone detection and its associated techniques. These scenarios include:
- In-place maintenance – An alternative maintenance mechanism other than modularizing clones in Java is to keep track of where the clones are located and notify the user when one of the clones is edited to allow the user to consider editing the remaining clone(s). A limitation of this technique in GPLs is that it only identifies clones once (i.e., at the beginning of the process), because clone detection programs written in Java and most GPLs can take a considerable amount of time to execute. Programs or code written in DSLs are mainly smaller compared to GPLs, which could potentially allow the execution of clone detection to be performed more frequently.
- Optimal language construct – This scenario suggests a more optimal language construct to the user among various clones that perform the same functionality. If the clone detection process can identify as clones more varying degrees of language constructs that represent the same functionality, a DSL expert can determine the most optimal version among these clones.
The scenarios outlined above are meaningless if the relevance of cloning in DSLs is not strong. To try to answer this question initially for OCL, we look for clones in OCL using a detection process that is MDE-based. In our case, models represent DSL code and the goal of the transformations of these models is to determine the duplication in the code. In the top part of Figure 1, a model of the DSL code is generated and subsequently a model transformation is executed that performs the detection of duplicate language constructs. The resulting model or target model consists of information about the grouping of the detected clones. Clones that represent the same duplication are grouped in clone groups. This process is customized for OCL (bottom part of Figure 1).
Figure 1. Clone detection process (in general and for OCL)
Our model transformation (i.e., clone detection transformation) can be conceptually separated into three main sub-steps that are described below:
- Count – Used to determine the size of a language construct (in our case, an OCL expression). This sub-step filters out elements that are less than a specified size that will not be included in the next sub-step.
- Match – Performs the task of determining whether two language constructs match each other based on similarity rules that are pre-defined. For this work, we detect clones that are exactly the same and clones that are syntactically the same with the possibility of differing names being used.
- Contain – Used to evaluate two language constructs to determine if one is contained or is a sub-construct of another. This is used to filter the detection results to avoid reporting clone groups that can be subsumed by other clone groups.
Recent progress in higher order transformations (HOTs) provide a promising mechanism to automate the construction of transformations for different metamodels. In our case, HOTs could potentially be used to generate the three sub-steps associated with the clone detection transformation automatically based on the metamodel of the DSL in question. All three sub-steps perform their tasks by traversing the models that conform to the metamodel. We envision HOTs to provide transformations based on the metamodel for each of the sub-steps, which in turn will generate the necessary functionalities to perform the detection process on the new DSL.
Table 1 lists the sizes of the detected clone groups found in the OCL artifacts that we evaluated. We would like thank those who responded to a related blog post for suggesting publicly available OCL artifacts. The table suggests that there is considerable cloning within the OCL artifacts that we evaluated. The table also reveals that small clone group sizes comprised around half of clone groups reported from the detection process. However, larger clone groups are still evident and clearly relevant.
Table 1. Clone group sizes
Some observations of cloning within the detected clone groups include:
- OCL cloning in ATL models – Cloning among multiple ATL models (i.e., inter-duplication) occurred in transformations involving either the same source or target metamodel. We saw this specifically for the XML metamodel where cloning occurred among models when the XML metamodel was either the source or target metamodel. We did not find much cloning in ATL models with differing metamodels (e.g., ATL models where the source and target metamodels differed). Modularizing some of these clones was actually done, but only within individual transformation models. We suggest populating a general library of helper definitions consisting of common tasks performed during XML-related transformations.
- Commonly used OCL constraints – The OCL expression clones below represents a typical constraint in OCL that asserts the names of two customers cannot be equal. In this case, modularizing the clones would not be the main goal. Instead this pattern could be included in the repository of patterns and be used to assist users during the coding of such constraints.
->forAll(c1 : Customer, c2:Customer |
c1 <> c2 implies c1.name <> c2.name)
Based on the evaluation of OCL artifacts, we conclude that the occurrences of cloning in OCL is evident and interesting clones can be found, thus warranting future efforts to provide maintenance assistance in OCL as it relates to cloning and clone detection. We would like to conduct further studies of artifacts (both OCL and other DSLs) to obtain more representative characteristics of cloning within DSLs. However, with the nature of DSLs being “specific,” a general understanding encompassing all DSLs may not be possible, in which case evaluation will be focused separately among the DSLs or families of DSLs.