{"id":6396,"date":"2018-05-25T12:20:06","date_gmt":"2018-05-25T12:20:06","guid":{"rendered":"https:\/\/modeling-languages.com\/?p=6396"},"modified":"2018-06-08T10:16:13","modified_gmt":"2018-06-08T10:16:13","slug":"discovery-and-visualization-of-nosql-database-schemas","status":"publish","type":"post","link":"https:\/\/modeling-languages.com\/discovery-and-visualization-of-nosql-database-schemas\/","title":{"rendered":"Discovery and Visualization of NoSQL Database Schemas"},"content":{"rendered":"<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Database schemas are a key element in relational database systems. Prior to store data, the structure of that data must be specified in form of a database schema. Schemas not only restrict the structure of stored data, but that also assure that data are correctly read from database applications. However, with a few exceptions, most of NoSQL database systems do not require the definition of schemas. This schemaless nature provides the flexibility required to cope <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">with<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> frequent schema changes, and it is one of the most attractive NoSQL database features for developers.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><a href=\"https:\/\/modeling-languages.com\/modeling-big-data-compatible\/\">Not having to define an explicit schema does not mean the absence of a database schema<\/a>, but that <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">it <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">is implicit to data. Actually, a database is always characterized by the structure of stored data, which can be explicit or implicitly specified. This schemaless feature is not a novelty of NoSQL systems but a property of semi-structured <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">data<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> (e.g. XML or JSON) which are \u201cself-describing\u201d and the definition of a separate schema is not needed. The need <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">(or lack of)<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> to define a database schema is similar to the distinction between static and dynamic typing in programming languages.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Schemaless databases offer some advantages that can result very useful in scenarios where the changes <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">in<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> the data structure are frequent [1]. For example, they facilitate to have custom fields and non-uniform types for database entities, and data with a new structure can be added at any moment <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">without a schema that would impose those restrictions.<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> However, this flexibility should not be obtained at the expense of losing the benefits provided by having schemas.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Developers need to keep in mind the implicit schema when they write (or read) code of applications that manage NoSQL databases. Also, database tools usually require the knowledge of a schema to implement their functionality.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Therefore, the NoSQL schema extraction is increasingly receiving attention from industry and academi<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">a<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">, as discussed in [2]. The report \u201cInsights into NoSQL Modeling\u201d (Dataversity, 2015) [3] highlighted that data modeling will be a crucial activity for NoSQL databases and drew attention on the need for NoSQL tools <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">to<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> provide functionality similar to those available for relational databases. In particular, three main types of desired functionalities were identified from the survey carried with data management experts: diagramming, code generation, and metadata management. The report also remarked that schema discovery would be a common task to be implemented to achieve such functionalities.<\/span><\/span><\/p>\n<h2>Schemas for NoSQL Databases<\/h2>\n<p>\u201c<span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">NoSQL database<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">s<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">\u201d is really used to denote a varied set of database modeling paradigms that are grouped <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">usually<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> in four main types: document, wide column, key-value stores and graph-based databases. The three former types are categorized as \u201caggregation-oriented paradigms\u201d because the object aggregations are prevalent over connections between objects (i.e. references). More details on this classification can be found in [5].<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">The notion of schema is well-defined for relational databases. However, NoSQL databases can store several versions or variations of a particular entity. For example, a movie database could have movie and director objects with different structure. Next, we show a movie database example that includes 3 versions for movies objects and 3 versions for director objects. We will use this example to illustrate the schema visualization.<\/span><\/span><\/p>\n<p id=\"cImXvNT\"><img loading=\"lazy\" decoding=\"async\" width=\"832\" height=\"730\" class=\"size-full wp-image-6398 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f78e301f5.png\" alt=\"\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f78e301f5.png 832w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f78e301f5-300x263.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f78e301f5-768x674.png 768w\" sizes=\"(max-width: 832px) 100vw, 832px\" \/><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Taking into account that data of the same entity can be stored with different structures (i.e. non-uniform types), we have considered several notions of schema for NoSQL databases:<\/span><\/span><\/p>\n<ul>\n<li><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><b>Schema object <\/b><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">(or object type): it is obtained by replacing, recursively, the atomic values of a semi-structured object (JSON in our case) by an identifier that denotes its type (i.e. String, Number).<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">The schema extraction process analyzes this set of object schemas to discover the set of entities and relationships between them.<\/span><\/span><\/li>\n<li><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><b>Entity version schema<\/b><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> (or simply version schema): it is obtained from the <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">object schema of an entity version by replacing each embedded and referenced <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">objects by the corresponding name of the embedded or target entity version, <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">respectively. These schemas can specify both root <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><b>(root version schema<\/b><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">) and <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">embedded objects (<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><b>embedded version schema<\/b><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">). Next, we show <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">the root version schema for the movie object with _id=\u201d1\u201d (Movie_1, each version is named by the entity name followed by the id number).<\/span><\/span><\/li>\n<\/ul>\n<pre><span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">{<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"title \"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"String\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"year \"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Number\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"genre \"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"String\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"director_id \"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"ref ( Director )\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"prizes\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Prize_1\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"criticisms \"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">[<\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Criticism_1\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span> <span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Criticism_2\"<\/span><\/span><\/span><\/span><span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">]<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">}<\/span><\/span><\/span><\/span><\/pre>\n<ul>\n<li><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><b>Entity schema<\/b><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">: <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">T<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">he set of version schemas of a given entity.<\/span><\/span><\/li>\n<li><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><b>Entity union schema<\/b><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">: It is a view of all the version schemas of an entity. It can be obtained by joining all the properties contained in the version schemas and applying some rule<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">s<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> to solve name conflict<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">s<\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">. We have applied the following: when a property name appears in more than one version schema and the type differs in some of them, the union type is applied. The union schema for the two movie entities of the movie database example would be the following:<\/span><\/span><\/li>\n<\/ul>\n<pre><span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">{<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"title\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"String\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"year\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Number\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"genre\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"String\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"director_id\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"ref ( Director )\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"ratings\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Rating_1\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"running_time\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Number\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"criticisms\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">[<\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Criticism_1\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Criticism_2\"<\/span><\/span><\/span><\/span><span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">]<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">,<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #000000;\"> <span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"prizes\"<\/span><\/span><\/span><\/span><span style=\"color: #9a0000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">: <\/span><\/span><\/span><\/span><span style=\"color: #000000;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">\"Prize_1\"<\/span><\/span><\/span><\/span>\r\n<span style=\"color: #1469b1;\"><span style=\"font-family: Consolas, serif;\"><span style=\"font-size: small;\"><span lang=\"en-US\">}<\/span><\/span><\/span><\/span><\/pre>\n<h2>Discovering NoSQL Schemas<\/h2>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">We have implemented a model-driven reverse engineering approach to infer NoSQL schemas for aggregation-oriented systems. The main features of our inference strategy with respect to other proposed approaches\u00a0 (like the <a href=\"https:\/\/modeling-languages.com\/json-schema-discoverer\/\" target=\"_blank\" rel=\"noopener\">JSONDiscoverer<\/a>)\u00a0 are the following: (i) to extract all the versions or variations of each entity; (ii) to discover all the relationships among the entity versions extracted: aggregation and references; and (iii) consider the scalability and performance of the inference algorithm. Instead of obtaining a succinct, approximate or skeleton schema, we record all the entity versions and relationships between them.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">The schema discovery strategy basically consists of the three stages showed in the following figure.<\/span><\/span><\/p>\n<p id=\"RXdFwRX\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6399 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7a0db7c8.png\" alt=\"\" width=\"669\" height=\"333\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7a0db7c8.png 1197w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7a0db7c8-300x149.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7a0db7c8-768x382.png 768w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7a0db7c8-1024x510.png 1024w\" sizes=\"(max-width: 669px) 100vw, 669px\" \/><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">First, a Map-Reduce operation is applied to directly access to the database and obtaining the minimum set of JSON objects needed to apply the inference process. Secondly, the object schema for each JSON object is calculated (version archetypes). Thirdly, these objects versions are analyzed to discover entities and relationships. This information is represented as a model that conforms to the following metamodel.<\/span><\/span><\/p>\n<p id=\"XCBpdPj\"><img loading=\"lazy\" decoding=\"async\" width=\"1404\" height=\"706\" class=\"size-full wp-image-6400 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7b6bc427.png\" alt=\"\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7b6bc427.png 1404w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7b6bc427-300x151.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7b6bc427-768x386.png 768w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f7b6bc427-1024x515.png 1024w\" sizes=\"(max-width: 1404px) 100vw, 1404px\" \/><\/p>\n<h2>Visualization of NoSQL Schemas<\/h2>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">The hierarchical structure of version schemas can be appropriately represented with class diagrams. For example, Movie_3 would be represented by the following diagram:<\/span><\/span><\/p>\n<p id=\"shIaCbK\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6410 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbc98ec79.png\" alt=\"\" width=\"623\" height=\"380\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbc98ec79.png 935w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbc98ec79-300x183.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbc98ec79-768x468.png 768w\" sizes=\"(max-width: 623px) 100vw, 623px\" \/><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">These diagrams have been obtained by using PlantUML (<\/span><\/span><span style=\"color: #0563c1;\"><u><a href=\"http:\/\/plantuml.com\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">http:\/\/plantuml.com\/<\/span><\/span><\/a><\/u><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">). A model-to test transformation generates the PlantUML code from the extracted NoSQL model. Each entity version is represented as a class, and a letter within a small circle is <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">used to distinguish the root entities (\u201cR\u201d)\u00a0 from the embedded entity versions (\u201cV\u201d). <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">All the version schemas directly or indirectly nested to the root entity version are shown <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">by means of\u00a0 unidirectional composite relationships whose name and cardinality <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">are the same than the corresponding aggregation elements of the schema model. <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">The letter \u201cE\u201d is used to denote the entities referenced. <\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Entity union schemas have also been represented as class diagrams by using PlantUML. When several version schemas have a property with identical name but different type, the union type inferred for this property can only be visualized if it has only two versions: one includes the property with a primitive type (or a tuple) and the other one is a relationship, but the rest of possible unions of types would cause an error in PlantUML. The representation of the Movie union schema would be the following:<\/span><\/span><\/p>\n<p id=\"GyRXGpe\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6411 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbda7aeab.png\" alt=\"\" width=\"675\" height=\"382\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbda7aeab.png 1299w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbda7aeab-300x170.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbda7aeab-768x435.png 768w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fbda7aeab-1024x579.png 1024w\" sizes=\"(max-width: 675px) 100vw, 675px\" \/><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">A diagram of the entity database schema is formed by superposing all the diagrams for root union schemas. In our database example, this diagram would be the same as that shown above.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">UML class diagrams cannot represent entity schemas or database schemas, and <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">visualizing the rest of schemas has significant limitations. Therefore, we have defined a specific notation to visualize all the kinds of NoSQL considered. We have taken advantage of obtaining a model that conforms to a Ecore metamodel in order to develop our visualization tool. We have used Sirius, a robust and powerful tool aimed to define graphical notations for existing metamodels. Sirius automatically generates an editor and injector from the notation specified by the developer. We have created tree view and diagrams to represent schemas with Sirius. In addition, we have created some browsing and navigation capabilities.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">A <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><b>Global View<\/b><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> shows a tree with three branches: <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><i>Schemas<\/i><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">, Inverted <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><i>Index<\/i><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">, and <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><i>Entities<\/i><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">. The global view for our movie database example would be the following:<\/span><\/span><\/p>\n<p id=\"mVesfEl\"><img loading=\"lazy\" decoding=\"async\" width=\"747\" height=\"601\" class=\"size-full wp-image-6406 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9992e8f0.png\" alt=\"\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9992e8f0.png 747w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9992e8f0-300x241.png 300w\" sizes=\"(max-width: 747px) 100vw, 747px\" \/><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Schemas list all the root entities with their version schema; given a root version schema, the user can browse their embedded and referenced schemas. Inverted Index lists an inverse index of versions and allows to navigate from a root or embedded version schema to all the root version schemas from which it is referenced. Entities list all the entities that exist in the database, and the user can select an entity to display its entity versions, and then he or she can inspect their properties and types. These three branches show, in different form, the information included in the database schema. It is possible to navigate from <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"><i>the Global View Tree<\/i><\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> to the other diagrams by means of contextual menus.<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Next, we show the diagrams defined for entity schemas, database schemas, and union database schemas.<\/span><\/span><\/p>\n<p id=\"qWVoYPC\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-6407 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9ae786ad.png\" alt=\"\" width=\"549\" height=\"487\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9ae786ad.png 764w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9ae786ad-300x266.png 300w\" sizes=\"(max-width: 549px) 100vw, 549px\" \/><\/p>\n<p id=\"vcEWxir\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-6408 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9f5ebdf4.png\" alt=\"\" width=\"720\" height=\"489\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9f5ebdf4.png 1099w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9f5ebdf4-300x204.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9f5ebdf4-768x522.png 768w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f9f5ebdf4-1024x696.png 1024w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/p>\n<p id=\"WCsPwnd\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-6405 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f8e2174db.png\" alt=\"\" width=\"824\" height=\"327\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f8e2174db.png 1296w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f8e2174db-300x119.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f8e2174db-768x305.png 768w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07f8e2174db-1024x407.png 1024w\" sizes=\"(max-width: 824px) 100vw, 824px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">We have validated our tools with several MongoDB datasets that have been generated by injecting open data of sources as <em>Stackoverflow<\/em>, <em>Facebook<\/em>, or <em>EveryPolitician<\/em>. Next, we show the schema obtained for Stackoverflow.<\/span><\/span><\/p>\n<p id=\"GpsJXPR\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6409 aligncenter\" src=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fb6ba04bb.png\" alt=\"\" width=\"812\" height=\"483\" srcset=\"https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fb6ba04bb.png 1421w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fb6ba04bb-300x178.png 300w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fb6ba04bb-768x457.png 768w, https:\/\/modeling-languages.com\/wp-content\/uploads\/2018\/05\/img_5b07fb6ba04bb-1024x609.png 1024w\" sizes=\"(max-width: 812px) 100vw, 812px\" \/><\/p>\n<p><span lang=\"en-US\">Further information on the schema extraction process and some database utilities developed can be found in [2]. A discussion on the limitations of our approach and some directions for further work can be found in [6]. <\/span><span lang=\"en-US\">The schema extraction and schema visualization tooling can be found in the repository <\/span><u><a href=\"https:\/\/github.com\/catedrasaes-umu\/NoSQLDataEngineering\/\" target=\"_blank\" rel=\"noopener\"><span lang=\"en-US\">https:\/\/github.com\/catedrasaes-umu\/NoSQLDataEngineering\/.<\/span><\/a><\/u><\/p>\n<p>The work presented here has been developed by Alberto Hern\u00e1ndez, Diego Sevilla,\u00a0Severino Feliciano and Jes\u00fas Garc\u00eda-Molina under the <a href=\"http:\/\/www.modelum.es\" target=\"_blank\" rel=\"noopener\">Modelum Group<\/a> and the <a href=\"http:\/\/www.catedrasaes.org\" target=\"_blank\" rel=\"noopener\">C\u00e1tedra SAES of the University of Murcia<\/a>.<\/p>\n<h2>References<\/h2>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">[1] Martin Fowler, <\/span><\/span><span style=\"color: #0563c1;\"><u><a href=\"https:\/\/martinfowler.com\/articles\/schemaless\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">https:\/\/martinfowler.com\/articles\/schemaless\/<\/span><\/span><\/a><\/u><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">[2] Severino Feliciano, \u201cInferring NoSQL Data Schemas with <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Model-Driven Engineering Techniques\u201d, Doctoral Thesis, Murcia University, 2017. <\/span><\/span><span style=\"color: #222222;\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">\u00a0(<\/span><\/span><\/span><span style=\"color: #0563c1;\"><u><a href=\"http:\/\/hdl.handle.net\/10201\/53472\" target=\"_blank\" rel=\"noopener\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">http:\/\/hdl.handle.net\/10201\/53472<\/span><\/span><\/a><\/u><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">)<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">[3] Insights into NoSQL Modeling: A Dataversity Report, 2015. <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">(<\/span><\/span><span style=\"color: #0563c1;\"><u><a href=\"http:\/\/forms.embarcadero.com\/2015-Dataversity-Survey-Report?cid=701G0000000tKU2\" target=\"_blank\" rel=\"noopener\"><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">http:\/\/forms.embarcadero.com\/2015-Dataversity-Survey-Report?cid=701G0000000tKU2<\/span><\/span><\/a><\/u><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\"> )<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\">[4] Sevilla Ruiz, D., Feliciano Morales, S., Garc\u00eda Molina, J.: Inferring Versioned <\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">Schemas from NoSQL Databases and Its Applications. In Proceedings ER 2015, pp. 467\u2013480 (2015).<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">[5] Sadalage, P., Fowler, M.: NoSQL Distilled. A Brief Guide to the Emerging World <\/span><\/span><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">of Polyglot Persistence. Addison-Wesley (2012)<\/span><\/span><\/p>\n<p><span style=\"font-family: Arial, serif;\"><span lang=\"en-US\">[6] <a href=\"https:\/\/dblp.uni-trier.de\/pers\/hd\/c\/Chill=oacute=n:Alberto_Hern=aacute=ndez\" target=\"_blank\" rel=\"noopener\">Alberto Hern\u00e1ndez Chill\u00f3n<\/a>,\u00a0<a href=\"https:\/\/dblp.uni-trier.de\/pers\/hd\/m\/Morales:Severino_Feliciano\" target=\"_blank\" rel=\"noopener\">Severino Feliciano Morales<\/a>,\u00a0<a href=\"https:\/\/dblp.uni-trier.de\/pers\/hd\/r\/Ruiz:Diego_Sevilla\" target=\"_blank\" rel=\"noopener\">Diego Sevilla<\/a>,\u00a0<a href=\"https:\/\/dblp.uni-trier.de\/pers\/hd\/m\/Molina:Jes=uacute=s_Garc=iacute=a\" target=\"_blank\" rel=\"noopener\">Jes\u00fas Garc\u00eda Molina<\/a>: Exploring the Visualization of Schemas for Aggregate-Oriented NoSQL Databases.\u00a0<\/span><\/span><span style=\"font-family: Arial, serif;\"><a href=\"https:\/\/dblp.uni-trier.de\/db\/conf\/er\/erf2017.html#ChillonMSM17\" target=\"_blank\" rel=\"noopener\">ER Forum\/Demos\u00a02017<\/a>:\u00a072-85 (<\/span><span style=\"color: #0563c1;\"><u><a href=\"http:\/\/ceur-ws.org\/Vol-1979\/paper-11.pdf\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #1155cc;\"><span style=\"font-family: Arial, serif;\">http:\/\/ceur-ws.org\/Vol-1979\/paper-11.pdf<\/span><\/span><\/a><\/u><\/span><span style=\"font-family: Arial, serif;\">)<\/span><\/p>\n<span class=\"et_bloom_bottom_trigger\"><\/span>","protected":false},"excerpt":{"rendered":"<p>Most NoSQL database systems do not require the definition of schemas but this does not mean such schema does not (implicitly) exist. We have implemented a model-driven reverse engineering approach to infer such NoSQL implicit schemas<\/p>\n","protected":false},"author":52,"featured_media":6399,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[17,36,24],"tags":[142,95,271,153,433,627,622,629],"hashtags":[],"_links":{"self":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/posts\/6396"}],"collection":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/users\/52"}],"replies":[{"embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/comments?post=6396"}],"version-history":[{"count":0,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/posts\/6396\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/media\/6399"}],"wp:attachment":[{"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/media?parent=6396"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/categories?post=6396"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/tags?post=6396"},{"taxonomy":"hashtags","embeddable":true,"href":"https:\/\/modeling-languages.com\/wp-json\/wp\/v2\/hashtags?post=6396"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}