{"id":6396,"date":"2018-05-25T12:20:06","date_gmt":"2018-05-25T12:20:06","guid":{"rendered":"https:\/\/modeling-languages.com\/?p=6396"},"modified":"2018-06-08T10:16:13","modified_gmt":"2018-06-08T10:16:13","slug":"discovery-and-visualization-of-nosql-database-schemas","status":"publish","type":"post","link":"https:\/\/modeling-languages.com\/discovery-and-visualization-of-nosql-database-schemas\/","title":{"rendered":"Discovery and Visualization of NoSQL Database Schemas"},"content":{"rendered":"

Database schemas are a key element in relational database systems. Prior to store data, the structure of that data must be specified in form of a database schema. Schemas not only restrict the structure of stored data, but that also assure that data are correctly read from database applications. However, with a few exceptions, most of NoSQL database systems do not require the definition of schemas. This schemaless nature provides the flexibility required to cope <\/span><\/span>with<\/span><\/span> frequent schema changes, and it is one of the most attractive NoSQL database features for developers.<\/span><\/span><\/p>\n

Not having to define an explicit schema does not mean the absence of a database schema<\/a>, but that <\/span><\/span>it <\/span><\/span>is implicit to data. Actually, a database is always characterized by the structure of stored data, which can be explicit or implicitly specified. This schemaless feature is not a novelty of NoSQL systems but a property of semi-structured <\/span><\/span>data<\/span><\/span> (e.g. XML or JSON) which are \u201cself-describing\u201d and the definition of a separate schema is not needed. The need <\/span><\/span>(or lack of)<\/span><\/span> to define a database schema is similar to the distinction between static and dynamic typing in programming languages.<\/span><\/span><\/p>\n

Schemaless databases offer some advantages that can result very useful in scenarios where the changes <\/span><\/span>in<\/span><\/span> the data structure are frequent [1]. For example, they facilitate to have custom fields and non-uniform types for database entities, and data with a new structure can be added at any moment <\/span><\/span>without a schema that would impose those restrictions.<\/span><\/span> However, this flexibility should not be obtained at the expense of losing the benefits provided by having schemas.<\/span><\/span><\/p>\n

Developers need to keep in mind the implicit schema when they write (or read) code of applications that manage NoSQL databases. Also, database tools usually require the knowledge of a schema to implement their functionality.<\/span><\/span><\/p>\n

Therefore, the NoSQL schema extraction is increasingly receiving attention from industry and academi<\/span><\/span>a<\/span><\/span>, as discussed in [2]. The report \u201cInsights into NoSQL Modeling\u201d (Dataversity, 2015) [3] highlighted that data modeling will be a crucial activity for NoSQL databases and drew attention on the need for NoSQL tools <\/span><\/span>to<\/span><\/span> provide functionality similar to those available for relational databases. In particular, three main types of desired functionalities were identified from the survey carried with data management experts: diagramming, code generation, and metadata management. The report also remarked that schema discovery would be a common task to be implemented to achieve such functionalities.<\/span><\/span><\/p>\n

Schemas for NoSQL Databases<\/h2>\n

\u201cNoSQL database<\/span><\/span>s<\/span><\/span>\u201d is really used to denote a varied set of database modeling paradigms that are grouped <\/span><\/span>usually<\/span><\/span> in four main types: document, wide column, key-value stores and graph-based databases. The three former types are categorized as \u201caggregation-oriented paradigms\u201d because the object aggregations are prevalent over connections between objects (i.e. references). More details on this classification can be found in [5].<\/span><\/span><\/p>\n

The notion of schema is well-defined for relational databases. However, NoSQL databases can store several versions or variations of a particular entity. For example, a movie database could have movie and director objects with different structure. Next, we show a movie database example that includes 3 versions for movies objects and 3 versions for director objects. We will use this example to illustrate the schema visualization.<\/span><\/span><\/p>\n

\"\"<\/p>\n

Taking into account that data of the same entity can be stored with different structures (i.e. non-uniform types), we have considered several notions of schema for NoSQL databases:<\/span><\/span><\/p>\n