JSONDiscoverer: Visualizing the schema lurking behind JSON documents

Tweet about this on TwitterShare on FacebookBuffer this pageShare on RedditShare on LinkedInShare on Google+Email this to someone

Today we present the new version of our JSON Discoverer tool, which helps to discover the schema out of JSON documents. In short, given a set of schemaless JSON documents, our tool will parse them to infer and make explicit the implicit schema shared by all documents. This can be specially useful to learn how to work with data storedin a NoSQL backend or behind an API as happened with this happy user:

In this blog post, we present a recap of the tool features and goals and include at the end a longer summary coming from our latest publication on this topic, titled JSONDiscoverer: Visualizing the schema lurking behind JSON documents  to appear in the Knowledge-Based Systems journal.

In the last years the JavaScript Object Notation (JSON) has been gaining in popularity since it provides a lightweigth data exchange format with a significant performance improvement. JSON consists in sets of Object described by name/value pairs. It is schemaless, i.e., there is no a structural definition of JSON objects, instead it is implicit. Schemaless data is particularly interesting in cases dealing with non-uniform data or in schema migration, however, it can become a burden in data integration scenarios (e.g., consuming JSONbased APIs) where it becomes necessary to discover at least partially the underlying structure in order to properly process the data.

Nowadays, a considerable number of web applications provide an external API consisting in a set of JSON-based services where all services are interrelated. Indeed, each service gives access to a subset of the application domain and developers must combine them to build any kind of non-trivial functionality on top of that API. Since JSON data is a schemaless format, deducing the right way of combining those services is not a trivial task.  JSONDiscoverer pretends to liberate developers from performing these tasks by inferring and visualizing the implicit schema of JSON data as well as the possible composition links among JSON-based Web APIs.

The figure below illustrates the typical development scenario where JSON-based Web APIs define a set of services, each one returning JSON documents when they are called.

doc-example

Our tool applies a discovery process to uncover the data model (i.e., schema) behind JSON-based Web APIs and assist on the discovery of composition links among them. These are currently the main functionalities provided:

  • Simple discovery, which discovers the schema of a given set of JSON documents returned by a single service. You can find a more detailed description of this step our paper and the corresponding presentation slides.
  • Advanced discovery, which discovers the schema from a set of JSON-based services. First, the schema of each JSON-based service is discovered (by using the simple discoverer), then the resulting schemas are composed to obtain a general one. You can find a more detailed description of this step our paper and the corresponding presentation slides.
  • API Composer, which takes a set of API schemas, looks for composition links (i.e., common concepts or attributes) and generates a composition graph. The result is used to help developers to compose APIs. The tool currently incoporates a sequence diagram generator to visualize API compositions. This paper describes the process in detail (and these are the presentation slides).

Our tool draws schema information as UML class diagrams, including concepts (i.e., classes) and their properties (i.e., attributes and associations linking the different concepts). As the tools leverages EMF (Eclipse Modeling Framework) to represent the schema information, Ecore models can also be obtained out of your JSON documents. Potential API compositions are represented by means of UML sequence diagrams showing the possible sequence of API calls.

The tool is available in GitHub, where you will find the sources to import the tool in your application. We have also developed a website to use the tool in any web browser.

website

The following demonstration video shows how to use the website:



This tool has been part of one of our resarch lines and several papers have been published in that regard. The last paper we have published describes the last version of the tool incorporating the previously described features. It is available here but you can have a look below.

Abstract

The so-called API economy is pushing more and more companies to provide open Web APIs to access their data, typically using the JavaScript Object Notation (JSON) as interchange data format. While JSON has been designed to be easy to read and parse, their structure is implicit. This poses a serious problem when consuming and integrating Web APIs from different sources since it forces us to manually analyze each individual API in detail. This paper presents JSONDiscoverer, a tool that alleviates this problem by discovering (and visualizing) the implicit schema of JSON documents as well as possible composition links among JSON-based Web APIs.

1  Introduction

The number of Web APIs is growing every day (the website Programmable web alone indexes over 13,000 APIs.) opening the door to an unlimited number of new services built on top of such APIs. Most of those Web APIs use JavaScript Object Notation (JSON) as a data interchange format. JSON mimics the JavaScript syntax, thus becoming human readable and easily parseable. However, it is schemaless, i.e., JSON documents do not include an explicit definition of the structure of the JSON objects contained in them.

This has several advantages but it is a serious problem when developing Web services that need to consume and exchange information among a set of APIs since developers need to figure out the structure of the JSON data provided by each one and the possible relationships between them. JSONDiscoverer pretends to liberate developers from performing these tasks by inferring and visualizing the implicit schema of JSON data as well as the possible composition links among JSON-based Web APIs. The tool has been made available as a web application. Since its release, JSONDiscoverer has been used to parse on average 375 JSON documents each month.

2  Problem and Background

The first thing needed to reuse/combine Web APIs is a good understanding of the data model behind them: what the data is about, what attributes each data object has, how they are related and so on.

JSON being schemaless combined with the fact that the few languages to specify Web APIs are still under development (e.g., RAML or Swagger, which allow specifying API services and their parameters but not the full API schema) or are not widely used (e.g., JSON Schema [1]) implies that developers must manually test Web APIs and try to deduce the data model lurking behind their services. Earlier research efforts (e.g., [2] and [3]) could be applied to analyze JSON documents, however, they are specially tailored to NoSQL databases and do not provide assistance to integrate Web APIs.

Figure 1 (grey boxes) illustrates the typical scenario when trying to integrate several JSON-based Web APIs. First, developers test each service provided and reverse engineer the implicit structure of the JSON data returned when calling them. Then, these individual service schemas need to be combined to build the full Web API schema (i.e., the global data model the API is giving access to). If two or more Web APIs need to be integrated, a last step is required aimed at identifying possible connections by searching for similar JSON elements that could be representing the same concept. Besides, the signature of each individual Web API service must be also considered to guarantee the accessibility of the target resources. Needless to say this is a time-consuming and error-prone task.

problem2

 

Figure 1: Discovery of the implicit structure from a set of JSON-based Web APIs. The main functionalities of our tool are represented with black-filled rounded boxes while input/output data is depicted as white-filled boxes.

3  Software Functionalities

JSONDiscoverer alleviates this situation by executing an automatic discovery process on the JSON data that uncovers the schema behind JSON-based Web APIs and suggests possible composition paths. The tool provides three main functionalities, which can be used separately or chained (see black-filled boxes in Figure 1):

  1. Simple discovery. It discovers the schema of a given set of JSON documents returned by a single service. The more the better, since some properties of the (implicit) model can only be deduced when having several examples.
  2. Advanced discovery. It takes the output of the simple discoverer for each service of a given Web API to infer its global schema.
  3. Composer. It takes a set of inferred Web API schemas and looks for composition links (i.e., common concepts or attributes). As a result, it generates a composition graph. The Composition Assistant will use this graph to help you find composition paths among the Web APIs.

JSONDiscoverer draws schema information as UML class diagrams, including concepts (i.e., classes) and their properties (i.e., attributes and associations linking the different concepts). Potential Web API compositions are represented by means of UML sequence diagrams showing the possible sequence of Web API calls.

More information about the discovery rules applied in the simple and advanced discoverers can be found in [4], while a description about the composition rules and sequence diagram generation is done in [5].

4  Implementation

JSONDiscoverer has been developed as a servlet-based web application including: (1) a backend developed in Java and providing the functionalities listed above; and (2) a front-end website implemented as an AngularJS web application. The website includes overlays to help newcomers to use the tool and a section explaining the inner workings for advanced users. Beyond this web frontend, JSONDiscoverer can also be executed from regular Java programs (see details on the tool website).
The parsing and management of JSON documents is performed by using the GSON library. Model management relies on the Eclipse Modeling Framework (EMF) while we use EMF2GV for their rendering.

5  Example

Figure 2 shows the Simple Discoverer page with an example. Once the user provides the JSON document to analyze in the textbox (or uses the default example for testing purposes) the button Discover Schema launches the discovery process, which sends the JSON document to the backend. As a result, the backend returns the domain model (i.e., schema) and JSON data (as an instance of the generated model), both as EMF models and pictures. The EMF models can be downloaded by the user and directly imported into other modeling tools for further analysis and manipulation.

screenshot2-2

Figure 2: Webpage for the simple discovery.

6  Conclusion

In this paper we have presented JSONDiscoverer, a tool aimed at promoting the integration and composition of JSON-based Web APIs. As further work we plan to enhance the concept-matching heuristics in the (advanced) discovery and composition process, and to provide code-generation facilities to realize the selected API’s integrations.

References

[1] IETF, JSON Schema Specification. http://json-schema.org/.

[2] M. Klettke, U. Störl, S. Scherzinger, Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores, in: BTW conf., 2015, pp. 425–444.

[3] D. Sevilla, S. Feliciano, J. G. Molina, Inferring Versioned Schemas from NoSQL Databases and Its Applications, in: ER conf., 2015, pp. 467–480.

[4] J. L. Cánovas Izquierdo, J. Cabot, Discovering Implicit Schemas in JSON Data, in: ICWE conf., Vol. 7977, LNCS, 2013, pp. 68–83.

[5] J. L. Cánovas Izquierdo, J. Cabot, Composing JSON-based Web APIs, in: ICWE conf., Vol. 8541, LNCS, 2014, pp. 390–399.

Tweet about this on TwitterShare on FacebookBuffer this pageShare on RedditShare on LinkedInShare on Google+Email this to someone

Reply

Your email address will not be published. Required fields are marked *