Today we present the new version of our JSON Discoverer tool, which helps to discover the schema out of JSON documents. In short, given a set of schemaless JSON documents, our tool will parse them to infer and make explicit the implicit schema shared by all documents. This can be specially useful to learn how to work with data storedin a NoSQL backend or behind an API as happened with this happy user:
— George Davies (@seivadge) June 20, 2016
In this blog post, we present a recap of the tool features and goals and include at the end a longer summary coming from our latest publication on this topic, titled JSONDiscoverer: Visualizing the schema lurking behind JSON documents to appear in the Knowledge-Based Systems journal.
Nowadays, a considerable number of web applications provide an external API consisting in a set of JSON-based services where all services are interrelated. Indeed, each service gives access to a subset of the application domain and developers must combine them to build any kind of non-trivial functionality on top of that API. Since JSON data is a schemaless format, deducing the right way of combining those services is not a trivial task. JSONDiscoverer pretends to liberate developers from performing these tasks by inferring and visualizing the implicit schema of JSON data as well as the possible composition links among JSON-based Web APIs.
The figure below illustrates the typical development scenario where JSON-based Web APIs define a set of services, each one returning JSON documents when they are called.
Our tool applies a discovery process to uncover the data model (i.e., schema) behind JSON-based Web APIs and assist on the discovery of composition links among them. These are currently the main functionalities provided:
- Simple discovery, which discovers the schema of a given set of JSON documents returned by a single service. You can find a more detailed description of this step our paper and the corresponding presentation slides.
- Advanced discovery, which discovers the schema from a set of JSON-based services. First, the schema of each JSON-based service is discovered (by using the simple discoverer), then the resulting schemas are composed to obtain a general one. You can find a more detailed description of this step our paper and the corresponding presentation slides.
- API Composer, which takes a set of API schemas, looks for composition links (i.e., common concepts or attributes) and generates a composition graph. The result is used to help developers to compose APIs. The tool currently incoporates a sequence diagram generator to visualize API compositions. This paper describes the process in detail (and these are the presentation slides).
Our tool draws schema information as UML class diagrams, including concepts (i.e., classes) and their properties (i.e., attributes and associations linking the different concepts). As the tools leverages EMF (Eclipse Modeling Framework) to represent the schema information, Ecore models can also be obtained out of your JSON documents. Potential API compositions are represented by means of UML sequence diagrams showing the possible sequence of API calls.
The following demonstration video shows how to use the website:
This has several advantages but it is a serious problem when developing Web services that need to consume and exchange information among a set of APIs since developers need to figure out the structure of the JSON data provided by each one and the possible relationships between them. JSONDiscoverer pretends to liberate developers from performing these tasks by inferring and visualizing the implicit schema of JSON data as well as the possible composition links among JSON-based Web APIs. The tool has been made available as a web application. Since its release, JSONDiscoverer has been used to parse on average 375 JSON documents each month.
2 Problem and Background
The first thing needed to reuse/combine Web APIs is a good understanding of the data model behind them: what the data is about, what attributes each data object has, how they are related and so on.
JSON being schemaless combined with the fact that the few languages to specify Web APIs are still under development (e.g., RAML or Swagger, which allow specifying API services and their parameters but not the full API schema) or are not widely used (e.g., JSON Schema ) implies that developers must manually test Web APIs and try to deduce the data model lurking behind their services. Earlier research efforts (e.g.,  and ) could be applied to analyze JSON documents, however, they are specially tailored to NoSQL databases and do not provide assistance to integrate Web APIs.
Figure 1 (grey boxes) illustrates the typical scenario when trying to integrate several JSON-based Web APIs. First, developers test each service provided and reverse engineer the implicit structure of the JSON data returned when calling them. Then, these individual service schemas need to be combined to build the full Web API schema (i.e., the global data model the API is giving access to). If two or more Web APIs need to be integrated, a last step is required aimed at identifying possible connections by searching for similar JSON elements that could be representing the same concept. Besides, the signature of each individual Web API service must be also considered to guarantee the accessibility of the target resources. Needless to say this is a time-consuming and error-prone task.
Figure 1: Discovery of the implicit structure from a set of JSON-based Web APIs. The main functionalities of our tool are represented with black-filled rounded boxes while input/output data is depicted as white-filled boxes.
3 Software Functionalities
JSONDiscoverer alleviates this situation by executing an automatic discovery process on the JSON data that uncovers the schema behind JSON-based Web APIs and suggests possible composition paths. The tool provides three main functionalities, which can be used separately or chained (see black-filled boxes in Figure 1):
- Simple discovery. It discovers the schema of a given set of JSON documents returned by a single service. The more the better, since some properties of the (implicit) model can only be deduced when having several examples.
- Advanced discovery. It takes the output of the simple discoverer for each service of a given Web API to infer its global schema.
- Composer. It takes a set of inferred Web API schemas and looks for composition links (i.e., common concepts or attributes). As a result, it generates a composition graph. The Composition Assistant will use this graph to help you find composition paths among the Web APIs.
JSONDiscoverer draws schema information as UML class diagrams, including concepts (i.e., classes) and their properties (i.e., attributes and associations linking the different concepts). Potential Web API compositions are represented by means of UML sequence diagrams showing the possible sequence of Web API calls.
More information about the discovery rules applied in the simple and advanced discoverers can be found in , while a description about the composition rules and sequence diagram generation is done in .
JSONDiscoverer has been developed as a servlet-based web application including: (1) a backend developed in Java and providing the functionalities listed above; and (2) a front-end website implemented as an AngularJS web application. The website includes overlays to help newcomers to use the tool and a section explaining the inner workings for advanced users. Beyond this web frontend, JSONDiscoverer can also be executed from regular Java programs (see details on the tool website).
The parsing and management of JSON documents is performed by using the GSON library. Model management relies on the Eclipse Modeling Framework (EMF) while we use EMF2GV for their rendering.
Figure 2 shows the Simple Discoverer page with an example. Once the user provides the JSON document to analyze in the textbox (or uses the default example for testing purposes) the button Discover Schema launches the discovery process, which sends the JSON document to the backend. As a result, the backend returns the domain model (i.e., schema) and JSON data (as an instance of the generated model), both as EMF models and pictures. The EMF models can be downloaded by the user and directly imported into other modeling tools for further analysis and manipulation.
Figure 2: Webpage for the simple discovery.
In this paper we have presented JSONDiscoverer, a tool aimed at promoting the integration and composition of JSON-based Web APIs. As further work we plan to enhance the concept-matching heuristics in the (advanced) discovery and composition process, and to provide code-generation facilities to realize the selected API’s integrations.
 IETF, JSON Schema Specification. http://json-schema.org/.
 M. Klettke, U. Störl, S. Scherzinger, Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores, in: BTW conf., 2015, pp. 425–444.
 D. Sevilla, S. Feliciano, J. G. Molina, Inferring Versioned Schemas from NoSQL Databases and Its Applications, in: ER conf., 2015, pp. 467–480.
 J. L. Cánovas Izquierdo, J. Cabot, Discovering Implicit Schemas in JSON Data, in: ICWE conf., Vol. 7977, LNCS, 2013, pp. 68–83.
 J. L. Cánovas Izquierdo, J. Cabot, Composing JSON-based Web APIs, in: ICWE conf., Vol. 8541, LNCS, 2014, pp. 390–399.