I was invited to participate in the panel Big Data and Conceptual models: Are they mutually compatible?, part of the ER 2016 conference (where we were already presenting our UMLtoGraphDB , metaScience, and human factors in MDE works).
The panel was organized as a highly interactive session where panelists with questions from the organizer (Sudha Ram) and the audience but without an initial position statement. Therefore, for he record, I’m going to use this post to make such statement 🙂 . So, here we go, my key messages regarding the relationship between (conceptual) models and the world of big data:
- Big data is not schemaless. At most, we can say it is “less-schema” than other data
- When accessing the data you need a schema that helps to interpret that data. If there is no explicit model to use, you have to infer one (e.g. using JSON Discoverer)
- For big data, models are not a static, fixed and complete artifact, rather a partial, dynamic and temporal view of the data to facilitate manipulating the data at that specific instant.
- In traditional software development, we follow a “model-down” approach. For big data, we have to switch to a “data-up” one. I.e. the data is what drives the models we have to use and not the models the ones that define what data we can have in the system
- Uncertainty becomes a first-class citizen: we may not be sure about the schema to use, about the quality of the data, about the reliability of the source,… Every interpretation comes with an uncertain probability of being correct
- Big data is more and more linked to APIs since plenty of data is being released under some kind of web API instead of using linked data / web semantic technologies. This is especially true for open data
- The user of Big data is more and more a non-technical end user (the so-called “citizen developer”). Modeling for big data approaches need to keep this profile in mind. This is for instance one of the goals of our “Open data for All” funded project
- Models can also play a key role in achieving interoperability between different data sources
- Models of big data don’t need to be unique, they can even be personal models: modeling the specific parts that a given user wants to explore on the data.
- Temporal and spatial properties are key elements in big data. Most modeling languages are not good at representing spatial or temporal information.
As always, happy to hear your opinion and listen to your disagreements!