From theory to practice
Everyone is talking about Big Data, and for good reason. Scientists are confronted with more and more data at an accelerating pace. Turning Big Data into insightful information remains a challenge. Interacting with a visual representation of data is a great way to obtain such insight, and is at the same time a very efficient way to communicate these insights to a broader public.
This project will facilitate an eScience setting that combines systems of interactive visualization and visual analytics. Extremely large and distributed data sets can be explored in a collaborative fashion by combining the visual representation of data with human interaction devices.
One of the difficult parts in the interactive visualization of Big Data is getting the data into the visualization. The focus of this project is to stop pushing data into the visualization and instead start pulling data from the data center. Within the Target environment very large data sets are handled, with file systems up to 10 petabyte in size and databases of up to 100 terabyte. For this a paradigm has been developed that involves ‘extreme data lineage’; full traceability of data items and their dependencies. On top of this a new access method has been developed nicknamed ‘query driven visualization’. These techniques will be combined with novel techniques from Visual Analytics to integrate human reasoning and creativity with automated scalable data handling.
This research is required by different disciplines, for example medical and astronomy projects. Since the problems addressed are generic, further dissemination is envisioned for other eScience projects. The developed tools will be implemented in research and decision environments on workstations but also in 3D venues such as the 3D theatre in the Donald Smits Center for Information Technology and the Infoversum 3D discovery theatre in Groningen, which is currently under construction. These implementations will facilitate real time 3D access and mining tools which can access extremely large datasets, distributed worldwide, for decision making, scientific exploration, and dissemination of knowledge.