A brave new world of data
18 Feb 2016 by Evoluted New Media
A more open future with data flowing between institutions will be a scientific utopia says Mark Hehnel.
A more open future with data flowing between institutions will be a scientific utopia, says Mark Hahnel. But can the flexibility of data sharing really be integrated with the control needed for sensitive research assets?
Researchers across all disciplines are seeing the amount of data growing exponentially year-on-year. They are consequently faced with the challenge of storing, managing, disseminating and reproducing the large and diverse data sets that are being generated each day.
In the past, researchers put together their own systems and methods to try to address these problems, each taking a different approach, resulting in few economies of scale or consistent best practices. However, in an increasingly collaborative world with interdisciplinary research occurring more often as scientists take on more diverse and complicated research, this approach is no longer feasible.
Today, as market dynamics change, researchers are being told by their institutions, publishers and funders they must have a data management plan and that they must make their data openly available in order to comply with changes in a publisher’s policy or with a funder’s mandates.
The format in which researchers disseminate their research findings is through the academic paper, an established medium that has seen barely any change or evolution in centuries. Technological advances have undoubtedly made the process of publishing research much easier. However, these changes have also come with their own set of challenges and drawbacks.
Researchers are rapidly creating vast amounts of data which needs to be easily accessed and propagated. Yet, researchers are finding themselves restricted and hindered in their ability to share their data and results due to the limited functionality of the fixed PDF document.For example, the PDF is a static format which does not lend itself to an open or collaborative process.
In attempting to address such issues, the future of academic research is striving to be a far more dynamic place. A world can be envisioned in which the process of academic discovery is increasingly innovative, transformable and collaborative.
Beginning with researchers generating new information based on new hypotheses, they will then make this information conveniently available to others.
[caption id="attachment_51930" align="alignnone" width="529"] Formats such as PDF make the sharing of data cumbersome at times[/caption]
It is from this point the world’s knowledge can be pulled together as other researchers and academics are able to identify new patterns and make new discoveries from the data provided that the original authors may not have seen or may not have thought to look for.
The future of managing research data for institutions lies in the strength of the application program interfaces (APIs). In the short term, allowing files, metadata and identifiers to flow between institutional systems is an achievable goal that has already been set into practice by several universities. For example, Loughborough University has built a solution that connects DSpace, Symplectic Elements and Figshare.
In the longer term, we expect to see the powers of APIs harnessed in order to query datasets in the browser, allowing researchers to build on top of work gone before them without needing to download and parse open and accessible data.
[caption id="attachment_51934" align="alignnone" width="620"] APIs will in future allow vast quantities of data to be shared among many institutions worldwide.[/caption]
Almost every university and academic institution has their own set of requirements and research systems they believe best helps them remove and ease administrative burden. Today, there is an ever broader set of tools from which to choose from when considering implementing a new system.
This wealth of choice is good news for research institutions, enabling them to utilise any number of tools which are best suited to their needs. However, if optimum efficiency is to be achieved then it is crucial these multiple systems are complementary and are able to talk to and interact with one another.
In order to truly further and enhance university efforts, it is imperative interoperability between systems is facilitated as much as possible. In today’s collaborative world, research systems that cannot be integrated into and accommodated by the existing ecosystem risk becoming obsolete.
The ideal goal is a system in which researchers and administrators will not be required to provide information to university systems multiple times. Instead they will be able to enter the appropriate data once, either automatically or manually, and it should then propagate throughout the different systems in the institution. This will streamline the data-input process in order to maximise efficiency and thus save time for both academics and administrators. It should also rely on records of truth that will not cause inconsistencies between systems. The benefits of these workflows can be seen in systems that have already been widely and globally adopted. Researchers are able to accurately capture great quantities of data with minimal administrative input required.
There are several factors which make the cloud so appealing to academics and researchers. This includes its fast load times, scalability, automated application deployment, multiple back-ups and constantly updated hardware.The cloud is becoming an increasingly attractive place in which many of the products of academia should reside.
These features mean institutions need not create their own server centres with associated running costs and rapid dating of technology. The ability of any academic developer to access the processing power of thousands of servers at the click of a button also demonstrates the inherent power of scale commercial cloud services can provide. Universities will be looking to harness the potential of the cloud in a move that will be both cost-effective and efficient.
For researchers accessing and managing large data sets, the ability to easily collect and find their research is paramount. Of importance are enhanced discoverability features to enable users to conduct more comprehensive searches and to collate, classify and cite data and their sources in a way that will ensure more accurate and relevant content.
[caption id="attachment_51931" align="alignnone" width="620"] As data sets become larger and larger, it's important that they can be stored in such a way that they are easily accessible.[/caption]
An increasing priority for the researcher is having access to tools that enable control and reuse. It is a matter of recognising that some data contains personal, ethical or commercially sensitive information that cannot be made available to the general public and therefore researchers require levels of control to promote best practice while also simultaneously enabling openness.
The types of features that provide this range from the use of embargoes to the use of confidential and linked files. This provides control over elements such as the specific time and date a file is published as well as the manner and location in which a file is stored.
The next generation of academic data publishing is moving towards an ever-more collaborative and open place in which researchers can easily choose to make the desired data sets available. The advancements will integrate flexibility with control, allowing research to be built upon and expanded upon in beneficial ways without compromising confidentiality or classification. Ultimately, it will be a world in which researchers and academics will be able to focus on their discoveries and their intellectual pursuits without being distracted by administrative burden or inefficient systems.
Author:
Mark Hahnel, founder and CEO of Figshare