Arcot Rajasekar is a Professor in the School of Library and Information Sciences at the University of North Carolina at Chapel Hill, a Chief Scientist at the Renaissance Computing Institute (RENCI) and co-Director of Data Intensive Cyber Environments (DICE) Center at the University of North Carolina at Chapel Hill. Previously he was at the San Diego Supercomputer Center at the University of California, San Diego, leading the Data Grids Technology Group.

He has been involved in research and development of data grid middleware systems for over a decade and is a lead originator behind the concepts in the Storage Resource Broker (SRB) and the integrated Rule Oriented Data Systems (iRODS), two premier data grid middleware developed by the Data Intensive Cyber Environments Group.

A leading proponent of policy-oriented large-scale data management, Rajasekar has several research projects funded by the National Science Foundation, the National Archives, National Institute of Health and other federal agencies.

Rajasekar has a PhD in Computer Science from the University of Maryland at College Park and has more than 200 publications in the areas of data grids, digital library, persistent archives, artificial intelligence and smart cities. His latest projects include the Datanet Federation Consortium, the Data Bridge and Smart and Connected Communities Initiative.

 

Presentation: Big Data & Data Science: A Practitioner’s Perspective

This talk explores four distinct types of big data each of which provides a different challenge for analysis, storage and dissemination. The "traditional" big data with large size are archetypal with high volume; the crowd sourced and public-facing big data have high velocity and transient value; long-tail of Science data forms the dark data, heterogeneous in form and suffering from first and last mile problems; and sensor stream big data are time-critical, have unknown value and may yet emerge as the most challenging.

This talk will examine the characteristics of these four types of big data and consider challenges in developing data-centric platforms for handling them. We will discuss emerging solutions to these issues such as the integrated Rule Oriented Data Systems and the DataBridge.