Deb Agarwal

On October 1, Deb Agarwal assumed the role of interim division director of the Computing Sciences Area (CSA)’s newest division, the Scientific Data Division. A strong advocate for diversity, inclusion, and mentoring the next generation of scientists, Deb was formerly department head of the Computational Research Division (CRD)’s Data Science and Technology Department, which will become four of seven groups in the new Division. Deb, who has been recognized for her leadership in the technology sector, shares her thoughts about the new Division and the value of data science.

You are heading up a new division in computing sciences. What is its purpose, and why now?

The Computational Sciences Area has been thinking about forming the Scientific Data Division for a while. As the amount of data involved in scientific research has continued to grow, and techniques for data collection and processing have evolved, our own capabilities have grown as well. We are already a premier organization in terms of these capabilities, and creating this new Division will shine a light on these capabilities. It will create new career paths for scientists who work with data. It will build more durable collaborations with Areas and Divisions at the Lab. It will allow us to think more strategically, rather than opportunistically, about our data capabilities, and broadcast the message that this field is important to us. Now is a good time to do this, as the DOE is paying more attention to data science. With this new Division, we will have a seat at the table in discussions and initiatives around data science.

Can you give examples of the kinds of projects and initiatives that your division will engage in?

There are two areas of work that the Division will engage in. The first is to advance data science, for example, using machine learning and artificial intelligence so that data science can help us understand and predict scientific processes. We also plan to build tools, structures, and workflows for processing and managing data at all scales. Science now requires handling large-scale data and a huge diversity of data, which need to be integrated, and it is important the data are usable and shareable. Throughout the whole process of scientific research, there is a lot that data science can contribute.

Data science can also contribute to understanding more deeply what is going on in the real world. Data science can help us understand complex systems where we have data from the system but not models. Ultimately, we hope that data science can also help predict outcomes. This predictive capability will require a merging of first-principles understanding with data science techniques and is one of the holy grails of data science.

The second area of work will be our collaborative engagements with Areas across the Lab. Data science demands collaboration at all stages of the data collection, preparation, and analysis process. For many research projects, there are large teams of scientists collecting data across scales and disciplines; we need to work with these teams to bring together the data, to build capabilities that help turn that data into knowledge and understanding, and to preserve that data as a body of work for reuse.

How will your division partner with or support other Areas and Divisions?

We already partner with many Areas and Divisions at the Lab. For example, our Physics and X-Ray Science Computing group is embedded in large experimental collaborations to provide cyberinfrastructure for experiments like the ATLAS high energy physics project, the Computational Cosmology Center is collaborating with astrophysicists to develop the tools, techniques and technologies to meet the analysis challenges posed by cosmological data sets, and the Computational Biosciences group is developing tools and frameworks to meet analysis challenges in collaboration with the Biosciences Area. I work on teams spanning the Computing Sciences Area (CSA) and Earth and Environmental Sciences Areas (EESA) on a number of projects related to watersheds, tropical forests, and carbon flux. These are just a few of the examples of the partnerships the groups in the new division have with the other divisions and areas across Berkeley Lab and DOE.

Each of these partnerships evolved organically, and they are unique, driven by the particular needs of that project or division. We want to explore how we can strengthen and expand these partnerships across the scientific areas to address data science challenges together.

It’s exciting that data science, which has been growing over the last few decades, is now getting broader recognition as an area of research. With the new Division, there are many data science topics and opportunities that will present themselves as gaps in the current research as we look more holistically and more strategically at our capabilities. An example where we have had success doing this is our User Experience research in the Data Science and Technology Department which grew from a recognition that solving the scientific data science challenges required better understanding of user needs.