News Weather Band Content Education In Brief

Keeping Up with Big Data

  • By AMS Staff
  • Sep 20, 2023

BAMS talked with Thomas Huang about the book Big Data Analytics in Earth, Atmospheric, and Ocean Sciences, of which he is the lead editor. The book explores new tools for the analysis and display of the rapidly increasing volume of data about the Earth and is part of the AGU Special Publications Series.
Huang is a group supervisor at the NASA Jet Propulsion Laboratory (JPL)’s Instrument Software and Science Data Systems section and the strategic lead for Interactive Analytics for the National Space Technology Applications Program Office at JPL. He is the NASA principal investigator for Earth System Digital Twins and the system architect for NASA’s Sea Level Change Portal. As an expert in large-scale, distributed intelligent data systems, Thomas has led both planetary and Earth information system projects, and as an advocate for free and open-source software, he led the open-sourcing of many JPL-developed technologies. He is the founder and creator of the Apache Science Data Analytics Platform technology as a community-driven, cloud-based analytics framework.

Why do this book?
While there have been many advances in the collection of observations, reflected in the fast increase in the Earth Observations archive and in forecast modeling, no one measurement or method can provide all the answers. Our rapidly growing collections of observational and model data require us to be smarter about when and what data to include. Tiffany Vance of NOAA, Christopher Lynnes of NASA, and I discussed the need for a book on big data analytics for Earth science. We all think that it would be a great resource if it can capture various innovative works in system architecture, data processing, access, management, analysis, and visualization that are foundational to today’s Earth science tools and solutions. 

Who is the book for?
Tackling big climate data requires professional software engineering and repeatable scientific methods. The goal is to make science sustainable and repeatable. I think sustainable includes affordable, maintainable, and extensible. We are living in an exciting time with many powerful computing options, intelligent IoT accessible via the internet, and a rapidly growing, diverse collection of Earth observation and model data. Let’s face it, there is no silver bullet for Earth science analysis. A sustainable big data solution for Earth science needs professional software engineering and solid science. The book targets both audiences. 

What obstacles did you face in working on this book?
It is a humbling experience to be able to put together this groundbreaking book. I had the privilege to work with fantastic coeditors. Through this book, I met and interacted with innovators around the world, something that I will always treasure. I think the most time-consuming part is coordinating between my authors and reviewers through the entire review and revision process. Everyone’s contributions to this book were done outside of their very busy schedules. 

What did you learn in the process?
Great minds think alike. The book presents big data solutions from different scientific disciplines and applications. There are synergies in their methods—that is, to reduce data movement and promote parallelization and scalability. They are foundational to sustainable big data solutions.

What are the implications of this work?
This book took a couple of years to complete. The technology world has evolved. However, the fundamental mythology for designing and developing big data analytic platforms and tools hasn’t changed. I would compare this book to the popular Design Patterns: Element of Reusable Object-Oriented Software, by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. That book was published in 1994, but it is still one of the best sellers and referred to by software professionals. The works presented in this book gave life to some of the new emerging architectural patterns, analysis-optimized format, and tools.

What's next?
My current project is creating an open-source framework for Earth System Digital Twins. It is to establish a digital representation of the Earth system to enable actionable prediction and dynamic acquisition of data and analysis. It is an opportunity for me to finally establish a federated big data architecture to include multicomputing, machine learning, and IoT. For a digital twin to be successful, it needs to provide an accurate representation of the Earth system. It is a technology that I don’t think should be tackled by or owned by a single organization. My goal is to establish an open implementation to encourage community contribution and to establish a federation of digital twins across the globe. It is not about building a bigger computer. It is about building a solution that can connect to mature, managed models and data services.

What are your plans for the future?
I have been designing and developing large-scale, distributed intelligent data systems for planetary and Earth science projects for almost three decades at the NASA Jet Propulsion Laboratory. While the fundamentals don’t change, I am always excited to learn about how to be smarter with big data.

* For more content from the Bulletin of the American Meteorological Society, please click here.