Trustworthy Scientific Machine Learning

Funding Source: NSF CAREER

Budget: $572,765

Time: 09/2024 - 08/2029

Trustworthy machine learning for geo-distributed scientific data analytics.

Description: This project aims to develop a trustworthy optimization toolbox for geo-distributed scientific data analytics, addressing gaps in AI/ML practices where models trained on historical or regional data struggle with complex and evolving dynamics of phenomena like extreme weather events and climate change. The project pioneers optimization methods to enhance prediction robustness, explanation reliability, and scalable privacy protections, crucial for rare or unseen scenarios in safety-critical applications. It pursues three aims: bridging data topology and robust optimization, revolutionizing explainable machine learning for scientific discovery, and ensuring trustworthy collaborative learning. The project integrates these advancements into education, promoting diversity and inclusion in STEM through interdisciplinary outreach and curricula.

Outcomes:

Double-correct Prediction in Sciences

Funding Source: NSF III CORE

Budget: $599,411

Time: 08/2024 - 07/2027

PI: Dr. Xi Peng (Machine Learning)

Co-PI: Dr. Rudolf Eigenmann (HPC)

A trustworthy toolbox for double-correct predictive modeling in sciences.

Description: AI and ML have driven scientific advances in critical domains like climate change and extreme weather prediction, but challenges remain due to unpredictable data shifts and unseen variables. This project introduces a novel trustworthy toolbox prioritizing both prediction robustness and rationale validity, ensuring accurate outcomes are backed by scientifically grounded rationales, even with unforeseen data variations. The toolbox will be optimized for scalability on HPC and released as open-source software, benefiting researchers in Earth, Marine, and Environmental Sciences through accessible, generalizable workflows and AI-ready datasets.

Outcomes:

Safe Learning-enable System

Funding Source: NSF SLES

Budget: $1,499,949

Time: 10/2024 - 09/2028

PI: Dr. Xi Peng (Machine Learning)

Co-PI: Dr. Weisong Shi (Autonomous Vehicle)

Co-PI: Dr. Chengmo Yang (Hardware)

The proposed OSLA (Orchestrated Safe Learning for Autonomous driving) system.

Description: Machine Learning (ML) has transformed autonomous driving by enabling vehicles to perceive their environment with high precision, make real-time decisions, and operate without human intervention. However, unsafety may stem from the model—such as inappropriate extrapolation in unique scenarios—the hardware, which suffers from faults and errors, or the system, where the real-time operating system (RTOS) may not deliver decisions in time. Developing a safe learning-enabled system for autonomous vehicles (AVs) requires orchestrating the model, hardware, and system. This project focuses on cross-layer optimizations to achieve end-to-end safety by developing rational ML models with valid rationales, integrating hardware reliability into ML design to tolerate runtime faults, and designing an RTOS scheduler that ensures time predictability while considering model and hardware reliability. Implementing these advancements on real autonomous driving platforms will enhance AV safety, promote efficient transportation, and advance education and workforce development in AI and autonomous driving with a commitment to diversity and inclusion in STEM fields.

Outcomes:

Robust & Explainable Seafloor AI

Funding Source: DoD DEPSCoR

Budget: $577,825

Time: 08/2023 - 07/2026

PI: Dr. Xi Peng (Machine Learning)

Co-PI: Dr. Arthur Trembanis (Marine Science)

SeafloorAI, the first large AI-ready dataset for seafloor mapping using sonar imagery.

Description: Characterizing seafloor morphodynamics is essential for naval applications relying on extensive geoacoustic and environmental data over broad spatiotemporal scales. Understanding dynamics and uncertainty among numerous variables presents a significant out-of-distribution challenge, as AI/ML models often struggle with unseen distributions, leading to fragile predictions and unreliable explanations. To overcome these challenges, this project will develop new robust and explainable optimization methods using newly-curated seafloorAI datasets. The research outcomes—including an AI-ready multi-site seabed morphodynamic database, innovative trustworthy AI/ML optimization methods, and scalable implementations—will be linked to the Ocean Biogeographic Information System and Seabed 2030 to benefit a broad range of communities and stakeholders.

Outcomes:

Safe AI for Prostate Cancer Diagnosis

Funding Source: MSK Cancer Center

Budget: $277,275

Time: 01/2024 - 12/2026

Description: Advances in natural language processing can build on image processing breakthroughs to offer clinicians new AI tools against prostate cancer (PCa). Current AI for interpreting mp-MRI scans relies on visual encodings like lesion annotations but fails at translation in patient care because it doesn't use the standardized PI-RADS (Prostate Imaging Reporting and Data System) format accepted by clinicians. The expertise in PI-RADS reports offers a major resource for training AI to achieve clinical acceptance. This research addresses two gaps: (1) Data availability—public PCa data repositories lack PI-RADS reports, and (2) AI modeling—existing approaches can't integrate complex radiologist expertise expressed through language. We will test the hypothesis that PI-RADS reports can be made machine-readable and combined with visual data so that AI can be trained to interpret MRIs according to the reasoning processes of radiologists. University of Delaware researchers and Memorial Sloan Kettering radiologists will collaborate to develop datasets and tools for safe AI-assisted prostate MRI interpretation.

Outcomes:

Characterizing the Global Illicit Trade

Funding Source: NSF CMMI

Budget: $999,984

Time: 08/2021 - 07/2026

Co-PI: Dr. Xi Peng (Machine Learning) with Dr. Julie Klinger (PI, Geo Science)

Description: The objective of this five-year project is to map and characterize the volume of illicitly-sourced materials in energy-critical minerals (ECM) supply chains. ECMs are essential to renewable, nuclear, and fossil energy generation and are included in the US governments list of 35 critical materials, yet their supply chains remain opaque and vulnerable to illicit activity. There are currently no global measurements of the licit-illicit composition of ECM trade flows or their evolution over time. To address the problem, this project seeks to map and model global ECM flows based on original research in several source, transit, and destination countries. The findings and tools developed under this study will improve discovery and traceability of illicitly sourced ECM, identify vulnerable points along several ECM supply chains, and generate predictive models of their dynamics in order to identify effective disruption strategies. The results will be informed by data drawn from open and proprietary datasets, as well as data gathered at national and subnational levels, and tested under extensive field research.

Outcomes: