Double-correct Prediction in Sciences

Funding Source: NSF III CORE

Budget: $599,411

Time: 08/2024 - 07/2027

PI: Dr. Xi Peng (Machine Learning)

Co-PI: Dr. Rudolf Eigenmann (HPC)

A trustworthy toolbox for double-correct predictive modeling in sciences.

Abstract: AI and ML have driven scientific advances in critical domains like climate change and extreme weather prediction, but challenges remain due to unpredictable data shifts and unseen variables. This project introduces a novel trustworthy toolbox prioritizing both prediction robustness and rationale validity, ensuring accurate outcomes are backed by scientifically grounded rationales, even with unforeseen data variations. The toolbox will be optimized for scalability on HPC and released as open-source software, benefiting researchers in Earth, Marine, and Environmental Sciences through accessible, generalizable workflows and AI-ready datasets.

Publications:

Open-Sourced Data:

  • Prediction Rationale Dataset for ImageNet: We construct a new rationale dataset that covers all 1,000 categories in the ImageNet. For each category, we generate an ontology tree with a maximum height of two. Combining attributes and sub-attributes, this dataset contains over 4,000 unique rationales. [https://github.com/deep-real/DCP/tree/main/Rationale%20Dataset]
  • Prostate MRI and PIRAIDl Dataset: We curate a prediction rationale dataset to tackle the lack of paired MRI images, annotations, and textual rationales for AI training. The dataset includes 180K image-mask-rationale triples with quality evaluated by expert radiologists. [https://github.com/deep-real/MedRationale].

Open-Sourced Software: