📄 Paper (NeurIPS 2024) ⬇️ Dataset (Hugging Face) 🔗 Code (GitHub)

Abstract

A major obstacle to the advancements of machine learning models in marine science, particularly in sonar imagery analysis, is the scarcity of AI-ready datasets. While there have been efforts to make AI-ready sonar image dataset publicly available, they suffer from limitations in terms of environment setting and scale. To bridge this gap, we introduce SeafloorAI, the first extensive AI-ready datasets for seafloor mapping across 5 geological layers that is curated in collaboration with marine scientists. We further extend the dataset to SeafloorGenAI by incorporating the language component in order to facilitate the development of both vision- and language-capable machine learning models for sonar imagery. The dataset consists of 62 geo-distributed data surveys spanning 17,300 square kilometers, with 696K sonar images, 827K annotated segmentation masks, 696K detailed language descriptions and approximately 7M question-answer pairs. By making our data processing source code publicly available, we aim to engage the marine science community to enrich the data pool and inspire the machine learning community to develop more robust models. This collaborative approach will enhance the capabilities and applications of our datasets within both fields.

What is Seafloor Mapping?

Seafloor mapping is the process of creating detailed maps of the ocean floor to understand its shape, depth, and the types of materials that cover it, like sand, rocks, or mud. This information helps scientists and industries make decisions about marine life conservation, offshore construction, and resource exploration.

The photo shows a tow ship and an autonomous underwater vehicle (AUV) capturing detailed images of the seafloor's texture (backscatter) and depth (bathymetry) using multi-beam echosounders or side-scan sonar.

Seafloor mapping system
Tow ship and AUV recording the seafloor details. Photo courtesy from Nautilus Magazine.

Why Seafloor Mapping?

Seafloor mapping stands at the forefront of marine science, utilizing cutting-edge technologies like multi-beam echo sounders and side-scan sonar to unveil the hidden complexities of the ocean floor. Beyond scientific research, seafloor mapping is instrumental in identifying potential resources, assessing environmental impacts, and supporting sustainable ocean management practices in the context of the blue economy. However, the current analysis techniques in seafloor mapping are predominantly labor-intensive and reliant on manual interpretation by marine scientists, necessitating hundreds of hours spent meticulously examining data surveys to analyze seabed imagery. This hands-on approach is not only time-consuming but also susceptible to user subjectivity and the limitations of individual expertise, thus introducing potential inconsistencies in analysis.

The integration of machine learning (ML) holds the promise of enhancing efficiency and reliability in seafloor mapping by automating the segmentation and classification tasks. To this end, we introduce SeafloorAI, the first extensive AI-ready sonar imagery dataset for seafloor mapping. We also incorporate language into our dataset, extending it to SeafloorGenAI.

SeafloorAI public survey regions
Overview of our datasets, SeafloorAI and SeafloorGenAI. The table highlights key dataset statistics. We incorporate 62 public data surveys published by USGS and NOAA from 9 major regions to construct SeafloorAI and SeafloorGenAI datasets. Our dataset contains 9 geological layers, 4 of which are raw signals, i.e., Backscatter, Bathymetry, Slope and Rugosity, and 5 annotated by human experts, i.e. Sediment, Physiographic Zone, Habitat, Fault and Fold. SeafloorAI serves as a dataset for standard computer vision tasks, i.e. semantic segmentation, whereas SeafloorGenAI constitutes a dataset for generative vision-language tasks, i.e., general visual question answering and instruction-following mapping.

SeafloorAI Dataset

SeafloorAI dataset provides 827K ground-truth segmentation masks for 696K sonar image tiles across 9 regions. It includes 9 geospatial data layers: 4 raw input signals, namely Backscatter, Bathymetry, Slope, and Rugosity, and 5 expert-annotated label layers, including Sediment, Physiographic Zone, Habitat, Fault, and Fold. Each sonar image tile has a spatial resolution of 224 × 224 pixels. To download the full dataset, please visit our Hugging Face page.

SeafloorGenAI Dataset

SeafloorGenAI extends SeafloorAI with the incorporation of the language component. We aim to equip each sonar image with a detailed language description as well as several question-answer pairs. The ultimate goal is to develop a large vision-language model for marine science. SeafloorGenAI dataset is coming soon.

The Language Annotation Pipeline for SeafloorGenAI Dataset

We leverage in-context learning (ICL) in GPT-4 to automate the language annotation process, providing few-shot input-output pairs for the LLM. In this case, the input contains the key analytical indicators and the output is the description written by the marine scientists for the same image. To construct the ICL input, we, in collaboration with marine scientists, identify the essential information required for analysis. Subsequently, we use standard statistical and computer vision tools to extract three categories of information: (1) geophysical parameters, (2) spatial distribution and (3) geological composition.

The objective is to help the model ''see'' the sonar image through as much detailed language descriptions as possible. For the ICL output, we ask marine scientists to manually describe in domain language 50 randomly selected samples from the SeafloorAI dataset. ICL ensures GPT-4 can accurately mimic the domain-specific language, enhancing the quality and relevance of the generated answers.

Next, we design a prompt to GPT-4, comprised of the input-output pairs and the extracted analytical indicators, to generate general descriptions and question-answer pairs for the remaining images. Finally, the domain experts carefully evaluate the generated language annotations to ensure quality and consistency. The last two steps form a feedback loop, creating an iterative prompt refinement process.

Annotation pipeline

Research Outcome

Publications

Open-Sourced Data

Open-Sourced Software

Acknowledgement

We sincerely thank the Department of Defense (DoD) and National Science Foundation (NSF) for supporting this research.

We also acknowledge the U.S. Geological Survey (USGS) and the National Oceanic and Atmospheric Administration (NOAA) for providing the raw survey data.