Vision–Language Modeling for Large-Scale Geospatial Data

Undergraduate Research, Cornell University, CIS (BURE), 2024

  • Developed automated data-generation pipelines using LLaVA-1.5 and LLaMA-3.
  • Generated detailed captions from large-scale internet imagery.
  • Curated and aligned over one million internet–satellite image pairs.
  • Enabled large-scale training of geospatial vision–language models.
  • Implemented distributed captioning and model pretraining with DeepSpeed.