Vision–Language Modeling for Large-Scale Geospatial Data
Undergraduate Research, Cornell University, CIS (BURE), 2024
- Developed automated data-generation pipelines using LLaVA-1.5 and LLaMA-3.
- Generated detailed captions from large-scale internet imagery.
- Curated and aligned over one million internet–satellite image pairs.
- Enabled large-scale training of geospatial vision–language models.
- Implemented distributed captioning and model pretraining with DeepSpeed.
