Hi, this is Nahid. I am an independent researcher with Cohere Labs community, working on Multimodal Learning, Computer Vision and Embodied AI.
I recently created Maya – a multilingual multimodal LLM. I work at the intersection of multimodal learning, computer vision and embodied ai, developing models that perceive, reason, and act in the physical world.
My current interests include:
- Spatial understanding in VLMs for real-world perception
- Physics-aware world models
- Multimodal Learning
- Simulation and Embodied AI
-
Behind Maya: Building a Multilingual Vision-Language Model.
Nahid Alam et al. CVPR 2025 Workshop (VLMs4All).
arXiv · Google Scholar -
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA.
Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Shayekh Islam.
CVPR 2025 Workshop (ReGenAI), Oral.
arXiv · Google Scholar -
Embedding Geometries of Contrastive Language-Image Pre-Training.
Jason Chuan-Chih Chou, Nahid Alam. ECCV 2024 Workshop (Beyond Euclidean).
arXiv · Google Scholar
More at Google Scholar
- Maya: Multilingual multimodal foundation model (2 CVPR workshops)
- Gemma3n-VLA: Vision-Language-Action model built with Hugging Face LeRobot
- GR00T-N1 Hackathon: Bimanual robot manipulation with multimodal control
- LinkedIn: nahidalam
- Twitter / X: @nahidalam