KEMBAR78
nahidalam (Nahid Alam) · GitHub
Skip to content
View nahidalam's full-sized avatar
👩‍💻
👩‍💻

Block or report nahidalam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nahidalam/README.md

Hi, this is Nahid. I am an independent researcher with Cohere Labs community, working on Multimodal Learning, Computer Vision and Embodied AI.

I recently created Maya – a multilingual multimodal LLM. I work at the intersection of multimodal learning, computer vision and embodied ai, developing models that perceive, reason, and act in the physical world.
My current interests include:

  • Spatial understanding in VLMs for real-world perception
  • Physics-aware world models
  • Multimodal Learning
  • Simulation and Embodied AI

Publications

  • Behind Maya: Building a Multilingual Vision-Language Model.
    Nahid Alam et al. CVPR 2025 Workshop (VLMs4All).
    arXiv · Google Scholar

  • Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA.
    Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Shayekh Islam.
    CVPR 2025 Workshop (ReGenAI), Oral.
    arXiv · Google Scholar

  • Embedding Geometries of Contrastive Language-Image Pre-Training.
    Jason Chuan-Chih Chou, Nahid Alam. ECCV 2024 Workshop (Beyond Euclidean).
    arXiv · Google Scholar

More at Google Scholar


Recent Projects

  • Maya: Multilingual multimodal foundation model (2 CVPR workshops)
  • Gemma3n-VLA: Vision-Language-Action model built with Hugging Face LeRobot
  • GR00T-N1 Hackathon: Bimanual robot manipulation with multimodal control

🌐 Connect


Pinned Loading

  1. maya maya Public

    Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya

    Python 117 11

  2. customer_bot customer_bot Public

    Simple chatbot using Rasa.ai

    Python 46 47

  3. modnet_docker modnet_docker Public

    Dockerized container for MODNet - a Real-Time Portrait Matting solution

    Python 13 4

  4. LLaVA LLaVA Public

    Forked from haotian-liu/LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

    Python 5 15