BanglaLLM is an independent, open research effort building language models for Bangla. We think there's a real difference between treating a language as an afterthought and designing for it from day one.
What We Work On
Our work spans foundation models, benchmarks, data infrastructure, and real-world applications.
New tokenization, continued pre-training, and instruction-tuning for Bangla, built on Llama and Qwen. The BanglaLlama family ranges from 3B to 33B; all released openly on HuggingFace.
Measuring how well models perform in Bangla is still largely an open question. We're building benchmarks around political-bias detection, mathematical reasoning, and test-time scaling.
Good models need good data, and for Bangla we've built most of it ourselves. News crawlers, translated instruction datasets (Bangla-Alpaca, Bangla-Orca), math datasets, all open.
Research that reaches people matters more than research that stays on a shelf. Drishtikon, a news-literacy platform for Bangladesh, is built on this lab's work.
Publications
Abdullah Khan Zehady, Shubhashis Roy Dipta, Naymul Islam, Safi Al Mamun, Santu Karmaker
Introduces Bangla-Alpaca (52k) and Bangla-Orca (172k) instruction datasets, plus 5 open BanglaLlama model variants.
Nusrat Jahan Lia, Shubhashis Roy Dipta, Abdullah Khan Zehady, Naymul Islam, Madhusodan Chakraborty, Abdullah Al Wasif
BanglaBias, a 200-article benchmark with three-way labels (gov-leaning / gov-critique / neutral), evaluated across 28 LLMs.
Building tutoring-oriented Bengali models.
Preprint coming soon
Models & Datasets
31+ models and 7+ datasets on HuggingFace, all freely available.
Built on Llama 3 / 3.1 / 3.2. Base and instruction-tuned variants, from 3B to 11B parameters.
Test-time scaling adapted for Bengali, built on Qwen-2.5 (3B/32B) and QWQ-32B. Optimized for reasoning tasks.
Team
Researchers and advisors building language technology for Bangla.

BanglaLLM

BanglaLLM

BanglaLLM
Open Source
All our code is publicly available. Contributions welcome!
Training notebooks and configs for the BanglaLlama family. LLaMA 2/3/3.1/3.2, Mistral, Mixtral, Unsloth.
Test-time scaling adapted for Bengali reasoning and complex tasks.
Evaluation framework fork with Bangla-oriented benchmarks and custom tasks.
Dataset management infrastructure for Bangla LLM work.
Crawlers for Bangla news sources and blogs, used for data collection.
Open-source translation agent for Bangla and other low-resource languages.
Q&A system over Bangla YouTube content using language models.
Multi-agent interactive classroom platform powered by Bangla language models.
Research in Production

Multi-perspective analysis platform for understanding complex information. Real-time insights powered by research-grade language models.
Visit Perspectivity
Bengali news-literacy platform with real-time bias detection. Multi-perspective analysis and source transparency for informed readers.
Visit DrishtikonWe're an open research group. Contributions, collaborations, and feedback are always welcome. The easiest way to get started is opening a GitHub issue or sending a pull request.