The first open-source machine translation model for Bishnupriya Manipuri (BPY). Built with Meta AI's NLLB-200 and fine-tuned by the community.
Bishnupriya Manipuri is spoken by over 500,000 people across Assam, Tripura, Manipur, and Bangladesh. Despite this, it has zero support in Google Translate, Microsoft Translator, or any major AI model.
This project changes that. We fine-tuned Meta's NLLB-200-distilled-600M model using LoRA to create the world's first English → BPY translator. Version 8.4 outputs pure Bishnupriya Manipuri, not Assamese or Bengali.
Languages without digital tools fade faster. Every app, website, and AI model that skips BPY pushes young speakers toward Hindi/English. This model puts BPY on the digital map.
BPY speakers can now translate English educational content, health information, and news into their mother tongue. No more relying on Assamese or Bengali as a bridge.
Unlike Big Tech models, this is 100% open source. The training data, model weights, and code are public. The BPY community owns and controls it.
The base model facebook/nllb-200-distilled-600M has strong Assamese/Bengali bias. It saw the word "জল" for water millions of times, but never saw BPY "পানীহান".
The breakthrough: We multiplied critical BPY vocabulary 25x in the training data. This gave LoRA weights enough signal to override NLLB's 600M parameter bias.
পানীহান, হাগহান, মর to teach the model BPY vocabজলহান and correctly outputs পানীহানasm_Beng as the target token. The output script is Bengali but vocabulary/grammar is pure BPY.
The model is on Hugging Face Hub with Apache 2.0 license. Use it in any commercial or non-commercial project:
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
base = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
model = PeftModel.from_pretrained(base, "Emarthar/nllb-bpy-beng-v8_4")
tokenizer = AutoTokenizer.from_pretrained("Emarthar/nllb-bpy-beng-v8_4")
View Full Code + Docs
Test the model directly on Hugging Face. Type English, get BPY back instantly. No coding needed.
Try Live DemoThis is V8.4 with 67% accuracy. We need community help to reach 95%+. Here's how:
Found a wrong translation? Download training_data_v8_4.csv, add the correct english,bpy_beng pair, and email it to us. We duplicate it 25x and retrain.
We need 5000+ pairs for 90% accuracy. Send us English-BPY sentence pairs on any topic: family, food, agriculture, daily life. Format: English sentence,BPY translation
Are you a BPY teacher or scholar? Review our outputs for tense, plurals, and honorifics. The model currently handles simple SOV sentences best.
Developers: Load V8.4, add your dialect data, train 1-2 epochs, push V8.5. All training scripts are in the repo.
| Version | Status | Key Improvement |
|---|---|---|
| V8.4 | ✅ Released | Fixed Bengali bias. Outputs পানীহান not জলহান |
| V8.5 | In Progress | Add sun/hot/father/work vocab. Target 80% accuracy |
| V9.0 | Planned | 5000+ pairs, r=32, handle complex sentences |
Model by: Emarthar/Uttam Singha/Bishnupriya Manipuri Language Devlopement Project
Base model: Meta AI NLLB-200
License: Apache 2.0 - Free for commercial use
Dataset: Community contributed BPY corpus
To submit corrections or volunteer: contact through Hugging Face or email via manipuri.com
Download Model View Training Data HF Bishnupriya Manipuri AI Community page