Bishnupriya Manipuri (BPM) Transcription Rules — Latin / ASCII / IPA

Reference rules used for your BPM dictionary conversions into Latin, ASCII Roman, and IPA.

Scope and goals

These rules are designed to convert Bishnupriya Manipuri (BPM) headwords into: Latin (IAST-like), ASCII Roman, and IPA. The main goals are:

  • Detect & fix common OCR artifacts from scanned dictionaries.
  • Use consistent transliteration mappings.
  • Apply BPM-specific pronunciation logic (especially final schwa and Ba-fala behavior).
  • Support a Tatsama option for Sanskritized spellings.

This page documents the rules; your conversion tool applies them programmatically. If you update your converter, update this page to match.

Processing pipeline

  1. OCR cleanup (remove unwanted marks, normalize spacing and punctuation).
  2. Core BPM normalization (standardize variants: nukta forms, ya-phala, etc.).
  3. Ba-fala rewrite pass (rewrite BPM text so all outputs follow the same logic).
  4. Generate Latin output.
  5. Generate ASCII Roman output from Latin.
  6. Generate IPA output directly from BPM (same normalization).

BPM normalization rules

OCR and dictionary marker cleanup

  • Remove zero-width characters (ZWJ / ZWNJ / BOM).
  • Normalize multiple spaces → single space.
  • Normalize hyphen spacing: স - তস-ত.
  • Dictionary markers:
    • + removed (prefix marker).
    • Parentheses removed: (অ).
    • Trailing hyphen removed: অকাতা-অকাতা.

Character normalizations

  • ওয় is treated as .
  • is normalized to apostrophe '.
  • (curly apostrophe) is removed.
  • Nukta normalization:
    • ড়
    • ঢ়
    • য়
    • Remove nukta mark if present.
  • Ya-phala:
    • ্য় and ্য are treated as the same behavior (y).
  • OCR redundancy fix:
    • Remove redundant independent after a consonant (e.g., অমঅমঅমম).

Apostrophe rule

Apostrophe ' is preserved as a grammatical marker (e.g., plural). It is not automatically converted into a final schwa. Example:

অকর   → ɔkɔr
অকর'  → ɔkɔr'

Latin (IAST-like) rules

Latin output is IAST-like transliteration with inherent vowel behavior:

  • Consonants carry an inherent a unless suppressed by (virama) or replaced by a vowel sign.
  • Vowel signs map to long/short vowels (e.g., ā).
  • Anusvara, visarga:
  • Ya/Ya-phala:
    • y
    • ্য behaves like (y behavior)

Key consonant mapping highlights

BPMLatinNotes
ড় / ড়Normalized to and transliterated as .
ঋ / ৃIndependent vowel vs vowel sign both render in Latin.
Anusvara in Latin; ASCII may convert to ng.

ASCII roman rules

ASCII romanization is derived from Latin by replacing diacritics:

LatinASCIIExample
ā ī ūaa ii uubhāṛā → bhaariaa
riakṛta → akrita (ASCII uses ri)
nghurkāṅ → hurkaang
ś / ṣshśuddha → shuddha
ngaṃś → angsh (project choice)

ASCII rules are meant for search and typing convenience, not phonetics.

IPA rules

IPA output is generated from BPM with BPM-specific phonology rules. The default inherent vowel is represented as ɔ when applicable.

Core mapping notes

  • Default inherent vowel: ɔ.
  • Long vowel signs: .
  • Anusvara is rendered as ŋ (unless context rules are added later).

Special conjuncts

ConjunctNon-TatsamaTatsamaNotes
ক্ষkkʰkṣ (Latin only)Project convention: IPA uses kkʰ for non-Tatsama.
জ্ঞɡɡ (Latin only)IPA uses ɡɡ (not gy).
স্বʃNon-Tatsama treats as -like.

Final schwa rules

BPM words often drop the final inherent vowel, but many words must keep it. The tool uses these rules to decide whether to keep a final a (Latin) / ɔ (IPA):

Always keep final schwa

  • Any single-letter word.
  • Words ending with ্ + letter (final conjunct marker).
  • Words ending with or .
  • Words ending with these sequences:
    • ্ + মত
    • ্ + রত
    • ্ + রথ
    • ৃ + ত (i.e., ৃত)
    • ্ধ + ত (i.e., ্ধত)
    • ন্দ
  • Words ending in ক্ষ or জ্ঞ keep final schwa.
  • Project exception list: a curated list of words that must keep final schwa (maintained in the converter code).

Example

অর্ধাঙ্গ → ardhāṅga / ɔrdʱaːŋɡɔ (final schwa kept)

Ba-fala (্ব) rules

Ba-fala (্ব) behaves like Bengali with BPM-specific refinements. The converter applies a rewrite pass so Latin/ASCII/IPA follow the same behavior.

Rules

  1. Beginning of word: usually silent (dropped).
  2. Beginning + "া" (ā-kar): if the word begins with a Ba-fala conjunct and is followed by , add a w-sound by rewriting ্ব্ৱ at the beginning only.
  3. Middle/end: consonant is doubled (geminate), and the Ba-fala itself is dropped.
  4. Exceptions where "b" remains: if attached to ম, ব, গ, হ, or in র্‌ব (rb), keep the b-sound.
  5. Special conjunct behavior: শ্ব in the middle behaves like a doubled consonant (e.g., “śś / ʃʃ”).

Examples

অশ্ব      → aśsa / ashsha / ɔsʃɔ
বিশ্বাস   → biśsāsa / bishsaas / biʃsaːs
দ্বাদশ    → dwādaśa (beginning + ā-kar gives w-sound)

Tatsama toggle

Some words are Sanskritized (Tatsama) and should follow different conjunct spellings:

  • ক্ষ: Tatsama kṣ, non-Tatsama kkh (Latin) / kkʰ (IPA).
  • জ্ঞ: Tatsama , non-Tatsama gg (Latin/ASCII), IPA uses ɡɡ.
  • স্ব: Tatsama sw (Latin) / (IPA), non-Tatsama behaves like .

Your UI can offer a checkbox “Tatsama” that switches these mappings.

Worked examples

BPMLatinASCIIIPA
অকখাকakkhākakkhaakɔkkʰaːk
অকচাকakcākakcaakɔkt͡ʃaːk
অংশaṃśaangshaɔŋʃɔ
যজ্ঞjagga (non-Tatsama)jaggad͡ʒɔɡɡɔ
অকরakaraakaraɔkɔr
অকর'akara'akara'ɔkɔr'

Some examples depend on the evolving exception list for final schwa and other refinements. If an example differs from your current converter output, adjust either the converter or this page for consistency.