Lexical Layer
Dictionary entries and source words.
A technical overview of the Bishnupriya Manipuri text-to-speech pipeline
Architecture
This page explains the technical architecture of the Bishnupriya Manipuri text-to-speech system, showing how dictionary entries are converted into pronunciation data, diphone filenames, and final browser playback.
Dictionary word ↓ Orthographic normalization ↓ BPM to IPA conversion ↓ Phoneme tokenization ↓ Diphone generation ↓ Safe filename mapping ↓ Audio file lookup ↓ Browser playback
Each stage depends on the previous one being stable. If one layer changes unexpectedly, later stages may fail even if their own code is correct.
The goal of the TTS system is to turn a Bishnupriya Manipuri word into playable speech by combining:
One shared pronunciation engine should feed every page, validator, and playback path.
The pipeline begins with a dictionary word record. The dictionary is the lexical foundation of the system and supplies the source form that needs pronunciation and speech output.
| Field | Use in TTS |
|---|---|
| Word / BPM form | Primary text input |
| ID | Stable lookup key for API and word pages |
| IPA field (if stored) | Can support validation or display |
| Part of speech / metadata | May support future linguistic refinement |
Before pronunciation logic runs, the input should be normalized. This reduces errors caused by inconsistent encoding or unexpected character forms.
Raw text ↓ Unicode normalization ↓ clean internal form
This step is especially important for Eastern Nagari text processing.
The rule-based converter transforms the written word into an IPA representation. This stage is one of the most important in the whole architecture.
The converter must handle:
Once IPA is generated, the next step is to split it into phoneme units.
This stage must use the same phoneme rules everywhere. If tokenization differs across pages, the diphone sequence will also differ.
The phoneme sequence is transformed into diphone transitions. These are the units that connect the linguistic layer to the audio layer.
Phonemes: d i ʃ a Diphones: #-d d-i i-ʃ ʃ-a a-#
Boundary diphones are included so the word has a natural entry and exit in playback.
Diphone strings are then converted into filesystem-safe names. This lets the audio library use predictable WAV filenames instead of raw IPA symbols.
| IPA Form | Safe Form | Filename |
|---|---|---|
| #-d | sil-d | sil-d.wav |
| i-ʃ | i-sh | i-sh.wav |
| ʃ-a | sh-a | sh-a.wav |
| a-# | a-sil | a-sil.wav |
Once safe filenames are produced, the system checks whether the expected WAV files exist.
sil-d.wav d-i.wav i-sh.wav sh-a.wav a-sil.wav
If one or more expected files are missing, playback becomes partial or fails. This is where validator tools become essential.
On the client side, JavaScript or browser audio logic loads the diphone files and plays them in sequence.
load filenames ↓ request WAV files ↓ play in order ↓ heard as one synthesized word
This stage is the user-facing end of the TTS pipeline, but it depends entirely on the earlier linguistic and file-generation stages being correct.
Dictionary entries and source words.
IPA conversion, schwa rules, phoneme inventory, and tokenization.
Diphone generation and boundary handling.
Safe filename conversion and deployment naming standards.
Diphone WAV inventory, segmentation outputs, and file coverage.
Browser-side loading, sequencing, and speech playback.
Validation is not a separate afterthought. It cuts across the whole architecture.
Dictionary word ↓ IPA check ↓ Phoneme check ↓ Diphone check ↓ Filename check ↓ Audio file existence check ↓ Playback check
A good validator can show exactly where the pipeline breaks.
IPA output changes in one page but not another.
One tokenizer splits sounds differently than the stable shared one.
The expected diphone sequence no longer matches the audio library.
Safe filename rules have changed but audio files still use the old form.
Old diphone files remain mixed with rebuilt files.
Browser-side code asks for filenames that do not exist.
This reduces mismatch and makes the system much easier to maintain.
Input word: দিশা IPA: diʃa Phonemes: d i ʃ a Diphones: #-d d-i i-ʃ ʃ-a a-# Safe filenames: sil-d.wav d-i.wav i-sh.wav sh-a.wav a-sil.wav Playback: load and play these 5 files in sequence
Review the pronunciation logic and orthography-to-IPA rules.
Review the transition units used in synthesis.
Review how IPA diphones become deployment-ready filenames.
Review how the pipeline is tested and checked.
Review the source audio workflow that supports the architecture.
Review the operational checklist for clean rebuild and deployment.