Article 1
1. Introduction
A practical Bishnupriya Manipuri TTS engine must perform more than simple audio playback. It must convert a word into a sequence of diphone audio files and then play them back in the correct order.
This creates a full speech pipeline:
Dictionary word
↓
PHP pronunciation engine
↓
IPA
↓
Phonemes
↓
Diphones
↓
Safe filenames
↓
JavaScript audio playback
The system is therefore divided into two main components:
- Server side (PHP): pronunciation analysis and diphone filename generation
- Client side (JavaScript): audio loading, sequencing, and playback
2. Server-Side Architecture
On the server side, PHP handles the language-specific logic.
Its main tasks are:
- read the BPM word from the dictionary
- convert the word to IPA
- extract phonemes
- build the diphone list
- convert diphones to safe filenames
- return this information as JSON
{
"ipa": "diʃa",
"phonemes": ["d","i","ʃ","a"],
"diphones": ["#-d","d-i","i-ʃ","ʃ-a","a-#"],
"diphone_files": [
"sil-d.wav",
"d-i.wav",
"i-sh.wav",
"sh-a.wav",
"a-sil.wav"
]
}
3. The Role of the Analyze API
A dedicated API endpoint such as analyze_api.php is a central part of the system.
This endpoint receives a word ID or BPM word and returns structured pronunciation data.
A typical request may look like:
/analyze_api.php?id=4716
The JSON response may contain:
- normalized word
- IPA
- phoneme list
- diphone list
- safe filenames
- trace information explaining the pronunciation
This API makes the system modular, because the same pronunciation engine can be reused by:
- word pages
- diphone batch tools
- validator pages
- fallback audio generators
4. IPA to Diphone Processing in PHP
Once PHP has produced the IPA string, the next step is phoneme tokenization.
IPA: diʃa Phonemes: d i ʃ a Diphones: #-d d-i i-ʃ ʃ-a a-#
The PHP pipeline typically includes these functions:
- IPA tokenization
- diphone generation
- safe filename conversion
For example:
#-d → sil-d.wav d-i → d-i.wav i-ʃ → i-sh.wav ʃ-a → sh-a.wav a-# → a-sil.wav
5. Safe Filename Mapping
A browser audio engine cannot reliably load raw IPA filenames because IPA symbols may not be safe in URLs or file systems. Therefore, the server converts each diphone into a safe filename.
| IPA form | Safe form |
|---|---|
| # | sil |
| aː | aa |
| iː | ii |
| uː | uu |
| ʃ | sh |
| ŋ | ng |
| ɔ | aw |
| ə | schwa |
#-d → sil-d.wav ʃ-aː → sh-aa.wav aː-# → aa-sil.wav
Once this mapping is fixed, every page in the system must use the same conversion rules.
6. Client-Side Playback with JavaScript
On the client side, JavaScript receives the diphone file list and plays the audio files in order.
A simplified playback logic is:
1. Request pronunciation JSON from the API 2. Read diphone filename list 3. Build audio URLs 4. Load each WAV file 5. Play them sequentially
This can be implemented using the HTML5 Audio API or the Web Audio API.
7. Sequential Audio Playback
The simplest approach is to play each WAV file one after another.
sil-d.wav d-i.wav i-sh.wav sh-a.wav a-sil.wav
Each file is loaded and played, and when one file ends, the next begins.
Pseudo-code:
load diphone file list
for each file:
create audio object
wait until previous file ends
play next file
This approach is easy to implement but may introduce tiny gaps between segments.
8. Improved Playback with Web Audio API
For smoother synthesis, the Web Audio API can be used. This allows:
- preloading all diphone files
- buffer-based playback
- gap reduction
- crossfade smoothing
A more advanced playback engine can:
fetch all diphone WAV files decode audio buffers schedule them back-to-back optionally overlap by 5–10 ms
This reduces clicks and improves continuity.
9. Missing Diphone Handling
A robust TTS engine must handle missing diphone files gracefully.
When a file is missing, the system may:
- show an error message
- display the missing filenames
- skip the missing segment
- fall back to whole-word audio if available
Missing (2) — Coverage: 60% sil-d.wav aa-sil.wav
This type of feedback is extremely useful during diphone inventory rebuilding.
10. Integration into Dictionary Pages
On a dictionary word page, the TTS button can trigger a JavaScript call that:
- reads the dictionary word or entry ID
- calls the pronunciation API
- receives diphone files
- plays them automatically
Word page ↓ Click “Play TTS” ↓ Fetch pronunciation JSON ↓ Receive diphone_files[] ↓ Play WAV files in sequence
11. Why Consistency Across Pages Matters
One of the biggest engineering problems in diphone TTS systems is inconsistency. If different pages generate different IPA forms or different safe filenames, the audio files will not match.
Therefore, all pages must use:
- the same IPA converter
- the same phoneme tokenizer
- the same diphone generator
- the same safe filename rules
A shared engine file, such as a common PHP module, helps enforce this consistency.
12. Recommended System Structure
A clean Bishnupriya Manipuri TTS web implementation may use the following structure:
/bpm_converter_core.php /diphone_engine.php /analyze_api.php /word.php /diphone_validator.php /audio/diphone/
Roles:
bpm_converter_core.php→ pronunciation enginediphone_engine.php→ diphone tokenization and safe filename logicanalyze_api.php→ JSON pronunciation outputword.php→ user interface and playback buttondiphone_validator.php→ inventory testing/audio/diphone/→ actual WAV files
13. Practical Playback Example
Server output:
IPA: diʃa Phonemes: d i ʃ a Diphones: #-d d-i i-ʃ ʃ-a a-# Safe filenames: sil-d.wav d-i.wav i-sh.wav sh-a.wav a-sil.wav
JavaScript playback order:
1. /audio/diphone/sil-d.wav 2. /audio/diphone/d-i.wav 3. /audio/diphone/i-sh.wav 4. /audio/diphone/sh-a.wav 5. /audio/diphone/a-sil.wav
These files are played sequentially to synthesize the word.
14. Advantages of PHP + JavaScript for TTS
This implementation model has several advantages:
- easy web integration
- no special client installation required
- server-side pronunciation control
- browser-based playback
- good compatibility with dictionary websites
It is also highly suitable for research projects and language preservation platforms.
15. Limitations
A diphone TTS engine implemented this way still has some limitations:
- quality depends heavily on diphone recording consistency
- missing files interrupt playback
- plain sequential playback may sound slightly segmented
- rare clusters may require additional recordings
These limitations can be reduced through:
- better normalization
- crossfade smoothing
- clean diphone inventory design
- systematic validation
16. Conclusion
A Bishnupriya Manipuri TTS engine can be implemented effectively in PHP and JavaScript using a diphone-based design.
The architecture is:
PHP: word → IPA → phonemes → diphones → filenames JavaScript: filenames → audio loading → playback → speech
This design makes it possible to integrate speech synthesis directly into an online dictionary and provides a practical path for language technology in an under-resourced language.
Next Article
Article 9 Integrating Speech into an Online Bishnupriya Manipuri Dictionary
The next article explains how dictionary pages, word records, audio tools, and TTS playback can be combined into a unified web platform.