Article 1

Abstract. A diphone-based text-to-speech engine can be implemented efficiently in a web environment using PHP on the server side and JavaScript on the client side. PHP handles dictionary lookup, IPA conversion, phoneme extraction, diphone generation, and API output, while JavaScript loads and concatenates the required diphone audio files for playback in the browser. This article describes the architecture of such a TTS engine for Bishnupriya Manipuri.

1. Introduction

A practical Bishnupriya Manipuri TTS engine must perform more than simple audio playback. It must convert a word into a sequence of diphone audio files and then play them back in the correct order.

This creates a full speech pipeline:

Dictionary word
    ↓
PHP pronunciation engine
    ↓
IPA
    ↓
Phonemes
    ↓
Diphones
    ↓
Safe filenames
    ↓
JavaScript audio playback

The system is therefore divided into two main components:

2. Server-Side Architecture

On the server side, PHP handles the language-specific logic.

Its main tasks are:

Typical PHP output:
{
  "ipa": "diʃa",
  "phonemes": ["d","i","ʃ","a"],
  "diphones": ["#-d","d-i","i-ʃ","ʃ-a","a-#"],
  "diphone_files": [
    "sil-d.wav",
    "d-i.wav",
    "i-sh.wav",
    "sh-a.wav",
    "a-sil.wav"
  ]
}

3. The Role of the Analyze API

A dedicated API endpoint such as analyze_api.php is a central part of the system. This endpoint receives a word ID or BPM word and returns structured pronunciation data.

A typical request may look like:

/analyze_api.php?id=4716

The JSON response may contain:

This API makes the system modular, because the same pronunciation engine can be reused by:

4. IPA to Diphone Processing in PHP

Once PHP has produced the IPA string, the next step is phoneme tokenization.

Example:
IPA: diʃa
Phonemes: d i ʃ a
Diphones: #-d d-i i-ʃ ʃ-a a-#

The PHP pipeline typically includes these functions:

  1. IPA tokenization
  2. diphone generation
  3. safe filename conversion

For example:

#-d   → sil-d.wav
d-i   → d-i.wav
i-ʃ   → i-sh.wav
ʃ-a   → sh-a.wav
a-#   → a-sil.wav

5. Safe Filename Mapping

A browser audio engine cannot reliably load raw IPA filenames because IPA symbols may not be safe in URLs or file systems. Therefore, the server converts each diphone into a safe filename.

IPA form Safe form
#sil
aa
ii
uu
ʃsh
ŋng
ɔaw
əschwa
Examples:
#-d   → sil-d.wav
ʃ-aː  → sh-aa.wav
aː-#  → aa-sil.wav

Once this mapping is fixed, every page in the system must use the same conversion rules.

6. Client-Side Playback with JavaScript

On the client side, JavaScript receives the diphone file list and plays the audio files in order.

A simplified playback logic is:

1. Request pronunciation JSON from the API
2. Read diphone filename list
3. Build audio URLs
4. Load each WAV file
5. Play them sequentially

This can be implemented using the HTML5 Audio API or the Web Audio API.

7. Sequential Audio Playback

The simplest approach is to play each WAV file one after another.

Example file sequence:
sil-d.wav
d-i.wav
i-sh.wav
sh-a.wav
a-sil.wav

Each file is loaded and played, and when one file ends, the next begins.

Pseudo-code:

load diphone file list
for each file:
    create audio object
    wait until previous file ends
    play next file

This approach is easy to implement but may introduce tiny gaps between segments.

8. Improved Playback with Web Audio API

For smoother synthesis, the Web Audio API can be used. This allows:

A more advanced playback engine can:

fetch all diphone WAV files
decode audio buffers
schedule them back-to-back
optionally overlap by 5–10 ms

This reduces clicks and improves continuity.

9. Missing Diphone Handling

A robust TTS engine must handle missing diphone files gracefully.

When a file is missing, the system may:

Example missing-file message:
Missing (2) — Coverage: 60%
sil-d.wav
aa-sil.wav

This type of feedback is extremely useful during diphone inventory rebuilding.

10. Integration into Dictionary Pages

On a dictionary word page, the TTS button can trigger a JavaScript call that:

  1. reads the dictionary word or entry ID
  2. calls the pronunciation API
  3. receives diphone files
  4. plays them automatically
Example interface flow:
Word page
   ↓
Click “Play TTS”
   ↓
Fetch pronunciation JSON
   ↓
Receive diphone_files[]
   ↓
Play WAV files in sequence

11. Why Consistency Across Pages Matters

One of the biggest engineering problems in diphone TTS systems is inconsistency. If different pages generate different IPA forms or different safe filenames, the audio files will not match.

Therefore, all pages must use:

A shared engine file, such as a common PHP module, helps enforce this consistency.

12. Recommended System Structure

A clean Bishnupriya Manipuri TTS web implementation may use the following structure:

/bpm_converter_core.php
/diphone_engine.php
/analyze_api.php
/word.php
/diphone_validator.php
/audio/diphone/

Roles:

13. Practical Playback Example

Input word: দিশা

Server output:

IPA: diʃa
Phonemes: d i ʃ a
Diphones:
#-d
d-i
i-ʃ
ʃ-a
a-#
Safe filenames:
sil-d.wav
d-i.wav
i-sh.wav
sh-a.wav
a-sil.wav

JavaScript playback order:

1. /audio/diphone/sil-d.wav
2. /audio/diphone/d-i.wav
3. /audio/diphone/i-sh.wav
4. /audio/diphone/sh-a.wav
5. /audio/diphone/a-sil.wav

These files are played sequentially to synthesize the word.

14. Advantages of PHP + JavaScript for TTS

This implementation model has several advantages:

It is also highly suitable for research projects and language preservation platforms.

15. Limitations

A diphone TTS engine implemented this way still has some limitations:

These limitations can be reduced through:

16. Conclusion

A Bishnupriya Manipuri TTS engine can be implemented effectively in PHP and JavaScript using a diphone-based design.

The architecture is:

PHP:
word → IPA → phonemes → diphones → filenames

JavaScript:
filenames → audio loading → playback → speech

This design makes it possible to integrate speech synthesis directly into an online dictionary and provides a practical path for language technology in an under-resourced language.

Next Article

Article 9
Integrating Speech into an Online Bishnupriya Manipuri Dictionary

The next article explains how dictionary pages, word records, audio tools, and TTS playback can be combined into a unified web platform.