From Bishnupriya Manipuri Script to Speech

Building a Computational Pipeline for IPA, Phonemes, Diphones, and Text-to-Speech

Abstract. Bishnupriya Manipuri is an Indo-Aryan language spoken primarily in Northeast India and Bangladesh. Despite its historical and linguistic importance, the language has limited computational resources. This article presents a systematic approach for developing a speech synthesis pipeline for Bishnupriya Manipuri, beginning with orthographic text and progressing through phonetic representation, phoneme segmentation, diphone generation, and finally diphone-based text-to-speech (TTS). The work demonstrates how linguistic knowledge and software engineering can be combined to create a functional speech technology framework for under-resourced languages.

1. Introduction

Speech technology development for under-resourced languages requires careful integration of linguistic analysis and computational tools. Bishnupriya Manipuri presents several challenges:

To address these challenges, a complete computational pipeline was developed:

Bishnupriya Manipuri Script
        ↓
Phonetic transcription (IPA)
        ↓
Phoneme sequence
        ↓
Diphone segmentation
        ↓
Audio diphone database
        ↓
Text-to-Speech synthesis

This pipeline enables automatic pronunciation generation and speech synthesis from dictionary data.

2. Bishnupriya Manipuri Writing System

Bishnupriya Manipuri is typically written using the Eastern Nagari script, the same script used for Bengali and Assamese.

Example:
Script: কথা
Romanization: kôtha
IPA: kɔtʰa

The script contains a standard set of vowels and consonants.

Vowels

অ আ ই ঈ উ ঊ এ ঐ ও ঔ

Consonants

ক খ গ ঘ ঙ
চ ছ জ ঝ ঞ
ট ঠ ড ঢ ণ
ত থ দ ধ ন
প ফ ব ভ ম
য র ল শ ষ স হ

However, Bishnupriya Manipuri pronunciation differs from standard Bengali in several ways, making rule-based phonetic modeling essential.

3. Orthography to IPA Conversion

The first step in speech synthesis is converting text into phonetic representation.

Example dictionary entry:
Word: অক্ষর
IPA output: ɔkʰʃɔr

Conversion rules include letter-to-sound mappings.

Example consonant mappings

Letter IPA
k
g
ʃ
s
r
l

Example vowel mappings

Script IPA
ɔ
a
i
u
e
o

Schwa Handling

A critical part of pronunciation is handling the inherent vowel. In many Indic scripts, consonants carry a default vowel unless specific rules suppress it.

Example 1:
কথা → kɔtʰa
Example 2:
অগ্নি → ɔgni

This requires rule-based schwa deletion and consonant-cluster analysis.

4. Phoneme Extraction

Once IPA transcription is produced, the next stage is to extract phonemes.

Example:
Word: উপকার
IPA: upokar
Phoneme sequence: u p o k a r

A practical phoneme inventory for Bishnupriya Manipuri TTS includes both vowels and consonants.

Vowels

a  aː  i  iː  u  uː  e  o  ɔ  ə

Consonants

k g kʰ
t d tʰ dʰ
p b pʰ
m n ŋ
s ʃ h
r l
j w
tʃ dʒ
ɽ

These phonemes form the foundation of the speech synthesis system.

5. Diphone Concept

Instead of storing entire words, many TTS systems use diphones. A diphone represents the transition between two adjacent phonemes.

Example:
Word: কথা
Phonemes: k ɔ tʰ a
Diphones:
#-k
k-ɔ
ɔ-tʰ
tʰ-a
a-#

The symbol # represents the beginning or end of a word.

6. Diphone Audio Database

Each diphone is stored as a small audio file. For a practical diphone-based TTS system, these files are named consistently using safe filenames.

Example filenames:
sil-k.wav
k-aw.wav
aw-th.wav
th-a.wav
a-sil.wav

A functional diphone inventory may contain around 200 to 300 files, yet this can be sufficient to synthesize thousands of words.

7. Diphone Segmentation

Audio recordings of words are segmented automatically or semi-automatically into diphones.

Example recording:
Word file: উপকার.wav
Segmented diphones:
sil-u
u-p
p-o
o-k
k-a
a-r
r-sil

Each diphone is extracted and saved to the diphone database.

8. Diphone-Based Text-to-Speech

During synthesis, the system performs the following steps:

  1. Read text input
  2. Convert the word to IPA
  3. Extract phoneme sequence
  4. Generate diphone list
  5. Concatenate audio diphones to produce speech
Example:
Input word: অপরিচিত
IPA: ɔporitʃit
Phonemes: ɔ p o r i tʃ i t
Diphones:
#-ɔ
ɔ-p
p-o
o-r
r-i
i-tʃ
tʃ-i
i-t
t-#

The corresponding diphone WAV files are then joined to synthesize the word.

9. Advantages of the Diphone Method

The diphone method offers several practical advantages for under-resourced languages:

It is particularly suitable for languages with limited speech resources and limited annotated corpora.

10. Conclusion

The Bishnupriya Manipuri TTS system demonstrates how a combination of linguistic analysis and computational tools can produce speech technology for an under-resourced language.

The pipeline includes:

Script → IPA → Phoneme → Diphone → Speech

This framework can serve as the foundation for future research, including:

Suggested Follow-Up Articles

  1. Designing a Rule-Based Bishnupriya Manipuri → IPA Converter
  2. Schwa Deletion and Consonant Cluster Handling in Bishnupriya Manipuri
  3. Building a Diphone Database for a Low-Resource Language
  4. Implementing Bishnupriya Manipuri Text-to-Speech in PHP and JavaScript