From Bishnupriya Manipuri Script to Speech

Building a Computational Pipeline for IPA, Phonemes, Diphones, and Text-to-Speech

Abstract. Bishnupriya Manipuri is an Indo-Aryan language spoken primarily in Northeast India and Bangladesh. Despite its historical and linguistic importance, the language has limited computational resources. This article presents a systematic approach for developing a speech synthesis pipeline for Bishnupriya Manipuri, beginning with orthographic text and progressing through phonetic representation, phoneme segmentation, diphone generation, and finally diphone-based text-to-speech (TTS). The work demonstrates how linguistic knowledge and software engineering can be combined to create a functional speech technology framework for under-resourced languages.

1. Introduction

Speech technology development for under-resourced languages requires careful integration of linguistic analysis and computational tools. Bishnupriya Manipuri presents several challenges:

variation in orthographic conventions
influence from Sanskrit, Bengali, and Assamese
complex consonant clusters
schwa deletion patterns
lack of standardized phonetic resources

To address these challenges, a complete computational pipeline was developed:

Bishnupriya Manipuri Script
        ↓
Phonetic transcription (IPA)
        ↓
Phoneme sequence
        ↓
Diphone segmentation
        ↓
Audio diphone database
        ↓
Text-to-Speech synthesis

This pipeline enables automatic pronunciation generation and speech synthesis from dictionary data.

2. Bishnupriya Manipuri Writing System

Bishnupriya Manipuri is typically written using the Eastern Nagari script, the same script used for Bengali and Assamese.

Example:
Script: কথা
Romanization: kôtha
IPA: kɔtʰa

The script contains a standard set of vowels and consonants.

Vowels

অ আ ই ঈ উ ঊ এ ঐ ও ঔ

Consonants

ক খ গ ঘ ঙ
চ ছ জ ঝ ঞ
ট ঠ ড ঢ ণ
ত থ দ ধ ন
প ফ ব ভ ম
য র ল শ ষ স হ

However, Bishnupriya Manipuri pronunciation differs from standard Bengali in several ways, making rule-based phonetic modeling essential.

3. Orthography to IPA Conversion

The first step in speech synthesis is converting text into phonetic representation.

Example dictionary entry:
Word: অক্ষর
IPA output: ɔkʰʃɔr

Conversion rules include letter-to-sound mappings.

Example consonant mappings

Letter	IPA
ক	k
খ	kʰ
গ	g
চ	tʃ
জ	dʒ
শ	ʃ
স	s
র	r
ল	l

Example vowel mappings

Script	IPA
অ	ɔ
আ	a
ই	i
উ	u
এ	e
ও	o

Schwa Handling

A critical part of pronunciation is handling the inherent vowel. In many Indic scripts, consonants carry a default vowel unless specific rules suppress it.

Example 1:
কথা → kɔtʰa

Example 2:
অগ্নি → ɔgni

This requires rule-based schwa deletion and consonant-cluster analysis.

4. Phoneme Extraction

Once IPA transcription is produced, the next stage is to extract phonemes.

Example:
Word: উপকার
IPA: upokar
Phoneme sequence: u p o k a r

A practical phoneme inventory for Bishnupriya Manipuri TTS includes both vowels and consonants.

Vowels

a  aː  i  iː  u  uː  e  o  ɔ  ə

Consonants

k g kʰ
t d tʰ dʰ
p b pʰ
m n ŋ
s ʃ h
r l
j w
tʃ dʒ
ɽ

These phonemes form the foundation of the speech synthesis system.

5. Diphone Concept

Instead of storing entire words, many TTS systems use diphones. A diphone represents the transition between two adjacent phonemes.

Example:
Word: কথা
Phonemes: k ɔ tʰ a
Diphones:

#-k
k-ɔ
ɔ-tʰ
tʰ-a
a-#

The symbol # represents the beginning or end of a word.

6. Diphone Audio Database

Each diphone is stored as a small audio file. For a practical diphone-based TTS system, these files are named consistently using safe filenames.

Example filenames:

sil-k.wav
k-aw.wav
aw-th.wav
th-a.wav
a-sil.wav

A functional diphone inventory may contain around 200 to 300 files, yet this can be sufficient to synthesize thousands of words.

7. Diphone Segmentation

Audio recordings of words are segmented automatically or semi-automatically into diphones.

Example recording:
Word file: উপকার.wav
Segmented diphones:

sil-u
u-p
p-o
o-k
k-a
a-r
r-sil

Each diphone is extracted and saved to the diphone database.

8. Diphone-Based Text-to-Speech

During synthesis, the system performs the following steps:

Read text input
Convert the word to IPA
Extract phoneme sequence
Generate diphone list
Concatenate audio diphones to produce speech

Example:
Input word: অপরিচিত
IPA: ɔporitʃit
Phonemes: ɔ p o r i tʃ i t
Diphones:

#-ɔ
ɔ-p
p-o
o-r
r-i
i-tʃ
tʃ-i
i-t
t-#

The corresponding diphone WAV files are then joined to synthesize the word.

9. Advantages of the Diphone Method

The diphone method offers several practical advantages for under-resourced languages:

small audio database
relatively natural sound
easy expansion and correction
compatibility with dictionary-based systems
good balance between quality and implementation simplicity

It is particularly suitable for languages with limited speech resources and limited annotated corpora.

10. Conclusion

The Bishnupriya Manipuri TTS system demonstrates how a combination of linguistic analysis and computational tools can produce speech technology for an under-resourced language.

The pipeline includes:

Script → IPA → Phoneme → Diphone → Speech

This framework can serve as the foundation for future research, including:

neural speech synthesis
speech recognition
pronunciation dictionaries
language learning tools
digital preservation of Bishnupriya Manipuri

Bishnupriya Manipuri Research Archive

Language, linguistics, dictionary, IPA, phonemes, diphones, and speech technology

From Bishnupriya Manipuri Script to Speech

1. Introduction

2. Bishnupriya Manipuri Writing System

Vowels

Consonants

3. Orthography to IPA Conversion

Example consonant mappings

Example vowel mappings

Schwa Handling

4. Phoneme Extraction

Vowels

Consonants

5. Diphone Concept

6. Diphone Audio Database

7. Diphone Segmentation

8. Diphone-Based Text-to-Speech

9. Advantages of the Diphone Method

10. Conclusion

Suggested Follow-Up Articles