Future Directions: Neural TTS and Advanced Speech Technology for Bishnupriya Manipuri

Abstract. The development of a rule-based IPA converter, phoneme extractor, diphone inventory, and dictionary-integrated TTS system creates the first foundation for Bishnupriya Manipuri speech technology. However, this work also opens the way toward more advanced systems such as neural TTS, automatic speech recognition, pronunciation learning tools, and digital language preservation platforms. This article explores the future directions of speech technology research for Bishnupriya Manipuri.

1. Introduction

The current Bishnupriya Manipuri speech system is built on a rule-based pipeline:

Script → IPA → Phoneme → Diphone → Speech

This architecture is highly valuable because it provides:

a stable pronunciation engine
a reusable diphone audio database
a web-based TTS implementation
a computational framework for future research

Once such a foundation exists, more advanced speech technologies become possible.

2. From Diphone TTS to Neural TTS

A diphone TTS system is an efficient and practical solution for a low-resource language, but it has some natural limitations:

speech may sound segmented
prosody is limited
intonation is not modeled deeply
quality depends on diphone coverage

Neural TTS systems can overcome many of these limitations.

What neural TTS adds

smoother transitions
better naturalness
improved prosody and rhythm
speaker adaptation possibilities
sentence-level fluency

However, neural TTS usually requires much more training data than a diphone-based approach.

3. Why the Current System Is Still Important

Even if the long-term goal is neural TTS, the current diphone-based system remains essential.

It provides:

a pronunciation lexicon
an IPA-conversion engine
a phoneme inventory
aligned speech data
recorded word audio

These are exactly the kinds of resources needed later for neural training.

In other words, the current rule-based and diphone-based system is not a dead end. It is the training and documentation foundation for future neural systems.

4. Data Requirements for Neural TTS

A future Bishnupriya Manipuri neural TTS system would need:

Resource	Purpose
clean speech recordings	training acoustic model
text transcripts	text-audio alignment
IPA or phoneme representation	pronunciation supervision
speaker consistency	voice stability
normalized audio	training quality

The current dictionary audio project already contributes toward these resources.

5. Expanding from Words to Sentences

The current TTS system primarily synthesizes individual words. A future system should expand to sentence-level speech.

This requires:

word boundary handling
phrase-level prosody
stress and rhythm modeling
punctuation-sensitive intonation

Current focus:

কথা
দিশা
অক্ষর

Future focus:

আজি মি স্কুলে যিতউগা।
তি কথাহান হুন।
এরে ৱাহি এহানর অর্থহান কিহান?

6. Automatic Speech Recognition

Another future direction is automatic speech recognition (ASR), which converts speech into text.

If Bishnupriya Manipuri audio and transcription resources continue to grow, the following applications become possible:

voice search in the dictionary
speech-to-text tools
language learning pronunciation feedback
oral archive transcription

ASR development would require:

larger audio corpora
carefully transcribed speech
speaker variation
sentence-level recordings

7. Language Learning Applications

One of the most promising future uses of the current work is language learning.

A speech-enabled Bishnupriya Manipuri dictionary can support:

pronunciation playback for each word
IPA and phoneme visualization
syllable segmentation
pronunciation comparison tools
learner speaking practice

Possible learner workflow:

Search word
   ↓
Read meaning
   ↓
Listen to pronunciation
   ↓
See IPA
   ↓
Repeat and compare

8. Digital Preservation of Bishnupriya Manipuri

Speech technology is not only a technical goal. It is also a method of language preservation.

For an under-resourced language, a digital archive of:

dictionary words
recorded pronunciation
phonetic transcription
speech synthesis tools

is itself a major act of preservation.

It helps ensure that future generations can study and hear the language, even if spoken usage changes over time.

9. Building a Full Linguistic Platform

The current dictionary and TTS system could eventually become part of a much larger Bishnupriya Manipuri language platform.

Such a platform might include:

dictionary
TTS playback
IPA converter
morphological tools
sentence parser
speech recognition
educational content
audio archive

This would transform the project from a dictionary into a full digital language resource.

10. Research Questions for the Future

The current work opens several important research questions:

What is the most stable phoneme inventory for BPM TTS?
How should schwa deletion be modeled across lexical classes?
Which diphone inventory provides the best balance of quality and size?
How much dictionary audio is needed for neural TTS training?
Can sentence-level prosody be modeled with rule-based methods?
How can speaker variation be documented without harming consistency?

These questions can guide future publications and linguistic investigation.

11. A Possible Development Roadmap

A realistic future roadmap could look like this:

Phase 1: Stabilize the current system

freeze IPA rules
rebuild clean diphone inventory
validate dictionary TTS playback

Phase 2: Expand the audio resource

record more word audio
add sentence recordings
improve coverage of rare phonotactic patterns

Phase 3: Build a training corpus

align text with audio
normalize metadata
prepare machine-readable datasets

Phase 4: Research advanced speech models

experiment with neural TTS
explore ASR
compare rule-based and neural pronunciation modeling

12. Challenges Ahead

Future work also faces several challenges:

limited amount of high-quality audio data
inconsistent orthography in real sources
speaker and dialect variation
lack of large annotated corpora
technical resource constraints

These are normal challenges for under-resourced language technology, and they do not prevent progress.

13. Why This Work Matters

The creation of speech technology for Bishnupriya Manipuri is important for:

language preservation
digital humanities
linguistic research
education
cultural continuity

A functioning IPA converter, diphone engine, and web-based TTS system already represent a major contribution.

They turn the language into a computationally documented and interactively accessible system.

14. Conclusion

The future of Bishnupriya Manipuri speech technology extends beyond a dictionary or a simple diphone TTS engine. The work completed so far provides a base for:

neural speech synthesis
automatic speech recognition
language learning tools
digital preservation systems
computational linguistic research

The most important lesson is that advanced language technology grows from carefully built foundations. A stable rule-based system, a clean phoneme inventory, and a validated diphone database are the first steps toward a much larger future.

Series Conclusion

This ten-article series has documented the full progression:

1. Script to Speech Pipeline
2. Rule-Based IPA Conversion
3. Schwa Deletion Rules
4. Phoneme Inventory
5. Diphone Inventory Design
6. Recording and Normalization
7. Automatic Diphone Segmentation
8. PHP + JavaScript TTS Engine
9. Dictionary Integration
10. Future Directions

Together, these articles form a structured documentation framework for Bishnupriya Manipuri computational linguistics and speech technology.

Bishnupriya Manipuri Research Archive

Language, linguistics, dictionary, IPA, phonemes, diphones, and speech technology