Future Directions: Neural TTS and Advanced Speech Technology for Bishnupriya Manipuri

Abstract. The development of a rule-based IPA converter, phoneme extractor, diphone inventory, and dictionary-integrated TTS system creates the first foundation for Bishnupriya Manipuri speech technology. However, this work also opens the way toward more advanced systems such as neural TTS, automatic speech recognition, pronunciation learning tools, and digital language preservation platforms. This article explores the future directions of speech technology research for Bishnupriya Manipuri.

1. Introduction

The current Bishnupriya Manipuri speech system is built on a rule-based pipeline:

Script → IPA → Phoneme → Diphone → Speech

This architecture is highly valuable because it provides:

Once such a foundation exists, more advanced speech technologies become possible.

2. From Diphone TTS to Neural TTS

A diphone TTS system is an efficient and practical solution for a low-resource language, but it has some natural limitations:

Neural TTS systems can overcome many of these limitations.

What neural TTS adds

However, neural TTS usually requires much more training data than a diphone-based approach.

3. Why the Current System Is Still Important

Even if the long-term goal is neural TTS, the current diphone-based system remains essential.

It provides:

These are exactly the kinds of resources needed later for neural training.

In other words, the current rule-based and diphone-based system is not a dead end. It is the training and documentation foundation for future neural systems.

4. Data Requirements for Neural TTS

A future Bishnupriya Manipuri neural TTS system would need:

Resource Purpose
clean speech recordingstraining acoustic model
text transcriptstext-audio alignment
IPA or phoneme representationpronunciation supervision
speaker consistencyvoice stability
normalized audiotraining quality

The current dictionary audio project already contributes toward these resources.

5. Expanding from Words to Sentences

The current TTS system primarily synthesizes individual words. A future system should expand to sentence-level speech.

This requires:

Current focus:
কথা
দিশা
অক্ষর
Future focus:
আজি মি স্কুলে যিতউগা।
তি কথাহান হুন।
এরে ৱাহি এহানর অর্থহান কিহান?

6. Automatic Speech Recognition

Another future direction is automatic speech recognition (ASR), which converts speech into text.

If Bishnupriya Manipuri audio and transcription resources continue to grow, the following applications become possible:

ASR development would require:

7. Language Learning Applications

One of the most promising future uses of the current work is language learning.

A speech-enabled Bishnupriya Manipuri dictionary can support:

Possible learner workflow:
Search word
   ↓
Read meaning
   ↓
Listen to pronunciation
   ↓
See IPA
   ↓
Repeat and compare

8. Digital Preservation of Bishnupriya Manipuri

Speech technology is not only a technical goal. It is also a method of language preservation.

For an under-resourced language, a digital archive of:

is itself a major act of preservation.

It helps ensure that future generations can study and hear the language, even if spoken usage changes over time.

9. Building a Full Linguistic Platform

The current dictionary and TTS system could eventually become part of a much larger Bishnupriya Manipuri language platform.

Such a platform might include:

This would transform the project from a dictionary into a full digital language resource.

10. Research Questions for the Future

The current work opens several important research questions:

These questions can guide future publications and linguistic investigation.

11. A Possible Development Roadmap

A realistic future roadmap could look like this:

Phase 1: Stabilize the current system

Phase 2: Expand the audio resource

Phase 3: Build a training corpus

Phase 4: Research advanced speech models

12. Challenges Ahead

Future work also faces several challenges:

These are normal challenges for under-resourced language technology, and they do not prevent progress.

13. Why This Work Matters

The creation of speech technology for Bishnupriya Manipuri is important for:

A functioning IPA converter, diphone engine, and web-based TTS system already represent a major contribution.

They turn the language into a computationally documented and interactively accessible system.

14. Conclusion

The future of Bishnupriya Manipuri speech technology extends beyond a dictionary or a simple diphone TTS engine. The work completed so far provides a base for:

The most important lesson is that advanced language technology grows from carefully built foundations. A stable rule-based system, a clean phoneme inventory, and a validated diphone database are the first steps toward a much larger future.

Series Conclusion

This ten-article series has documented the full progression:

1. Script to Speech Pipeline
2. Rule-Based IPA Conversion
3. Schwa Deletion Rules
4. Phoneme Inventory
5. Diphone Inventory Design
6. Recording and Normalization
7. Automatic Diphone Segmentation
8. PHP + JavaScript TTS Engine
9. Dictionary Integration
10. Future Directions

Together, these articles form a structured documentation framework for Bishnupriya Manipuri computational linguistics and speech technology.