Coverage
The diphone inventory determines which words can be synthesized successfully.
Core transition units, safe filenames, coverage strategy, and practical TTS resources
Dataset
This page documents the practical diphone inventory used in the Bishnupriya Manipuri TTS workflow, including diphone structure, safe filename mapping, inventory layers, recording priorities, and validator-facing organization.
A diphone is the transition between two adjacent phonemes. In a diphone-based TTS system, words are synthesized by joining the audio files corresponding to these transitions.
Word ↓ IPA ↓ Phoneme sequence ↓ Diphone sequence ↓ Safe filenames ↓ Audio playback
#-k
k-ɔ
ɔ-tʰ
tʰ-a
a-#
The diphone inventory determines which words can be synthesized successfully.
A stable inventory ensures that dictionary pages, validators, and TTS playback all expect the same filenames and transitions.
A well-designed inventory avoids recording unnecessary combinations while still covering the language effectively.
| Type | Example | Purpose |
|---|---|---|
| Boundary → phoneme | #-k | Word-initial entry into speech |
| Phoneme → boundary | a-# | Word-final exit from speech |
| Consonant → vowel | k-a | Very common transition type |
| Vowel → consonant | a-r | Common syllable-closing transition |
| Consonant → consonant | g-n | Needed for clusters and learned forms |
| Vowel → vowel | a-i | Needed in vowel sequences and transitions |
Raw IPA symbols are not ideal for filesystem storage and URL loading. Therefore, each diphone is converted to a stable safe filename form.
| IPA Symbol | Safe Form | Example |
|---|---|---|
| # | sil | #-d → sil-d.wav |
| aː | aa | ʃ-aː → sh-aa.wav |
| iː | ii | d-iː → d-ii.wav |
| uː | uu | k-uː → k-uu.wav |
| ʃ | sh | i-ʃ → i-sh.wav |
| ŋ | ng | a-ŋ → a-ng.wav |
| ɔ | aw | k-ɔ → k-aw.wav |
| ə | schwa | k-ə → k-schwa.wav |
The smallest practical set for broad TTS usefulness.
Target size: about 180–220 diphones
Additional transitions for broader lexical coverage.
Target size: about 250–320 diphones
| Diphone | Safe Filename | Priority | Example Word |
|---|---|---|---|
| #-k | sil-k.wav | Core | কর |
| k-ɔ | k-aw.wav | Core | কথা |
| ɔ-tʰ | aw-th.wav | Core | কথা |
| tʰ-a | th-a.wav | Core | কথা |
| a-# | a-sil.wav | Core | কথা |
| #-d | sil-d.wav | Core | দিশা |
| d-i | d-i.wav | Core | দিশা |
| i-ʃ | i-sh.wav | Core | দিশা |
| ʃ-a | sh-a.wav | Core | দিশা |
| a-r | a-r.wav | Core | উপকার |
| Diphone | Safe Filename | Priority | Example Word |
|---|---|---|---|
| g-n | g-n.wav | Extended | অগ্নি |
| k-ʃ | k-sh.wav | Extended | অক্ষর |
| ʃ-ɔ | sh-aw.wav | Extended | অক্ষর |
| i-tʃ | i-ch.wav | Extended | ইচ্ছা |
| tʃ-i | ch-i.wav | Extended | ইচ্ছা |
| ɔ-r | aw-r.wav | Extended | অক্ষর |
Not all diphones should be recorded at the same time. A practical recording plan should start with the most useful transitions.
High-frequency boundary, consonant-vowel, and vowel-consonant diphones.
Common lexical cluster transitions and high-value extended diphones.
Rare learned forms, less frequent clusters, and low-coverage diphones.
The inventory should not only exist as a concept. It should be organized in a way that supports practical validation.
Rebuild only after pronunciation logic is stable.
Do not rename mapping conventions in the middle of the rebuild.
Rebuild in a fresh diphone directory to avoid mixing old and new files.
Read the research chapter on diphone inventory design.
Review pronunciation rules and conversion logic.
Return to the broader dataset/resources overview.