Diphone Inventory

Core transition units, safe filenames, coverage strategy, and practical TTS resources

Dataset

Diphone Inventory

This page documents the practical diphone inventory used in the Bishnupriya Manipuri TTS workflow, including diphone structure, safe filename mapping, inventory layers, recording priorities, and validator-facing organization.

About this dataset page. The article series explains the theory of diphone-based synthesis. This page is the working resource layer: a practical reference for organizing, recording, naming, validating, and expanding the diphone inventory.

1. What Is a Diphone?

A diphone is the transition between two adjacent phonemes. In a diphone-based TTS system, words are synthesized by joining the audio files corresponding to these transitions.

Word
  ↓
IPA
  ↓
Phoneme sequence
  ↓
Diphone sequence
  ↓
Safe filenames
  ↓
Audio playback
  
Example:
Word: কথা
IPA: kɔtʰa
Phonemes: k ɔ tʰ a
Diphones:
#-k
k-ɔ
ɔ-tʰ
tʰ-a
a-#
    

2. Why the Inventory Matters

Coverage

The diphone inventory determines which words can be synthesized successfully.

Consistency

A stable inventory ensures that dictionary pages, validators, and TTS playback all expect the same filenames and transitions.

Efficiency

A well-designed inventory avoids recording unnecessary combinations while still covering the language effectively.

3. Structural Types of Diphones

Type Example Purpose
Boundary → phoneme #-k Word-initial entry into speech
Phoneme → boundary a-# Word-final exit from speech
Consonant → vowel k-a Very common transition type
Vowel → consonant a-r Common syllable-closing transition
Consonant → consonant g-n Needed for clusters and learned forms
Vowel → vowel a-i Needed in vowel sequences and transitions

4. Safe Filename Mapping

Raw IPA symbols are not ideal for filesystem storage and URL loading. Therefore, each diphone is converted to a stable safe filename form.

IPA Symbol Safe Form Example
# sil #-d → sil-d.wav
aa ʃ-aː → sh-aa.wav
ii d-iː → d-ii.wav
uu k-uː → k-uu.wav
ʃ sh i-ʃ → i-sh.wav
ŋ ng a-ŋ → a-ng.wav
ɔ aw k-ɔ → k-aw.wav
ə schwa k-ə → k-schwa.wav
Safe filename rules should be frozen before rebuilding the diphone folder. If the naming rules change mid-project, old files become incompatible with new TTS expectations.

5. Inventory Layers

Core Inventory

The smallest practical set for broad TTS usefulness.

  • word-initial boundary diphones
  • word-final boundary diphones
  • common consonant-vowel transitions
  • common vowel-consonant transitions

Target size: about 180–220 diphones

Extended Inventory

Additional transitions for broader lexical coverage.

  • rare clusters
  • learned Sanskritic forms
  • less frequent consonant transitions
  • special vowel transitions

Target size: about 250–320 diphones

6. Example Core Inventory Table

Diphone Safe Filename Priority Example Word
#-ksil-k.wavCoreকর
k-ɔk-aw.wavCoreকথা
ɔ-tʰaw-th.wavCoreকথা
tʰ-ath-a.wavCoreকথা
a-#a-sil.wavCoreকথা
#-dsil-d.wavCoreদিশা
d-id-i.wavCoreদিশা
i-ʃi-sh.wavCoreদিশা
ʃ-ash-a.wavCoreদিশা
a-ra-r.wavCoreউপকার

7. Example Extended Inventory Table

Diphone Safe Filename Priority Example Word
g-ng-n.wavExtendedঅগ্নি
k-ʃk-sh.wavExtendedঅক্ষর
ʃ-ɔsh-aw.wavExtendedঅক্ষর
i-tʃi-ch.wavExtendedইচ্ছা
tʃ-ich-i.wavExtendedইচ্ছা
ɔ-raw-r.wavExtendedঅক্ষর

8. Recording Priority Strategy

Not all diphones should be recorded at the same time. A practical recording plan should start with the most useful transitions.

Priority 1

High-frequency boundary, consonant-vowel, and vowel-consonant diphones.

Priority 2

Common lexical cluster transitions and high-value extended diphones.

Priority 3

Rare learned forms, less frequent clusters, and low-coverage diphones.

9. Validator-Facing Organization

The inventory should not only exist as a concept. It should be organized in a way that supports practical validation.

Useful validator-facing fields:
  • diphone
  • safe filename
  • priority level
  • recorded status
  • segmented status
  • uploaded status
  • validator pass/fail
  • example source word

10. Rebuild Principles

Freeze IPA Rules

Rebuild only after pronunciation logic is stable.

Freeze Safe Filenames

Do not rename mapping conventions in the middle of the rebuild.

Use a Clean Folder

Rebuild in a fresh diphone directory to avoid mixing old and new files.

11. Related Archive Pages

Inventory note. This page should gradually evolve from a conceptual inventory page into a live working dataset page, including actual downloadable tables, coverage trackers, and validator-facing spreadsheets.