Missing File
The expected WAV file does not exist in the diphone folder.
Coverage checks, missing-file logic, rebuild validation, and deployment-ready testing
Workflow
This page documents the practical validator workflow used to test diphone coverage, detect missing audio files, verify safe filename consistency, and confirm that the TTS system is ready for clean deployment.
The validator exists to answer one practical question:
To answer that, the validator checks:
Word ↓ IPA conversion ↓ Phoneme sequence ↓ Diphone sequence ↓ Safe filename generation ↓ Check diphone WAV files ↓ Pass / Missing / Mismatch result
If any stage is inconsistent, the validator can expose the exact point of failure.
| Field | Purpose |
|---|---|
| Word | The source dictionary word being tested |
| IPA | The pronunciation form produced by the converter |
| Phonemes | The tokenized sound sequence |
| Diphones | The expected diphone transitions |
| Safe filenames | The expected filesystem names for each diphone |
| Missing files | Expected diphone files not found in the audio folder |
| Coverage | Percentage of expected diphones currently present |
| Status | Pass / Partial / Fail summary |
Word: দিশা IPA: diʃa Phonemes: d i ʃ a Diphones: #-d d-i i-ʃ ʃ-a a-# Safe filenames: sil-d.wav d-i.wav i-sh.wav sh-a.wav a-sil.wav Missing: i-sh.wav Coverage: 4 / 5 = 80% Status: Partial
The expected WAV file does not exist in the diphone folder.
The diphone exists conceptually, but the validator expects a different safe filename.
A page is producing a different IPA form than the stable shared converter.
The expected diphone sequence differs from the sequence used when the files were created.
Old diphone files from a previous rule system remain in the folder and cause confusion.
Initial or final boundary mapping differs between validation logic and file generation.
A validator should not only detect missing files. It should identify them clearly and predictably.
This makes it easy to rebuild only the missing parts of the inventory.
Coverage is a simple but powerful summary:
Coverage = existing diphone files / expected diphone files
Example:
Expected: 5 Found: 3 Coverage: 60%
This is especially useful when testing large batches of words or comparing progress across rebuild stages.
Best for debugging one word or one pronunciation problem at a time.
Best for rebuild testing and inventory progress tracking.
Freeze IPA rules ↓ Freeze safe filename rules ↓ Back up old diphone folder ↓ Create fresh diphone folder ↓ Copy only current-generation files ↓ Run validator on sample test list ↓ Fix missing / mismatched items ↓ Run validator again ↓ Deploy when stable
This is much safer than mixing new files into an old folder with unknown leftovers.
| Status | Meaning |
|---|---|
| Pass | All expected diphones exist and filenames match |
| Partial | Some expected files are present, but one or more are missing |
| Fail | Major mismatch in IPA, diphone logic, or file availability |
| Rule Mismatch | Current validation output disagrees with the stable shared converter |
A good validator test list should not be random. It should be coverage-driven.
Simple high-frequency words that test common CV and VC transitions.
Words chosen specifically to test initial and final diphone boundaries.
Words containing consonant clusters, learned forms, or diphone edge cases.
A small curated list of 50 words is often enough to catch most rebuild problems before wider deployment.
A spreadsheet-based validator tracker can be extremely useful during rebuild work.
| Field | Use |
|---|---|
| Word | Source item being tested |
| IPA | Expected pronunciation output |
| Diphone count | How many files are expected |
| Found count | How many files are present |
| Coverage % | Quick progress summary |
| Missing filenames | Rebuild targets |
| Status | Pass / Partial / Fail |
| Notes | Special mismatch or rule problems |
Review inventory layers, priority levels, and safe filename mapping.
Review the source audio workflow before segmentation and validation.
Return to the broader datasets/resources overview.