Validator Workflow

Coverage checks, missing-file logic, rebuild validation, and deployment-ready testing

Workflow

Validator Workflow

This page documents the practical validator workflow used to test diphone coverage, detect missing audio files, verify safe filename consistency, and confirm that the TTS system is ready for clean deployment.

About this workflow. A diphone-based TTS system is only reliable when the expected diphone sequence, safe filename mapping, and actual audio files all agree. This validator workflow is the practical quality-control layer that connects pronunciation logic to working playback.

1. Purpose of the Validator

The validator exists to answer one practical question:

Can the current TTS system successfully build and play the expected diphone sequence for a word?

To answer that, the validator checks:

2. Core Validation Path

Word
  ↓
IPA conversion
  ↓
Phoneme sequence
  ↓
Diphone sequence
  ↓
Safe filename generation
  ↓
Check diphone WAV files
  ↓
Pass / Missing / Mismatch result
  

If any stage is inconsistent, the validator can expose the exact point of failure.

3. What the Validator Should Report

Field Purpose
Word The source dictionary word being tested
IPA The pronunciation form produced by the converter
Phonemes The tokenized sound sequence
Diphones The expected diphone transitions
Safe filenames The expected filesystem names for each diphone
Missing files Expected diphone files not found in the audio folder
Coverage Percentage of expected diphones currently present
Status Pass / Partial / Fail summary

4. Example Validator Output

Example word: দিশা
Word: দিশা
IPA: diʃa
Phonemes: d i ʃ a
Diphones:
#-d
d-i
i-ʃ
ʃ-a
a-#

Safe filenames:
sil-d.wav
d-i.wav
i-sh.wav
sh-a.wav
a-sil.wav

Missing:
i-sh.wav

Coverage:
4 / 5 = 80%

Status:
Partial
  

5. Common Failure Types

Missing File

The expected WAV file does not exist in the diphone folder.

Filename Mismatch

The diphone exists conceptually, but the validator expects a different safe filename.

IPA Rule Drift

A page is producing a different IPA form than the stable shared converter.

Diphone Segmentation Drift

The expected diphone sequence differs from the sequence used when the files were created.

Old File Pollution

Old diphone files from a previous rule system remain in the folder and cause confusion.

Boundary Rule Mismatch

Initial or final boundary mapping differs between validation logic and file generation.

6. Missing-File Logic

A validator should not only detect missing files. It should identify them clearly and predictably.

Recommended missing-file output:
  • show exact expected filename
  • group missing files separately from present files
  • show coverage percentage
  • keep output stable and machine-readable if possible

This makes it easy to rebuild only the missing parts of the inventory.

7. Coverage Calculation

Coverage is a simple but powerful summary:

Coverage = existing diphone files / expected diphone files
  

Example:

Expected: 5
Found: 3
Coverage: 60%
  

This is especially useful when testing large batches of words or comparing progress across rebuild stages.

8. Recommended Validation Levels

Single-Word Validation

Best for debugging one word or one pronunciation problem at a time.

  • shows complete trace
  • ideal for word page debugging
  • good for pronunciation rule testing

Batch Validation

Best for rebuild testing and inventory progress tracking.

  • tests many words at once
  • finds repeated missing diphones
  • good for coverage analysis

9. Clean Rebuild Validator Process

Freeze IPA rules
   ↓
Freeze safe filename rules
   ↓
Back up old diphone folder
   ↓
Create fresh diphone folder
   ↓
Copy only current-generation files
   ↓
Run validator on sample test list
   ↓
Fix missing / mismatched items
   ↓
Run validator again
   ↓
Deploy when stable
  

This is much safer than mixing new files into an old folder with unknown leftovers.

10. Recommended Validation Status Labels

Status Meaning
Pass All expected diphones exist and filenames match
Partial Some expected files are present, but one or more are missing
Fail Major mismatch in IPA, diphone logic, or file availability
Rule Mismatch Current validation output disagrees with the stable shared converter

11. Recommended Validator Test List Strategy

A good validator test list should not be random. It should be coverage-driven.

Core Coverage Words

Simple high-frequency words that test common CV and VC transitions.

Boundary Test Words

Words chosen specifically to test initial and final diphone boundaries.

Cluster Test Words

Words containing consonant clusters, learned forms, or diphone edge cases.

A small curated list of 50 words is often enough to catch most rebuild problems before wider deployment.

12. Validator Checklist Before Deployment

Deployment checklist:
  • shared converter produces stable IPA
  • phoneme tokenization matches expected rule set
  • diphone generation is stable across all pages
  • safe filename mapping is frozen
  • old files are removed from live folder
  • sample validation list passes at acceptable coverage
  • TTS playback on word pages matches validator expectations

13. Validator Data Fields for Spreadsheet Tracking

A spreadsheet-based validator tracker can be extremely useful during rebuild work.

Field Use
Word Source item being tested
IPA Expected pronunciation output
Diphone count How many files are expected
Found count How many files are present
Coverage % Quick progress summary
Missing filenames Rebuild targets
Status Pass / Partial / Fail
Notes Special mismatch or rule problems

14. Related Archive Pages

Workflow note. This page should gradually evolve into a fuller operational validator guide, including real sample outputs, downloadable checklists, validation spreadsheets, and clean rebuild test protocols.