Safe Filename Mapping

IPA-safe conversion rules for diphone filenames, validator logic, and rebuild stability

Mapping

Safe Filename Mapping

This page documents the filename-safe mapping layer that connects IPA output to diphone audio files. It is one of the most important practical bridges between pronunciation logic, validator checks, and working TTS playback.

About this page. Raw IPA symbols are excellent for linguistic analysis, but they are not always convenient for filesystem storage, URLs, or browser-based audio lookup. This page defines the stable safe-filename layer used to convert phonetic forms into predictable, deployment-ready diphone filenames.

1. Why Safe Filename Mapping Is Necessary

A TTS system needs more than correct IPA. It also needs filenames that can be:

IPA symbols such as ʃ, ŋ, ə, or boundary markers like # are linguistically useful, but not ideal as raw filenames. The safe mapping layer converts them into stable equivalents.

2. Core Principle

IPA diphone
   ↓
Safe symbol mapping
   ↓
Filename-safe diphone
   ↓
WAV file lookup
  
Example:
IPA diphone: ʃ-aː
Safe form:   sh-aa
Filename:    sh-aa.wav
    

3. Boundary Handling

Word boundaries are crucial in diphone-based synthesis. These are usually represented in phonological notation as #. In safe filenames, a readable replacement should be used.

IPA / Symbol Safe Form Example
# sil #-d → sil-d.wav
# sil a-# → a-sil.wav

Using sil is practical because it is readable and clearly signals word-initial or word-final silence/boundary behavior.

4. Recommended Safe Symbol Table

IPA Symbol Safe Form Reason
#silBoundary marker
aaLong vowel made ASCII-safe
iiLong vowel made ASCII-safe
uuLong vowel made ASCII-safe
eeLong vowel made ASCII-safe
ooLong vowel made ASCII-safe
ʃshReadable fricative mapping
ŋngReadable nasal mapping
ɔawReadable vowel mapping
əschwaExplicit reduced vowel name
ɽrrAvoid raw IPA in filename
jyReadable glide mapping if needed

5. Example Diphone Conversions

IPA Diphone Safe Form Filename
#-dsil-dsil-d.wav
d-id-id-i.wav
i-ʃi-shi-sh.wav
ʃ-ash-ash-a.wav
a-#a-sila-sil.wav
k-ɔk-awk-aw.wav
ʃ-aːsh-aash-aa.wav
aː-#aa-silaa-sil.wav
a-ŋa-nga-ng.wav
k-ək-schwak-schwa.wav

6. Mapping Rules Should Be One-Way and Stable

The safest workflow is to treat the mapping as a one-way transformation:

IPA → safe filename form
  

The TTS engine, validator, batch tools, and deployment scripts should all use the same mapping logic. Do not maintain slightly different versions in different pages.

A stable safe filename system is as important as a stable IPA converter. If mapping rules drift, the validator and the live TTS engine will disagree about which files should exist.

7. Long Vowel Encoding

Long vowels should not be stored using IPA length marks in filenames. A doubled-letter form is safer and easier to read.

Recommended long-vowel strategy:
aː → aa
iː → ii
uː → uu
eː → ee
oː → oo
    

This makes filenames more portable and avoids encoding issues in some environments.

8. Safe Filename Workflow

Word
  ↓
IPA
  ↓
Phoneme sequence
  ↓
Diphone sequence
  ↓
Safe mapping
  ↓
Expected filenames
  ↓
Validator / playback
  

This means safe filename generation should happen after diphone construction, not during early grapheme or IPA processing.

9. Common Problems Caused by Mapping Drift

Validator Failure

The validator expects one filename but the audio folder contains another.

Playback Failure

JavaScript tries to load a file that does not exist because the mapping changed.

Old File Pollution

Old files remain from a previous naming system and create confusion during rebuilds.

Cross-Page Mismatch

One page uses sh while another uses raw ʃ or a different alias.

10. Recommended Rebuild Rule

Before rebuilding a diphone folder, the following should be frozen:

Freeze rules
   ↓
Back up old folder
   ↓
Create fresh diphone folder
   ↓
Generate / copy only current-system files
   ↓
Run validator
   ↓
Deploy
  

11. Recommended Validator Fields

A validator or spreadsheet tracker should include explicit safe filename columns.

Field Purpose
IPA diphoneLinguistic form
Safe diphone formMapped deployment form
FilenameActual expected WAV filename
Exists?Yes/No validation
StatusPass / Missing / Mismatch

12. Suggested Practical Mapping Standard

Simple working standard:
  • use lowercase ASCII only
  • use hyphen between diphone parts
  • use sil for boundaries
  • use doubled vowels for length
  • avoid raw IPA in filenames whenever practical
  • freeze the mapping before rebuilding audio inventory

13. Related Archive Pages

IPA Toolkit

Review the pronunciation side that feeds this filename mapping layer.

Open IPA Toolkit →

Mapping note. This page should eventually become the single source of truth for safe filename rules, so every validator, batch tool, page, and TTS component follows exactly the same naming system.