System File Map / Codebase Guide

A practical guide to the archive structure, shared files, and TTS-related code responsibilities

Guide

System File Map / Codebase Guide

This page explains the structure of the Bishnupriya Manipuri research archive and speech-technology codebase, including shared infrastructure, article pages, toolkit pages, and the logic layers that must remain synchronized.

About this guide. As the project grows, the most common source of breakage is not a single bug, but a mismatch between related files. This guide identifies which files are responsible for which parts of the system and which ones must stay aligned.

1. High-Level Structure

/che/
   pages
   shared templates
   metadata helpers
   publication helpers
   contributor helpers
   glossary / reference / index systems
   search system
   toolkit and dataset pages
  

The archive is designed as a shared PHP publication framework rather than a collection of unrelated pages. That means many pages depend on the same shared logic.

2. Main File Categories

Shared Configuration

Site-wide settings, publication metadata, and shared contributor/reference/index data.

Shared Helpers

Functions used across the site for metadata, references, search, glossary linking, publication display, and contributor rendering.

Shared Layout

Header, footer, templates, print styles, and site-wide CSS.

Content Pages

Article pages, book page, glossary, references, and public-facing archive pages.

Toolkit Pages

IPA toolkit, diphone inventory, recording protocol, validator workflow, safe filename mapping, rebuild checklist, and related operational pages.

Search / Navigation Pages

Search, book index, glossary, resources, downloads, and archive landing pages.

3. Core Shared Files

File Role Importance
site_config.phpSite-wide configurationCore
meta_helpers.phpSEO/meta/canonical/JSON-LD outputCore
publication_config.phpPublication defaultsCore
publication_helpers.phpEdition/version/revision displayCore
contributors_data.phpContributor master dataCore
contributors_helpers.phpContributor rendering functionsCore
articles_data.phpMaster article listingCore
article_helpers.phpArticle lookup/navigation helpersCore
references_data.phpShared bibliography dataCore
references_helpers.phpReference formatting and linkingCore
index_terms.phpMaster index term listCore
index_helpers.phpAuto-index logicCore
glossary_data.phpGlossary term dataCore
glossary_helpers.phpGlossary rendering and auto-linkingCore
search_helpers.phpArchive search logicCore

4. Shared Layout Files

File Role
header.phpSite header, navigation, meta loading, global page start
footer.phpSite footer, page close
style.cssMain archive styling
print.cssPrint/PDF formatting
article_template.phpShared article wrapper
book_chapter_template.phpShared chapter/book wrapper
chapter_nav.phpPrevious/next chapter navigation
If navigation, styling, or layout breaks across many pages at once, the first places to inspect are header.php, footer.php, and style.css.

5. Main Public Pages

File Role
index.phpLanding page / home page for the archive
about.phpAbout the language and project
book.phpCombined book page
references.phpShared bibliography page
glossary.phpGlossary page
book_index.phpAuto-generated archive/book index
contributors.phpContributors and credits page
updates.phpProject updates / news
downloads.phpDownload / export center
resources.phpDatasets / resources hub
search.phpSite-wide search results page

6. Article Pages

The article series pages are the main long-form research content:

article1.php
article2.php
...
article10.php
  

Each article should ideally do only a few things:

The less page-specific infrastructure repeated inside article files, the more stable the system remains.

7. Toolkit / Operational Pages

These pages document the practical working system behind the archive:

File Role
ipa_toolkit.phpOrthography-to-IPA and pronunciation toolkit
diphone_inventory.phpDiphone structure, layers, priorities, and coverage
recording_protocol.phpRecording and normalization workflow
validator_workflow.phpValidation logic and pass/fail workflow
safe_filename_mapping.phpIPA-to-safe-filename rules
rebuild_checklist.phpOperational rebuild deployment checklist
test_word_list.phpCurated validation words for rebuild testing
tts_architecture.phpFull technical TTS pipeline overview

8. Which Files Must Stay Synchronized

Critical synchronization groups:
  • articles_data.php ↔ article pages ↔ navigation
  • glossary_data.phpglossary_helpers.php ↔ glossary auto-linking
  • references_data.phpreferences_helpers.php ↔ references page
  • index_terms.phpindex_helpers.php ↔ book index
  • header.php ↔ navigation links ↔ actual page paths
  • IPA logic ↔ diphone logic ↔ safe filename logic ↔ validator expectations

When something looks correct on one page but wrong on another, the issue is often one of these synchronization points.

9. Common Maintenance Rules

Rule 1

Do not duplicate core logic across multiple pages unless absolutely necessary.

Rule 2

Keep one shared source of truth for article listings, references, glossary terms, and index terms.

Rule 3

Freeze naming and pronunciation rules before rebuilding dependent datasets.

Rule 4

When many pages fail at once, inspect shared files before page files.

Rule 5

When one page fails but others work, inspect that page’s local includes and metadata first.

Rule 6

Use smoke-test pages when debugging shared include chains.

10. Fast Debug Strategy

Page fails
   ↓
Check which include is last known good
   ↓
Test shared helper chain
   ↓
Inspect data file or helper file
   ↓
Confirm encoding and syntax
   ↓
Reload page
  

A small smoke test can isolate errors much faster than guessing across many files.

11. Recommended Load Order for Complex Pages

meta_helpers.php
article_helpers.php
references_helpers.php
index_helpers.php
glossary_helpers.php
publication_helpers.php
contributors_helpers.php
  

Not every page needs every helper, but pages should only load what they actually use.

12. Where to Patch What

If the problem is... Patch here first
navigation or top barheader.php, style.css
footer across pagesfooter.php
article previous/next linksarticles_data.php, chapter_nav.php
glossary linkingglossary_data.php, glossary_helpers.php
reference formattingreferences_data.php, references_helpers.php
index page resultsindex_terms.php, index_helpers.php
search resultssearch_helpers.php
page titles / canonical / SEOmeta_helpers.php, page metadata arrays
publication versioningpublication_config.php, publication_helpers.php
contributors / creditscontributors_data.php, contributors_helpers.php

13. Practical TTS Synchronization Rule

The most important technical maintenance rule in the speech system is this: the IPA converter, phoneme tokenizer, diphone generator, safe filename mapper, validator, and playback logic must agree.

If one of those layers changes while the others stay old, the system becomes unreliable.

14. Recommended Future Code Organization

As the project grows, it may be useful to group code more explicitly:

/che/
   /core/
   /pages/
   /data/
   /toolkit/
   /assets/
  

The current flat structure is still workable, but this future structure can help once the archive becomes larger.

15. Related Archive Pages

Guide note. This page should become the main maintenance reference for the archive. As new systems are added, update this guide so future work remains understandable and stable.