WCAG Success Criteria · Level AAA
WCAG 3.1.6: Pronunciation
WCAG 3.1.6 requires that a mechanism be available to identify the specific pronunciation of words where meaning is ambiguous without knowing the pronunciation. This criterion ensures users who rely on text-to-speech technology or who encounter unfamiliar language can access the correct meaning of ambiguous content.
- Level AAA
- Wcag
- Wcag 2 2 aaa
- Understandable
- Accessibility
What This Rule Means
WCAG 3.1.6 Pronunciation is a Level AAA success criterion under the Understandable principle. It states: "A mechanism is available for identifying specific pronunciation of words where meaning of the words, in context, is ambiguous without knowing the pronunciation."
The core requirement is that when a word's meaning depends entirely on how it is pronounced â and that pronunciation cannot be determined from the surrounding context â authors must provide a way for users to discover the correct pronunciation. This is distinct from simply providing a definition; the criterion is specifically about phonetic pronunciation that resolves semantic ambiguity.
The criterion targets situations where the same string of characters can be read in multiple ways, each yielding a different meaning. Classic examples from English include the word "read" (present tense, rhymes with "reed") versus "read" (past tense, rhymes with "red"), or "wind" (moving air, rhymes with "sinned") versus "wind" (to coil, rhymes with "find"). In languages with more complex writing systems or tonal distinctions â such as Japanese, Chinese, or Arabic â the problem is even more prevalent and consequential.
Turkish, while largely phonetically regular compared to many other languages, still has words and loanwords whose pronunciation may be unclear in specialized, technical, or formal contexts, particularly for screen reader users whose synthesized speech engine may mis-stress or mispronounce unfamiliar terminology or foreign loan words.
What counts as a pass: A page passes if, wherever a word is ambiguous without knowing its pronunciation, at least one of the following mechanisms is present:
- An inline phonetic guide immediately adjacent to the word (e.g., using the HTML
<ruby>element and its associated<rt>and<rp>tags for East Asian scripts, or a parenthetical pronunciation key in IPA or another recognized notation system). - A link to a glossary entry or pronunciation guide that explicitly covers the ambiguous word.
- An audio pronunciation clip associated with the word.
- Inline text immediately following or preceding the word that describes its pronunciation in a way the reader can interpret (e.g., "The word 'bass' here refers to the fish â pronounced like 'mass'").
What counts as a fail: A page fails if a word's meaning is genuinely ambiguous without hearing it spoken, and no mechanism exists to resolve that ambiguity through pronunciation information. Simply providing a text definition that does not clarify pronunciation is insufficient if the meaning cannot be derived from the definition alone without knowing how the word sounds. Note that if context â such as the surrounding sentence, heading, or image â already makes the pronunciation clear, the criterion is satisfied without any additional mechanism.
Official exceptions: The WCAG specification explicitly scopes this criterion to cases where ambiguity exists without knowing the pronunciation. If the surrounding text, visuals, or semantic structure already resolves the ambiguity unambiguously, no additional pronunciation mechanism is required. The criterion does not require phonetic annotation for every word on every page â only those where meaning genuinely depends on pronunciation that cannot be inferred from context.
Why It Matters
Pronunciation ambiguity creates meaningful barriers for several distinct user groups, and the impact is particularly acute for those who cannot rely on visual or auditory cues outside of the primary text.
Blind and low-vision users relying on screen readers are the most directly affected group. Screen readers convert text to synthesized speech, and when a word has multiple valid pronunciations with different meanings, the text-to-speech engine must make a choice â and it frequently chooses incorrectly. A user listening to a financial article about "compound interest" may hear "compound" pronounced identically to its noun form (an enclosed area), creating momentary or sustained confusion. For users who cannot quickly glance at surrounding visual context, resolving this confusion requires re-reading passages or seeking clarification elsewhere. According to the World Health Organization, approximately 2.2 billion people globally have some form of vision impairment, a significant proportion of whom use screen reading technology as their primary means of accessing digital content.
Users with cognitive and learning disabilities, including those with dyslexia or language processing disorders, often rely on text-to-speech tools even when they have functional vision. For these users, hearing an incorrect pronunciation of a homograph can disrupt comprehension in ways that are difficult to recover from, particularly when the passage is technical or unfamiliar.
Deaf and hard-of-hearing users who use sign languages as their primary language may encounter written text in a second or third language. For them, seeing a phonetic representation of a word â even if they cannot hear it â can connect the written form to a known concept more reliably than a text definition alone.
Non-native speakers and language learners benefit enormously from pronunciation guidance. A learner of Turkish encountering a specialized medical or legal term, or a foreign technical term rendered in Turkish transliteration, may not know whether stress falls on the first or second syllable, which can change meaning or simply impede comprehension.
A concrete real-world scenario: Consider a Turkish healthcare portal describing a procedure involving the word "ileum" (a section of the small intestine) alongside content that also references the ilium (a pelvic bone). In English, these words sound identical in many dialects. On a page read aloud by a screen reader, a patient preparing for surgery who is blind or has low vision would have no way to distinguish between the two terms from audio alone unless pronunciation or phonetic context is provided. This is not a hypothetical edge case â medical documentation is a high-stakes domain where such ambiguities can cause real harm.
SEO and usability benefits also exist. Pronunciation guides encourage the use of precise, well-defined terminology. Glossaries with phonetic annotations improve time-on-page metrics and reduce user frustration. Rich structured content that explains terminology tends to attract more inbound links and signals subject-matter authority to search engines.
Related Axe-core Rules
WCAG 3.1.6 requires manual testing only. There are no automated axe-core rules that map directly to this criterion. The following explanation clarifies why automation cannot reliably detect violations and what testers must look for manually.
- No automated rule exists for pronunciation ambiguity. Automated accessibility testing engines such as axe-core operate by scanning the DOM for structural patterns, missing attributes, invalid roles, and other rule-based conditions. Determining whether a specific word is ambiguous without knowing its pronunciation requires semantic and linguistic understanding of the content â a judgment that depends on vocabulary, language, domain context, and reader background. No current static analysis engine can reliably determine that the word "read" in a given sentence is ambiguous in pronunciation without human interpretation of the surrounding meaning. This is why WCAG itself acknowledges that this criterion is difficult to test programmatically and places it at Level AAA.
- What manual testers must check: Testers must read through page content with domain knowledge of the language(s) used and flag any word where (a) two or more valid pronunciations exist, (b) each pronunciation corresponds to a different meaning, and (c) the surrounding context does not unambiguously resolve which meaning is intended. For each flagged word, the tester must then verify that a pronunciation mechanism â phonetic guide, audio clip, glossary link, or contextual clarification â is present and accessible.
- Screen reader spot-check: Testers using screen readers (NVDA, JAWS, VoiceOver, TalkBack) should listen to the content and note any instances where the synthesized voice pronounces a word in a way that conflicts with the intended meaning in context. This is a strong signal that a pronunciation mechanism is needed.
How to Test
- Run an automated scan first (for baseline): Use axe DevTools or Lighthouse to perform a general accessibility audit of the page. While neither tool has a dedicated rule for WCAG 3.1.6, the scan may surface related language issues such as a missing or incorrect
langattribute on the<html>element (WCAG 3.1.1) or missing language identification for passages in a different language (WCAG 3.1.2). These issues can compound pronunciation problems by causing the screen reader to apply the wrong language engine entirely. Verify that<html lang='tr'>(or the appropriate language code) is present and correct. - Conduct a content audit for homographs and ambiguous terms: With domain expertise in the page's subject matter and language, read through all text content. Create a list of any words that have multiple pronunciations with distinct meanings. Pay special attention to: loanwords from English, French, Arabic, or other languages that may not follow standard Turkish phonetic rules; technical jargon in medicine, law, or engineering; proper nouns with non-obvious pronunciation; and any word explicitly flagged in editorial review as potentially confusing.
- Test with NVDA + Firefox: Open the page in Firefox with NVDA running. Use NVDA's continuous reading mode (Insert + Down Arrow) to listen to the entire page or the relevant sections. Note any word that the synthesizer pronounces in a way that could be misunderstood. Check whether any pronunciation mechanism (phonetic annotation, audio button, glossary link) is available and whether NVDA announces it clearly.
- Test with JAWS + Chrome: Repeat the above listening test in Chrome with JAWS. JAWS and NVDA use different speech synthesizers and may pronounce the same word differently, so both tests are valuable. Use JAWS's verbosity settings to ensure all inline annotations and
<ruby>element content are being read aloud. - Test with VoiceOver + Safari (macOS/iOS): Enable VoiceOver and navigate the page using Safari. Use VO + A to read the page continuously. Apple's speech synthesizer has its own pronunciation logic; verify that any
<ruby>annotations or aria-label overrides are being surfaced correctly. - Verify the pronunciation mechanism is accessible: For each pronunciation mechanism present on the page, confirm that it is reachable by keyboard alone, that it is announced by screen readers, and that the pronunciation information provided actually resolves the ambiguity (e.g., an IPA transcription is only useful if the target audience can read IPA; a plain-language phonetic spelling like "pronounced: EYE-lee-um" may be more universally helpful).
- Check audio pronunciation clips: If audio clips are used, verify they have accessible controls (play button with a label, volume control) and that transcripts or text alternatives are available for deaf users who cannot benefit from audio.
How to Fix
Homograph in body text â Incorrect
<!-- The word "bass" is used in a music context, but its pronunciation
is ambiguous (rhymes with "face" not "mass" in this context).
No mechanism is provided to clarify. -->
<p>
The bass guitar part in the recording was improvised live during
the studio session.
</p>
Homograph in body text â Correct
<!-- A parenthetical phonetic guide immediately resolves the ambiguity.
Alternatively, a link to a glossary entry with an audio clip
would also satisfy the criterion. -->
<p>
The bass <span lang='en-x-phonetics'>(pronounced: "base", rhymes with "face")</span>
guitar part in the recording was improvised live during the studio session.
</p>
East Asian or ruby-annotated script â Incorrect
<!-- Japanese kanji without furigana: the reading of this compound
is not clear to all readers and screen readers may mispronounce it. -->
<p>æŹæ„ăź<span>éłæ„œ</span>ă€ăăłăăžăăăăă</p>
East Asian or ruby-annotated script â Correct
<!-- The <ruby> element with <rt> provides the phonetic reading.
<rp> provides fallback parentheses for browsers that do not
support ruby annotations, ensuring backward compatibility. -->
<p>æŹæ„ăź
<ruby>
éłæ„œ
<rp>(</rp>
<rt>ăăăă</rt>
<rp>)</rp>
</ruby>
ă€ăăłăăžăăăăă</p>
Technical term with ambiguous pronunciation â Incorrect
<!-- "Ileum" and "ilium" sound identical when mispronounced by a TTS engine.
No disambiguation mechanism is present in this medical content. -->
<p>
The surgical procedure involves resection of the terminal ileum
to treat the affected region.
</p>
Technical term with ambiguous pronunciation â Correct
<!-- A glossary link provides access to a page with an audio pronunciation
clip and IPA notation, satisfying the criterion. The link text is
descriptive so screen reader users understand where it leads. -->
<p>
The surgical procedure involves resection of the terminal
<a href='/glossary/ileum' aria-label='ileum â view pronunciation and definition'>ileum</a>
to treat the affected region.
</p>
<!-- The linked glossary entry should contain: -->
<article id='glossary-ileum'>
<h2>Ileum</h2>
<p><strong>Pronunciation:</strong> ILL-ee-um (/ËÉȘliÉm/)</p>
<audio controls aria-label='Audio pronunciation of ileum'>
<source src='/audio/ileum.mp3' type='audio/mpeg'>
Your browser does not support the audio element.
</audio>
<p><strong>Definition:</strong> The final section of the small intestine,
connecting to the large intestine. Not to be confused with the ilium
(a bone of the pelvis, pronounced identically).</p>
</article>
Loanword with non-standard pronunciation in Turkish â Incorrect
<!-- The English loanword "cache" is used in a Turkish tech article.
Turkish TTS engines may pronounce this as "kah-sheh" or "kash"
rather than the intended "kash". No guidance is provided. -->
<p>Tarayıcı cache dosyalarını temizlemek performansı artırabilir.</p>
Loanword with non-standard pronunciation in Turkish â Correct
<!-- A phonetic clarification in parentheses uses familiar Turkish
phonetic conventions to guide the reader. -->
<p>
Tarayıcı cache
<span class='pronunciation-guide' aria-label='telaffuz: keĆ'>
(telaffuz: keĆ)
</span>
dosyalarını temizlemek performansı artırabilir.
</p>
Common Mistakes
- Providing only a text definition without pronunciation: Adding a tooltip or glossary definition that explains the meaning of a word does not satisfy WCAG 3.1.6 if the definition itself does not clarify pronunciation. For example, defining "bass" as "a low-frequency sound or musical instrument" still leaves the pronunciation ambiguous; the mechanism must specifically address how the word is pronounced.
- Using
<ruby>without<rp>fallback tags: In browsers that do not support ruby annotations natively, omitting<rp>(ruby parentheses) causes the phonetic annotation to disappear entirely. Always include<rp>(</rp>and<rp>)</rp>around each<rt>element so users on non-supporting platforms still see the pronunciation text inline. - Providing audio clips without accessible controls or text alternatives: An audio pronunciation button that has no label (e.g.,
<button><img src='speaker.png'></button>with noaltoraria-label) is inaccessible to the very users who need it most. Every audio control must have a descriptive label, and the pronunciation content of the audio must also be available in text form for deaf users. - Assuming the TTS engine will get it right: Many teams skip pronunciation mechanisms because their internal testing (done visually or aurally by sighted/hearing testers) does not expose the ambiguity. Relying on a text-to-speech engine's heuristics to select the correct pronunciation of a homograph is not a valid accessibility strategy; those heuristics fail regularly, especially for domain-specific or multilingual content.
- Placing pronunciation guidance too far from the word: Linking to a site-wide pronunciation glossary at the bottom of the page or in a help section does not meet the criterion if users must navigate away from the content to find it, losing their reading position. The mechanism must be clearly associated with the specific ambiguous word, either inline or via a proximate, clearly labeled link.
- Using IPA notation without considering the audience: International Phonetic Alphabet transcriptions are precise but are not readable by most general audiences. If your users are not language professionals, plain-language phonetic respellings ("pronounced: KAY-oss" for "chaos") are more practically useful. Choosing an inaccessible format for the pronunciation guide undermines the entire purpose of providing one.
- Failing to mark up pronunciation spans with appropriate language attributes: When providing a phonetic respelling in a language or notation system different from the page's primary language, omit the correct
langattribute on the containing element. This causes screen readers to apply incorrect phonetic rules to the very text meant to guide pronunciation, creating a compounding problem. - Applying the criterion only to body text and ignoring headings, navigation, and UI labels: Ambiguous homographs can appear in headings, button labels, link text, form field labels, and error messages. These locations are often read in isolation by screen reader users navigating by landmark or element type, making contextual disambiguation even less reliable than in body text.
- Conflating WCAG 3.1.3 (Unusual Words) with 3.1.6 (Pronunciation): WCAG 3.1.3 requires mechanisms for words used in an unusual or specialized way. WCAG 3.1.6 targets a distinct problem: words whose very meaning depends on how they are pronounced. A word can require a fix under 3.1.6 even if it is not unusual â "read" and "wind" are common words. Do not assume that satisfying one criterion satisfies the other.
- Not testing with multiple screen readers and TTS engines: Different synthesizers (NVDA's eSpeak, JAWS's Eloquence or Vocalizer, Apple's built-in voices) have different pronunciation heuristics and will handle homographs differently. A word that a particular engine happens to pronounce correctly may be mispronounced by another. Content authors should test with at least two screen reader/browser combinations to identify pronunciation failures that affect real users.
Relation to Turkey's Accessibility Regulations
Turkey's Presidential Circular 2025/10, published in the Official Gazette No. 32933 on June 21, 2025, establishes binding web accessibility requirements for a broad range of entities operating in Turkey. The circular mandates compliance with WCAG 2.2 standards, with a primary emphasis on Level A and Level AA criteria for covered entities. The entities explicitly subject to the circular include public institutions and agencies, e-commerce platforms, banks and financial services providers, hospitals and healthcare organizations, telecommunications companies with 200,000 or more subscribers, travel agencies, private transportation companies, and private schools operating under authorization from the Ministry of National Education (MoNE).
WCAG 3.1.6 Pronunciation is a Level AAA criterion and is therefore not among the legally mandated requirements under the circular. Covered entities are not obligated by the circular to implement pronunciation mechanisms as a baseline compliance measure. However, the circular's broader purpose â ensuring that digital services are genuinely usable by all citizens, including those with disabilities â is well served by voluntary adoption of Level AAA criteria wherever technically and editorially feasible.
For certain covered entity categories, the practical case for implementing WCAG 3.1.6 is especially strong even absent a legal mandate. Healthcare portals operated by hospitals covered under the circular deal in terminology where pronunciation ambiguity can cause genuine patient harm. Legal or regulatory text published by public institutions may contain specialized vocabulary with non-obvious pronunciation that creates barriers for screen reader users. E-commerce platforms that serve diverse linguistic audiences â including non-native Turkish speakers â may find that pronunciation guidance reduces customer confusion and abandonment.
Turkish is a phonetically regular language, meaning that the correspondence between spelling and pronunciation is more consistent than in languages like English or French. This does reduce (but does not eliminate) the scope of WCAG 3.1.6 compliance work for Turkish-language content. However, the prevalence of English and French loanwords in Turkish technical, commercial, and digital content â particularly in the sectors covered by the circular â means that pronunciation ambiguity remains a real concern. Words borrowed from other languages do not always follow Turkish phonetic conventions and may be rendered differently by Turkish TTS engines depending on the synthesizer's configuration.
Organizations subject to the circular that aspire to best-in-class accessibility â or that serve users in multilingual contexts, operate in high-stakes domains such as health or finance, or wish to demonstrate accessibility leadership in the Turkish digital market â should consider WCAG 3.1.6 as part of a comprehensive accessibility programme that extends beyond minimum legal compliance. Implementing pronunciation mechanisms is a relatively low-cost enhancement for most content types and signals a genuine commitment to inclusive design that aligns with both the spirit of the circular and with international best practice.
