WCAG Success Criteria · Level AAA
WCAG 1.2.7: Extended Audio Description (Prerecorded)
WCAG 1.2.7 requires that when pauses in foreground audio are insufficient to convey all visual information, extended audio descriptions—achieved by pausing the video—must be provided for prerecorded synchronized media. This ensures blind and low-vision users can fully understand complex visual content that standard audio descriptions cannot cover.
- Level AAA
- Wcag
- Wcag 2 2 aaa
- Perceivable
- Accessibility
What This Rule Means
WCAG Success Criterion 1.2.7 — Extended Audio Description (Prerecorded) operates at Level AAA and builds directly on the Level AA requirement in SC 1.2.5 (Audio Description for Prerecorded). Where standard audio description simply uses natural pauses in a video's audio track to narrate visual information, extended audio description goes further: when those pauses are too short or too infrequent to accommodate all the necessary description, the video is paused and the audio description plays, after which the video resumes.
The criterion applies specifically to prerecorded synchronized media — that is, video content that has a soundtrack synchronized with its visuals, such as instructional films, corporate training videos, documentaries, product demonstrations, and similar content. It does not apply to live media, to audio-only content, or to video-only content where there is no soundtrack.
A pass under this criterion requires one of the following: (a) an extended audio description track or version of the media is provided that pauses playback to deliver descriptions of critical visual information that cannot fit within natural pauses, or (b) all visual information is already conveyed through the existing audio track without any need for additional description (sometimes called an "equivalently described" version). A fail occurs when a prerecorded synchronized video contains meaningful visual information — such as on-screen text, diagrams, facial expressions driving the narrative, or demonstrations — that is not conveyed through either natural-pause audio descriptions or extended descriptions, and the existing audio alone leaves a blind user without that information.
WCAG also recognizes an important exception: if the media is itself a media alternative for text — for example, a video version of a text document that is clearly labeled as such — then extended audio description is not required. Additionally, if the foreground audio already describes all visual content fully, no additional description is needed.
It is worth noting that providing extended audio description often requires producing an entirely separate version of the video, since most media players do not natively support the pause-and-resume mechanism for description delivery. Common approaches include a dedicated "described version" accessible via a separate URL or a toggle button on the player, or use of a media player that supports TTML (Timed Text Markup Language) or SMIL-based extended description tracks.
Why It Matters
Extended audio description is critical for users who are blind or have severe low vision — an audience that is larger than many developers assume. According to the World Health Organization, approximately 2.2 billion people worldwide have some form of vision impairment, with at least 1 billion of those experiencing conditions that could have been prevented or remain unaddressed. For users who rely entirely on screen readers and audio output, a video that describes only what its natural pauses allow may leave enormous gaps in comprehension.
Consider a medical training video demonstrating a surgical technique. The narrator might say, "Now we make the incision here," while the camera zooms into a specific anatomical landmark and the surgeon's hands perform a precise maneuver. The spoken narration assumes the viewer can see the visual context. A standard audio description might be able to insert a short note during a brief pause, but if the action is continuous for two minutes with constant speech, a blind medical student receives almost none of the visual detail that is central to learning the technique. Extended audio description pauses the video and delivers the full description: the anatomy visible, the exact tool used, the angle of approach, the tissue reaction. The student then has equivalent access to the learning material.
Beyond blindness, extended audio description benefits users with cognitive disabilities who process information more slowly and benefit from the additional context that descriptive narration provides. It also helps users in audio-only contexts — such as someone listening to a training video while commuting — who cannot see the screen regardless of visual ability.
From a business and legal standpoint, providing extended audio descriptions signals a serious, measurable commitment to inclusion. For organizations in regulated sectors — public institutions, banks, healthcare providers, educational establishments — demonstrating Level AAA compliance on complex media content can meaningfully reduce legal risk and reputational exposure. There is also a practical SEO benefit: the scripts used to produce extended audio descriptions often serve as rich transcripts, which search engines index as meaningful content, improving discoverability of video-based resources.
Related Axe-core Rules
WCAG 1.2.7 requires manual testing because automated tools cannot evaluate the semantic content of a video, compare the audio track against the visual track, or determine whether visual information is adequately described. No axe-core rule exists that can watch a video, understand what is visually depicted, and judge whether an extended audio description is present, accurate, and complete. This is a fundamentally human judgment task.
- Manual evaluation — visual content vs. audio content comparison: A human tester must watch the video with eyes open and with eyes closed (or using a screen reader) and determine whether the audio alone — including any standard audio description — conveys all meaningful visual information. If it does not, the tester must check whether an extended audio description version is provided. Automated tools cannot perform this comparison because they lack the ability to interpret video frames as meaningful visual events or correlate them with semantic meaning in the audio.
- Manual evaluation — pause-and-resume mechanism: If an extended description is claimed, a tester must verify that the player actually pauses during description delivery and resumes correctly afterward. This behavior is a media player and timing concern that requires active playback testing by a human, as automated scanners do not execute or observe media playback states.
- Manual evaluation — description accuracy and completeness: Even where an extended audio description track exists, its content must be accurate and cover all visually critical information. No automated rule can assess whether the description text correctly and completely represents what is shown on screen. A description that says "the presenter points at the board" when the board contains a critical diagram with labeled data points would fail this criterion despite being technically present.
How to Test
- Run an automated accessibility scan first. Use axe DevTools (browser extension) or Lighthouse on the page containing the video. While neither tool can verify extended audio description compliance directly, they can flag missing or broken media elements, absent track elements, and other structural issues. Note any warnings about media content as a starting point. Axe may flag the absence of a captions track or audio description track at the element level, which narrows your manual review scope.
- Identify all prerecorded synchronized media on the page. Locate every
<video>element or embedded media player (YouTube iframes, Vimeo embeds, custom players). Confirm each one contains synchronized audio and video. Pure audio podcasts or silent videos are out of scope for this criterion. - Watch the video with audio only. Close your eyes or use a screen reader (NVDA with Firefox, VoiceOver with Safari, or JAWS with Chrome) and listen to the full video including any existing audio description track. Note every moment where you lack understanding of what is happening visually — actions, on-screen text, diagrams, scene transitions, character expressions that drive the narrative.
- Compare your notes against the visual track. Now watch the video with audio muted and note all visual information that appears on screen. Cross-reference with what you heard. If meaningful visual content was not conveyed in the audio, the video requires audio description. If natural pauses in the audio were too short or absent to accommodate those descriptions, extended audio description is required.
- Check for an extended description version. Look for a clearly labeled "Audio Described Version" link, a toggle in the video player, or a described version at an alternate URL. If present, activate it and repeat steps 3 and 4 with the described version playing, verifying that the pauses and descriptions now cover the missing visual information.
- Test the pause-and-resume behavior with NVDA + Firefox. With the extended description version playing, confirm that the video pauses, the audio description is delivered clearly, and the video resumes from the correct point. Verify the screen reader announces the described content or that it is otherwise audible to a non-sighted user.
- Test with VoiceOver + Safari on macOS/iOS. Repeat the playback test. Ensure that the described version is operable with keyboard navigation (Tab, Space, Enter) and that VoiceOver announces player controls correctly, including any description toggle.
- Verify the description script for accuracy. Obtain the extended description script or transcript if available. Review it against the video to confirm it is factually accurate, covers all critical visual events, and does not omit information that a sighted viewer would use to understand the content.
How to Fix
Scenario 1: Video with no audio description at all — Incorrect
<!-- A training video with no audio description track and no described version link.
Blind users receive only the foreground narration, missing all visual demonstrations. -->
<video controls width='800'>
<source src='surgical-technique.mp4' type='video/mp4'>
<track kind='captions' src='captions-en.vtt' srclang='en' label='English Captions' default>
</video>
Scenario 1: Video with extended audio description version — Correct
<!-- Provide a clearly labeled link to the extended described version.
The described version pauses at critical moments to deliver full visual descriptions.
This is the most reliable cross-browser approach. -->
<video controls width='800' id='main-video'>
<source src='surgical-technique.mp4' type='video/mp4'>
<track kind='captions' src='captions-en.vtt' srclang='en' label='English Captions' default>
<track kind='descriptions' src='descriptions-en.vtt' srclang='en' label='Audio Descriptions'>
</video>
<p>
<a href='surgical-technique-extended-described.mp4'>
Watch extended audio described version of this video
</a>
</p>
Scenario 2: Embedded YouTube video with fast-paced visuals — Incorrect
<!-- An iframe embed of a product demo video. The YouTube auto-captions exist
but there is no audio description, and the visual demonstrations are rapid
with no natural pauses long enough for description. -->
<iframe
width='560'
height='315'
src='https://www.youtube.com/embed/EXAMPLE_ID'
title='Product demonstration video'
allowfullscreen>
</iframe>
Scenario 2: Embedded video with toggle for described version — Correct
<!-- Offer a button that swaps the src to the extended described version.
The described version was produced as a separate MP4 with pauses built in.
The button is keyboard-accessible and has a clear accessible name. -->
<div role='region' aria-label='Product demonstration video player'>
<iframe
id='demo-video-frame'
width='560'
height='315'
src='https://www.youtube.com/embed/EXAMPLE_ID'
title='Product demonstration video'
allowfullscreen>
</iframe>
<p>
<button
type='button'
aria-pressed='false'
onclick='toggleDescribedVersion(this)'>
Enable extended audio description
</button>
</p>
</div>
<!-- The toggleDescribedVersion() function swaps the iframe src
to the described YouTube video ID and updates aria-pressed. -->
Scenario 3: HTML5 video player with a description track that is too brief — Incorrect
<!-- A descriptions track exists but its cue text is truncated to fit within
existing audio pauses. Key visual information (a data chart with five labeled
columns) is summarized as 'a chart appears on screen' — insufficient. -->
<video controls width='800'>
<source src='annual-report.mp4' type='video/mp4'>
<track kind='captions' src='captions-en.vtt' srclang='en' label='English' default>
<track kind='descriptions' src='brief-descriptions.vtt' srclang='en' label='Descriptions'>
</video>
<!-- brief-descriptions.vtt contains only: 'A chart appears on screen.' -->
Scenario 3: Separate extended described version with full narration — Correct
<!-- The extended described version pauses playback at the chart moment
and delivers: 'A bar chart titled Annual Revenue by Region appears.
Five bars are shown: Europe 2.1 million, Asia 3.4 million,
North America 4.8 million, South America 1.2 million, Africa 0.9 million.
North America leads all regions.' The video then resumes. -->
<video controls width='800'>
<source src='annual-report.mp4' type='video/mp4'>
<track kind='captions' src='captions-en.vtt' srclang='en' label='English' default>
</video>
<p>
<strong>Extended audio described version:</strong>
<a href='annual-report-extended-described.mp4'>
Annual report video with extended audio descriptions
</a>
</p>
Common Mistakes
- Treating a captions track as a substitute for audio description: Captions convey spoken dialogue and sound effects as text for deaf users. They do not describe visual content for blind users. A video with only a captions track still fails this criterion if visual information is not described in the audio.
- Providing a standard audio description track without checking whether pauses are sufficient: Many teams add a
<track kind='descriptions'>element and consider the job done, without verifying that every significant visual event has a pause long enough for the description to fit. Fast-paced demos, complex diagrams, and dense on-screen text typically require extended descriptions. - Describing only obvious visual changes and omitting textual content on screen: On-screen text — slide titles, form labels, chart axes, button labels shown in a demo — must be read in full during the audio description. Saying "a slide appears" instead of reading the slide's title and key bullet points leaves critical information inaccessible.
- Linking to a described version without a clear, programmatically determinable accessible name: A link that says "click here" or "described version" without a label that identifies which video it describes fails WCAG 2.4.6 and creates confusion for screen reader users navigating a page with multiple videos.
- Using the described version toggle button without updating
aria-pressedor providing feedback: If a button switches between standard and described playback, it must usearia-pressed(true/false) or an equivalent live region announcement so screen reader users know the current state and that their action took effect. - Producing the extended described version without testing resume accuracy: After the description pause, the video must resume from exactly where it left off — not from a slightly earlier or later frame. Incorrect resume points cause loss of narrative context and compound confusion for blind users.
- Assuming that a video transcript alone satisfies this criterion: A transcript is valuable and supports SC 1.2.8 (Media Alternative), but it does not fulfill 1.2.7. Extended audio description is a time-synchronized, audio-delivered mechanism, not a separate document to read independently.
- Failing to describe visual information that drives the emotional or narrative meaning of a scene: If a character's facial expression, body language, or visual reaction is central to understanding what is happening — in a customer testimonial video, for example — omitting that description leaves blind users without equivalent understanding even if the spoken dialogue is intact.
- Not updating the described version when the main video is updated: If the source video is re-edited, updated, or replaced (common with instructional content), the extended description track or version must also be updated. Stale descriptions can actively mislead by describing scenes that no longer exist.
- Embedding videos via third-party iframes (YouTube, Vimeo) and assuming the platform handles description: Platform-provided audio descriptions (where they exist) are rarely extended descriptions. The content owner is responsible for ensuring an extended described version exists and is linked or accessible from the embedding page.
Relation to Turkey's Accessibility Regulations
Turkey's Presidential Circular 2025/10, published in the Official Gazette No. 32933 on June 21, 2025, establishes accessibility obligations for a defined set of digital service providers. The circular mandates conformance with accessibility standards for digital products and services offered to the public, aligning broadly with WCAG 2.1 Level AA as its baseline compliance requirement.
The entity types covered by the circular include public institutions and agencies, e-commerce platforms, banks and financial institutions, hospitals and healthcare providers, telecommunications operators with 200,000 or more subscribers, licensed travel agencies, private transport companies, and private schools authorized by the Ministry of National Education (MoNE). For these entities, WCAG 2.1 Level AA conformance is the enforceable floor.
WCAG 1.2.7 (Extended Audio Description) is a Level AAA criterion, which means it is not directly mandated by the circular's baseline requirements. However, its importance should not be understated in the Turkish regulatory context for several reasons. First, organizations that produce complex media content — such as healthcare providers publishing surgical training videos, public institutions releasing policy explainer films, or private schools distributing educational video content — have a strong ethical and practical case for implementing extended audio descriptions on their most critical materials, even without a strict legal mandate.
Second, as Turkish digital accessibility regulation matures and enforcement mechanisms are strengthened, Level AAA criteria are increasingly referenced as indicators of best-in-class practice. Organizations that demonstrate voluntary AAA compliance — particularly in high-stakes domains like health, education, and finance — are better positioned for future regulatory updates and face reduced risk of complaints under broader anti-discrimination frameworks.
Third, for public broadcasters and media organizations — even if not directly named in the 2025/10 circular — Turkey's Radio and Television Supreme Council (RTÜK) has historically engaged with accessibility provisions for broadcast content. Extended audio description aligns with the spirit of those obligations when applied to on-demand and web-distributed video.
Organizations using the Accsible widget SDK should be aware that while the overlay widget can surface accessibility features and controls to end users, extended audio description itself must be implemented at the content production level — it cannot be retrofitted automatically by a client-side tool. The SDK can, however, surface a described version toggle or link within the accessibility panel, making that alternative version more discoverable for users who need it.
Fuentes y referencias
- W3C Understanding 1.2.7 Extended Audio Description (Prerecorded)
- W3C Techniques for WCAG 1.2.7
- WebAIM: Captions, Transcripts, and Audio Descriptions
- MDN: HTMLTrackElement and the track element
- W3C Technique G8: Providing a movie with extended audio descriptions
- W3C Technique G78: Providing a second, user-selectable, audio track
