WCAG Success Criteria · Level A
WCAG 1.2.3: Audio Description or Media Alternative (Prerecorded)
WCAG 1.2.3 requires that prerecorded synchronized media (video with audio) provides either an audio description of the visual content or a full text alternative, ensuring users who are blind or have low vision can access information conveyed visually.
- Level A
- Wcag
- Wcag 2 2 a
- Perceivable
- Accessibility
What This Rule Means
WCAG Success Criterion 1.2.3 addresses one of the most fundamental barriers for blind and low-vision users consuming video content: the loss of visual information that is never spoken aloud. The criterion states that for all prerecorded synchronized media — that is, video content paired with audio — web authors must provide either an audio description of the visual track or a full media alternative in text form.
An audio description is a narration added to the audio track of a video that describes important visual details that cannot be understood from the main audio alone. These descriptions are typically inserted during natural pauses in the dialogue, or the video may be paused momentarily to allow the narrator time to describe complex visual events. For example, if a training video shows a presenter drawing a diagram on a whiteboard without commenting on it verbally, an audio description would narrate what is being drawn and why it matters.
A full text alternative for media is a document that conveys all the information in the synchronized media — both the audio content (dialogue, narration, sound effects) and the visual content (actions, settings, on-screen text, speaker identification) — in text form. This is a more expansive version of a transcript; it must describe visual events precisely enough that a user who cannot see or hear the video can understand all the information the video communicates.
This criterion applies specifically to prerecorded synchronized media. Live video streams are addressed by other criteria (1.2.4 for captions), and audio-only content is covered by 1.2.1. Importantly, if the video track is purely decorative — for example, an animated background that conveys no information — the criterion does not apply. Similarly, if the audio track of a video already fully describes all meaningful visual information (a situation sometimes called "equivalent audio"), no additional audio description is required.
A pass under 1.2.3 requires that at least one of the following is true for each piece of prerecorded synchronized media: an audio description is provided, or a text alternative that conveys all audio and visual information is linked or directly adjacent to the media. A fail occurs when video content contains meaningful visual elements — on-screen text, graphical data, facial expressions conveying key emotion, demonstration steps — that are not conveyed through any audio or text alternative.
Note that 1.2.3 is a Level A requirement, making it the baseline expectation. The more robust Level AA criterion 1.2.5 (Audio Description — Prerecorded) requires audio descriptions in all cases where they are needed, whereas 1.2.3 permits the text alternative as a substitute at Level A.
Why It Matters
Approximately 2.2 billion people worldwide have some form of vision impairment, according to the World Health Organization. For users who are blind, video content without an audio description or text alternative is completely inaccessible as a source of visual information. A screen reader can announce that a video element is present and can read any associated captions, but it cannot interpret the visual content of the video frames themselves. Without an audio description or media alternative, those users simply miss whatever the video shows but does not say.
Consider a concrete scenario: a Turkish e-commerce platform publishes a product demonstration video for a smart home device. The video shows a presenter pairing the device with a smartphone app, navigating menus on both screens, and plugging cables into specific ports. The presenter's narration focuses on the benefits of the device but does not describe which buttons are being pressed or which menu items are being selected. A blind user watching this video with a screen reader hears only the narration — they receive none of the procedural visual information that would allow them to replicate the setup at home. With an audio description or a detailed text alternative, that user gains full access to the same instructional content.
Beyond blind users, detailed text alternatives benefit users with cognitive disabilities who may process written instructions more easily than a fast-moving video. They also benefit users in bandwidth-constrained environments who cannot stream video, users on corporate networks where video is blocked, and users whose devices or browsers do not support certain video formats. Search engines also index text alternatives, meaning that providing them improves SEO by making video content discoverable through full-text search — a meaningful business benefit alongside the accessibility value.
For motor-impaired users who cannot operate video controls precisely, a text alternative allows them to consume the content at their own pace without struggling with pause, rewind, or playback controls. In short, audio descriptions and media alternatives serve a wide population and improve the overall quality and reach of video content far beyond the users who strictly require them for access.
Related Axe-core Rules
WCAG 1.2.3 requires manual testing. There is no axe-core rule that automatically flags a violation of this criterion, and understanding why helps clarify what testers must look for manually.
- Manual testing required — visual content analysis: Automated tools can detect the presence of a
<video>element, a<track>element, or an associated transcript link, but they cannot evaluate whether the content of an audio description or text alternative is sufficient. Sufficiency depends on whether all meaningful visual information is conveyed — a judgment that requires a human to watch the video, read the alternative, and compare them. An axe scan might confirm that a<track kind='descriptions'>element is present, but it cannot verify that the descriptions actually cover all critical visual events in the video. - Manual testing required — equivalence assessment: Determining whether the main audio track already describes all visual information (making an additional audio description unnecessary) is inherently a content judgment. A human reviewer must watch the video and assess whether a blind user listening only to the audio would miss any meaningful information. No automated rule can make this determination reliably.
- Manual testing required — text alternative completeness: If a text alternative (full media alternative) is provided instead of an audio description, a human must read the text alternative and compare it against the video to confirm that all visual events, on-screen text, and meaningful actions are represented. Automated tools can check that a link to a transcript exists but cannot assess whether that transcript is complete and accurate.
How to Test
- Automated scan baseline: Run axe DevTools or Google Lighthouse against the page containing the video. While neither tool will flag a 1.2.3 violation directly, the scan can surface related issues such as missing
<track>elements (flagged under 1.2.2 for captions) or missing text alternatives for image-based media. Note any video elements present on the page so you know which ones require manual review under 1.2.3. - Identify synchronized media: Locate every
<video>element (or embedded third-party player such as YouTube or Vimeo iframes) on the page. Confirm whether each video is prerecorded and synchronized (i.e., it has both audio and video tracks that are meaningful). If a video is audio-only or has a decorative video track, it falls outside the scope of 1.2.3. - Watch the video with sound on: View the video normally and pay close attention to any information that is conveyed visually but not described in the audio. Common examples include: on-screen text overlays, diagrams or charts being drawn, step-by-step demonstrations of a physical process, facial expressions or body language that carry emotional meaning, and speaker identification when multiple people appear on screen.
- Check for an audio description track: Inspect the video element's markup for a
<track kind='descriptions'>element. If present, enable the descriptions in the video player (or use a browser that surfaces them) and re-watch the video. Verify that every meaningful visual event identified in step 3 is described in the audio description track at an appropriate time. - Check for a full text alternative: If no audio description track is present, look for a link to a transcript or a full media alternative adjacent to or immediately following the video. Confirm that the linked document or inline text describes all audio content (dialogue, narration, relevant sound effects) and all visual content (actions, on-screen text, setting descriptions, speaker identification).
- Screen reader verification (NVDA + Firefox): Open the page with NVDA running. Navigate to the video element and confirm that NVDA announces the presence of the video and any associated controls. If a text alternative is provided inline or via a link, navigate to it and confirm NVDA reads the full content without omissions. Note: NVDA cannot read the visual content of the video frames, which underscores why the human comparison in step 3 is essential.
- Screen reader verification (VoiceOver + Safari on macOS): Activate VoiceOver and navigate to the video. Use VoiceOver's rotor to find the video element and any associated track or link elements. Confirm that the description track, if present, is accessible through Safari's media controls.
- Third-party players: For YouTube embeds, check whether the video has an audio description version (often a separate video linked in the description) or whether an associated transcript is available and linked on the embedding page. For Vimeo, check the video's accessibility settings. Third-party players do not automatically satisfy 1.2.3 — the page author is responsible for ensuring an alternative is provided or linked.
How to Fix
Scenario 1: HTML5 video with no audio description — Incorrect
<!-- A product demo video with meaningful visual content but no audio description or text alternative -->
<video controls width='800'>
<source src='product-demo.mp4' type='video/mp4'>
<track kind='captions' src='captions-en.vtt' srclang='en' label='English' default>
</video>
Scenario 1: HTML5 video with audio description track — Correct
<!-- Audio description track added using kind='descriptions'.
The VTT file contains timed narrations of visual events
that are not conveyed through the main audio. -->
<video controls width='800'>
<source src='product-demo.mp4' type='video/mp4'>
<track kind='captions' src='captions-en.vtt' srclang='en' label='English' default>
<track kind='descriptions' src='descriptions-en.vtt' srclang='en' label='Audio Descriptions'>
</video>
Scenario 2: HTML5 video with no text alternative — Incorrect
<!-- Tutorial video with on-screen steps and diagrams; no transcript provided -->
<section>
<h2>How to Configure Your Router</h2>
<video controls width='800'>
<source src='router-setup.mp4' type='video/mp4'>
<track kind='captions' src='captions-tr.vtt' srclang='tr' label='Turkish' default>
</video>
</section>
Scenario 2: HTML5 video with a full media alternative — Correct
<!-- Full media alternative linked immediately after the video.
The linked page contains both transcript text (all dialogue and narration)
and descriptions of all visual steps shown in the video. -->
<section>
<h2>How to Configure Your Router</h2>
<video controls width='800'>
<source src='router-setup.mp4' type='video/mp4'>
<track kind='captions' src='captions-tr.vtt' srclang='tr' label='Turkish' default>
</video>
<p>
<a href='router-setup-full-transcript.html'>
Full text alternative for this video (includes all dialogue and visual descriptions)
</a>
</p>
</section>
Scenario 3: YouTube embed with no supplemental alternative — Incorrect
<!-- Embedded YouTube video; the video on YouTube has no audio description
and no transcript is linked on this page -->
<iframe width='560' height='315'
src='https://www.youtube.com/embed/XXXXXXXXXXX'
title='Annual Report Highlights 2024'
allowfullscreen>
</iframe>
Scenario 3: YouTube embed with linked text alternative — Correct
<!-- Embedding page provides a link to a full text alternative.
The linked document describes all visual content in the video
(slides, charts, on-screen data) in addition to the spoken content. -->
<figure>
<iframe width='560' height='315'
src='https://www.youtube.com/embed/XXXXXXXXXXX'
title='Annual Report Highlights 2024'
allowfullscreen>
</iframe>
<figcaption>
<a href='annual-report-2024-full-transcript.html'>
Read the full text alternative for Annual Report Highlights 2024
</a>
</figcaption>
</figure>
Scenario 4: Video whose audio already describes all visual content (exception) — Correct
<!-- This video features a narrator who explicitly describes every action
being performed on screen: 'I am now clicking the blue Settings button
in the top-right corner and selecting Account from the dropdown menu.'
Because the audio fully conveys all visual information, no separate
audio description is required under 1.2.3. -->
<video controls width='800'>
<source src='fully-described-tutorial.mp4' type='video/mp4'>
<track kind='captions' src='captions-en.vtt' srclang='en' label='English' default>
</video>
<!-- Document the rationale in an internal accessibility conformance note -->
Common Mistakes
- Providing captions instead of an audio description: Captions transcribe the spoken audio for deaf users; they do not describe visual information for blind users. Adding a
<track kind='captions'>element satisfies 1.2.2 but does not satisfy 1.2.3. These are two separate requirements addressing two different disability groups. - Linking a transcript that only covers dialogue: A text alternative for 1.2.3 must describe all meaningful visual content — on-screen text, diagrams, physical actions, speaker identification — not just what is spoken. A script-only transcript typically fails this criterion if the video contains visual-only information.
- Placing the text alternative link far from the video: If the full media alternative is buried in a footnote or on a separate page without a clear, adjacent link, users may not find it. The link should appear immediately before or after the video element so that screen reader users encounter it in natural reading order.
- Assuming YouTube's auto-generated transcript satisfies the criterion: Auto-generated YouTube transcripts cover only the spoken audio. They do not describe visual content and are often inaccurate. They do not constitute a sufficient full media alternative under 1.2.3.
- Using a
<track kind='descriptions'>element but leaving the VTT file empty or incomplete: The presence of the track element is not sufficient; the VTT file must contain accurate, timely descriptions of all meaningful visual events. An empty or skeletal VTT file does not satisfy the criterion. - Failing to describe on-screen text overlays: Marketing videos frequently display statistics, product names, or call-to-action text as animated overlays. If these overlays are not read aloud by a narrator, they must appear in the audio description or text alternative — authors frequently overlook them.
- Writing audio descriptions that are too vague: Descriptions such as "the presenter demonstrates the process" are insufficient. Effective descriptions name specific actions, interface elements, colors where meaningful, and spatial relationships: "The presenter clicks the red Delete button on the right side of the toolbar, then confirms by selecting OK in the dialog box."
- Not providing an alternative for autoplay or background videos that carry information: A video that plays automatically and displays important information (such as a hero section showing product features with text overlays) is still synchronized media and requires compliance if it conveys meaningful content.
- Treating decorative videos as exempt without verification: Teams sometimes label a video as "decorative" to avoid the requirement, even when it actually conveys product information or instructional content. The decorative exception applies only when the video truly adds no meaningful information beyond what is already available in adjacent text.
- Forgetting to update the audio description or text alternative when the video is updated: If the video content changes — for example, product steps are revised or pricing data is updated — the audio description and text alternative must be updated to match. Stale alternatives are a conformance failure even if the original alternatives were accurate.
Relation to Turkey's Accessibility Regulations
Turkey's Presidential Circular 2025/10, published in Official Gazette No. 32933 on June 21, 2025, establishes mandatory web accessibility standards for a broad range of public and private entities operating in Turkey. The circular references internationally recognized accessibility standards, with WCAG 2.2 Level A and Level AA serving as the technical baseline for conformance. Because WCAG 1.2.3 is a Level A requirement, it is among the most fundamental obligations under the circular — there is no lower level of conformance that permits organizations to ignore it.
The circular covers a wide range of entity types. Public institutions and government bodies — including ministries, municipalities, state universities, and other public agencies — must achieve conformance within one year of the circular's publication date. Private sector entities covered by the circular include e-commerce platforms, banks and financial institutions, hospitals and private healthcare providers, telecommunications companies with 200,000 or more subscribers, licensed travel agencies, private transportation companies, and private schools authorized by the Ministry of National Education (MoNE). These private sector organizations have two years from the publication date to achieve conformance.
For any covered entity that publishes video content — which today includes virtually every major Turkish institution and business — WCAG 1.2.3 creates a concrete, enforceable obligation. A bank that publishes video tutorials explaining how to use its mobile app, a public hospital that posts video guides for patient registration, a telecom provider that shares promotional videos with on-screen plan comparisons, or an e-commerce site that includes product demonstration videos must all ensure that each prerecorded synchronized media asset is accompanied by an audio description or full text alternative.
Non-compliance with the circular's requirements can result in regulatory scrutiny and reputational harm, and as Turkish digital accessibility enforcement matures, legal exposure for covered entities. Organizations should treat WCAG 1.2.3 not as an optional enhancement but as a baseline legal obligation. Practically, this means conducting an inventory of all video content, assessing which videos contain visual-only information, and systematically producing audio descriptions or full text alternatives for any that do. New video production workflows should include accessibility deliverables — description scripts and text alternatives — as standard outputs alongside captions and subtitles.
Źródła i odniesienia
- W3C Understanding 1.2.3 Audio Description or Media Alternative (Prerecorded)
- W3C Techniques for 1.2.3
- WebAIM: Captions, Transcripts, and Audio Descriptions
- MDN: HTMLTrackElement kind attribute
- MDN: The HTML track element
- W3C Technique G78: Providing a second, user-selectable, audio track
- W3C Technique G69: Providing an alternative for time-based media
