WCAG Success Criteria · Level A
WCAG 1.2.2: Captions (Prerecorded)
WCAG 1.2.2 requires that all prerecorded audio content in synchronized media (video with audio) includes accurate captions. This ensures deaf and hard-of-hearing users can access spoken dialogue, sound effects, and other meaningful audio information.
- Level A
- Wcag
- Wcag 2 2 a
- Perceivable
- Accessibility
What This Rule Means
WCAG 1.2.2 — Captions (Prerecorded) requires that captions are provided for all prerecorded audio content in synchronized media. Synchronized media is defined as audio or video content that is synchronized with another format, and/or with time-based interactive components — in practice, this means any video file that also contains audio, such as recorded lectures, product demonstrations, news clips, testimonials, or marketing videos.
Captions are a text alternative that is synchronized with the video timeline. They must convey not only the spoken dialogue but also any other meaningful audio — including speaker identification, relevant sound effects (such as [applause], [door slams], or [music plays]), and tone or manner of speech where it affects comprehension. This distinguishes captions from subtitles, which typically present only spoken words and are intended for audiences who can hear the audio but do not understand the spoken language.
A pass under this criterion requires all of the following: captions exist for all prerecorded synchronized media, they are accurately synchronized with the corresponding audio, they describe all meaningful audio (dialogue, speaker identity, relevant sound effects), and they are available in the same location as the media itself — not merely linked from a separate page.
A fail occurs when: no captions are provided at all, captions exist but are inaccurate or incomplete (e.g., auto-generated without correction), captions omit meaningful non-speech audio, captions are present but not properly synchronized, or captions are provided as a separate transcript only (a transcript alone does not satisfy this criterion).
The one official exception defined in WCAG is media that is itself a media alternative for text. For example, if a web page contains a written article and a video that simply re-presents the same information as that article, and the video is clearly labeled as such, then captions for the video are not required. This exception is narrow and intentional — it should not be used as a loophole to avoid captioning videos that contain substantive content not otherwise available in text form.
Captions may be open (burned directly into the video and always visible) or closed (delivered as a separate track that users can toggle on or off). Both are acceptable under WCAG 1.2.2, though closed captions are generally preferable because they allow users to customize appearance and can be turned off by those who do not need them. In HTML, closed captions are typically implemented using the <track> element with kind='captions' inside a <video> element, pointing to a WebVTT or SRT caption file.
Why It Matters
Approximately 466 million people worldwide live with disabling hearing loss, according to the World Health Organization — a figure projected to rise to over 900 million by 2050. Beyond those with profound deafness, a much larger population experiences situational hearing difficulties: people in noisy environments such as public transport, open-plan offices, or crowded cafés who cannot hear audio even with functioning hearing, users who have temporarily lost access to audio hardware, and people watching content in quiet settings where audio must be muted.
For users who are deaf or hard of hearing, captions are not a convenience — they are the only way to access the spoken content of a video. Without captions, a deaf user visiting an e-commerce site cannot understand a product demonstration video, a deaf student cannot follow a recorded lecture, and a deaf patient cannot absorb the information in a hospital's instructional video. The information is entirely inaccessible to them, regardless of how clearly the video is filmed or how high the production quality.
Captions also benefit people with cognitive and learning disabilities, such as those with attention-deficit/hyperactivity disorder or dyslexia, who may find it easier to process information when it is presented simultaneously in audio and text form. Non-native speakers benefit because captions reinforce comprehension of accented or fast-paced speech. Elderly users with age-related hearing decline — a condition affecting roughly one-third of people over 65 — also rely on captions.
Consider a concrete scenario: a Turkish bank publishes a video on its website explaining how to open a digital account. A potential customer who is deaf visits the site. Without captions, she cannot understand any of the spoken steps, cannot follow the process, and is unable to complete the task — creating both an accessibility failure and a lost business opportunity for the bank. With accurate, synchronized captions, she follows every step, completes the onboarding, and becomes a customer.
Beyond disability access, captions carry measurable SEO and usability benefits. Caption files in WebVTT or SRT format are machine-readable, meaning search engines can index the full text of video content, improving discoverability. Studies have shown that a majority of viewers watch social media video without sound, making captions effectively a usability feature for the mainstream audience. Captions also make it easier for search engines and accessibility evaluation tools to understand the purpose and content of embedded video, contributing to overall page quality signals.
Related Axe-core Rules
WCAG 1.2.2 requires manual testing. There is no automated axe-core rule that reliably detects the absence or inaccuracy of captions for prerecorded synchronized media. The reasons for this are fundamental to how automated testing works:
- Manual testing required — absence of caption track: An automated tool can detect that a
<video>element lacks a<track kind='captions'>element, and some tools will flag this as a potential issue. However, the tool cannot determine whether the video actually contains meaningful audio content (it might be a silent video), whether open (burned-in) captions are present in the video itself, or whether the video qualifies for the media-alternative exception. Human review is required to make that determination. - Manual testing required — caption accuracy and completeness: Even when a caption track file is present and linked, an automated tool cannot assess whether the captions accurately reflect what is spoken, whether they include relevant sound effects and speaker identification, or whether they are properly synchronized with the audio. A caption file containing entirely wrong text, auto-generated gibberish, or captions consistently 10 seconds out of sync would pass automated detection but fail the criterion. Only a human reviewer who watches the video while reading the captions can evaluate accuracy.
- Manual testing required — open captions: If captions are burned into the video file itself (open captions), automated tools have no way to detect this at all. They see only a
<video>element without a track, and cannot analyze the visual content of video frames to determine whether text is present.
Because of these limitations, WCAG 1.2.2 must always include a human review step as part of any comprehensive audit. Automated scans serve as a useful first pass to identify obviously missing track elements, but they cannot substitute for manual evaluation of caption quality, accuracy, and synchronization.
How to Test
- Inventory all synchronized media: Before running any tool, manually review the page to identify every video element — embedded
<video>tags, iframe-embedded players (YouTube, Vimeo, third-party), and any HTML5 media players. List each video and note whether it contains meaningful audio (dialogue, narration, significant sound effects). Videos with no audio track or with only background music that carries no informational content may be treated differently, but document your reasoning. - Run an automated scan with axe DevTools or Lighthouse: Open axe DevTools in the browser's developer tools and run a full-page scan. Look for any violations or incomplete items related to video or audio elements. Lighthouse (run via Chrome DevTools under the Accessibility audit category) will similarly flag videos missing a
<track>element. Note that a clean automated result does not mean the criterion is satisfied — it only means no obviously missing track was detected. Treat automated results as a starting point, not a conclusion. - Inspect the HTML source: For each
<video>element on the page, inspect the DOM to verify whether a<track kind='captions'>element is present and whether itssrcattribute points to a valid, accessible caption file (WebVTT or SRT). Check that thesrclangattribute is set to the appropriate language. Confirm the file loads without a network error by opening the URL directly in the browser. - Play the video and evaluate captions manually: Enable captions in the video player (or confirm they appear automatically if open captions are used). Watch the video in its entirety — or at a statistically representative sample across the beginning, middle, and end. For each segment, verify: (a) dialogue is accurately transcribed with no significant errors, (b) speaker changes are identified where it aids comprehension, (c) meaningful non-speech audio is described in brackets (e.g., [phone ringing]), and (d) captions appear and disappear in sync with the audio — not significantly ahead of or behind the spoken words.
- Test with a screen reader where relevant: Using NVDA with Firefox, VoiceOver with Safari, or JAWS with Chrome, navigate to the video element. Confirm the player controls are keyboard accessible and that toggling captions on and off can be accomplished without a mouse. This tests the usability of the caption feature in addition to its existence.
- Test third-party embedded players: For iframes embedding YouTube or Vimeo players, open the video in the platform directly and verify that captions have been uploaded and are enabled by default or easily toggleable. Auto-generated captions on platforms like YouTube do not satisfy WCAG 1.2.2 unless they have been reviewed and corrected for accuracy.
- Document pass/fail with evidence: For each video tested, record the page URL, video title or description, whether captions were present, and your accuracy assessment. Screenshots or timestamped notes from the review serve as audit evidence.
How to Fix
HTML5 video without any caption track — Incorrect
<!-- Fails 1.2.2: video with audio has no caption track at all -->
<video controls width='800'>
<source src='product-demo.mp4' type='video/mp4'>
Your browser does not support the video element.
</video>
HTML5 video without any caption track — Correct
<!-- Passes 1.2.2: a WebVTT caption track is linked with kind='captions' -->
<video controls width='800'>
<source src='product-demo.mp4' type='video/mp4'>
<!-- The track element links the WebVTT file; srclang and label aid player UI -->
<track
kind='captions'
src='product-demo-captions-en.vtt'
srclang='en'
label='English Captions'
default
>
Your browser does not support the video element.
</video>
Multilingual video with only subtitle track, no captions — Incorrect
<!-- Fails 1.2.2: kind='subtitles' provides translated dialogue but omits -->
<!-- sound effect descriptions and speaker identification needed for deaf users -->
<video controls width='800'>
<source src='webinar.mp4' type='video/mp4'>
<track kind='subtitles' src='webinar-tr.vtt' srclang='tr' label='Turkish'>
</video>
Multilingual video with only subtitle track, no captions — Correct
<!-- Passes 1.2.2: a dedicated captions track is provided alongside subtitles. -->
<!-- The captions file includes [Speaker: Dr. Aydin], [applause], etc. -->
<video controls width='800'>
<source src='webinar.mp4' type='video/mp4'>
<track
kind='captions'
src='webinar-captions-tr.vtt'
srclang='tr'
label='Turkish Captions'
default
>
<track kind='subtitles' src='webinar-en.vtt' srclang='en' label='English'>
</video>
YouTube embed relying on uncorrected auto-generated captions — Incorrect
<!-- Fails 1.2.2: uncorrected auto-captions are not considered accurate captions -->
<iframe
width='800'
height='450'
src='https://www.youtube.com/embed/VIDEOID'
title='Company Introduction Video'
allowfullscreen
></iframe>
YouTube embed with verified human-edited captions — Correct
<!-- Passes 1.2.2 provided that captions have been uploaded or reviewed -->
<!-- in YouTube Studio and confirmed accurate by a human reviewer. -->
<!-- The &cc_load_policy=1 parameter enables captions by default. -->
<iframe
width='800'
height='450'
src='https://www.youtube.com/embed/VIDEOID?cc_load_policy=1&cc_lang_pref=tr'
title='Company Introduction Video'
allowfullscreen
></iframe>
<!-- Also ensure in YouTube Studio that the caption track is marked as -->
<!-- human-reviewed and that auto-captions have been corrected. -->
Common Mistakes
- Providing a text transcript instead of synchronized captions: A transcript that appears below or beside a video satisfies WCAG 1.2.3 (Audio Description or Media Alternative) in some contexts, but it does not satisfy 1.2.2. Captions must be synchronized with the timeline of the video so that text appears at the moment the corresponding audio occurs. A static block of text does not meet this requirement.
- Using
kind='subtitles'instead ofkind='captions': Subtitles are designed for viewers who can hear the audio but do not understand the language — they typically include only spoken dialogue translated into another language. Captions are designed for viewers who cannot hear the audio at all — they must include speaker identification, meaningful sound effects, and other non-speech information. Substituting a subtitle track where a caption track is required is a common and consequential error. - Relying on uncorrected auto-generated captions from YouTube or similar platforms: Auto-generated captions use speech recognition and frequently produce errors, especially for technical terminology, proper nouns, accents, or fast speech. WCAG requires captions to be accurate. Auto-generated captions that have not been reviewed and corrected by a human do not meet the accuracy standard of 1.2.2.
- Including a
<track>element but pointing to a broken or missing VTT file: If thesrcattribute of the track element references a file path that returns a 404 error, the captions will silently fail to load. The HTML passes automated inspection, but the user receives no captions. Always verify the caption file URL resolves correctly in production. - Omitting meaningful non-speech audio from caption files: A caption file that transcribes only spoken words but ignores important sounds — such as an alarm, a phone ringing, applause, or a crash that is central to understanding the video — fails the criterion. Captions must describe all audio information necessary to understand the content, not just the dialogue.
- Captions that are significantly out of sync with the audio: A WebVTT file with incorrect timestamp entries may display captions seconds before or after the corresponding speech. This disrupts comprehension and constitutes a failure, even if the caption text itself is accurate. Always review synchronization across the full duration of the video, particularly around scene cuts and pauses.
- Assuming the media-alternative exception applies broadly: Some teams assume that because they have a written article accompanying a video, the video qualifies as a media alternative and does not need captions. The exception applies only when the video adds no information beyond what the text already contains, the relationship is explicit to users, and the video is clearly labeled as an alternative. A video that adds demonstrations, speaker tone, or visual information not covered in the text does not qualify.
- Placing caption controls only accessible via mouse: Even if captions exist, if the button to enable them in a custom video player is not keyboard accessible — for example, a styled
<div>with an onclick handler but notabindex='0'or keyboard event listener — users who rely on keyboard navigation cannot turn captions on. The caption feature itself must be operable by keyboard. - Not testing captions in embedded or third-party players: Teams often test caption behavior in their development environment but forget that production embeds through third-party players (Vimeo, Wistia, JW Player) may have different caption loading behavior, default states, or API configurations. Always test the caption experience in the actual production embed context.
- Providing captions in only one language when the site serves multilingual audiences: While WCAG does not strictly require captions in every language a site supports, providing captions only in one language when the site and video content are available in Turkish and English, for example, means some users receive accessible content while others do not. Best practice is to provide caption tracks matching each language version of the video.
Relation to Turkey's Accessibility Regulations
Turkey's Presidential Circular 2025/10, published in the Official Gazette numbered 32933 on June 21, 2025, establishes binding web accessibility obligations for a broad range of public and private entities operating in Turkey. The circular mandates conformance with WCAG 2.2 at Level A as a minimum baseline, with Level AA conformance strongly recommended. WCAG 1.2.2 — Captions (Prerecorded) is a Level A requirement, meaning it is among the most fundamental obligations under the circular and non-conformance constitutes a direct regulatory violation.
The entities covered by the circular include public institutions and government bodies at all levels, e-commerce platforms, banks and financial institutions, hospitals and healthcare providers, telecommunications operators with 200,000 or more subscribers, licensed travel agencies, private transportation companies, and private schools authorized by the Ministry of National Education (MoNE). Public institutions are required to achieve compliance within one year of the circular's publication date. Private sector entities within the covered categories have a two-year compliance window.
For organizations in these categories, the practical implication of WCAG 1.2.2 is clear: any video content published on a website or digital platform that includes meaningful audio must be captioned accurately. This is particularly significant for sectors such as banking, where product explanation and onboarding videos are common; healthcare, where patient education videos are frequently published online; and e-commerce, where product demonstration videos are central to the shopping experience. A bank publishing an uncaptioned video tutorial on how to use mobile banking, or a hospital posting an uncaptioned video about post-operative care instructions, would be in direct violation of the circular's Level A requirements.
The circular does not establish a separate Turkish standard — it directly references internationally recognized WCAG 2.2 criteria, which means Turkish compliance teams should follow the WCAG 1.2.2 specification as defined by the W3C and described in this article. Organizations should document their captioning practices, maintain records of caption file versions alongside video content, and include caption accuracy as a standard step in their video content publication workflow. Accessibility audits conducted for regulatory compliance purposes must include manual review of caption accuracy, as automated tools alone are insufficient to demonstrate conformance with this criterion.
