Video is the web's dominant content format — but without captions, transcripts, and audio descriptions, it excludes millions of users and exposes your organization to serious legal risk. This guide breaks down exactly what WCAG requires, how each accessibility layer works, and the practical steps to implement them across your site.

Here is a number that should stop any website owner cold: 4,187 digital accessibility lawsuits were filed in the United States in 2024 alone, and 2025 is tracking 37% higher. Video content sits at the center of many of these cases, because video without proper accessibility features is one of the most conspicuous compliance failures an auditor — or a plaintiff's attorney — can find. Yet the problem goes far beyond legal exposure. With more than 48 million Americans experiencing some degree of hearing loss and millions more who are blind or have low vision, inaccessible video simply means your message never reaches a significant share of your audience. The good news is that making video accessible is entirely achievable, and the techniques involved — captions, transcripts, and audio descriptions — also deliver measurable business benefits that have nothing to do with compliance.

Why Video Accessibility Is No Longer Optional

The legal landscape around video accessibility has sharpened dramatically in recent years. On April 8, 2024, the U.S. Department of Justice (DOJ) issued a final rule that improves web and mobile app access under Title II of the ADA, aligning with WCAG 2.1 Level AA — establishing it as the standard for state and local government websites and apps. The updated rule requires these entities to provide captions, audio descriptions, and accessible video players, as well as ensure keyboard navigation for ADA video compliance. For public entities serving populations over 50,000, the compliance deadline is April 24, 2026. Smaller public entities have until April 2027.

Private businesses fall under ADA Title III, which carries no single federal deadline but faces vigorous, ongoing litigation. Courts increasingly reference WCAG 2.1 Level AA as the standard, making proactive compliance the safer path. This development confirms that captions and audio descriptions are essential "auxiliary aids" under the ADA, making digital content more accessible for all users.

Beyond legal risk, there is a powerful audience argument. According to a survey conducted by Verizon Media and Publicis Media, 80% of those who use captions don't have a hearing impairment — and 50% believe captions are important since they often watch videos with no audio. Video viewing is increasingly done in public, with 69% of those surveyed saying they watched video with the sound off in public areas. In other words, accessibility features are mainstream viewer preferences, not niche accommodations.

71% of people with disabilities leave a website immediately if it is not accessible. Every inaccessible video on your site is a door you are actively closing to a substantial segment of your audience — and search engines are not immune to the problem either, as we will see later.

The WCAG Framework: What the Guidelines Actually Require

The Web Content Accessibility Guidelines (WCAG) are organized into four core principles — Perceivable, Operable, Understandable, and Robust — and three conformance levels: A, AA, and AAA. Level AA is the target for legal compliance in most jurisdictions, and it covers the full spectrum of video accessibility requirements. Understanding which success criteria apply to which type of content is essential before you can prioritize your remediation work.

For prerecorded synchronized media (a video with both audio and visual content), the key Level A and AA requirements are:

SC 1.2.1 — Audio-only and Video-only (Prerecorded): For prerecorded audio-only and prerecorded video-only media, an alternative for time-based media must be provided that presents equivalent information for the content.
SC 1.2.2 — Captions (Prerecorded): Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such.
SC 1.2.3 — Audio Description or Media Alternative (Prerecorded): An alternative for time-based media or audio description of the prerecorded video content is provided for synchronized media, except when the media is a media alternative for text and is clearly labeled as such. This is a Level A requirement.
SC 1.2.4 — Captions (Live): Captions are provided for all live audio content in synchronized media. This is a Level AA requirement.
SC 1.2.5 — Audio Description (Prerecorded): Audio description is provided for all prerecorded video content in synchronized media. This is the stricter Level AA version of SC 1.2.3.

It is worth noting that WCAG 2.1 and 2.2 do not introduce differences from 2.0 that apply to captioning or audio description requirements, so the fundamental obligations have been consistent across recent versions. What has changed is the legal and regulatory landscape that references these criteria.

A common misconception is that providing a transcript satisfies the captioning requirement. It does not. Transcripts alone are insufficient for video content, because the text must be synchronized with the video. A transcript and captions serve overlapping but distinct purposes.

Captions: The Foundation of Accessible Video

Captions are a synchronized, time-coded text representation of a video's audio track. Unlike subtitles, which assume the viewer can hear but does not understand the language, closed captions assume the viewer cannot hear. They make video accessible to deaf and hard of hearing users by providing a time-to-text track as a supplement to, or substitute for, the audio — and while caption text is predominantly speech, captions also include non-speech elements like speaker IDs and sound effects that are critical to understanding the content.

Quality is the variable that separates genuinely accessible captions from a checkbox exercise. The industry standard for caption accuracy is 99%. The University of Minnesota at Duluth's Media Hub reports that YouTube's automatic captions are only 60–70% accurate, depending on audio quality. That gap matters enormously: captions riddled with errors are not just unhelpful — they actively mislead deaf and hard-of-hearing viewers, misrepresenting the content they depend on. For production workflows, AI-generated captions should be treated as a first draft requiring human review, not a finished product.

High-quality captions share three characteristics described by the Described and Captioned Media Program (DCMP): they are accurate (errorless captions are the goal), consistent (uniformity in style and presentation), and clear (a complete textual representation of the audio, including speaker identification and non-speech information). On the technical side, readability also depends on caption display. WCAG guidelines recommend a minimum contrast ratio of 4.5:1 for text, while font size should be at least 14 point — fonts with thin strokes or unusual characteristics are discouraged because they are harder to read.

The two dominant caption file formats for the web are WebVTT and SRT. WebVTT is the recommended format for web video — it is the native caption format for HTML5 video players, supports styling options, and is widely supported across browsers and video platforms. SRT is the other common format and works well for most platforms but has fewer styling options than VTT. Here is a minimal example of an HTML5 video element with a caption track attached:

<video controls>
  <source src='product-demo.mp4' type='video/mp4'>
  <track
    kind='captions'
    src='product-demo-en.vtt'
    srclang='en'
    label='English'
    default>
</video>

The kind='captions' attribute is important — it signals to the browser and assistive technologies that this track is intended for deaf and hard-of-hearing users rather than for language translation. Adding the default attribute causes captions to be shown automatically, which is worth considering for content-heavy pages where a user may not notice the CC button.

For live video — webinars, live streams, virtual events — WCAG 2.1 Level AA requires captions for all live audio in synchronized media, which is especially important for webinars, live events, and real-time broadcasts. Platforms like Zoom support live captions using automatic speech recognition, and also provide a mechanism for integrating human captioners for higher accuracy requirements.

Transcripts: Broader Accessibility, Deeper Reach

A transcript is a written document that captures everything in a video — all spoken dialogue, relevant sound effects, and (for descriptive transcripts) important visual information. A transcript provides a word-for-word text version of the audio portion of video content, as well as non-speech audio information that helps the reader understand the content — and a descriptive transcript goes a step further, adding visual information that helps people understand the content.

Under WCAG 2.1 Level AA, transcripts are strictly required for audio-only content such as podcasts and audio recordings. For captioned video, transcripts are not required by WCAG 2.1 Level AA — however, transcripts are recommended for all videos since they are more accessible than captions for people who are deaf-blind, and also benefit people with slow internet connections, those who want to quickly scan or search a video's content, and people who simply prefer text. Best practice is to provide them regardless of strict WCAG obligation.

When writing a descriptive transcript, aim to include:

All spoken dialogue, attributed to individual speakers
Meaningful sound effects and non-verbal audio cues (e.g., [applause], [alarm sounds])
Descriptions of on-screen text, charts, or visuals that are not explained verbally
Scene-setting information where it affects comprehension

One practical debate is whether transcripts should be verbatim or lightly edited. While some resources insist on verbatim transcripts, edited transcripts are often the better option — because you are writing for real humans, and clear and concise language improves accessibility. Removing filler words like "um" and "uh" generally improves readability without sacrificing accuracy.

Transcripts also deliver a significant SEO dividend. Search engines cannot watch your video, but they can index your captions and transcripts — and adding a text transcript to your video page gives search engines crawlable content that matches search queries. Discovery Digital Networks performed an experiment on their YouTube channel comparing videos with and without closed captions; they found that captioned videos enjoyed 7.32% more views on average, and confirmed that captions were indexed by search bots — testing this by querying a phrase that appeared nowhere except in the captions, with the video appearing fourth in YouTube search results.

Audio Descriptions: Accessibility for Blind and Low-Vision Users

Audio descriptions (AD) address a completely different accessibility barrier from captions. Where captions serve users who cannot hear, audio descriptions serve users who cannot see. Audio description is a narration of meaningful visual information in a video to provide context, clarify speakers, and articulate visual elements — think of it like alternative text for videos. Examples of relevant information include facial expressions and scenes — anything that a sighted viewer absorbs visually but that is not conveyed through dialogue or narration.

Not every video needs audio descriptions. Generally, if you close your eyes but can still follow the program — such as a talking-head interview where the speaker explains everything verbally — you probably do not need it. However, if someone refers to visuals in a presentation without describing them aloud, audio description would likely be necessary. A product demo that shows a UI being clicked without narrating the actions, a training video describing a diagram, or a marketing video heavy on scene-setting visuals — these all require audio description.

There are two types of audio description to understand:

Standard audio description: Descriptions use natural pauses in the existing soundtrack to insert narration of visual elements such as actions, settings, character appearance, body language, costumes, lighting, and on-screen text.
Extended audio description: With extended description, the video momentarily pauses to allow more time for descriptions when needed. For extended AD, provide a version of the movie with extended audio descriptions and a version without. This is required under WCAG Level AAA (SC 1.2.7) but is a best practice when standard pauses are insufficient.

Implementing audio descriptions in a web context has practical challenges. One of the challenges with implementing audio description is player support — most browsers and video players do not support audio descriptions in the same way they support captions. However, Able Player is a fully accessible cross-browser HTML5 media player that supports audio description as a separate video or in a WebVTT file read aloud by modern browsers. The most reliable production technique remains recording a separate version of the video with the description audio baked into the soundtrack, and offering users a clearly labeled toggle between the standard and described versions.

WCAG 2.1 AA standards require that audio descriptions provide equivalent access to visual information, meaning they must capture the key details a sighted viewer would understand. Write descriptions in plain, objective language. Describe what is actually on screen, not your interpretation of it — for example, say "A student raises her hand," not "A student looks eager to answer."

Accessible Video Players: The Often-Overlooked Layer

Even perfect captions and audio descriptions are worthless if the video player itself cannot be operated by keyboard or assistive technology. The player is the delivery mechanism, and it must be accessible in its own right. Many users navigate the web using only a keyboard or assistive technology, so all content should be operable via a keyboard interface without a mouse.

Key player accessibility requirements include full keyboard operability (play, pause, seek, volume, caption toggle, full-screen all reachable by keyboard), visible focus indicators on controls, ARIA labels for all interactive elements, and caption controls that are easy to locate. Section 508 also requires that user controls for captions and audio descriptions be available at the same level as volume controls or play/pause buttons.

Auto-play is a common accessibility hazard that deserves special attention. Videos that start automatically can be frustrating for many users and pose serious issues for viewers with attention disorders, autism, or visual impairments who rely on screen readers — auto-playing content may interfere with screen reader output, creating confusion and hindering access. Disable auto-play by default on all video embeds, and if you must use it, ensure volume starts muted and a pause mechanism is immediately accessible.

When embedding third-party video (YouTube, Vimeo, Wistia, etc.), confirm that the platform's embed code passes keyboard focus correctly and that the iframe has a meaningful title attribute so screen reader users know what they are interacting with before they navigate into the player:

<iframe
  src='https://www.youtube-nocookie.com/embed/VIDEO_ID'
  title='Product walkthrough: Setting up your dashboard'
  allowfullscreen>
</iframe>

Building an Accessible Video Workflow

The most sustainable approach to video accessibility is not remediation after the fact — it is integrating accessibility into your production and publishing pipeline from the start. The cost of retrofitting a large video library can be substantial; the cost of building it right the first time is marginal in comparison.

A practical workflow looks like this. During pre-production, write a detailed script. A complete script is the foundation for every downstream accessibility asset — captions, transcripts, and audio description scripts all become dramatically easier when good source material exists. During production, minimize background noise, use clear speech, and ensure on-screen text, graphics, and meaningful visual actions are verbally narrated where possible. This reduces the audio description burden significantly.

Post-production is where the accessibility assets are produced. Use your AI captioning tool of choice to generate a first draft, then have it reviewed and corrected by a human — especially for technical terminology, proper names, and domain-specific language where AI transcription is most prone to errors. Create the descriptive transcript by combining your caption file with descriptions of meaningful visual information. Produce audio description narration using either in-house voice talent or a professional AD service.

For organizations with large existing video libraries, prioritize remediation by usage. Start with your highest-traffic videos, onboarding and training content, product demos, and any video embedded on pages that appear in conversion funnels. Begin accessibility audits now, prioritize high-use materials first, then build accessibility into all new video workflows moving forward.

A common, expensive mistake is treating captions as a final-stage deliverable — something added just before publishing. Build caption review into your QA checklist the same way you would check video encoding or thumbnail creation. One hour of effort at the right point in the workflow saves many hours of remediation later.

The Business Case: Beyond Compliance

Accessible video is better video for every viewer, not just those with disabilities. The data on this point is compelling. A national research study surveying 2,124 students from 15 different colleges and universities found that 98.6% of students find captions helpful. 71% of students without hearing difficulties use captions at least some of the time, and 66% of ESL students find captions "very" or "extremely" helpful.

The engagement impact is similarly significant. Facebook found that captions increased video views by 12% compared with uncaptioned videos; a separate study measured a 40% increase in views for captioned videos, and found that viewers were 80% more likely to watch a video to completion when closed captions were available.

The SEO benefits stack on top of the engagement benefits. Video transcripts help maximize SEO because they give search engines context — this can mean videos have higher visibility in the search engine results pages when a user types in a related search. Transcripts also make it easy to create blog posts, newsletters, or social media snippets from your videos — turning a single piece of video content into a multi-channel content asset at minimal additional cost.

Finally, consider the long-term demographic trajectory. The World Health Organization estimates that by 2050, nearly 2.5 billion people will have some degree of hearing loss, and 1 in 10 will have significant hearing loss. The audience that depends on accessible video is not shrinking. Every investment you make in video accessibility today pays compounding dividends as that audience grows.

Key Takeaways

Captions are mandatory for all prerecorded and live synchronized media under WCAG 2.1 Level AA. Auto-generated captions are a starting point only — industry best practice requires 99% accuracy, which means human review of AI output is non-negotiable for any public-facing content.
Transcripts are strongly recommended for all video even where not strictly required, because they serve deaf-blind users, improve SEO by giving search engines crawlable text, and benefit any viewer who prefers to skim or reference content in text form.
Audio descriptions are required at WCAG Level AA for prerecorded video that contains meaningful visual information not conveyed through audio. Test by closing your eyes — if you miss important content, audio description is needed.
Your video player must be keyboard-accessible with properly labeled controls for captions and audio descriptions. An inaccessible player undermines every other accessibility investment you have made in the content itself.
The business case for video accessibility is independently strong: captioned videos receive significantly more views and completions, transcripts improve SEO rankings, and 80% of caption users have no hearing impairment — accessible video reaches a wider audience on every metric that matters to your organization.

Making Videos Accessible: Captions, Transcripts, and Audio Descriptions