Glossary · plain English

AI Avatar Glossary

Plain-English definitions for the terms you'll meet when evaluating real-time avatars. Each is written to stand alone.

The terms

Real-time AI avatar (interactive avatar)

A rendered digital person you can talk to live: it listens, understands, replies in a synthesized voice, and animates its face in sync — a two-way conversation, unlike generated avatar videos, which are pre-rendered one-way clips. full explainer

AI video generator

Software that produces finished videos of an avatar reading a script — Synthesia, HeyGen and D-ID are the best-known. Great for training and marketing content; not conversational. interactive vs video-gen

Digital human

Umbrella term for lifelike rendered people, both interactive and pre-rendered. full explainer

AI receptionist

Software handling a business's front-desk interactions — answering, booking, greeting, routing — via conversational AI. Comes in two forms: phone voice agents and visual avatar receptionists. full explainer

Avatar memory (per-user memory)

The ability to recognize a returning user and recall their previous conversations and preferences — turning each interaction into a continuing relationship instead of a fresh start. Three levels exist: stateless (none), in-session context (forgets at hangup), and persistent per-user memory. Few platforms include the third by default (Selvia AI does; others market it or leave it to you to build). the memory page

Turn-taking latency

The pause between a user finishing a sentence and the avatar starting its reply — the single biggest factor in whether a conversation feels natural. Measured end-to-end (hearing → thinking → speaking → rendering), not per-component.

STT / TTS

Speech-to-text (the avatar's ears) and text-to-speech (its voice). In all-inclusive platforms they're bundled; in avatar-only APIs you buy and integrate them separately.

Lip sync

Matching the avatar's mouth movements to its speech audio. Quality here, more than overall realism, determines whether an avatar reads as 'off'.

All-inclusive (turnkey) pricing

One rate covering the full pipeline — avatar rendering, speech, language model, and (rarely) memory. Contrast: render-only / avatar-only pricing, where the rate covers just the face and you supply speech and brain yourself. Comparing one of each on per-minute price is the classic apples-to-oranges mistake. the true cost, decoded

Session cap

A plan limit on concurrent sessions or minutes per session. Irrelevant for short web chats; disqualifying for always-on uses.

Kiosk avatar (always-on avatar)

An avatar deployed on a physical screen in a public space, running continuously — the most demanding deployment type: it needs no session caps, robustness to noisy rooms, and pricing that survives an 8-hour day. kiosk guide

Custom avatar / replica

An avatar generated to look like a specific real person, typically from short video samples (Tavus's replicas are the best-known example). Distinct from choosing a face from a curated library.

Embedded widget vs API

The two integration routes: a drop-in script or iframe a non-developer can add to a site, versus programmatic endpoints for building the avatar into your own product. Platforms offer one or both.

See the terms in action

A real-time avatar with per-user memory, $1/hour all-inclusive — 60 free minutes monthly.

Book a Free Demo