Glossary · plain English
AI Avatar Glossary
Plain-English definitions for the terms you'll meet when evaluating real-time avatars. Each is written to stand alone.
The terms
Real-time AI avatar (interactive avatar)
A rendered digital person you can talk to live: it listens, understands, replies in a synthesized voice, and animates its face in sync — a two-way conversation, unlike generated avatar videos, which are pre-rendered one-way clips. → full explainer
AI video generator
Software that produces finished videos of an avatar reading a script — Synthesia, HeyGen and D-ID are the best-known. Great for training and marketing content; not conversational. → interactive vs video-gen
Digital human
Umbrella term for lifelike rendered people, both interactive and pre-rendered. → full explainer
AI receptionist
Software handling a business's front-desk interactions — answering, booking, greeting, routing — via conversational AI. Comes in two forms: phone voice agents and visual avatar receptionists. → full explainer
Avatar memory (per-user memory)
The ability to recognize a returning user and recall their previous conversations and preferences — turning each interaction into a continuing relationship instead of a fresh start. Three levels exist: stateless (none), in-session context (forgets at hangup), and persistent per-user memory. Few platforms include the third by default (Selvia AI does; others market it or leave it to you to build). → the memory page
Turn-taking latency
The pause between a user finishing a sentence and the avatar starting its reply — the single biggest factor in whether a conversation feels natural. Measured end-to-end (hearing → thinking → speaking → rendering), not per-component.
STT / TTS
Speech-to-text (the avatar's ears) and text-to-speech (its voice). In all-inclusive platforms they're bundled; in avatar-only APIs you buy and integrate them separately.
Lip sync
Matching the avatar's mouth movements to its speech audio. Quality here, more than overall realism, determines whether an avatar reads as 'off'.
All-inclusive (turnkey) pricing
One rate covering the full pipeline — avatar rendering, speech, language model, and (rarely) memory. Contrast: render-only / avatar-only pricing, where the rate covers just the face and you supply speech and brain yourself. Comparing one of each on per-minute price is the classic apples-to-oranges mistake. → the true cost, decoded
Session cap
A plan limit on concurrent sessions or minutes per session. Irrelevant for short web chats; disqualifying for always-on uses.
Kiosk avatar (always-on avatar)
An avatar deployed on a physical screen in a public space, running continuously — the most demanding deployment type: it needs no session caps, robustness to noisy rooms, and pricing that survives an 8-hour day. → kiosk guide
Custom avatar / replica
An avatar generated to look like a specific real person, typically from short video samples (Tavus's replicas are the best-known example). Distinct from choosing a face from a curated library.
Embedded widget vs API
The two integration routes: a drop-in script or iframe a non-developer can add to a site, versus programmatic endpoints for building the avatar into your own product. Platforms offer one or both.
See the terms in action
A real-time avatar with per-user memory, $1/hour all-inclusive — 60 free minutes monthly.
Book a Free Demo