AI Voice Agent for Real Estate: Build One with OpenAI Realtime + LiveKit
Build an AI real estate voice agent with LiveKit + OpenAI Realtime. Answer calls, qualify leads, and book showings. Optional Twilio SIP.
By James Le
This step‑by‑step tutorial guides you from zero to a phone‑call‑capable AI assistant. You’ll learn the building blocks—LiveKit rooms and agents, realtime speech with OpenAI Realtime, telephony via Twilio SIP routing into LiveKit, and traces in Langfuse—then assemble them into a production-ready flow.
You will build:
- a minimal LiveKit Agent that connects to a room,
- the same agent speaking & listening via OpenAI Realtime,
- a telephony entry point (Twilio → LiveKit SIP inbound trunk + dispatch rule),
- a clean hang‑up tool, BVC noise‑cancellation, and Langfuse telemetry.
LiveKit Agents is the realtime framework we’ll use to build, run, and dispatch voice agents. OpenAI Realtime API gives low‑latency “speech‑in/speech‑out” with turn detection. Twilio provides the PSTN phone number and SIP routing into LiveKit. Langfuse ingests OTLP traces so you can inspect sessions.
0) Prerequisites
Why this matters: you’ll use uv to manage the Python project, LiveKit Cloud to host rooms & SIP, Twilio for PSTN, OpenAI for realtime speech, and Langfuse for observability.
- Python 3.11+
uvpackage & project manager (install and basics).- LiveKit Cloud account +
lkCLI (auth, projects). - Twilio account with a phone number.
- OpenAI API key (Realtime).
- Langfuse project (OTLP ingestion).
1) Concepts & Architecture (the mental model)
Goal: understand the moving parts before writing code.
- LiveKit room: a realtime space where participants (our agent and a SIP “caller” participant) exchange audio.
- Agent worker: a Python process running LiveKit Agents that joins rooms and speaks for your app.
- OpenAI Realtime: turns your agent into “ears + brain + voice” in one model with built‑in turn detection (no manual STT/TTS).
- SIP inbound + dispatch rule: Twilio routes calls to LiveKit SIP; a dispatch rule decides which room and which agent join.
- Hang‑up: end the call by deleting the room—disconnects everyone.
- Noise cancellation: enable BVC (background voice cancellation) to keep telephony audio clean.
- Tracing: send OTLP traces to Langfuse for session‑level observability.
Call flow in words:
Caller dials Twilio number → Twilio SIP routes INVITE → LiveKit SIP inbound trunk → dispatch rule creates room (e.g., call‑abc) and dispatches agent → agent joins room and speaks via OpenAI Realtime. Hang‑up deletes the room.
2) Project Scaffold with uv
Purpose: create a clean, reproducible Python project.
Directory layout
phone-ai
├── .python-version
├── .env.local # secrets (DO NOT COMMIT)
├── pyproject.toml
├── livekit.toml # created later by CLI or provided manually
└── src
├── __init__.py
├── agent.py
└── tools.py
uvis a fast package & project manager; we’ll useuv syncto install anduv runto execute.
pyproject.toml
[project]
name = "phone-ai"
version = "0.1.0"
description = "Phone AI: LiveKit + Twilio + OpenAI Realtime + Langfuse"
requires-python = ">=3.11"
dependencies = [
"python-dotenv>=1.0",
"livekit-agents[mcp,openai,silero,turn-detector", # core agent framework
"livekit-plugins-noise-cancellation", # BVC & telephony-optimized filters
"twilio>=9", # useful for phone-side utilities (optional)
]
[tool.uv] # optional; uv works without this section tooInstall deps:
uv syncLiveKit Agents & the OpenAI plugin are published on PyPI. The noise‑cancellation plugin is separate and supports BVC. OTLP HTTP exporter is the standard way to send traces.
3) Configure local secrets
Create .env.local (never commit):
# LiveKit (Cloud)
LIVEKIT_URL=wss://<your-project-subdomain>.livekit.cloud
LIVEKIT_API_KEY=REDACTED
LIVEKIT_API_SECRET=REDACTED
# OpenAI
OPENAI_API_KEY=REDACTED
# Langfuse (OTLP HTTP endpoint)
LANGFUSE_HOST=https://us.cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=REDACTED
LANGFUSE_SECRET_KEY=REDACTEDWe’ll programmatically set
OTEL_EXPORTER_OTLP_*to point at Langfuse’s OTLP endpoint (/api/public/otel) with Basic Auth.
4) Hello, Room: the smallest possible Agent
Purpose: prove we can start a worker and join a room—no speech yet.
Concepts: Agent, AgentSession, cli.run_app, and the worker entrypoint.
Create src/agent.py (step 1):
# src/agent.py
from dotenv import load_dotenv
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, RoomInputOptions, cli
load_dotenv(".env.local", override=True)
INSTRUCTIONS = "You are a polite assistant. Keep replies short."
class HelloAgent(Agent):
async def on_enter(self) -> None:
# Generate a first reply when the agent joins the room.
await self.session.generate_reply()
async def entrypoint(ctx: JobContext):
session = AgentSession() # No speech yet
await session.start(
room=ctx.room,
agent=HelloAgent(),
room_input_options=RoomInputOptions(), # defaults
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, agent_name="phone-ai"))Run it:
uv run src/agent.py startThis mirrors the official “build a voice agent” pattern (we’ll add audio shortly).
5) Give the agent a voice with OpenAI Realtime
Purpose: add realtime speech comprehension + synthesis (one model), plus turn detection.
Why Realtime: keeps latency low and handles endpointing/interruptions for you. In LiveKit Agents, you pass a RealtimeModel instead of a text‑only LLM or separate STT/TTS.
Replace entrypoint() in src/agent.py (step 2):
from livekit.plugins import openai # add this import at top
async def entrypoint(ctx: JobContext):
# 1) Create a Realtime model with a built-in voice.
# (You can change "marin" to any available Realtime voice.)
session = AgentSession(
llm=openai.realtime.RealtimeModel(voice="marin")
# If you want fine-grained control, see docs for configuring
# input_audio_transcription and turn_detection parameters.
)
# 2) Start the session with your agent in the current room.
await session.start(room=ctx.room, agent=HelloAgent())LiveKit’s OpenAI Realtime plugin exposes a Python
RealtimeModel. Use it directly in the session config to get speech‑in/speech‑out. For deeper control, consult the resources section for configuring semantic VAD turn detection, transcription model, and more. OpenAI Realtime overview & API reference are listed in Resources.
6) Add a tool to hang up cleanly + telephony noise cancellation
Why: when the conversation ends, delete the room so the phone leg drops immediately; enable BVC (telephony‑optimized) noise cancellation for clarity.
Create src/tools.py:
# src/tools.py
from livekit import api
from livekit.agents import get_job_context
async def hangup_call():
"""
Ends the call by deleting the room (disconnects all participants).
Requires LIVEKIT_* env vars.
"""
ctx = get_job_context()
if ctx and ctx.room:
await ctx.api.room.delete_room(api.DeleteRoomRequest(room=ctx.room.name))Update src/agent.py (step 3):
from livekit.agents.llm import function_tool
from livekit.plugins import noise_cancellation # add
class PhoneAgent(Agent):
def __init__(self) -> None:
super().__init__(instructions=INSTRUCTIONS)
@function_tool
async def end_call(self, ctx):
"""Politely end the call and hang up."""
from tools import hangup_call
await ctx.wait_for_playout()
await hangup_call()
async def on_enter(self) -> None:
await self.session.generate_reply()
async def entrypoint(ctx: JobContext):
session = AgentSession(
llm=openai.realtime.RealtimeModel(voice="marin"),
)
await session.start(
room=ctx.room,
agent=PhoneAgent(),
room_input_options=RoomInputOptions(
# Enable Telephony-optimized Background Voice Cancellation
noise_cancellation=noise_cancellation.BVCTelephony(),
),
)Delete room is the recommended way to hang up for all participants. BVC is LiveKit Cloud’s enhanced noise/voice cancellation (Krisp), ideal for voice AI; a SIP‑telephony setting is available on trunks and in‑agent filters.
7) Run locally (no phones yet)
# Start the worker
uv run src/agent.py startYou can join a test room from another client (e.g., LiveKit Agents Playground) to see the agent respond.
8) Connect a phone number: Twilio → LiveKit SIP → Dispatch your agent
You have two common inbound paths. Choose one:
A) Twilio Programmable Voice (TwiML Sip to LiveKit)
- Create a TwiML Bin in Twilio:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Dial>
<Sip>
sip:<your-project-subdomain>.sip.livekit.cloud
</Sip>
</Dial>
</Response>-
Point your Twilio phone number’s Voice URL to this TwiML Bin.
-
In LiveKit, create an inbound trunk and a dispatch rule (below).
Twilio
<Sip>dials any SIP endpoint; see Resources for a full TwiML path for inbound voice.
B) Twilio Elastic SIP Trunking (scalable, SIP‑native)
Set your Origination SIP URI to your LiveKit SIP domain (visible in LiveKit UI or project settings) and proceed with the same LiveKit trunk/dispatch setup.
Create LiveKit inbound trunk (with Krisp enabled) and dispatch rule
inbound-trunk.json:
{
"trunk": {
"name": "Inbound Trunk",
"numbers": ["+1YOURNUMBER"],
"krispEnabled": true
}
}lk sip inbound create inbound-trunk.jsonThe inbound trunk accepts calls from your provider and can enable Krisp noise cancellation globally.
dispatch-rule.json (create per‑caller rooms and explicitly dispatch your agent):
{
"dispatch_rule": {
"rule": {
"dispatchRuleIndividual": { "roomPrefix": "call-" }
},
"roomConfig": {
"agents": [{ "agentName": "phone-ai" }]
}
}
}lk sip dispatch-rule create dispatch-rule.jsonDispatch rules control how inbound calls land in rooms and which agents join (explicit dispatch recommended for SIP).
Test: start your worker (uv run src/agent.py start), call your Twilio number, and confirm a LiveKit room like call-xyz appears with your agent joined.
Notes
- Twilio
<Sip>TwiML reference if using Programmable Voice. - Twilio SIP interface docs (general).
- LiveKit inbound calls & Twilio flow.
9) Add Langfuse observability via OpenTelemetry
Purpose: ship traces for each session to Langfuse’s OTLP endpoint—debug audio turns, latencies, and errors.
Add this helper to src/agent.py (near imports):
import base64, os
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
def setup_langfuse_otel():
host = os.getenv("LANGFUSE_HOST", "").rstrip("/")
pub = os.getenv("LANGFUSE_PUBLIC_KEY")
sec = os.getenv("LANGFUSE_SECRET_KEY")
if not (host and pub and sec):
return None
# Langfuse OTLP endpoint + Basic Auth
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = f"{host}/api/public/otel/v1/traces"
os.environ["OTEL_EXPORTER_OTLP_PROTOCOL"] = "http/protobuf"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = "Authorization=Basic " + base64.b64encode(f"{pub}:{sec}".encode()).decode()
tp = TracerProvider()
tp.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
return tpInitialize it in entrypoint():
tp = setup_langfuse_otel()
# (Optionally register shutdown callback to flush)Langfuse provides an OTLP HTTP ingestion endpoint; the standard Python exporter is
opentelemetry-exporter-otlp-proto-http. OTLP/HTTP & exporter environment variables are standard OpenTelemetry.
10) Make it truly conversational (prompt & behavior)
Refine the agent’s instructions:
INSTRUCTIONS = """
You are a friendly receptionist for ACME.
- Greet the caller briefly.
- Collect name and reason for calling.
- Offer to schedule a follow-up if appropriate.
- Keep turns short; don't overtalk the caller.
- When the caller is finished, call the `end_call` tool.
"""11) Optional: Scheduling via an MCP server (Cal.com)
If you want the agent to book meetings, you can run a Cal.com Model Context Protocol server and expose scheduling tools. Start one via NPX and point it to your Cal.com API key:
npx -y @calcom/cal-mcpThen, in LiveKit Agents, you can attach MCP servers (stdio) to your session if you choose to expand beyond basic voice. (Full MCP integration is out of scope for the core telephony build, but the server exists and is easy to pilot.)
12) Deploy to LiveKit Cloud (production path)
While local is great for development, you’ll likely deploy the worker and let LiveKit Cloud handle dispatch at scale:
# Authenticate once
lk cloud auth
# Create & deploy your agent from the working dir
lk agent create
lk agent deployThe CLI will generate/update a livekit.toml in your project, track your agent ID, and build a container image for you.
13) End‑to‑end test checklist
- Worker running locally or deployed.
- Call your Twilio number; LiveKit SIP should create a room (e.g.,
call-...) and dispatch youragent_name. - Speak and interrupt the agent; Realtime handles turn detection.
- Ask the agent to end the call → verifies the
end_calltool (room deletion). - Confirm noise cancellation effective (BVC telephony).
- See Langfuse traces arriving via OTLP.
14) Troubleshooting
- Agent joins but call doesn’t end: ensure you’re calling
delete_room(...)on the LiveKit API; merely stopping the agent leaves the caller in silence. - Dispatch not working: verify your
dispatch-rule.jsonschema and thatagentNamematchesagent_nameyou passed toWorkerOptions. - No audio / choppy audio: enable BVC (either trunk‑level
krispEnabledorRoomInputOptionsfilter) and test again. - Realtime voice not speaking: confirm
OPENAI_API_KEYis set and you’re using the Realtime plugin (not a text‑only LLM). uvenvironment issues: re‑runuv sync, and verify Python 3.11+ is active.
15) Full, self‑contained file listing (final state)
pyproject.toml — see §2.
.env.local — see §3 (Sensitive keys omitted).
src/tools.py — see §6.
src/agent.py — consolidated:
from dotenv import load_dotenv
import base64, os
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, RoomInputOptions, cli
from livekit.agents.llm import function_tool
from livekit.plugins import openai, noise_cancellation
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
load_dotenv(".env.local", override=True)
def setup_langfuse_otel():
host = os.getenv("LANGFUSE_HOST", "").rstrip("/")
pub = os.getenv("LANGFUSE_PUBLIC_KEY")
sec = os.getenv("LANGFUSE_SECRET_KEY")
if not (host and pub and sec):
return None
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = f"{host}/api/public/otel/v1/traces"
os.environ["OTEL_EXPORTER_OTLP_PROTOCOL"] = "http/protobuf"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = "Authorization=Basic " + base64.b64encode(f"{pub}:{sec}".encode()).decode()
tp = TracerProvider()
tp.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
return tp
INSTRUCTIONS = """
You are a friendly receptionist. Keep turns short.
Collect caller name & reason. Offer a follow-up. Use the end_call tool to hang up.
"""
class PhoneAgent(Agent):
def __init__(self) -> None:
super().__init__(instructions=INSTRUCTIONS)
@function_tool
async def end_call(self, ctx):
"""Politely end the call and hang up for everyone."""
from tools import hangup_call
await ctx.wait_for_playout()
await hangup_call()
async def on_enter(self) -> None:
await self.session.generate_reply()
async def entrypoint(ctx: JobContext):
setup_langfuse_otel()
session = AgentSession(
llm=openai.realtime.RealtimeModel(voice="marin"),
)
await session.start(
room=ctx.room,
agent=PhoneAgent(),
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVCTelephony(),
),
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, agent_name="phone-ai"))16) What you learned
- Core LiveKit building blocks: rooms, agents, dispatch, and SIP inbound trunks.
- Realtime voice with OpenAI Realtime via LiveKit’s plugin (low latency, turn detection).
- Telephony routing (Twilio → LiveKit SIP) with explicit agent dispatch.
- Clean termination (delete room) and noise cancellation (BVC).
- Observability with OTLP → Langfuse.
17) Extensions & next steps
- Separate TTS while keeping Realtime STT: configure Realtime model for text output (
modalities=["text"]) and pair with your chosen TTS. - Outbound calling (dial out to users) with SIP participants.
- Advanced dispatch and scaling in LiveKit Cloud via
lk agenttooling. - Add MCP tools (e.g., Cal.com) for scheduling workflows.
Resources
-
LiveKit Agents (Python) docs & reference: building agents, sessions, worker lifecycle. (LiveKit docs)
-
OpenAI Realtime: conceptual overview & API reference. (OpenAI Platform)
-
LiveKit ↔ OpenAI Realtime plugin (Python): how to pass
RealtimeModel. (LiveKit docs) -
SIP inbound & Twilio Voice: trunks, dispatch rules, TwiML
<Sip>. (LiveKit docs) -
Noise cancellation (Krisp/BVC): feature overview & trunk options. (LiveKit docs)
-
Langfuse OTLP ingestion (getting started). (Langfuse)
-
Example repos
Redactions / sensitive items
- API keys and secrets are omitted by design. Use
.env.locallocally and LiveKit secrets for production deployments.
That’s it! You’ve built a phone‑call voice AI from fundamentals, not just copy‑pasting: LiveKit room/agent basics → Realtime speech → Twilio SIP → clean hang‑up → Langfuse traces. From here, iterate on prompts, add tools (search, calendar), and move to LiveKit Cloud for resilient, auto‑scaled deployments.