Case study
Self‑Hosted AI Chatbot Saves 90k per year and securely never leaves your tenant
Book a FREE 15-minute strategy call to unlock this solution at your company

The Problem
A mid‑market retailer needed a secure, self‑hosted AI chatbot to deflect routine support tickets, speed internal knowledge lookup, and provide 24/7 customer assistance without exposing PII to third parties. We designed and launched a .NET‑based solution on Azure using Semantic Kernel (SK) for orchestration and a Retrieval‑Augmented Generation (RAG) pipeline backed by Azure AI Search. The chatbot now resolves ~62% of Tier‑1 inquiries autonomously, reduces average handle time (AHT) by 34%, and pays for itself in < 90 days.
Headline outcomes (first 60 days):
-
62% Tier‑1 auto‑resolution; CSAT +11 pts; AHT ↓ 34%
-
Containment rate: 68% (sessions never escalated to agent)
-
SLA: P95 latency 1.2s (search) / 2.8s (answer)
-
Cost to serve: ~$0.018 per resolved session (incl. compute + inference)
-
Savings: Migrating from a vendor SaaS chatbot at ~$8k/month to self‑hosted Azure AI/OpenAI/OpenRouter (~$500/month) delivers $7,500 monthly savings (~$90k annually)
Why ask a third party to guard your secrets? We brought the guardhouse inside.
— T
(Fractional CTO Leadership)
The Approach
As the Fractional CTO, I led the company through a Self‑hosted RAG chatbot, orchestrated by Semantic Kernel, exposed via web and Teams. Content indexed from SOPs, product catalog, order status API, and knowledge base articles.
Key Technologies:
-
.NET 8 (Minimal APIs + ASP.NET Core)
-
Semantic Kernel (skills/functions, planners, memory, pipelines)
-
Azure AI Search (vector + BM25 hybrid, semantic ranker)
-
Model hosting: Azure OpenAI (for prod) and Azure ML/AKS w/ open‑source LLM for fallback/offline testing
-
Azure App Service (API + web), Azure Functions (ETL/indexers)
-
Azure API Management (APIM) for throttling, auth, and versioning
-
Azure Cosmos DB (chat transcripts + preferences) & Blob Storage (documents)
-
Azure Monitor / App Insights (observability + prompt/response traces)
-
Microsoft Entra ID (AuthN/Z, Managed Identity), Private Endpoints
-
GitHub Actions (CI/CD), Bicep/Terraform for IaC
Reference Architecture (high level)
RAG & Orchestration Flow
-
User query → Validate + classify intent (support vs. sales vs. internal)
-
Grounding: Query Azure AI Search (hybrid: BM25 + vectors). Use filters for locale, product, doc freshness.
-
Synthesis: SK composes system + user + retrieved context; applies policies (disclaimers, citations).
-
Tool use: If intent requires live data (order status, inventory), SK calls typed .NET skills.
-
Answer: Return concise response with citations + next‑best actions. Log prompts/outputs with PII hashing.
-
Escalation: If confidence < threshold or user requests human, handoff via webhook to CRM/agent.
Security & Compliance
-
All services in private VNets + Private Endpoints; egress restricted
-
Entra ID auth, per‑role policies (customer vs. agent vs. admin)
-
Managed Identity for data plane access (Search, Cosmos, Storage)
-
Prompt, input, and output logs stored with PII tokenization (SHA‑256 + salt)
-
Content filters + profanity checks; jailbreak/hallucination detectors (confidence + guardrails)
-
Data residency: single Azure region with zone redundancy; optional geo‑replication
The solution
Faster, cheaper, and more secure.
— T