Case study

Self‑Hosted AI Chatbot Saves 90k per year and securely never leaves your tenant

Book a FREE 15-minute strategy call to unlock this solution at your company

The Problem

A mid‑market retailer needed a secure, self‑hosted AI chatbot to deflect routine support tickets, speed internal knowledge lookup, and provide 24/7 customer assistance without exposing PII to third parties. We designed and launched a .NET‑based solution on Azure using Semantic Kernel (SK) for orchestration and a Retrieval‑Augmented Generation (RAG) pipeline backed by Azure AI Search. The chatbot now resolves ~62% of Tier‑1 inquiries autonomously, reduces average handle time (AHT) by 34%, and pays for itself in < 90 days.

Headline outcomes (first 60 days):

62% Tier‑1 auto‑resolution; CSAT +11 pts; AHT ↓ 34%
Containment rate: 68% (sessions never escalated to agent)
SLA: P95 latency 1.2s (search) / 2.8s (answer)
Cost to serve: ~$0.018 per resolved session (incl. compute + inference)
Savings: Migrating from a vendor SaaS chatbot at ~$8k/month to self‑hosted Azure AI/OpenAI/OpenRouter (~$500/month) delivers $7,500 monthly savings (~$90k annually)

Why ask a third party to guard your secrets? We brought the guardhouse inside.

— T

(Fractional CTO Leadership)

The Approach

As the Fractional CTO, I led the company through a Self‑hosted RAG chatbot, orchestrated by Semantic Kernel, exposed via web and Teams. Content indexed from SOPs, product catalog, order status API, and knowledge base articles.

Key Technologies:

.NET 8 (Minimal APIs + ASP.NET Core)
Semantic Kernel (skills/functions, planners, memory, pipelines)
Azure AI Search (vector + BM25 hybrid, semantic ranker)
Model hosting: Azure OpenAI (for prod) and Azure ML/AKS w/ open‑source LLM for fallback/offline testing
Azure App Service (API + web), Azure Functions (ETL/indexers)
Azure API Management (APIM) for throttling, auth, and versioning
Azure Cosmos DB (chat transcripts + preferences) & Blob Storage (documents)
Azure Monitor / App Insights (observability + prompt/response traces)
Microsoft Entra ID (AuthN/Z, Managed Identity), Private Endpoints
GitHub Actions (CI/CD), Bicep/Terraform for IaC

Reference Architecture (high level)

User/Web/Teams → Frontend (ASP.NET Core Blazor/React)

↓

API Gateway (APIM)

↓

Chat Orchestrator (.NET + SK)

┌──────────────┬──────────────┐

│ Tools/Skills │ Memory/RAG │

│ (order API, │ Azure AI │

│ returns, ERP)│ Search (vec) │

└──────┬───────┴─────┬────────┘

│ │

Azure OpenAI Azure ML/AKS (OSS fallback)

│ │

App Insights (traces, tokens, safety)

RAG & Orchestration Flow

User query → Validate + classify intent (support vs. sales vs. internal)
Grounding: Query Azure AI Search (hybrid: BM25 + vectors). Use filters for locale, product, doc freshness.
Synthesis: SK composes system + user + retrieved context; applies policies (disclaimers, citations).
Tool use: If intent requires live data (order status, inventory), SK calls typed .NET skills.
Answer: Return concise response with citations + next‑best actions. Log prompts/outputs with PII hashing.
Escalation: If confidence < threshold or user requests human, handoff via webhook to CRM/agent.

Security & Compliance

All services in private VNets + Private Endpoints; egress restricted
Entra ID auth, per‑role policies (customer vs. agent vs. admin)
Managed Identity for data plane access (Search, Cosmos, Storage)
Prompt, input, and output logs stored with PII tokenization (SHA‑256 + salt)
Content filters + profanity checks; jailbreak/hallucination detectors (confidence + guardrails)
Data residency: single Azure region with zone redundancy; optional geo‑replication

The solution

Faster, cheaper, and more secure.

— T