Skip to main content Skip to footer

Case study

Self‑Hosted AI Chatbot Saves 90k per year and securely never leaves your tenant



The Problem

A mid‑market retailer needed a secure, self‑hosted AI chatbot to deflect routine support tickets, speed internal knowledge lookup, and provide 24/7 customer assistance without exposing PII to third parties. We designed and launched a .NET‑based solution on Azure using Semantic Kernel (SK) for orchestration and a Retrieval‑Augmented Generation (RAG) pipeline backed by Azure AI Search. The chatbot now resolves ~62% of Tier‑1 inquiries autonomously, reduces average handle time (AHT) by 34%, and pays for itself in < 90 days.

Headline outcomes (first 60 days):

  • 62% Tier‑1 auto‑resolution; CSAT +11 pts; AHT ↓ 34%

  • Containment rate: 68% (sessions never escalated to agent)

  • SLA: P95 latency 1.2s (search) / 2.8s (answer)

  • Cost to serve: ~$0.018 per resolved session (incl. compute + inference)

  • Savings: Migrating from a vendor SaaS chatbot at ~$8k/month to self‑hosted Azure AI/OpenAI/OpenRouter (~$500/month) delivers $7,500 monthly savings (~$90k annually)


Why ask a third party to guard your secrets? We brought the guardhouse inside. 

— T 

(Fractional CTO Leadership)

The Approach

As the Fractional CTO, I led the company through a Self‑hosted RAG chatbot, orchestrated by Semantic Kernel, exposed via web and Teams. Content indexed from SOPs, product catalog, order status API, and knowledge base articles.

Key Technologies:

  • .NET 8 (Minimal APIs + ASP.NET Core)

  • Semantic Kernel (skills/functions, planners, memory, pipelines)

  • Azure AI Search (vector + BM25 hybrid, semantic ranker)

  • Model hosting: Azure OpenAI (for prod) and Azure ML/AKS w/ open‑source LLM for fallback/offline testing

  • Azure App Service (API + web), Azure Functions (ETL/indexers)

  • Azure API Management (APIM) for throttling, auth, and versioning

  • Azure Cosmos DB (chat transcripts + preferences) & Blob Storage (documents)

  • Azure Monitor / App Insights (observability + prompt/response traces)

  • Microsoft Entra ID (AuthN/Z, Managed Identity), Private Endpoints

  • GitHub Actions (CI/CD), Bicep/Terraform for IaC

Reference Architecture (high level)

User/Web/Teams → Frontend (ASP.NET Core Blazor/React)
API Gateway (APIM)
Chat Orchestrator (.NET + SK)
┌──────────────┬──────────────┐
│ Tools/Skills │ Memory/RAG │
│ (order API, │ Azure AI │
│ returns, ERP)│ Search (vec) │
└──────┬───────┴─────┬────────┘
│ │
Azure OpenAI Azure ML/AKS (OSS fallback)
│ │
App Insights (traces, tokens, safety)

RAG & Orchestration Flow

  1. User query → Validate + classify intent (support vs. sales vs. internal)

  2. Grounding: Query Azure AI Search (hybrid: BM25 + vectors). Use filters for locale, product, doc freshness.

  3. Synthesis: SK composes system + user + retrieved context; applies policies (disclaimers, citations).

  4. Tool use: If intent requires live data (order status, inventory), SK calls typed .NET skills.

  5. Answer: Return concise response with citations + next‑best actions. Log prompts/outputs with PII hashing.

  6. Escalation: If confidence < threshold or user requests human, handoff via webhook to CRM/agent.

Security & Compliance

  • All services in private VNets + Private Endpoints; egress restricted

  • Entra ID auth, per‑role policies (customer vs. agent vs. admin)

  • Managed Identity for data plane access (Search, Cosmos, Storage)

  • Prompt, input, and output logs stored with PII tokenization (SHA‑256 + salt)

  • Content filters + profanity checks; jailbreak/hallucination detectors (confidence + guardrails)

  • Data residency: single Azure region with zone redundancy; optional geo‑replication

 

The solution


Faster, cheaper, and more secure. 

— T