Letting Data Speak, AI Act!

Case Study

Real-Time AI Chatbot Platform’s Lambda to ECS Migration

About the Client

A technology startup in the AI/ML industry developing an intelligent conversational AI platform. The platform provides real-time group chat collaboration features powered by machine learning classification models. This case study demonstrates how serverless Lambda infrastructure can be successfully migrated to container-based ECS Fargate to resolve critical performance issues affecting ML-intensive workloads.

Untitled design - 2024-09-27T104509.589.png

Challenge

The client’s Lambda-based backend faced three critical architectural failures threatening platform viability:

Memory Constraint Crisis: The Lambda function consistently hits 100% memory utilization at the 2048 MB limit when loading the ML classification model. This caused frequent out-of-memory crashes, function throttling, and complete service unavailability during peak periods. The ephemeral nature of Lambda meant the model had to be reloaded from S3 on every cold start.
Severe Cold Start Latency: Users experienced 7-8 second delays on initial requests as Lambda reloaded the entire ML model from S3 into memory. For a real-time chat application where sub-second responsiveness is expected, this created an unacceptable user experience driving customer complaints and churn.
Real-Time Communication Architecture Mismatch: The group chat feature was completely non-functional. The frontend used a Socket.IO client while the backend relied on API Gateway WebSocket, creating fundamental protocol incompatibility. WebSocket connections failed with timeout errors within seconds, and Lambda’s stateless execution model proved incompatible with WebSocket’s requirement for persistent long-lived connections. When User A sent messages, User B never received them due to lack of room-based broadcast logic.

Business Impact: Without resolution, the client faced customer churn,scalability issues, potential platform shutdown due to lack of core functionality. A structured POC was mandated to validate whether ECS Fargate could solve these issues before committing to full production migration.

Untitled design - 2024-09-27T105551.128.png

Key Results

Eliminated 100% of Cold Starts: Migrated to persistent ECS Fargate containers that maintain the 500 MB RoBERTa model in memory continuously, completely removing the 7-8 second initialization delay and providing instant response to all user requests.
Reduced Average Response Time by 94%: Achieved sub-1-second response times (964 ms for login, 1,045 ms for ML classification, 1,204 ms for chat) for 1,000 concurrent users through JMeter load testing, down from 17+ seconds in the Lambda architecture.
Validated Linear Horizontal Scalability: Demonstrated near-perfect scaling efficiency where increasing from 1 to 6 ECS containers reduced response times proportionally (17.6s → 1.0s for 500 users) while maintaining stable 73% CPU utilization with 27% headroom for traffic spikes.

Solution

The team executed a structured 4-week POC following agile methodology with three specialized roles (Technical Lead, Cloud Engineer, Backend Engineer)

Architecture Design and Infrastructure Foundation - Analyzed the existing Lambda function to document the existing architecture and memory bottlenecks. Designed ECS Fargate architecture with direct Application Load Balancer routing, rejecting the alternative API Gateway + VPC Link + NLB approach due to 10-20ms higher latency and increased operational cost. Created comprehensive Terraform Infrastructure-as-Code templates for ECS cluster configuration , ALB , security groups, IAM roles, and auto-scaling policies.

Containerization - Developed optimized Dockerfiles using python:3.10-slim base image with multi-stage builds, baking the 500 MB RoBERTa model directly into container images to eliminate runtime S3 downloads. Refactored Lambda event handlers to Flask REST API endpoints and replaced API Gateway WebSocket with Flask-SocketIO using eventlet async workers. Pushed all versioned container images to Amazon ECR.
Performance Test Suite Development - Provisioned dedicated JMeter infrastructure and created comprehensive test plans simulating complete user journeys across login, classification, and chat APIs. Developed automated test execution framework with configurable thread groups for 100, 500, and 1,000 concurrent users. Created test data generation scripts producing realistic message payloads and user credentials for parameterized testing with CSV data sets.
Optimization, Validation, and Documentation - Executed systematic performance testing across 16 configurations varying ECS container count (1-6), Gunicorn workers (1-4), and threads per worker (4-16). Identified CPU as the primary bottleneck for ML inference and discovered Python’s Global Interpreter Lock (GIL) prevents thread-based parallelism, requiring multi-process workers. Determined optimal configuration: Workers = CPU cores with 8 threads per worker, providing best throughput/resource balance. Validated linear scalability with 1,000 users achieving 1.1s response time at 73% CPU. Delivered complete Terraform templates, technical documentation with architecture diagrams, and performance analysis reports.
Critical Bug Resolution - Proactively identified and fixed few production-blocking bugs during POC validation: LocalStorage key mismatch breaking “My Chats” navigation, message alignment issues requiring userId in WebSocket payloads, JWT error handling ,, missing UI icons (notification bell, user placeholder), sender name display in group chat, and WebSocket authentication flow corrections.
POC Deliverables Achieved - Container Infrastructure (1 Lambda function containerized and pushed to ECR), AWS Infrastructure as Code (complete Terraform templates), Performance Test Suite (JMeter scripts, test data generation, automated execution framework), Performance Test Results (baseline and load test reports with response time analysis), and comprehensive Documentation (technical guides, architecture diagrams, knowledge transfer sessions).

Technologies Used

Amazon ECS Fargate
Flask-SocketIO
Gunicorn with Eventlet Workers
Docker & Amazon ECR
Application Load Balancer (ALB)
Terraform
Apache JMeter
AWS Cognito
Amazon DynamoDB
Transformers (Hugging Face) & PyTorch
Python 3.10

Other Case Study Items

Implementation of Cloud-Agnostic Smart Meter Billing Solution

A leading Indian smart meter provider partnered with JashDS to transform their AWS-locked system into a cloud-agnostic solution built on Kubernetes, achieving an 80% reduction in processing time for managing millions of consumer accounts. The new system revolutionized smart meter management through the implementation of FastAPI and TimescaleDB, enabling efficient charge calculations, automated connection management, and comprehensive usage tracking for 6 million consumers.

Modernizing Data Ingestion for Green Energy AI

JashDS modernized and automated data ingestion for a green energy AI solutions provider by developing a pipeline_builder library, reducing pipeline creation time by 40%, and improving data accessibility for 40+ utility sources.

Revolutionizing Data Infrastructure for AI-Driven Green Energy Solutions

JashDS revolutionized a green energy tech company's data infrastructure by implementing a scalable Matillion-based ETL solution and automated CI/CD processes, resulting in 2-3x faster client onboarding and a 35% reduction in Google Cloud costs. The comprehensive solution included reusable components, optimized SQL queries, and efficient data aggregation techniques, enhancing the client's ability to process vast amounts of utility data from 40+ companies and support their AI-driven green energy initiatives.

Data Science

Data Engineering

AI and Agentic AI