DEO SHANKAR
Multi Cloud Architect • Level 9 Clearance
SYSTEMS OPTIMAL
AI AGENTS: 12 ACTIVE

NEURAL COMMAND CENTER

Real-time AI Systems Monitoring & Control

2M+
Documents Processed
13Y
Experience
500+
Services Migrated
99.9%
Uptime

🚀 CURRENT IMPACT

Live systems processing millions of transactions daily

Agentic RAG System

DEPLOYED

Enterprise-scale RAG with LangGraph + MCP Protocol. Self-correcting agents achieve 87% F1 score across 2M+ documents.

87%
F1 Score
60%
Less Hallucinations

Payment Processing

SCALING

Distributed system handling 5M+ daily transactions with 99.9% uptime. Event-driven architecture with Kafka.

5M+
Daily TXN
50ms
p99 Latency

💼 EXPERIENCE MATRIX

13 years of building systems that scale and survive

Solutions Architect

Tiger Analytics | 2023 - Present

CURRENT

Built agentic RAG with LangGraph + MCP + Bedrock for 2M documents. Autonomous agents self-correct to achieve 87% F1 score.

87% F1 Score 60% less hallucinations 2M+ documents

Senior Cloud Engineer

Oracle | 2020 - 2022

Migrated 50TB Oracle DB to OCI using GoldenGate with zero downtime. Built facial recognition pipeline on OKE with 4x A100s for 500K images.

500+ services 30% cost reduction Zero downtime

Technical Lead

Paytm | 2019 - 2020

Crisis: Black Friday outage (8x traffic spike). Implemented ProxySQL for connection multiplexing + circuit breakers in 2 hours. Reduced cart abandonment 15% ($2M ARR).

5M+ daily TXN $2M ARR impact 75% latency reduction

⚡ CORE EXPERTISE

Multi-cloud architecture & cutting-edge AI systems

☁️ Cloud Computing

  • • AWS (Certified Solutions Architect)
  • • Google Cloud Platform (GCP)
  • • Oracle Cloud Infrastructure (OCI)
  • • Multi-Cloud Architecture
  • • Cloud Cost Optimization

🤖 AI/GenAI Systems

  • • Generative AI (GenAI)
  • • Agentic AI Systems
  • • RAG Architecture
  • • ML/Transformers
  • • LangGraph & LangChain

🏗️ Architecture & Systems

  • • Distributed Systems
  • • Event-Driven Architecture
  • • Microservices
  • • System Design
  • • High Availability & Scalability

🚀 DevOps & Infrastructure

  • • Kubernetes
  • • Infrastructure as Code (Terraform)
  • • CI/CD Pipelines
  • • Docker & Containerization
  • • GitOps & Automation

🔬 RESEARCH & INNOVATION

Pushing boundaries in AI agent interoperability

Semantic Protocol Layer Translation for AI Agent Interoperability

IN REVIEW - FGCS

Novel approach to bidirectional protocol translation enabling seamless communication between heterogeneous AI agents. Production-tested with real-world results achieving 94% cross-protocol accuracy.

// Key Contributions
• Semantic layer for protocol abstraction
• Bidirectional translation engine
• 94% accuracy in cross-protocol communication
• Production-tested with 30+ AI agents
🔒

secscan-cli

Security Scanner Tool

Multi-language security vulnerability scanner with support for Python, JavaScript, Java, and more.

1.4K+ downloads v1.0.3
🎓

MS in Data Science

IIM & IIT Indore

2022-2024 • Applied ML, Deep Learning, Statistical Analysis

Key Projects:
• Predictive Analytics for Healthcare
• NLP for Document Classification
• Time Series Forecasting

🔒 SECSCAN-CLI LIVE DEMO

Interactive security vulnerability scanner - Try it yourself!

📦 Quick Install

$ pip install secscan-cli

🖥️ Try It Live

secscan-cli v1.0.0
$ secscan /path/to/project
Scanning for vulnerabilities...
🌐
Multi-Language
Python, JS, Go, Java
🚀
CI/CD Ready
GitHub Actions, Jenkins
📊
Multiple Formats
JSON, CSV, SARIF
OSV.dev API
Real-time updates

📝 Common Usage Examples

# Basic scan
$ secscan
# JSON output for CI/CD
$ secscan -f json --ci --fail-on high
# Scan with auto-fix
$ secscan --fix --backup
# Generate SARIF report for GitHub
$ secscan -f sarif -o results.sarif

🧠 ENGINEERING PHILOSOPHY

Hard-earned principles from production battles

Boring > Clever

I've seen clever die in production. GraphQL federation across 8 services? Reverted to REST after 3 months. That Docker optimization algorithm that found 100% optimal solutions? Chose heuristics: 95% optimal in 30 seconds vs 100% in 5 minutes.

"How does this fail?" > "How does this work?"

Every architecture review starts with failure modes. No CDC in my ETL design crashed production processing 500M CDRs. Now I always ask: what happens when this breaks at 10x scale?

📡 LIVE SYSTEM ACTIVITY

Real-time updates from production systems

INITIATE CONTACT

SYSTEM ARCHITECTURES

Agentic RAG Platform with LangGraph + MCP + Bedrock

User Query Layer L1 Support Agents API Gateway Rate Limiting | Auth | Load Balancing LangGraph Orchestrator Self-Correction Loops | Multi-Agent Coordination Extract Validate Correct Response MCP Protocol Server 30+ Specialized AI Agents Tool Registry | Message Bus 60% Hallucination Reduction AWS Bedrock Claude 3 | Titan | Llama Multi-Model Strategy 87% F1 Score OpenSearch Vector DB 2M+ Documents Semantic Search 100ms P99 Latency Data Persistence Layer S3 Data Lake Policy Docs | Claims DynamoDB Session State | Cache Aurora PostgreSQL Metadata | Analytics CloudWatch Metrics | Alarms Performance Metrics: • Response Time: 3 min (was 15 min) • Accuracy: 87% F1 Score • Hallucination Rate: 7.2% (was 18%) • Cost Savings: $2M/year • Documents Processed: 2M+ • Agent Collaboration: 30+ agents
MCP

30+ Agents

AWS

Bedrock

Vector

2M+ Docs

87%
F1 Score
60%↓
Hallucinations
3min
Response

High-Scale Payment Platform with Event Sourcing

5M+ Transactions Per Second ALB Cluster API Gateway ProxySQL Event Sourcing Layer Kinesis Shard 1 1M TPS Kinesis Shard 2 1M TPS Kinesis Shard 3 1M TPS Kinesis Shard 4 1M TPS Kinesis Shard 5 1M TPS Lambda Event Processors Auto-scaling: 100-10,000 concurrent 15ms avg processing time DynamoDB Event Store (Sharded) Table Shard 0 4K WCU 10K RCU 500K TPS Table Shard 1 4K WCU 10K RCU 500K TPS ... Table Shard 9 4K WCU 10K RCU 500K TPS Circuit Breaker Hystrix Fallback Logic 99.9% uptime RDS Read Replicas Async Projection 100ms eventual consistency ElastiCache Redis Hot Data Cache 1ms latency SageMaker Endpoint Fraud Detection 89% catch rate
Kafka
Event Stream
Redis
Cache
PostgreSQL
CQRS
5M+
Daily TXN
50ms
p99 Latency
99.9%
Uptime

Docker Image Optimization Platform

Dockerfiles 100+ Teams Avg: 3.5GB Heuristic Optimization Engine Merge RUN Commands -40% layers Slim Base Images -60% size Multi-Stage Builds -70% size Optimized Images 95% reduction Avg: 350MB Before vs After Comparison BEFORE Optimization • Image Size: 3.5GB • Build Time: 25 min • Deploy Time: 15 min • ECR Cost: $500K/month • Layers: 47 average • Cache Hit: 20% • Dev Feedback: 5 min AFTER Optimization • Image Size: 350MB • Build Time: 5 min • Deploy Time: 3 min • ECR Cost: $200K/month • Layers: 12 average • Cache Hit: 85% • Dev Feedback: 30 sec Impact Metrics 90% Size Reduction $300K/mo Saved 80% Faster Deploys 95% Adoption
Before
• Image: 1.2GB
• Build: 15min
• Layers: 47
After
• Image: 67MB
• Build: 3min
• Layers: 12
94%
Size Reduction
5x
Faster Builds

Model Context Protocol (MCP) Integration

MCP Protocol Message Bus Tool Registry Data Agents ETL SQL NoSQL Stream Batch ML Agents Train Infer Eval NLP Vision Code Agents Review Gen Test Debug Doc Ops Agents Deploy Monitor Alert Scale Heal Hallucination: -60% Tool Reuse: 80% Dev Speed: 3x Bidirectional Protocol Translation | Event-Driven Communication | Tool Discovery

MCP-Based Multi-Agent Orchestration

MCP
Extract Agent
Validate Agent
Transform Agent
Analyze Agent
Report Agent
Monitor Agent
Protocol Translation
94% Cross-Protocol Accuracy

PEER ENDORSEMENTS

"I have known Deo for last 2 years since the time he has joined Oracle. He has excellent technical knowledge around cloud solutions right from automation, IaaS, DevOps to Networking etc. He articulates his thoughts very well while delivering a session or having a deep technical conversation with customers or colleagues. His positive attitude and never say 'No' even in a toughest of situations is an asset for any organization. Above all a great human being."

Nitin Kaushik
Director - Cloud Solution Engineering, Oracle
Managed Deo directly at Oracle
CLOUD LEADERSHIP
View All Recommendations on LinkedIn →

CONFIDENCE vs COMPETENCE TRAJECTORY

13 Years: From "I know everything" to "I know what I don't know"

Mount Stupid
Valley of Despair
Slope of Enlightenment
Plateau of Sustainability
Lesson 1
"No CDC = Production will crash"
Lesson 2
"Scale breaks everything"
Lesson 3
"Boring > Clever"

FAILURE LOGS (LESSONS LEARNED)

INCIDENT: BLACK_FRIDAY_2019
Timestamp: 2019-11-29 10:00:00 IST
# System Collapse Detected
Traffic: 8x normal (1M concurrent users)
Connection Pool: EXHAUSTED (100/100)
Response Time: TIMEOUT (>30s)
Revenue Loss Rate: $10K/minute

# Root Cause Analysis
- Fixed connection pool (hardcoded 100)
- No circuit breakers implemented
- Load testing only covered 2x traffic
- Single point of failure: main DB

# Emergency Fix (2 hours)
1. Deployed ProxySQL for connection multiplexing
2. Implemented Hystrix circuit breakers
3. Scaled read replicas from 1 to 5
4. Added aggressive caching layer

# Lessons Encoded
- Always design for 10x predicted load
- Circuit breakers are NOT optional
- Connection pools must be dynamic
- Boring solutions > Clever solutions
- Test failure modes, not success paths

Status: RESOLVED | Revenue Saved: $1M+
                            

KNOWLEDGE STREAMS

📝 LATEST POST

Why Your RAG System Hallucinates

Deep dive into semantic drift, context window limitations, and how self-correcting agents reduced hallucinations by 60%...

Read more →
🔧 TECHNICAL DEEP DIVE

Circuit Breakers in Production

Real-world implementation of Hystrix patterns that saved $2M during Black Friday. Code samples included...

Read more →
🚀 CASE STUDY

Zero-Downtime 50TB Migration

How we moved Oracle's largest database to OCI using GoldenGate while serving 500+ microservices...

Read more →
1