π Hello, Iβm
Rohan Patil
Building AI Systems
That Scale π
AI/ML Engineer with experience at Perplexity and Amazon, building production-grade LLM pipelines, RAG systems, and distributed ML infrastructure for real-world high-scale environments.
5+ yrs
Experience
25%
Latency Improvement
1M+
Requests handled
Selected Projects
Adaptive RAG Chatbot
Dynamic retrieval + query routing system for grounded LLM responses.
β Latency 40% | β Accuracy 25%
Resume β Job Matching Agent
Agentic AI system for matching resumes to job descriptions using embeddings.
β Match Precision 30%
Real-Time ML Pipeline
Kafka + Spark streaming pipeline for low-latency feature engineering.
1M+ events/day processed
LLM Evaluation Dashboard
Tracking model performance, latency, and hallucination metrics.
Improved evaluation accuracy
Vector Search Engine
Hybrid FAISS + Redis retrieval system for semantic search.
Recall β 20%
Inference Optimization System
Optimized GPU inference using batching and Triton.
Throughput β 25%
Experience

AI/ML Engineer β Perplexity
June 2024 β Present Β· San Francisco, CA
- β’ Architected RAG pipelines integrating vector search + web indexing.
- β’ Built FAISS + Redis hybrid retrieval improving recall/precision tradeoff.
- β’ Optimized Triton GPU inference β +25% throughput.
- β’ Designed LLM routing (on-device + cloud) for sub-second latency.
- β’ Improved factual consistency via ranking + citation pipelines.
- β’ Built evaluation systems tracking latency, accuracy, UX metrics.
- β’ Led 0β1 agentic AI features β +18% engagement.

AI/ML Engineer β Amazon
Oct 2019 β June 2023 Β· India
- β’ Built batch + streaming pipelines using AWS, Spark, Kafka.
- β’ Designed feature systems β +30% faster data access.
- β’ Prevented training-serving skew in real-time ML systems.
- β’ Built Kafka + Spark streaming pipelines for low latency updates.
- β’ Orchestrated ML workflows with Airflow + SageMaker.
- β’ Built drift detection + monitoring datasets.
- β’ Reduced infra cost by ~15% via optimization.