Resume

πŸ‘‹ Hello, I’m

Rohan Patil

Building AI Systems
That Scale πŸš€

AI/ML Engineer with experience at Perplexity and Amazon, building production-grade LLM pipelines, RAG systems, and distributed ML infrastructure for real-world high-scale environments.

Resume

5+ yrs

Experience

25%

Latency Improvement

1M+

Requests handled

Selected Projects

Adaptive RAG Chatbot

Dynamic retrieval + query routing system for grounded LLM responses.

LangGraphFastAPIQdrantOpenAI

↓ Latency 40% | ↑ Accuracy 25%

Resume β†’ Job Matching Agent

Agentic AI system for matching resumes to job descriptions using embeddings.

LangGraphEmbeddingsVector DB

↑ Match Precision 30%

Real-Time ML Pipeline

Kafka + Spark streaming pipeline for low-latency feature engineering.

KafkaSparkAWS

1M+ events/day processed

LLM Evaluation Dashboard

Tracking model performance, latency, and hallucination metrics.

PythonLLMsEvaluation

Improved evaluation accuracy

Vector Search Engine

Hybrid FAISS + Redis retrieval system for semantic search.

FAISSRedisEmbeddings

Recall ↑ 20%

Inference Optimization System

Optimized GPU inference using batching and Triton.

TritonGPUKubernetes

Throughput ↑ 25%

Experience

Perplexity

AI/ML Engineer β€” Perplexity

June 2024 – Present Β· San Francisco, CA

  • β€’ Architected RAG pipelines integrating vector search + web indexing.
  • β€’ Built FAISS + Redis hybrid retrieval improving recall/precision tradeoff.
  • β€’ Optimized Triton GPU inference β†’ +25% throughput.
  • β€’ Designed LLM routing (on-device + cloud) for sub-second latency.
  • β€’ Improved factual consistency via ranking + citation pipelines.
  • β€’ Built evaluation systems tracking latency, accuracy, UX metrics.
  • β€’ Led 0β†’1 agentic AI features β†’ +18% engagement.
Amazon

AI/ML Engineer β€” Amazon

Oct 2019 – June 2023 Β· India

  • β€’ Built batch + streaming pipelines using AWS, Spark, Kafka.
  • β€’ Designed feature systems β†’ +30% faster data access.
  • β€’ Prevented training-serving skew in real-time ML systems.
  • β€’ Built Kafka + Spark streaming pipelines for low latency updates.
  • β€’ Orchestrated ML workflows with Airflow + SageMaker.
  • β€’ Built drift detection + monitoring datasets.
  • β€’ Reduced infra cost by ~15% via optimization.

Tools & Technologies

AI / ML

PyTorchTensorFlowScikit-learnNumPyPandasTime SeriesModel Evaluation

LLM / GenAI

RAGLangChainLangGraphOpenAI APIsEmbeddingsPrompt EngineeringSemantic SearchLLM Evaluation

Data Engineering

Apache SparkKafkaAirflowETL PipelinesParquetStreaming SystemsFeature Engineering

Infrastructure & Cloud

AWSKubernetesDockerSageMakerRedshiftGPU InferenceTriton ServerCI/CDPrometheusGrafana

Let’s Build Something Great πŸš€

I’m open to AI/ML Engineering roles, collaborations, and interesting problems. Feel free to reach out β€” I’d love to connect.

Β© 2026 Rohan Patil β€” Built with Next.js πŸš€