Professional Summary
Data Scientist with 2 years of experience deploying LLM/NLP apps, building vector-search recommendations,
and partnering with stakeholders to deliver RAG, AI agents, and AWS ML solutions reducing churn by 15%.
Key Achievements
500+
New User Acquisitions
20%
Engagement Boost
30%
Turnaround Time Reduction
95%
Fraud Detection Recall
Technical Skills
Languages
SQL
Python
PySpark
Pandas
NumPy
Scikit-learn
TensorFlow
PyTorch
Tools & Technologies
Hugging Face
Git
Tableau
Power BI
MS Excel
Cloud & Deployment
AWS EC2
AWS ECS
AWS Lambda
AWS SageMaker
AWS Bedrock
MS Azure
Streamlit
Docker
CI/CD
MLOps
Machine Learning
Regression
Classification
Clustering
Deep Neural Networks
Random Forest
XGBoost
NLP
Statistical & Analytical
Predictive Modelling
Statistical Analysis
Time Series Forecasting
A/B Testing
Data Mining
Data Engineering
Spark
Airflow
Kafka
MySQL
PostgreSQL
MongoDB
ETL
GenAI
MCP
LLM
LLM Fine Tuning
LORA
QLORA
PEFT
AI Agents
Quantization
LangChain
LangGraph
CrewAI
Other Skills
Business Analysis
Root Cause Analysis
Issue Tree Framework
Work Experience
Data Scientist
GradMeet LLC | Oct 2024 – May 2025
- Launched a Vector Search driven recommendation engine on AWS EC2, driving 500+ new user acquisitions and boosting engagement by 20% through data-driven personalization
- Boosted satisfaction for 100+ users by integrating NLP-driven sentiment analysis and regex into feedback processing, enhancing personalized experiences
- Built an internal RAG system to organize documents and enable fast, accurate retrieval
- Partnered with Marketing and Finance teams to validate growth strategies using A/B testing and root cause analysis, directly supporting business objectives
- Productionized analytics with AWS/Docker/CI/CD, improving reliability and time-to-insight for iterative experiments
Data Scientist
Indiana University Bloomington | Sep 2023 – Aug 2024
- Enhanced spatial genomic data, increasing cluster quality by 18% (Silhouette) and 68% (Davies-Bouldin) using advanced statistical methods
- Engineered a cutting-edge Graph Neural Network that integrated 6,000 genes and 23,000 drug–cell line pairs via Feed Forward Network, slashing prediction error (RMSE) by 7% and boosting model explainability (R-squared) by 2%
- Streamlined validation workflows for 15,000+ RNA-seq samples, reducing manual review time by 15% and accelerating three concurrent lab research projects
- Translated complex 2D gene patterns into actionable insights and presented key findings to stakeholders, supporting data-driven decision-making in biomedical research
Associate Instructor
Indiana University Bloomington | Jan 2023 – May 2023
- Mentored 30+ students in machine learning workshops, resulting in a 2% average performance improvement across participants
Research Data Analyst
Biostatistics Consulting Center, Indiana University Bloomington | Aug 2022 – Dec 2022
- Spearheaded analysis of 35,000+ COVID-19 PCR test records, pinpointing and remedying process bottlenecks to slash lab turnaround time by 30%
- Built Power BI dashboards for real-time test tracking, cutting testing backlogs by 40% and saving over 10 staff hours weekly with Python automation
- Ensured 99% diagnostic accuracy by rigorously validating laboratory processes with ANOVA and t-tests, ensuring reliable results for critical healthcare decisions
Education
MS in Data Science
Indiana University, Bloomington, IN, USA | May 2023
Post Graduate Diploma in Data Science
IIIT Bangalore, India | Aug 2021
Bachelor's in Computer Science
Symbiosis International University, Pune, India | May 2019
Featured Projects
ArXiv Multi-Agent Research Assistant
AI-powered system for researching top papers using CrewAI and LangChain
CrewAI
LangChain
Python
Real-time Sentiment Analysis Pipeline
Streaming data pipeline for Reddit sentiment analysis using Kafka and Spark
Kafka
Spark
NLP
Credit Card Fraud Detection
XGBoost model deployed on Azure with 95% accuracy
XGBoost
Azure
ML
RAG Chat Application
PDF chat application using locally hosted LLMs and ChromaDB
RAG
ChromaDB
Ollama
Certifications & Publications