Nitanshu Joshi

Data Scientist | Machine Learning Engineer | AI Engineer

Transforming business problems into AI-based applications

About Me

Nitanshu Joshi Profile Picture

Results-driven Data Scientist with 2 years experience building scalable ML solutions in industry and academia.

Deployed recommendation systems and predictive models to boost user growth and retention.

Skilled in real-time analytics, NLP, AWS, Azure; expert at transforming data into actionable insights.

Proficient in Python, SQL, Power BI, Tableau and advanced statistics, with a passion for continuous learning.

Tech Stack

Data Science & ML

TensorFlow Scikit-Learn PyTorch Python GitLab CI/CD

LLM & GenAI

Hugging Face Langchain Chroma Db Lang Graph Crew AI

Data Engineering

Postgre SQL Snowflake Amazon Redshift Apache Spark Kafka

Data Analysis & Business Analysis

Tableau Power BI Python Excel

MLOps & CI/CD

Docker Git MLflow Airflow

Cloud

Snowflake Streamlit AWS Azure

My Projects

Have a Glimpse

EcoAction AI
AI Agent Crew AI Open AI GPT 4 Python Streamlit

a hyper-personalized, AI-powered coaching platform that translates the abstract goal of "living sustainably" into a manageable, engaging, and highly effective personal journey.

Arxiv Research Assistant Multi-agent
AI Agent Crew AI LM Studio Local LLM Python Streamlit

Desgined a locally hosted LLM based AI-Agent for researching top research papers from Arxiv

Reddit Sentiment Analysis
Data Engineering NLP Classification Data Streaming Python Hugging Face

Designed a pipeline for streaming data from reddit, and then performed a sentiment analysis classification on it

RAG App for chat with PDF
Ollama Streamlit LLama 3.1 Langchain Python

Designed a RAG application for chatting with PDF using Locally hosted LLM using Ollama

Hospital CMS Rating Improvement
Business Analysis Regression Data Analysis Data Visualization Root Cause Analysis

Analyzed CMS Rating data to perfrom business analysis on the root causes of poor performance and CMS score for a hospital

Sales Forecasting for Global Mart
Data Analysis Time Series Forecasting Data Visualization Root Cause Analysis Python

Forecasted 6-months sales for a retail chain using smoothing and auto-regressive methods, by analyzing time series data

Credit Card Fraud Detection
Data Analysis Azure XGBoost End-to-End ML Root Cause Analysis Python

Designed and deployed an XGBoost ML Model on Azure, after performing Root cause analysis on the causes of possible Credit Card fraud.

Telecom Churn Analysis
Data Analysis Logistic Regression Churn Analysis Root Cause Analysis Python

Designed a ML based analytical technique to identify and predict the causes of churn in the telecom sector

Articles

Insights and knowledge sharing

10 Interesting MLOps Interview Questions for ML Engineers

10 Interesting MLOps Interview Questions for ML Engineers

Essential Questions to Prepare for Your Next Role in MLOps & ML Engineering

Read Article
Production ML Is Different: A Brutally Honest Guide to MLOps Pipelines

Production ML Is Different: A Brutally Honest Guide to MLOps Pipelines

The 11-Stage Pipeline That Separates Data Scientists From ML Engineers

Read Article
Why Your RAG Sucks (And How to Fix It)

Why Your RAG Sucks (And How to Fix It)

From Generic Chunking-based RAG to Category-Aware RAG: A Deep Dive into Production-Ready Document Processing

Read Article
Multi-agent-Framework for Arxiv Research Assistant

A Multi-Agent ArXiv Research Assistant

A Multi Agent Framework that helps with researching the top AI Papers in ArXiv

Read Article
The Future of LLMs in Business

My Experiments with Sleep

Time series analysis of sleep

Read Article

Active Projects

Ongoing projects and contributions

Professional Experience

A journey of growth and innovation

Analytics Engineer
SwissRe
November 2025 – Present | Bangalore, India
Python Palantir Foundry Document Search NLP MS Azure
Data Scientist
GradMeet LLC
October 2024 – May 2025 | Remote, USA
  • Launched a Vector Search driven recommendation engine on AWS EC2, driving 500+ new user acquisitions and boosting engagement by 20% through data-driven personalization
  • Boosted satisfaction for 100+ active users by integrating NLP-driven sentiment analysis and regex into feedback processing, enhancing personalized experiences
  • Built an internal RAG system to organize documents and enable fast, accurate retrieval
  • Partnered with Marketing and Finance teams to validate growth strategies using A/B testing and root cause analysis, directly supporting business objectives
  • Productionized analytics with AWS/Docker/CI/CD, improving reliability and time-to-insight for iterative experiments
Python Postgres SQL AWS EC2 Vector Search NLP RAG Docker CI/CD
Data Scientist
Indiana University Bloomington
September 2023 – August 2024 | Bloomington, IN, USA
  • Enhanced spatial genomic data, increasing cluster quality by 18% (Silhouette) and 68% (Davies-Bouldin) using advanced statistical methods
  • Engineered a cutting-edge Graph Neural Network that integrated 6,000 genes and 23,000 drug–cell line pairs via Feed Forward Network, slashing prediction error (RMSE) by 7% and boosting model explainability (R-squared) by 2%
  • Streamlined validation workflows for 15,000+ RNA-seq samples, reducing manual review time by 15% and accelerating three concurrent lab research projects
  • Translated complex 2D gene patterns into actionable insights and presented key findings to stakeholders, supporting data-driven decision-making in biomedical research
Python Graph Neural Networks PyTorch Bioinformatics Statistical Analysis
Associate Instructor
Indiana University Bloomington
January 2023 – May 2023 | Bloomington, IN, USA
  • Mentored Master's and PhD level 30 students in machine learning workshops, resulting in a 2% average performance improvement across participants
R Programming Python Statistical Analysis Machine Learning Bioinformatics Teaching
Research Data Analyst
Biostatistics Consulting Center, Indiana University Bloomington
August 2022 – December 2022 | Bloomington, IN, USA
  • Spearheaded analysis of 35,000+ COVID-19 PCR test records, pinpointing and remedying process bottlenecks to slash lab turnaround time by 30%
  • Built Power BI dashboards for real-time test tracking, cutting testing backlogs by 40% and saving over 10 staff hours weekly with Python automation
  • Ensured 99% diagnostic accuracy by rigorously validating laboratory processes with ANOVA and t-tests, ensuring reliable results for critical healthcare decisions
Python MS Excel Power BI Statistical Analysis ANOVA Clinical Data Analysis

Education

Academic journey that shaped my expertise

Master of Science in Data Science
Indiana University, Bloomington, IN, USA
Aug 2021 - May 2023
Comprehensive program focusing on advanced statistical modeling, machine learning algorithms, and big data analytics. Specialized in predictive modeling, deep learning, and data visualization techniques with hands-on experience in real-world datasets.
Key Coursework:
Advanced Machine Learning Deep Learning Statistical Modeling Big Data Analytics Data Mining Bayesian Methods Time Series Analysis Natural Language Processing
Post Graduate Diploma in Data Science
IIIT Bangalore, India
Aug 2020 - Aug 2021
Intensive industry-focused program designed to bridge the gap between academic knowledge and practical applications. Emphasized real-world business problems, industry case studies, and hands-on projects with leading technology companies.
Key Coursework:
Business Analytics Predictive Modeling Data Visualization Applied Statistics Market Research Operations Research
Bachelor of Science in Computer Science
Symbiosis International University, Pune, India
May 2019
Comprehensive undergraduate program providing strong foundation in computer science fundamentals, software engineering principles, and emerging technologies.
Key Coursework:
Artificial Intelligence Data Structures & Algorithms Database Management Software Engineering

Contact

Let's connect and collaborate