Learn Data Science in 90 Days: A Complete Roadmap
Data science is one of the most transformative and in-demand disciplines of the 21st century. By combining statistics, programming, and domain expertise, data scientists extract meaningful insights from data to drive informed decision-making across industries — from healthcare to finance, from e-commerce to public policy .
This 90-day roadmap is designed for dedicated learners willing to commit 3–4 hours daily (or 20–25 hours per week) to go from absolute beginner to job-ready data scientist. The plan is structured into three 30-day phases, each building on the previous one, ensuring a progressive, scaffolded learning experience.
Who is this for? Career switchers, self-taught programmers, recent graduates, or anyone who wants a structured, time-bound path into data science.
The core areas you'll master include:
- Python programming (86% of data science jobs require it )
- SQL for data querying and manipulation
- Statistics & probability — the mathematical backbone of all modeling
- Machine learning — from linear regression to ensemble methods
- Data visualization & storytelling — turning analysis into impact
- Portfolio building & career preparation
Below is a high-level visual of your 90-day journey:
Footnotes
-
Coursera – Data Science Learning Roadmap for Beginners - Comprehensive beginner to expert roadmap covering essential topics and courses. ↩
-
Dawn Choo / LinkedIn – Most In-Demand Data Science Skills in 2025 - Analysis of 101 data science job postings showing Python at 86%, ML at 65%, and SQL demand trends. ↩
Become a Data Scientist in 90 Days — Step-by-Step Roadmap
90-Day Data Science Learning Lifecycle
Python Fundamentals
Days 1–10Set up your environment, learn Python syntax, data types, control flow, and functions. Install Jupyter Notebook and run your first scripts."
Data Manipulation with Pandas & NumPy
Days 11–20Master the core data science libraries: NumPy for numerical operations and Pandas for data wrangling, cleaning, and transformation."
SQL & Statistics Foundations
Days 21–30Learn SQL queries (SELECT, JOIN, GROUP BY, window functions) and core statistics: descriptive stats, probability distributions, and hypothesis testing."
Exploratory Data Analysis & Visualization
Days 31–45Perform EDA on real datasets with Matplotlib, Seaborn, and Plotly. Learn to tell stories with data and build your first dashboards."
Machine Learning Fundamentals
Days 46–60Implement supervised and unsupervised learning algorithms with scikit-learn: linear/logistic regression, decision trees, clustering, and model evaluation."
Advanced ML & Deployment
Days 61–75Explore ensemble methods, feature engineering, model tuning with cross-validation, and deploy models using Flask, FastAPI, or Streamlit."
Portfolio, Networking & Job Prep
Days 76–90Complete 3 capstone projects, publish to GitHub, build your personal brand on LinkedIn, craft your resume, and practice interview questions."
Weekly Time Allocation Across 90 Days
Recommended hours per week for each skill area
Common Pitfall: Tutorial Hell
Watching tutorials endlessly without practicing is the #1 reason learners fail. For every hour of video you watch, spend at least 2 hours coding on your own. Use platforms like Kaggle, LeetCode (SQL), and HackerRank to practice actively.
Phase 1: Foundations (Days 1–30)
The first 30 days are about building a rock-solid foundation. You cannot build a house on sand — and you cannot do data science without fluency in programming, data manipulation, and mathematical reasoning.
1.1 Python Programming (Days 1–10)
Python is the lingua franca of data science. According to recent job market analysis, 86% of data science job postings require Python . Start with the fundamentals before touching any data science library.
Core topics to cover:
| Topic | Key Concepts | Estimated Hours |
|---|---|---|
| Python Basics | Variables, data types, operators, strings | 6 |
| Control Flow | if/elif/else, for loops, while loops | 4 |
| Data Structures | Lists, tuples, dictionaries, sets | 6 |
| Functions | def, lambda, args/kwargs, scope | 4 |
| OOP Basics | Classes, objects, methods, inheritance | 4 |
Essential libraries to learn early:
pandas— the workhorse of data manipulationnumpy— fast numerical computationmatplotlib— basic plotting
Use keywordJupyter Notebook as your primary development environment.
1.2 Data Manipulation (Days 11–20)
Once you know Python basics, shift to keyworddata wrangling — this is what you'll spend 60-80% of your time doing in real data science roles .
Pandas mastery checklist:
- Importing data:
pd.read_csv(),pd.read_excel(),pd.read_sql() - Inspection:
.head(),.info(),.describe(),.shape - Selection & filtering:
.loc[],.iloc[], Boolean indexing - Cleaning: handling missing values (
.dropna(),.fillna()), duplicates (.drop_duplicates()) - Transformation:
.groupby(),.agg(),.pivot_table(),.merge(),.join() - Apply functions:
.apply(),.map(), vectorized operations
1.3 SQL (Days 21–25)
SQL remains one of the most critical skills — approximately 70-80% of data science roles list it as a requirement . You'll use SQL to extract, filter, and aggregate data from relational databases.
SQL learning path:
1.4 Statistics & Probability (Days 26–30)
Statistics is the intellectual foundation of data science. Without it, you're just guessing. Key areas:
- Descriptive statistics: mean (), median, mode, variance (), standard deviation
- Probability distributions: Normal, Binomial, Poisson
- Central Limit Theorem: understanding why sample means converge to
- Hypothesis testing: null/alternative hypotheses, p-values, confidence intervals
- Correlation vs. causation: Pearson's , Spearman's
Footnotes
-
Dawn Choo / LinkedIn – Most In-Demand Data Science Skills in 2025 - Analysis of 101 data science job postings showing Python at 86%, ML at 65%, and SQL demand trends. ↩ ↩2
-
Databricks – Uncovering Data Science: Skills, Careers, and Education - Research on data science career pathways, required skills, and hiring signals. ↩
Phase 1 Daily Study Protocol (Days 1–30)
- 1Step 1
Watch tutorials or read documentation on the day's core topic. Take handwritten notes for retention. Focus on understanding, not memorizing.
- 2Step 2
Open Jupyter Notebook and implement every concept yourself. Modify examples, break things, fix errors. Use LeetCode (Python) or HackerRank for structured drills.
- 3Step 3
Apply what you learned to a small, self-contained task. For example, after learning Pandas groupby, analyze a Kaggle dataset and find top categories.
- 4Step 4
Revisit your notes, summarize the day's learning in 3-5 bullet points, and preview tomorrow's topics. Spaced repetition dramatically improves retention.
Phase 2: Core Skills (Days 31–60)
With foundations in place, you now shift to the heart of data science: analysis, visualization, and machine learning.
2.1 Exploratory Data Analysis (Days 31–40)
keywordExploratory Data Analysis (EDA) is the disciplined practice of understanding your data before modeling. A thorough EDA typically reveals:
- Data distributions and outliers
- Missing data patterns
- Relationships between variables
- Potential feature engineering opportunities
EDA workflow tools: Use pandas-profiling for automated reports, then manually explore with seaborn (pairplots, heatmaps, boxplots, violin plots).
2.2 Data Visualization & Storytelling (Days 41–50)
Data without visualization is just noise. The most effective data scientists are not just analysts — they are storytellers.
Visualization stack:
| Library | Best For | Difficulty |
|---|---|---|
| Matplotlib | Custom, publication-quality plots | Medium |
| Seaborn | Statistical plots with minimal code | Low |
| Plotly | Interactive, web-based visualizations | Medium |
| Tableau / Power BI | Business dashboards, stakeholder reporting | Low |
Key principles:
- Choose the right chart type — bar charts for comparison, line charts for trends, scatter plots for relationships
- Minimize chart junk — remove unnecessary grid lines, borders, 3D effects
- Use color intentionally — highlight the key insight, not decoration
- Always label axes and provide context
2.3 Machine Learning Fundamentals (Days 46–60)
Machine learning is where many learners get most excited — but remember, without solid EDA and clean data, models are useless ("garbage in, garbage out").
Supervised Learning algorithms to master:
The equation above represents keywordlinear regression — the simplest and most interpretable model. Start here, then progress to:
| Algorithm | Type | Use Case | scikit-learn Class |
|---|---|---|---|
| Linear Regression | Supervised | Predicting continuous values | LinearRegression |
| Logistic Regression | Supervised | Binary classification | LogisticRegression |
| Decision Tree | Supervised | Interpretable classification/regression | DecisionTreeClassifier |
| Random Forest | Supervised | Ensemble, robust | RandomForestClassifier |
| K-Means Clustering | Unsupervised | Customer segmentation | KMeans |
Model evaluation metrics you must know:
- Regression: , MAE, RMSE
- Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC
The 80/20 Rule of Machine Learning
In practice, 80% of your time is spent on data cleaning and feature engineering, and only 20% on model building. Don't skip EDA to reach modelling faster — the quality of your data determines the quality of your model. A simple linear regression on clean, well-engineered features will outperform a complex neural network trained on noisy, unprocessed data.
Phase 3: Specialization & Portfolio (Days 61–90)
The final phase transitions you from learner to practitioner. Your goal is to build a portfolio of 3 substantial projects that demonstrate real-world data science skills, then prepare strategically for the job market.
3.1 Advanced ML & Feature Engineering (Days 61–72)
Go beyond the basics:
- Ensemble methods: Bagging (Random Forests), Boosting (XGBoost, LightGBM, AdaBoost)
- Cross-validation: k-fold CV, stratified k-fold for reliable model assessment
- Hyperparameter tuning: GridSearchCV, RandomizedSearchCV, Bayesian optimization
- Feature engineering: creating interaction terms, binning, encoding cyclical features, target encoding
- Handling imbalanced data: SMOTE, class weights, under-sampling
Modern additions (2024–2025): Many data science roles now require familiarity with keywordGenerative AI and LLM tools . Consider adding:
- prompt engineering basics
- using OpenAI/HuggingFace APIs for text analysis tasks
- RAG (Retrieval-Augmented Generation) fundamentals
3.2 Model Deployment (Days 73–80)
A model on your laptop is not a product. Deploying your work demonstrates engineering maturity:
- Streamlit — fastest way to build ML web apps (under 1 hour)
- Flask / FastAPI — lightweight REST API frameworks
- Docker — containerize your application for reproducibility
- Cloud platforms: AWS (SageMaker), GCP (Vertex AI), or Heroku for hosting
3.3 Portfolio Projects (Days 81–88)
Your portfolio is your resume in data science. Build 3 projects covering different skill areas:
- End-to-End EDA Project — analyze a public dataset (e.g., NYC taxi, WHO health data), clean it, visualize insights, write a blog post
- Predictive Modeling Project — pick a Kaggle competition (e.g., House Prices, Titanic), build and tune a model, document your process
- Applied ML Dashboard — wrap a trained model in a Streamlit app and deploy it
Every project must include:
- A
README.mdwith problem statement, methodology, and results - Clean, commented code in Jupyter notebooks
- Visualizations with clear takeaways
- A deployed link or demo
3.4 Career Preparation (Days 89–90)
- Write a targeted resume highlighting projects and technical skills
- Optimize your LinkedIn profile with data science keywords
- Practice common interview questions: SQL queries, probability brainteasers, ML theory, and take-home case studies
- Network: attend local meetups, join data science Discord communities, contribute to open-source projects
Footnotes
-
Towards Data Science – The 5 Data Science Skills You Can't Ignore in 2024 - Evolving skill requirements including deep learning, GenAI, and cloud/ML engineering overlap. ↩
1import pandas as pd 2import numpy as np 3from sklearn.model_selection import train_test_split 4from sklearn.ensemble import RandomForestClassifier 5from sklearn.metrics import classification_report 6 7# Load and inspect data 8df = pd.read_csv('dataset.csv') 9print(df.info()) 10print(df.describe()) 11 12# Train / test split 13X = df.drop('target', axis=1) 14y = df['target'] 15X_train, X_test, y_train, y_test = train_test_split( 16 X, y, test_size=0.2, random_state=42, stratify=y 17) 18 19# Train model 20model = RandomForestClassifier(n_estimators=100, random_state=42) 21model.fit(X_train, y_train) 22 23# Evaluate 24y_pred = model.predict(X_test) 25print(classification_report(y_test, y_pred))
End-to-End Data Science Project Workflow
- 1Step 1
Start with a clear, measurable question. For example: 'Can we predict customer churn with >80% accuracy using transaction history?' A well-defined problem drives every downstream decision.
- 2Step 2
Gather data from databases (SQL), APIs, CSVs, or web scraping. Assess data quality: check for missing values, duplicates, inconsistent formatting. Document every assumption.
- 3Step 3
Compute summary statistics. Plot distributions, correlations, and outliers. Ask: What patterns exist? What anomalies need investigation? EDA is where real insight lives.
- 4Step 4
Transform raw columns into model-ready features: encode categoricals, scale numeric variables, create interaction terms, handle datetime fields, and address skewness with log transforms.
- 5Step 5
Start with a simple baseline (e.g., Logistic Regression for classification). Then iterate: try Random Forests, XGBoost, or gradient boosting. Use k-fold cross-validation and compare metrics systematically.
- 6Step 6
Build a Streamlit or Flask app to make predictions available. Write a clear report explaining methodology, limitations, and business impact. Communication separates good data scientists from great ones.
Data Science Job Skill Requirements (2025)
Based on analysis of 101 data science job postings
Frequently Asked Questions
Avoid These Common Mistakes
- Skipping SQL — SQL is used daily by 70-80% of data professionals. Don't assume Python replaces it.
- Ignoring statistics — ML algorithms are statistics dressed in code. Without statistical literacy, you cannot interpret results correctly.
- Too many tools, too little depth — Master Python + SQL + one BI tool deeply before adding more.
- No projects — Certificates without projects don't get interviews. Hiring managers want to see what you've built.
- Applying too late — Start networking and submitting applications in Phase 3, not after.
Recommended Resources & Weekly Schedule
Below is a suggested weekly structure to keep you on track across all 90 days:
| Day | Morning (60–90 min) | Afternoon/Evening (60–90 min) |
|---|---|---|
| Mon | New concept learning | Coding practice |
| Tue | Coding practice | Project work |
| Wed | New concept learning | Coding practice |
| Thu | Tutorial / documentation | Project work |
| Fri | Weekly review & quizzes | Blog / note-taking |
| Sat | Deep-dive project session (3–4 hrs) | — |
| Sun | Rest or light review | — |
Top free resources:
- Python: Python.org tutorial, Codecademy, Automate the Boring Stuff
- SQL: W3Schools SQL, StrataScratch, SQLZoo
- Statistics: Khan Academy Statistics, StatQuest (YouTube)
- ML: Andrew Ng's Coursera course, scikit-learn documentation
- Projects: Kaggle datasets, UCI ML Repository, data.gov
The key metric for success isn't hours studied — it's projects completed and concepts understood deeply enough to explain to others.
Knowledge Check
Approximately what percentage of data science job postings require Python, according to recent analysis?
Explore Related Topics
Learn SQL in 30 Days: From Zero to Query Master
SQL (Structured Query Language) is the standard language for creating, managing, updating, and retrieving data from relational databases such as MySQL, PostgreSQL, SQL Server, and Oracle. It is widely used across industries — from software engineering to data analytics — making it one of the most in
How to Become a Data Scientist
Becoming a data scientist requires a multidisciplinary foundation in math, statistics, programming, machine learning, domain knowledge, and communication, combined with hands‑on projects that demonstrate the full data‑science lifecycle.
- Master core competencies: probability & inference, Python + SQL, data cleaning/EDA, modeling (regression, classification, clustering) and storytelling.
- Follow the iterative CRISP‑DM process: business understanding → data preparation → modeling → evaluation → deployment.
- Build 2–4 end‑to‑end portfolio projects with messy real data, clear documentation, and business impact to outweigh certificates.
- A typical 12‑month pathway allocates ~20% effort to math & stats, 25% to Python/SQL, and the remainder to cleaning, ML, and portfolio work.
- Employers usually require at least a bachelor’s degree, but strong projects and communication often outweigh advanced degrees.
Learn JavaScript in 30 Days
The course provides a 30‑day roadmap that guides beginners from core JavaScript syntax to building interactive, async web applications with vanilla JavaScript.
- Daily coding sessions of ‑ minutes, followed by brief ‑ minute reviews.
- Structured timeline: Days 1‑5 syntax & data types, 6‑10 functions & structures, 11‑15 DOM & events, 16‑20 modern ES6+, 21‑25 async / fetch, 26‑30 projects.
- Covers variables, control flow, functions, arrays/objects, DOM manipulation, modules, promises, and
async/await. - Project sequence builds confidence: calculator → counter → to‑do list → API‑driven app → mini dashboard.
- Key habits: use
constby default, learn APIs by building, finish each topic with a working example.