Back to Projects

SQL PROJECT

COVID-19 Data Analysis

A comprehensive exploration of global pandemic data using advanced SQL techniques to track infection rates, mortality, and vaccination progress.

SQL Joins CTEs Window Functions Temp Tables Views Data Conversion

Project Overview

This project analyzes the Covid Deaths and Covid Vaccinations datasets from "Our World in Data". The goal is to derive meaningful insights regarding the likelihood of mortality, infection rates relative to population, and the global rollout of vaccinations.

The analysis follows a structured approach, starting from basic data exploration and moving towards complex joins and window functions to calculate rolling metrics.

Usage Instructions

  1. Clone the repository via Git.
  2. Import the Our World in Data dataset into your SQL Server environment.
  3. Run the queries below sequentially to reproduce the analysis.

Key Analysis & SQL Queries

1. Total Cases vs Total Deaths

Evaluates the likelihood of dying if you contract COVID-19 in a specific country (e.g., India).

SQL
SELECT Location, date, total_cases, total_deaths, 
(CAST(total_deaths AS decimal) / total_cases) * 100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE location LIKE '%india%' 
AND continent IS NOT NULL 
ORDER BY 1,2

2. Total Cases vs Population

Examines what percentage of the population has contracted COVID-19.

SQL
SELECT Location, date, population, total_cases, 
(total_cases/population)*100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE location LIKE '%india%'
AND continent IS NOT NULL 
ORDER BY 1,2

3. Highest Infection Rates

Identifies countries with the highest infection rates relative to their population.

SQL
SELECT Location, population, MAX(total_cases) AS HighestInfectionCount, 
MAX((total_cases/population))*100 AS PercentPopulationInfected
FROM [Covid Data Exploration]..CovidDeaths
GROUP BY location, population
ORDER BY PercentPopulationInfected DESC

4 & 5. Highest Death Counts (Country & Continent)

Breaking down total death counts by country and by continent to identify the hardest-hit regions.

SQL
-- By Country
SELECT location, MAX(total_deaths) AS TotalDeathCount
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY TotalDeathCount DESC

-- By Continent
SELECT continent, MAX(total_deaths) AS TotalDeathCount
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY continent
ORDER BY TotalDeathCount DESC

6. Global Statistics

Calculates the total new cases, new deaths, and the global death percentage across the world.

SQL
SELECT SUM(new_cases) AS TotalCases, 
SUM(new_deaths) AS TotalDeaths, 
(SUM(CAST(new_deaths AS decimal)) / SUM(CAST(new_cases AS decimal))) * 100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
ORDER BY 1,2

8 & 9. Vaccination Rollout (Window Functions & CTE)

Uses Window Functions to calculate a rolling count of vaccinated people, then uses a CTE (Common Table Expression) to perform calculations on that rolling count.

SQL
WITH PopVsVac (Continent, Location, Date, Population, new_vaccinations, RollingPeopleVaccinated)
AS (
    SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
    SUM(CONVERT(int, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
    FROM [Covid Data Exploration]..CovidDeaths dea 
    JOIN [Covid Data Exploration]..CovidVaccinations vac 
        ON dea.[location] = vac.[location] AND dea.[date] = vac.[date]
    WHERE dea.continent IS NOT NULL 
)
SELECT *, (RollingPeopleVaccinated/population)*100 AS PercentVaccinated
FROM PopVsVac

10. Creating Views for Visualization

Storing the complex query logic into a View for later use in visualization tools like Tableau or PowerBI.

SQL
CREATE VIEW VaccinationPercentage AS
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(CONVERT(int, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM [Covid Data Exploration]..CovidDeaths dea 
JOIN [Covid Data Exploration]..CovidVaccinations vac 
    ON dea.[location] = vac.[location] AND dea.[date] = vac.[date]
WHERE dea.continent IS NOT NULL

Outcomes & Final Thoughts

Through this extensive data exploration project, we successfully analyzed the global impact of COVID-19. Key insights derived include:

Mortality Risk

Established precise death percentages for individual countries, allowing for risk assessment.

Infection Spread

Identified nations with the highest infection rates relative to their population density.

Global Impact

Quantified the global death percentage and isolated the hardest-hit continents.

Vaccination Progress

Tracked rolling vaccination counts and calculated percentage of vaccinated populations.

The skills utilized here - complex joins, CTEs, and window functions - are foundational for any robust data analysis workflow. This project highlights the power of SQL for deep data exploration. By converting raw data into meaningful metrics, we can tell a compelling story about the global impact of the pandemic.

© 2026 Sai Suraj Matta. All rights reserved.