SQL PROJECT
A comprehensive exploration of global pandemic data using advanced SQL techniques to track infection rates, mortality, and vaccination progress.
This project analyzes the Covid Deaths and Covid Vaccinations datasets from "Our World in Data". The goal is to derive meaningful insights regarding the likelihood of mortality, infection rates relative to population, and the global rollout of vaccinations.
The analysis follows a structured approach, starting from basic data exploration and moving towards complex joins and window functions to calculate rolling metrics.
Evaluates the likelihood of dying if you contract COVID-19 in a specific country (e.g., India).
SELECT Location, date, total_cases, total_deaths,
(CAST(total_deaths AS decimal) / total_cases) * 100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE location LIKE '%india%'
AND continent IS NOT NULL
ORDER BY 1,2
Examines what percentage of the population has contracted COVID-19.
SELECT Location, date, population, total_cases,
(total_cases/population)*100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE location LIKE '%india%'
AND continent IS NOT NULL
ORDER BY 1,2
Identifies countries with the highest infection rates relative to their population.
SELECT Location, population, MAX(total_cases) AS HighestInfectionCount,
MAX((total_cases/population))*100 AS PercentPopulationInfected
FROM [Covid Data Exploration]..CovidDeaths
GROUP BY location, population
ORDER BY PercentPopulationInfected DESC
Breaking down total death counts by country and by continent to identify the hardest-hit regions.
-- By Country
SELECT location, MAX(total_deaths) AS TotalDeathCount
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY TotalDeathCount DESC
-- By Continent
SELECT continent, MAX(total_deaths) AS TotalDeathCount
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY continent
ORDER BY TotalDeathCount DESC
Calculates the total new cases, new deaths, and the global death percentage across the world.
SELECT SUM(new_cases) AS TotalCases,
SUM(new_deaths) AS TotalDeaths,
(SUM(CAST(new_deaths AS decimal)) / SUM(CAST(new_cases AS decimal))) * 100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
ORDER BY 1,2
Uses Window Functions to calculate a rolling count of vaccinated people, then uses a CTE (Common Table Expression) to perform calculations on that rolling count.
WITH PopVsVac (Continent, Location, Date, Population, new_vaccinations, RollingPeopleVaccinated)
AS (
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(CONVERT(int, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM [Covid Data Exploration]..CovidDeaths dea
JOIN [Covid Data Exploration]..CovidVaccinations vac
ON dea.[location] = vac.[location] AND dea.[date] = vac.[date]
WHERE dea.continent IS NOT NULL
)
SELECT *, (RollingPeopleVaccinated/population)*100 AS PercentVaccinated
FROM PopVsVac
Storing the complex query logic into a View for later use in visualization tools like Tableau or PowerBI.
CREATE VIEW VaccinationPercentage AS
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(CONVERT(int, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM [Covid Data Exploration]..CovidDeaths dea
JOIN [Covid Data Exploration]..CovidVaccinations vac
ON dea.[location] = vac.[location] AND dea.[date] = vac.[date]
WHERE dea.continent IS NOT NULL
Through this extensive data exploration project, we successfully analyzed the global impact of COVID-19. Key insights derived include:
Established precise death percentages for individual countries, allowing for risk assessment.
Identified nations with the highest infection rates relative to their population density.
Quantified the global death percentage and isolated the hardest-hit continents.
Tracked rolling vaccination counts and calculated percentage of vaccinated populations.
The skills utilized here - complex joins, CTEs, and window functions - are foundational for any robust data analysis workflow. This project highlights the power of SQL for deep data exploration. By converting raw data into meaningful metrics, we can tell a compelling story about the global impact of the pandemic.