Heart Disease
Prediction ToolDocumentation and Technical Specifications
Project Overview
This project implements a deep learning model with KNN to predict heart disease risk based on patient risk factors. The model utilizes the comprehensive "Heart Disease Dataset" from Kaggle for training and validation.
Key Problems
Leading Cause of Death
Coronary heart disease remains one of the leading causes of death worldwide.
Risk Awareness Gap
Limited awareness of critical risk factors among the general population.
Technology Integration
Underutilization of data-driven prediction technologies in healthcare.
Statistical Support
Cardiovascular diseases account for approximately 30% of global deaths, with Indonesian hypertension prevalence at 34.1% for ages ≥18 (Riskesdas 2018).
Over 70% of heart disease risk is attributed to lifestyle factors according to the Journal of the American Medical Association.
Dataset Attributes
| ATTRIBUTE | DESCRIPTION |
|---|---|
| age | Age in years |
| sex | Gender (0 = male; 1 = female) |
| cp | Chest pain type |
| trestbps | Resting blood pressure (in mm Hg) |
| chol | Serum cholesterol in mg/dl |
| fbs | Fasting blood sugar > 120 mg/dl (1 = true; 0 = false) |
| restecg | Resting electrocardiographic results |
| thalach | Maximum heart rate achieved during exercise |
| exang | Exercise induced angina (1 = yes; 0 = no) |
| oldpeak | ST depression induced by exercise relative to rest |
| slope | Slope of peak exercise ST segment |
| ca | Number of major vessels (0-3) colored by flourosopy |
| thal | 3 = normal; 6 = fixed defect; 7 = reversible defect |
Model Configuration
K-Nearest Neighbors (KNN)
n_neighbors = 5
Random Forest
n_estimators = 20, criterion = 'entropy'
XGBoost
learning_rate = 0.1, max_depth = 15, n_estimators = 100
Performance Metrics
Accuracy
Measures the percentage of correct predictions
Precision
Measures the proportion of correct positive predictions
Recall
Measures the proportion of actual positives correctly predicted