PENERAPAN ALGORITMA LOGISTIC REGRESSION UNTUK KLASIFIKASI PENYAKIT STROKE
DOI:
https://doi.org/10.21067/bimasakti.v8i2.13201Abstract
Stroke is one of the leading causes of death worldwide, ranking after heart disease and cancer. Early detection of stroke risk is essential to enable faster and more accurate treatment. The purpose of this study is to apply the Logistic Regression algorithm to classify stroke cases based on several risk factors, including gender, age, hypertension, heart disease, marital status, occupation, residence type, average glucose level, body mass index (BMI), smoking status, and stroke status. The dataset used in this research was obtained from Kaggle and consists of 5,110 patient records. The research process involves several stages, including data cleaning, data transformation, and normalization using the Min-Max Scaler method, followed by splitting the data into training and testing sets with various proportions (90%-10%, 85%-15%, 80%-20%, 70%-30%, and 65%-35%). The evaluation was conducted using a Confusion Matrix with performance metrics such as accuracy, precision, recall, and F1-score. The analysis results show that the 90%-10% data split achieved the highest accuracy of 76.17%, with precision and recall values indicating that the model performs well in identifying non-stroke cases. However, performance on the minority class (stroke) remains relatively low, suggesting the need for improvement through data imbalance handling. Overall, the application of the Logistic Regression algorithm proved to be effective for initial stroke classification, although accuracy can still be improved through resampling techniques or advanced model optimization.


