... Captone Project IBM (SpaceX dataset) |

Captone Project IBM (SpaceX dataset)

  • The following is a captone project of the course “IBM Data Science”. For more details about the Professional certificate, click here

  • Full project on github, click here

Executive Summary

For this project, the data was collected from the public SpaceX API and the SpaceX Wikipedia page. Then,added a column called ‘class’ to classify successful landings. To analyze the data, SQL, visualizations, folium maps, and dashboards were used. The relevant columns as features for further analysis were selected.

Next, The categorical variables were transformed into binary using a technique called one hot encoding. The data was standardized and used GridSearchCV to find the best parameters for the machine learning models. Finally, the accuracy scores were visualized of all the models. Four machine learning models were considered: Logistic Regression, Support Vector Machine, Decision Tree Classifier, and K Nearest Neighbors. Surprisingly, all the models produced similar results, with an accuracy rate of around 83.33%. However, it’s important to note that all the models tended to over-predict successful landings. The following pdf file shows the final presentation for this project.

Tools: Python Management Studio,SQL Server Integration Services (SSIS)
Languajes: SQL

Andres Camilo Viloria Garcia
Data Scientist | Data Analyst