Wind vs. Solar Logistic Regressions

Project Lead: Angela Voit

Project Member: Geric Panr

Motivation:

Compete an ML project using the Yale Climate Opinion Survey data. After looking into other related county level data, we decide it would be interesting to examine the differences between coal counties and wind counties.

Methods:

Python analysis in Jupiter notebook with data from Yale, the US census bureau, and the US Energy Information Administration

Findings:

We started by visualizing our coal versus wind county-level data. Then we developed a logistic regression using sklearn and the liblinear model for an accuracy of 88%. Finally, we examined the coeffs of the regression model which we suspect to have been highly influenced by outliers and multicollinearity despite the overall accuracy of the model. Although the model can classify coal versus wind counties, the categories display many similarities along the opinion measurements. (Note: for the visualizations displayed below, "not wind" indicates a coal county.)

Cleaned Dataset:

https://www.kaggle.com/srikantsahu/co2-and-ghg-emission-data

Sources:

https://www.outdoorphotographer.com/blog/on-assignment-nextera-energy-resources//