Skip to content

Instantly share code, notes, and snippets.

@evrenarslan
Created May 17, 2020 15:43
Show Gist options
  • Select an option

  • Save evrenarslan/c12d3f5ae518013f7356f3d23aff2c52 to your computer and use it in GitHub Desktop.

Select an option

Save evrenarslan/c12d3f5ae518013f7356f3d23aff2c52 to your computer and use it in GitHub Desktop.
Makine Öğrenmesi - Polynomial Regression
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Makine Öğrenmesi - Polynomial Regression"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import pylab as pl\n",
"import numpy as np\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Veriyi dataframe içerisine yüklüyorum. Download etmek için (http://open.canada.ca/data/en/dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MODELYEAR</th>\n",
" <th>MAKE</th>\n",
" <th>MODEL</th>\n",
" <th>VEHICLECLASS</th>\n",
" <th>ENGINESIZE</th>\n",
" <th>CYLINDERS</th>\n",
" <th>TRANSMISSION</th>\n",
" <th>FUELTYPE</th>\n",
" <th>FUELCONSUMPTION_CITY</th>\n",
" <th>FUELCONSUMPTION_HWY</th>\n",
" <th>FUELCONSUMPTION_COMB</th>\n",
" <th>FUELCONSUMPTION_COMB_MPG</th>\n",
" <th>CO2EMISSIONS</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014</td>\n",
" <td>ACURA</td>\n",
" <td>ILX</td>\n",
" <td>COMPACT</td>\n",
" <td>2.0</td>\n",
" <td>4</td>\n",
" <td>AS5</td>\n",
" <td>Z</td>\n",
" <td>9.9</td>\n",
" <td>6.7</td>\n",
" <td>8.5</td>\n",
" <td>33</td>\n",
" <td>196</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014</td>\n",
" <td>ACURA</td>\n",
" <td>ILX</td>\n",
" <td>COMPACT</td>\n",
" <td>2.4</td>\n",
" <td>4</td>\n",
" <td>M6</td>\n",
" <td>Z</td>\n",
" <td>11.2</td>\n",
" <td>7.7</td>\n",
" <td>9.6</td>\n",
" <td>29</td>\n",
" <td>221</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014</td>\n",
" <td>ACURA</td>\n",
" <td>ILX HYBRID</td>\n",
" <td>COMPACT</td>\n",
" <td>1.5</td>\n",
" <td>4</td>\n",
" <td>AV7</td>\n",
" <td>Z</td>\n",
" <td>6.0</td>\n",
" <td>5.8</td>\n",
" <td>5.9</td>\n",
" <td>48</td>\n",
" <td>136</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2014</td>\n",
" <td>ACURA</td>\n",
" <td>MDX 4WD</td>\n",
" <td>SUV - SMALL</td>\n",
" <td>3.5</td>\n",
" <td>6</td>\n",
" <td>AS6</td>\n",
" <td>Z</td>\n",
" <td>12.7</td>\n",
" <td>9.1</td>\n",
" <td>11.1</td>\n",
" <td>25</td>\n",
" <td>255</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2014</td>\n",
" <td>ACURA</td>\n",
" <td>RDX AWD</td>\n",
" <td>SUV - SMALL</td>\n",
" <td>3.5</td>\n",
" <td>6</td>\n",
" <td>AS6</td>\n",
" <td>Z</td>\n",
" <td>12.1</td>\n",
" <td>8.7</td>\n",
" <td>10.6</td>\n",
" <td>27</td>\n",
" <td>244</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MODELYEAR MAKE MODEL VEHICLECLASS ENGINESIZE CYLINDERS \\\n",
"0 2014 ACURA ILX COMPACT 2.0 4 \n",
"1 2014 ACURA ILX COMPACT 2.4 4 \n",
"2 2014 ACURA ILX HYBRID COMPACT 1.5 4 \n",
"3 2014 ACURA MDX 4WD SUV - SMALL 3.5 6 \n",
"4 2014 ACURA RDX AWD SUV - SMALL 3.5 6 \n",
"\n",
" TRANSMISSION FUELTYPE FUELCONSUMPTION_CITY FUELCONSUMPTION_HWY \\\n",
"0 AS5 Z 9.9 6.7 \n",
"1 M6 Z 11.2 7.7 \n",
"2 AV7 Z 6.0 5.8 \n",
"3 AS6 Z 12.7 9.1 \n",
"4 AS6 Z 12.1 8.7 \n",
"\n",
" FUELCONSUMPTION_COMB FUELCONSUMPTION_COMB_MPG CO2EMISSIONS \n",
"0 8.5 33 196 \n",
"1 9.6 29 221 \n",
"2 5.9 48 136 \n",
"3 11.1 25 255 \n",
"4 10.6 27 244 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\"FuelConsumption.csv\")\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bu veri grubu içerisinden hangilerini seçeceğime karar vermek için corr metodunu kullanabilirim. Buradaki kolon sayısı çok fazla olmadığı için pek anlamlı olmaz. Aşağıda sonuçları görebilirsiniz."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CO2EMISSIONS</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FUELCONSUMPTION_COMB_MPG</th>\n",
" <td>0.906394</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FUELCONSUMPTION_CITY</th>\n",
" <td>0.898039</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FUELCONSUMPTION_COMB</th>\n",
" <td>0.892129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ENGINESIZE</th>\n",
" <td>0.874154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FUELCONSUMPTION_HWY</th>\n",
" <td>0.861748</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CYLINDERS</th>\n",
" <td>0.849685</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CO2EMISSIONS\n",
"FUELCONSUMPTION_COMB_MPG 0.906394\n",
"FUELCONSUMPTION_CITY 0.898039\n",
"FUELCONSUMPTION_COMB 0.892129\n",
"ENGINESIZE 0.874154\n",
"FUELCONSUMPTION_HWY 0.861748\n",
"CYLINDERS 0.849685"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()[[\"CO2EMISSIONS\"]].dropna().abs().sort_values(by=[\"CO2EMISSIONS\"], ascending=False)[1:]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>FUELCONSUMPTION_COMB</th>\n",
" <th>ENGINESIZE</th>\n",
" <th>CYLINDERS</th>\n",
" <th>CO2EMISSIONS</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>8.5</td>\n",
" <td>2.0</td>\n",
" <td>4</td>\n",
" <td>196</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>9.6</td>\n",
" <td>2.4</td>\n",
" <td>4</td>\n",
" <td>221</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5.9</td>\n",
" <td>1.5</td>\n",
" <td>4</td>\n",
" <td>136</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>11.1</td>\n",
" <td>3.5</td>\n",
" <td>6</td>\n",
" <td>255</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>10.6</td>\n",
" <td>3.5</td>\n",
" <td>6</td>\n",
" <td>244</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" FUELCONSUMPTION_COMB ENGINESIZE CYLINDERS CO2EMISSIONS\n",
"0 8.5 2.0 4 196\n",
"1 9.6 2.4 4 221\n",
"2 5.9 1.5 4 136\n",
"3 11.1 3.5 6 255\n",
"4 10.6 3.5 6 244"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Modeli eğitmek için sadece belirlediğim alanları içeren bir dataset yaratacağım\n",
"cdf=df[[\"FUELCONSUMPTION_COMB\",\"ENGINESIZE\",\"CYLINDERS\",\"CO2EMISSIONS\"]]\n",
"cdf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Motor hacmi ile CO2 Emisyonun ilişkisini gösteren bir grafik çizelim."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.scatter(cdf[\"ENGINESIZE\"],cdf[\"CO2EMISSIONS\"],color=\"blue\")\n",
"plt.xlabel(\"Motor Hacmi\")\n",
"plt.ylabel(\"Co2 Emisyonu\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Train ve Test için elimizdeki verileri ikiye böleceğiz.***"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"msk=np.random.rand(len(df))<0.8 # %80 oranında rastgale True değer dönen bir dizi yarattıyoruz.\n",
"train=cdf[msk] # cdf dataframe içindeki verilerin %80 train olarak atanmış oldu.\n",
"test=cdf[~msk] # cdf dataframe içerisindeki verilerin %20'sini, train veri setine atananların dışındakileri test olarak ayırdık"
]
},
{
"attachments": {
"PolynomialFunctionsGraph.png": {
"image/png": ""
}
},
"cell_type": "markdown",
"metadata": {},
"source": [
"Verinini doğrusal değilde bir eğri şeklinde olduğunu düşündüğümüzde veriyi açıklayan fonksiyonun polynomial fonksiyon olduğunu düşünürüz. Aşağıda görebileceğiniz üzere farklı şekillerdeki eğriler farklı derecelerde fonksiyonların çıktılarıdır. Dağılımımız hangisine benziyorsa aradığımız fonksiyon o dereceden bir fonksiyondur diye düşünebiliriz.\n",
"\n",
"![PolynomialFunctionsGraph.png](attachment:PolynomialFunctionsGraph.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Polynomial fonksiyon analizi için Sklear içinde PolynomialFeatures() foksiyonunu kullanırız. Bu fonksiyon bize, fonksiyona parametre olarak verilmiş dereceye eşit yada küçük, independent featureların \"Aşağıdaki örnekte ENGINESIZE\" tüm olası polinom durumlarını içeren bir matris oluşturur. Aşağıdaki kodda 2. derece dediğimiz için <code>0. kuvveti,1. kuvveti ve 2. kuvvetinden</code> oluşan bir matris."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1. , 2. , 4. ],\n",
" [ 1. , 2.4 , 5.76],\n",
" [ 1. , 1.5 , 2.25],\n",
" ...,\n",
" [ 1. , 3. , 9. ],\n",
" [ 1. , 3.2 , 10.24],\n",
" [ 1. , 3.2 , 10.24]])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.preprocessing import PolynomialFeatures\n",
"from sklearn import linear_model\n",
"\n",
"train_x = np.asanyarray(train[['ENGINESIZE']])\n",
"train_y = np.asanyarray(train[['CO2EMISSIONS']])\n",
"\n",
"test_x = np.asanyarray(test[['ENGINESIZE']])\n",
"test_y = np.asanyarray(test[['CO2EMISSIONS']])\n",
"\n",
"poly=PolynomialFeatures(degree=2)\n",
"train_x_poly=poly.fit_transform(train_x)\n",
"train_x_poly"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bu yeni oluşan matris değişkeninin her dereceden değerini içerdiği için problemimiz Linear Regression problemi haline geldi. Yapmamız gereken bu matris üzerinden en uygun değerleri veren Theta değerlerini bulmak haline geliyor.\n",
"\n",
"***y_hat=b+Theta1*X1+Theata2*X2***"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Coefficients: [[ 0. 49.99413743 -1.4940489 ]]\n",
"Intercept: [108.35041419]\n"
]
}
],
"source": [
"clf=linear_model.LinearRegression()\n",
"train_y_=clf.fit(train_x_poly,train_y)\n",
"print ('Coefficients: ', clf.coef_)\n",
"print ('Intercept: ',clf.intercept_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Buradaki Coefficent ve Intercept değerleri fonksiyonumuzu oluşturan sabit değerler. Bu değerleri kullanarak oluşan eğriyi asıl veri ile birlikte grafik üzerinde gösterecek olursak."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Emission')"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')\n",
"XX = np.arange(0.0, 10.0, 0.1)\n",
"yy = clf.intercept_[0]+ clf.coef_[0][1]*XX+ clf.coef_[0][2]*np.power(XX, 2)\n",
"plt.plot(XX, yy, '-r' )\n",
"plt.xlabel(\"Engine size\")\n",
"plt.ylabel(\"Emission\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Modelin Değerlendirilmesi***"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean absolute error: 23.77\n",
"Residual sum of squares (MSE): 942.47\n",
"R2-score: 0.69\n"
]
}
],
"source": [
"from sklearn.metrics import r2_score\n",
"\n",
"test_x_poly=poly.fit_transform(test_x)\n",
"test_y_=clf.predict(test_x_poly)\n",
"print(\"Mean absolute error: %.2f\" % np.mean(np.absolute(test_y_ - test_y)))\n",
"print(\"Residual sum of squares (MSE): %.2f\" % np.mean((test_y_ - test_y) ** 2))\n",
"print(\"R2-score: %.2f\" % r2_score(test_y_ , test_y) )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment