KNN Classifier

In this blog, I will write about a very famous supervised learning algorithm, k-nearest neighbors or in short KNN.

This algorithm is interesting because it differs from the other classification algorithms. This algorithm is also called a lazy learner algorithm. KNN comes under instance-based learning. Models based on instance-based learning are characterized by memorizing the training dataset, and lazy learning is a special case of instance-based learning that is associated with zero cost during the learning process.

The algorithm can be summarized in three steps :

Calculate the number of k clusters and compute the distance between the test object and every object in the dataset
Find the k-nearest neighbors of the test object we want to classify
Assign it to the maximum frequency class.

Exception: If the neighbors have similar distances, the algorithm will choose the class label that comes first in the training dataset.

Let’s implement the KNN classifier in the breast cancer dataset

#Import the package
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn import metrics

#Load the dataset
data = pd.read_csv('data.csv')
data.head()
data.drop(['id'],axis=1,inplace=True)
data.columns
data.info()
#Dimension
print("Dimension : " , data.shape)
data['diagnosis'].value_counts()

Out of 569, 212 are categorized as malignant and others are categorized as benign

X = data.drop(['diagnosis','Unnamed: 32'],axis=1)
y = data['diagnosis']
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.33,random_state=42)
#Hyperparameter Tuning - GridSearchCV
parameters = {'metric' : ('manhattan','euclidean','minkowski'),
             'n_neighbors':range(1,21)}
model = KNeighborsClassifier()
tuning = GridSearchCV(model,param_grid=parameters,scoring='accuracy')
tuning.fit(x_train,y_train)
print("Best parameter  -  " , tuning.best_params_)
print("Best Accuracy Score  -  ", tuning.best_score_)

The best accuracy score happens with the k = 5

knn = KNeighborsClassifier(n_neighbors=5,metric='manhattan',p=2)
knn.fit(x_train,y_train)
y_pred = knn.predict(x_test)

print("The accuracy score of training dataset : ", metrics.accuracy_score(y_train,knn.predict(x_train)))
print("The accuracy score of testing dataset : " , metrics.accuracy_score(y_test,y_pred))

KNN Classifier in Machine Learning

Table of contents

No headings in the article.