KNN Algorithm in Machine Learning

9 mins read1K Views Comment

Call 8585951111Got Doubts?

Updated on Mar 24, 2023 16:57 IST

In this article, we will briefly discuss about KNN algorithm in machine learning, how to find the value of K, how to build a KNN classifier and finally we will discuss pros and cons of knn.

As a Machine Learning practitioner working with labelled training datasets, one of the most sophisticated algorithms you will encounter is the K Nearest Neighbors (KNN). This algorithm works for both classification and regression problems. You might have heard the popular phrase “Birds of the same feather flock together”. Which basically means you are who you keep company with. Our behavior and characteristics tend to be affected by the people around us. Similarly, the KNN algorithm in machine learning determines the characteristics of a data point by the data points surrounding it.

In this article, we will focus on how KNN in machine learning is used for classification.

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

MCA in Machine Learning & Artificial Intelligence (ML & AI) (Online MCA)

TCS ionDegree

Total Fees

₹2.75 L

Duration

2 years

Quick Introduction to KNN

In the KNN model, learning is based on the nature of the data points (neighbors) that are present close to the query data point in the training dataset. The number of training examples or nearest neighbors is given by ‘K’.

Let’s understand with an analogy – Say, you have a singular close friend you hang out with all the time, and you will probably end up sharing similar interests with them. In KNN, that means the value of K = 1.

Stay updated with the latest blogs on online courses and skills

Enter Mobile Number

Similarly, if you have a group of four friends, your characteristics will tend to be the average of them all. That is KNN with a value of K = 4.

Once the value of K is determined,

A KNN classifier determines the class of a data point through a majority vote of nearest neighbors.
A KNN regressor predicts the class by calculating the mean of the nearest neighbors.

The KNN algorithm is an instance-based method and is called a lazy learner. Lazy because it doesn’t explicitly learn from the training data. It just memorizes the training instances which are used as “knowledge” during prediction.

Must Check: What is Machine Learning?

Must Check: Machine Learning Online Courses & Certifications

How to find the optimal value of K?

The value of K plays a significant role in the performance efficacy of the model. The chosen K value should neither be too large nor too small. As K increases, the error usually goes down, then stabilizes, and starts rising again when K becomes too large.

If K is too small, the model might perform very well on training data but would drastically fail on testing data (overfitting).

Whereas if K is too large, the resulting model would be too generalized resulting in underfitting. A large K value would also increase the computational expense of your model.

The optimal value of K depends on the dataset being used and largely depends on trial-and-error. Domain knowledge plays a vital role in this.

It is preferable to choose an odd number for K to minimize the chances of landing a tie during class prediction through the majority voting mechanism in KNN classifiers.

More ways that can help with the estimation of the K value are:

Square Root Method: We can consider the square root of all data points in the training dataset as the optimal value of K.
Cross-Validation Method: We will start with K=1, run cross-validation (5 to 10-fold), measure the accuracy, and keep repeating until the results get consistent.
As stated above, as the value of K increases, it stabilizes the error at some point before it rises again. We choose the value at the beginning of the stable zone as the optimal value of K. This is also called Elbow Method.

How to choose the Value of k in K-fold Cross-Validation

Cross-validation is a technique for evaluating a machine learning model and testing its performance. It is used commonly in applied ML tasks. It helps in comparing and selecting an appropriate...read more

Read Later

Differences Between Supervised and Unsupervised Learning

While delving into AI (Artificial Intelligence) and ML (Machine Learning), you will come across two main ways machines learn from the data fed into them – Supervised and Unsupervised. This...read more

Read Later

Basics of Machine Learning – Definition and Concepts

This post will help you understand the emerging technology of today’s time- Machine Learning. Here we have covered basic concept of Machine Learning.

Read Later

How to find the K Nearest Neighbors?

To determine which data points are close enough to be considered nearest neighbors, we commonly use the following distance measuring techniques:

Euclidean distance (most commonly used method)
Manhattan distance
Minkowski distance

The figure below illustrates how to calculate the Euclidean distance between two points in a 2D space:

How KNN Works?

Step 1: the optimal value of K is determined.
Step 2: The KNN algorithm calculates the distance of all data points from the query data point using the distance measuring techniques stated above.
Step 3: It ranks the data points by increasing distance. The closest K points in the data space of the query point are its nearest neighbors.
Step 4: For each query point, one of these K neighbors predicts its class as follows.
Counting the data points in each category and taking the majority votes into consideration – KNN Classifier model.
- Calculating the average of the nearest neighbors – KNN Regressor model.

Building a KNN Classifier in Python

Problem Statement:

For demonstration, we are going to build a classifier model using a K Nearest Neighbors algorithm to predict whether the patients have diabetes or not based on the features in the given data. We will also find the optimal value of K using the GridSearchCV() method in the Scikit-learn library.

So, let’s understand!

Dataset Description:

The dataset has 8 features as given below:

Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration at 2 hours in an oral glucose tolerance test
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index
DiabetesPedigreeFunction: Diabetes pedigree function
Age: Age in years
Outcome: Class variable (0 or 1)

The Outcome column is our target variable.

Tasks to be performed:

Load the data
Perform feature scaling
Perform label encoding
Split the data into training and testing sets
Fit the KNN Classifier model
Generate an accuracy plot
Create a KNN Classifier with K=7
Fit the Classifier and get the Accuracy Score
Create a confusion matrix
Perform cross-validation

Step 1 – Load the data

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #808080;font-style: italic">\n    \n    \n    #Import required libraries\n    \n    \n    
\n    \n    \n    <span style="color: #ff7700;font-weight:bold">\n     \n     \n     import numpy \n     \n     \n     <span style="color: #ff7700;font-weight:bold">\n      \n      \n      as np\n      \n      \n      
\n      \n      \n      <span style="color: #ff7700;font-weight:bold">\n        import pandas  \n       <span style="color: #ff7700;font-weight:bold">\n         as pd \n        
 \n        <span style="color: #ff7700;font-weight:bold">\n          import matplotlib. \n         <span style="color: black">\n           pyplot  \n          <span style="color: #ff7700;font-weight:bold">\n            as plt \n           
plt. \n           <span style="color: black">\n             style. \n            <span style="color: black">\n              use \n             <span style="color: black">\n               ( \n              <span style="color: #483d8b">\n                'ggplot' \n               <span style="color: black">\n                 ) \n                
  \n                
 \n                <span style="color: #808080;font-style: italic">\n                  #Load the dataset \n                 
data  \n                 <span style="color: #66cc66">\n                   = pd. \n                  <span style="color: black">\n                    read_csv \n                   <span style="color: black">\n                     ( \n                    <span style="color: #483d8b">\n                      'diabetes.csv' \n                     <span style="color: black">\n                       ) \n                      
data. \n                      <span style="color: black">\n                        head \n                       <span style="color: black">\n                         ( \n                        <span style="color: black">\n                          ) \n                        </span style="color: black"> \n                       </span style="color: black"> \n                      </span style="color: black"> \n                     </span style="color: black"> \n                    </span style="color: #483d8b"> \n                   </span style="color: black"> \n                  </span style="color: black"> \n                 </span style="color: #66cc66"> \n                </span style="color: #808080;font-style: italic"> \n               </span style="color: black"> \n              </span style="color: #483d8b"> \n             </span style="color: black"> \n            </span style="color: black"> \n           </span style="color: black"> \n          </span style="color: #ff7700;font-weight:bold"> \n         </span style="color: black"> \n        </span style="color: #ff7700;font-weight:bold"> \n       </span style="color: #ff7700;font-weight:bold"> \n      </span style="color: #ff7700;font-weight:bold">\n     \n     \n     </span style="color: #ff7700;font-weight:bold">\n    \n    \n    </span style="color: #ff7700;font-weight:bold">\n   \n   \n   </span style="color: #808080;font-style: italic">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

Step 2 – Perform feature scaling

We will use StandardScaler() to perform the task of Standardization to have a common scale while building our classifier:

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #ff7700;font-weight:bold">\n    \n    \n    from sklearn.\n    \n    \n    <span style="color: black">\n     \n     \n     preprocessing \n     \n     \n     <span style="color: #ff7700;font-weight:bold">\n      \n      \n      import StandardScaler\n      \n      \n      
 \n      \n      \n      
\n      \n      \n      <span style="color: #808080;font-style: italic">\n        #Perform feature scaling \n       
sc_X  \n       <span style="color: #66cc66">\n         = StandardScaler \n        <span style="color: black">\n          ( \n         <span style="color: black">\n           ) \n          
X  \n          <span style="color: #66cc66">\n            =  pd. \n           <span style="color: black">\n             DataFrame \n            <span style="color: black">\n              (sc_X. \n             <span style="color: black">\n               fit_transform \n              <span style="color: black">\n                (data. \n               <span style="color: black">\n                 drop \n                <span style="color: black">\n                  ( \n                 <span style="color: black">\n                   [ \n                  <span style="color: #483d8b">\n                    'Outcome' \n                   <span style="color: black">\n                     ] \n                    <span style="color: #66cc66">\n                      ,axis  \n                     <span style="color: #66cc66">\n                       =  \n                      <span style="color: #ff4500">\n                        1 \n                       <span style="color: black">\n                         ). \n                        <span style="color: black">\n                          values \n                         <span style="color: black">\n                           ) \n                          <span style="color: black">\n                            ) \n                           
y  \n                           <span style="color: #66cc66">\n                             = data \n                            <span style="color: black">\n                              [ \n                             <span style="color: #483d8b">\n                               'Outcome' \n                              <span style="color: black">\n                                ]. \n                               <span style="color: black">\n                                 values \n                               </span style="color: black"> \n                              </span style="color: black"> \n                             </span style="color: #483d8b"> \n                            </span style="color: black"> \n                           </span style="color: #66cc66"> \n                          </span style="color: black"> \n                         </span style="color: black"> \n                        </span style="color: black"> \n                       </span style="color: black"> \n                      </span style="color: #ff4500"> \n                     </span style="color: #66cc66"> \n                    </span style="color: #66cc66"> \n                   </span style="color: black"> \n                  </span style="color: #483d8b"> \n                 </span style="color: black"> \n                </span style="color: black"> \n               </span style="color: black"> \n              </span style="color: black"> \n             </span style="color: black"> \n            </span style="color: black"> \n           </span style="color: black"> \n          </span style="color: #66cc66"> \n         </span style="color: black"> \n        </span style="color: black"> \n       </span style="color: #66cc66"> \n      </span style="color: #808080;font-style: italic">\n     \n     \n     </span style="color: #ff7700;font-weight:bold">\n    \n    \n    </span style="color: black">\n   \n   \n   </span style="color: #ff7700;font-weight:bold">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

Step 3 – Perform label encoding

We will use LabelEncoder() to convert the target variable class labels into a numeric form so they become machine-readable:

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #ff7700;font-weight:bold">\n    \n    \n    from sklearn.\n    \n    \n    <span style="color: black">\n     \n     \n     preprocessing \n     \n     \n     <span style="color: #ff7700;font-weight:bold">\n      \n      \n      import LabelEncoder\n      \n      \n      
 \n      \n      \n      
\n      \n      \n      <span style="color: #808080;font-style: italic">\n        #Perform label encoding \n       
le  \n       <span style="color: #66cc66">\n         = LabelEncoder \n        <span style="color: black">\n          ( \n         <span style="color: black">\n           ) \n          
y  \n          <span style="color: #66cc66">\n            = le. \n           <span style="color: black">\n             fit_transform \n            <span style="color: black">\n              (y \n             <span style="color: black">\n               ) \n             </span style="color: black"> \n            </span style="color: black"> \n           </span style="color: black"> \n          </span style="color: #66cc66"> \n         </span style="color: black"> \n        </span style="color: black"> \n       </span style="color: #66cc66"> \n      </span style="color: #808080;font-style: italic">\n     \n     \n     </span style="color: #ff7700;font-weight:bold">\n    \n    \n    </span style="color: black">\n   \n   \n   </span style="color: #ff7700;font-weight:bold">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

Step 4 – Split the data into training and testing sets

We will split the data into 70% training and 30% testing sets:

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #ff7700;font-weight:bold">\n    \n    \n    from sklearn.\n    \n    \n    <span style="color: black">\n     \n     \n     model_selection \n     \n     \n     <span style="color: #ff7700;font-weight:bold">\n      \n      \n      import train_test_split\n      \n      \n      
 \n      \n      \n      
\n      \n      \n      <span style="color: #808080;font-style: italic">\n        #Splitting the Data into Training and Testing Dataset \n       
X_train \n       <span style="color: #66cc66">\n         , X_test \n        <span style="color: #66cc66">\n          , y_train \n         <span style="color: #66cc66">\n           , y_test  \n          <span style="color: #66cc66">\n            = train_test_split \n           <span style="color: black">\n             (X \n            <span style="color: #66cc66">\n              ,y \n             <span style="color: #66cc66">\n               ,test_size \n              <span style="color: #66cc66">\n                = \n               <span style="color: #ff4500">\n                 0.3 \n                <span style="color: #66cc66">\n                  ,random_state \n                 <span style="color: #66cc66">\n                   = \n                  <span style="color: #ff4500">\n                    42 \n                   <span style="color: #66cc66">\n                     , stratify \n                    <span style="color: #66cc66">\n                      =y \n                     <span style="color: black">\n                       ) \n                     </span style="color: black"> \n                    </span style="color: #66cc66"> \n                   </span style="color: #66cc66"> \n                  </span style="color: #ff4500"> \n                 </span style="color: #66cc66"> \n                </span style="color: #66cc66"> \n               </span style="color: #ff4500"> \n              </span style="color: #66cc66"> \n             </span style="color: #66cc66"> \n            </span style="color: #66cc66"> \n           </span style="color: black"> \n          </span style="color: #66cc66"> \n         </span style="color: #66cc66"> \n        </span style="color: #66cc66"> \n       </span style="color: #66cc66"> \n      </span style="color: #808080;font-style: italic">\n     \n     \n     </span style="color: #ff7700;font-weight:bold">\n    \n    \n    </span style="color: black">\n   \n   \n   </span style="color: #ff7700;font-weight:bold">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

Step 5 – Fit the KNN Classifier model

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #ff7700;font-weight:bold">\n    \n    \n    from sklearn.\n    \n    \n    <span style="color: black">\n     \n     \n     neighbors \n     \n     \n     <span style="color: #ff7700;font-weight:bold">\n      \n      \n      import KNeighborsClassifier\n      \n      \n      
 \n      \n      \n      
\n      \n      \n      <span style="color: #808080;font-style: italic">\n        #Setup arrays to store training and test accuracies \n       
neighbors  \n       <span style="color: #66cc66">\n         = np. \n        <span style="color: black">\n          arange \n         <span style="color: black">\n           ( \n          <span style="color: #ff4500">\n            1 \n           <span style="color: #66cc66">\n             , \n            <span style="color: #ff4500">\n              9 \n             <span style="color: black">\n               ) \n              
train_accuracy  \n              <span style="color: #66cc66">\n                =np. \n               <span style="color: black">\n                 empty \n                <span style="color: black">\n                  ( \n                 <span style="color: #008000">\n                   len \n                  <span style="color: black">\n                    (neighbors \n                   <span style="color: black">\n                     ) \n                    <span style="color: black">\n                      ) \n                     
test_accuracy  \n                     <span style="color: #66cc66">\n                       = np. \n                      <span style="color: black">\n                        empty \n                       <span style="color: black">\n                         ( \n                        <span style="color: #008000">\n                          len \n                         <span style="color: black">\n                           (neighbors \n                          <span style="color: black">\n                            ) \n                           <span style="color: black">\n                             ) \n                            
  \n                            
 \n                            <span style="color: #ff7700;font-weight:bold">\n                              for i \n                             <span style="color: #66cc66">\n                               ,k  \n                              <span style="color: #ff7700;font-weight:bold">\n                                in  \n                               <span style="color: #008000">\n                                 enumerate \n                                <span style="color: black">\n                                  (neighbors \n                                 <span style="color: black">\n                                   ): \n                                  
     \n                                  <span style="color: #808080;font-style: italic">\n                                    #Setup a knn classifier with k neighbors \n                                   
    knn  \n                                   <span style="color: #66cc66">\n                                     = KNeighborsClassifier \n                                    <span style="color: black">\n                                      (n_neighbors \n                                     <span style="color: #66cc66">\n                                       =k \n                                      <span style="color: black">\n                                        ) \n                                       
     \n                                       
     \n                                       <span style="color: #808080;font-style: italic">\n                                         #Fit the model \n                                        
    knn. \n                                        <span style="color: black">\n                                          fit \n                                         <span style="color: black">\n                                           (X_train \n                                          <span style="color: #66cc66">\n                                            , y_train \n                                           <span style="color: black">\n                                             ) \n                                            
     \n                                            
     \n                                            <span style="color: #808080;font-style: italic">\n                                              #Compute accuracy on the training set \n                                             
    train_accuracy \n                                             <span style="color: black">\n                                               [i \n                                              <span style="color: black">\n                                                ]  \n                                               <span style="color: #66cc66">\n                                                 = knn. \n                                                <span style="color: black">\n                                                  score \n                                                 <span style="color: black">\n                                                   (X_train \n                                                  <span style="color: #66cc66">\n                                                    , y_train \n                                                   <span style="color: black">\n                                                     ) \n                                                    
     \n                                                    
     \n                                                    <span style="color: #808080;font-style: italic">\n                                                      #Compute accuracy on the test set \n                                                     
    test_accuracy \n                                                     <span style="color: black">\n                                                       [i \n                                                      <span style="color: black">\n                                                        ]  \n                                                       <span style="color: #66cc66">\n                                                         = knn. \n                                                        <span style="color: black">\n                                                          score \n                                                         <span style="color: black">\n                                                           (X_test \n                                                          <span style="color: #66cc66">\n                                                            , y_test \n                                                           <span style="color: black">\n                                                             )  \n                                                           </span style="color: black"> \n                                                          </span style="color: #66cc66"> \n                                                         </span style="color: black"> \n                                                        </span style="color: black"> \n                                                       </span style="color: #66cc66"> \n                                                      </span style="color: black"> \n                                                     </span style="color: black"> \n                                                    </span style="color: #808080;font-style: italic"> \n                                                   </span style="color: black"> \n                                                  </span style="color: #66cc66"> \n                                                 </span style="color: black"> \n                                                </span style="color: black"> \n                                               </span style="color: #66cc66"> \n                                              </span style="color: black"> \n                                             </span style="color: black"> \n                                            </span style="color: #808080;font-style: italic"> \n                                           </span style="color: black"> \n                                          </span style="color: #66cc66"> \n                                         </span style="color: black"> \n                                        </span style="color: black"> \n                                       </span style="color: #808080;font-style: italic"> \n                                      </span style="color: black"> \n                                     </span style="color: #66cc66"> \n                                    </span style="color: black"> \n                                   </span style="color: #66cc66"> \n                                  </span style="color: #808080;font-style: italic"> \n                                 </span style="color: black"> \n                                </span style="color: black"> \n                               </span style="color: #008000"> \n                              </span style="color: #ff7700;font-weight:bold"> \n                             </span style="color: #66cc66"> \n                            </span style="color: #ff7700;font-weight:bold"> \n                           </span style="color: black"> \n                          </span style="color: black"> \n                         </span style="color: black"> \n                        </span style="color: #008000"> \n                       </span style="color: black"> \n                      </span style="color: black"> \n                     </span style="color: #66cc66"> \n                    </span style="color: black"> \n                   </span style="color: black"> \n                  </span style="color: black"> \n                 </span style="color: #008000"> \n                </span style="color: black"> \n               </span style="color: black"> \n              </span style="color: #66cc66"> \n             </span style="color: black"> \n            </span style="color: #ff4500"> \n           </span style="color: #66cc66"> \n          </span style="color: #ff4500"> \n         </span style="color: black"> \n        </span style="color: black"> \n       </span style="color: #66cc66"> \n      </span style="color: #808080;font-style: italic">\n     \n     \n     </span style="color: #ff7700;font-weight:bold">\n    \n    \n    </span style="color: black">\n   \n   \n   </span style="color: #ff7700;font-weight:bold">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

Step 6 – Generate an accuracy plot

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #808080;font-style: italic">\n    \n    \n    #Generate plot\n    \n    \n    
plt.\n    \n    \n    <span style="color: black">\n     \n     \n     title\n     \n     \n     <span style="color: black">\n      \n      \n      (\n      \n      \n      <span style="color: #483d8b">\n        'k-NN Varying number of neighbors' \n       <span style="color: black">\n         ) \n        
plt. \n        <span style="color: black">\n          plot \n         <span style="color: black">\n           (neighbors \n          <span style="color: #66cc66">\n            , test_accuracy \n           <span style="color: #66cc66">\n             , label \n            <span style="color: #66cc66">\n              = \n             <span style="color: #483d8b">\n               'Testing Accuracy' \n              <span style="color: black">\n                ) \n               
plt. \n               <span style="color: black">\n                 plot \n                <span style="color: black">\n                  (neighbors \n                 <span style="color: #66cc66">\n                   , train_accuracy \n                  <span style="color: #66cc66">\n                    , label \n                   <span style="color: #66cc66">\n                     = \n                    <span style="color: #483d8b">\n                      'Training accuracy' \n                     <span style="color: black">\n                       ) \n                      
plt. \n                      <span style="color: black">\n                        legend \n                       <span style="color: black">\n                         ( \n                        <span style="color: black">\n                          ) \n                         
plt. \n                         <span style="color: black">\n                           xlabel \n                          <span style="color: black">\n                            ( \n                           <span style="color: #483d8b">\n                             'Number of neighbors' \n                            <span style="color: black">\n                              ) \n                             
plt. \n                             <span style="color: black">\n                               ylabel \n                              <span style="color: black">\n                                ( \n                               <span style="color: #483d8b">\n                                 'Accuracy' \n                                <span style="color: black">\n                                  ) \n                                 
plt. \n                                 <span style="color: black">\n                                   show \n                                  <span style="color: black">\n                                    ( \n                                   <span style="color: black">\n                                     ) \n                                   </span style="color: black"> \n                                  </span style="color: black"> \n                                 </span style="color: black"> \n                                </span style="color: black"> \n                               </span style="color: #483d8b"> \n                              </span style="color: black"> \n                             </span style="color: black"> \n                            </span style="color: black"> \n                           </span style="color: #483d8b"> \n                          </span style="color: black"> \n                         </span style="color: black"> \n                        </span style="color: black"> \n                       </span style="color: black"> \n                      </span style="color: black"> \n                     </span style="color: black"> \n                    </span style="color: #483d8b"> \n                   </span style="color: #66cc66"> \n                  </span style="color: #66cc66"> \n                 </span style="color: #66cc66"> \n                </span style="color: black"> \n               </span style="color: black"> \n              </span style="color: black"> \n             </span style="color: #483d8b"> \n            </span style="color: #66cc66"> \n           </span style="color: #66cc66"> \n          </span style="color: #66cc66"> \n         </span style="color: black"> \n        </span style="color: black"> \n       </span style="color: black"> \n      </span style="color: #483d8b">\n     \n     \n     </span style="color: black">\n    \n    \n    </span style="color: black">\n   \n   \n   </span style="color: #808080;font-style: italic">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

From the above plot, we can observe that we get the maximum testing accuracy for K=7 and K=8. As discussed above, we will choose the odd value for K.

So, let’s create a KNeighborsClassifier with the nearest number of neighbors as 7:

Step 8 – Fit the Classifier and get the Accuracy Score

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #808080;font-style: italic">\n    \n    \n    #Fit the model\n    \n    \n    
knn.\n    \n    \n    <span style="color: black">\n     \n     \n     fit\n     \n     \n     <span style="color: black">\n      \n      \n      (X_train\n      \n      \n      <span style="color: #66cc66">\n        ,y_train \n       <span style="color: black">\n         ) \n        
y_pred  \n        <span style="color: #66cc66">\n          = knn. \n         <span style="color: black">\n           predict \n          <span style="color: black">\n            (X_test \n           <span style="color: black">\n             ) \n            
  \n            
 \n            <span style="color: #808080;font-style: italic">\n              #Calculating Model Accuracy \n             
 \n             <span style="color: #ff7700;font-weight:bold">\n               from sklearn. \n              <span style="color: black">\n                metrics  \n               <span style="color: #ff7700;font-weight:bold">\n                 import accuracy_score \n                
 \n                <span style="color: #ff7700;font-weight:bold">\n                  print \n                 <span style="color: black">\n                   ( \n                  <span style="color: #483d8b">\n                    "Accuracy of test set=" \n                   <span style="color: #66cc66">\n                     ,accuracy_score \n                    <span style="color: black">\n                      (y_test \n                     <span style="color: #66cc66">\n                       , y_pred \n                      <span style="color: black">\n                        )* \n                       <span style="color: #ff4500">\n                         100 \n                        <span style="color: black">\n                          ) \n                        </span style="color: black"> \n                       </span style="color: #ff4500"> \n                      </span style="color: black"> \n                     </span style="color: #66cc66"> \n                    </span style="color: black"> \n                   </span style="color: #66cc66"> \n                  </span style="color: #483d8b"> \n                 </span style="color: black"> \n                </span style="color: #ff7700;font-weight:bold"> \n               </span style="color: #ff7700;font-weight:bold"> \n              </span style="color: black"> \n             </span style="color: #ff7700;font-weight:bold"> \n            </span style="color: #808080;font-style: italic"> \n           </span style="color: black"> \n          </span style="color: black"> \n         </span style="color: black"> \n        </span style="color: #66cc66"> \n       </span style="color: black"> \n      </span style="color: #66cc66">\n     \n     \n     </span style="color: black">\n    \n    \n    </span style="color: black">\n   \n   \n   </span style="color: #808080;font-style: italic">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

Step 9 – Create a confusion matrix

A confusion matrix describes the performance of a classifier model on a set of test data for which the true values are known. We will calculate the confusion matrix using the confusion_matrix() method of Scikit-learn:

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #808080;font-style: italic">\n    \n    \n    #Confusion Matrix\n    \n    \n    
confusion_matrix \n    \n    \n    <span style="color: #66cc66">\n     \n     \n     = confusion_matrix\n     \n     \n     <span style="color: black">\n      \n      \n      (y_test\n      \n      \n      <span style="color: #66cc66">\n        , y_pred \n       <span style="color: black">\n         ) \n        
confusion_matrix \n       </span style="color: black"> \n      </span style="color: #66cc66">\n     \n     \n     </span style="color: black">\n    \n    \n    </span style="color: #66cc66">\n   \n   \n   </span style="color: #808080;font-style: italic">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

Considering the obtained confusion matrix, we have:

True negative = 131
False positive = 19
True positive = 42
False negative = 39

Step 10 – Perform cross-validation

Cross-validation is a technique to evaluate predictive models by dividing the dataset into a training set to train the model, and a test set to evaluate it.

We will be using the Scikit-learn’s function called GridSearchCV i.e., Grid Search cross-validation, as shown:

\n  \n  \n  <pre class="python" style="font-family:monospace">\n   \n   \n   <span style="color: #ff7700;font-weight:bold">\n    \n    \n    from sklearn.\n    \n    \n    <span style="color: black">\n     \n     \n     model_selection \n     \n     \n     <span style="color: #ff7700;font-weight:bold">\n      \n      \n      import GridSearchCV\n      \n      \n      
 \n      \n      \n      
\n      \n      \n      <span style="color: #808080;font-style: italic">\n        #In KNN, the parameter to be tuned is n_neighbors \n       
param_grid  \n       <span style="color: #66cc66">\n         =  \n        <span style="color: black">\n          { \n         <span style="color: #483d8b">\n           'n_neighbors':np. \n          <span style="color: black">\n            arange \n           <span style="color: black">\n             ( \n            <span style="color: #ff4500">\n              1 \n             <span style="color: #66cc66">\n               , \n              <span style="color: #ff4500">\n                50 \n               <span style="color: black">\n                 ) \n                <span style="color: black">\n                  } \n                 
  \n                 
knn  \n                 <span style="color: #66cc66">\n                   = KNeighborsClassifier \n                  <span style="color: black">\n                    ( \n                   <span style="color: black">\n                     ) \n                    
knn_cv \n                    <span style="color: #66cc66">\n                      = GridSearchCV \n                     <span style="color: black">\n                       (knn \n                      <span style="color: #66cc66">\n                        ,param_grid \n                       <span style="color: #66cc66">\n                         ,cv \n                        <span style="color: #66cc66">\n                          = \n                         <span style="color: #ff4500">\n                           5 \n                          <span style="color: black">\n                            ) \n                           
knn_cv. \n                           <span style="color: black">\n                             fit \n                            <span style="color: black">\n                              (X \n                             <span style="color: #66cc66">\n                               ,y \n                              <span style="color: black">\n                                ) \n                              </span style="color: black"> \n                             </span style="color: #66cc66"> \n                            </span style="color: black"> \n                           </span style="color: black"> \n                          </span style="color: black"> \n                         </span style="color: #ff4500"> \n                        </span style="color: #66cc66"> \n                       </span style="color: #66cc66"> \n                      </span style="color: #66cc66"> \n                     </span style="color: black"> \n                    </span style="color: #66cc66"> \n                   </span style="color: black"> \n                  </span style="color: black"> \n                 </span style="color: #66cc66"> \n                </span style="color: black"> \n               </span style="color: black"> \n              </span style="color: #ff4500"> \n             </span style="color: #66cc66"> \n            </span style="color: #ff4500"> \n           </span style="color: black"> \n          </span style="color: black"> \n         </span style="color: #483d8b"> \n        </span style="color: black"> \n       </span style="color: #66cc66"> \n      </span style="color: #808080;font-style: italic">\n     \n     \n     </span style="color: #ff7700;font-weight:bold">\n    \n    \n    </span style="color: black">\n   \n   \n   </span style="color: #ff7700;font-weight:bold">\n  \n  \n  </pre class="python" style="font-family:monospace">
Copy code

knn_cv.best_score_

knn_cv.best_params_

So, our KNN classifier with 17 nearest neighbors achieves the best accuracy score of about 76%.

Pros and Cons of KNN

Pros –

Simple and intuitive algorithm.
It is a non-parametric algorithm which means it does not need any assumptions to implement.
Evolves constantly which allows the algorithm to respond quickly to real-time input changes.
Works well with multiclass data in classification problems.
Works equally well with regression problems.
We can use it to implement non-linear tasks.

Cons –

It is a slow algorithm.
Sensitive to outliers.
Suffers from the curse of dimensionality
The dataset should have homogenous features for the algorithm to predict accurately.
No capability to deal with missing values in the data.

How to Become a Machine Learning Expert in 9 Months

Learning machine learning is critical because it opens the door to developing cutting-edge applications in cybersecurity, facial recognition, and other fields. This article aims to guide you through the process...read more

Read Later

Handling missing values: Beginners Tutorial

We take data from sometimes sources like kaggle.com, sometimes we collect from different sources by doing web scrapping containing missing values in it. We take data from sometimes sources like...read more

Read Later

Regression Analysis in Machine Learning

In this article, we will discuss Regression analysis in Machine Learning which is one of the important concepts used in building machine learning models.

Read Later

Conclusion

The KNN algorithm in machine learning is a simple, yet versatile supervised algorithm that can be used to solve both classification and regression problems. Machine learning & intelligence are rapidly growing areas in the IT industry and have a huge impact on big businesses across the globe.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski Read Full Bio

KNN Algorithm in Machine Learning

Table of Content

Best-suited Machine Learning courses for you

MCA in Machine Learning & Artificial Intelligence (ML & AI) (Online MCA)

Quick Introduction to KNN

How to find the optimal value of K?

How to find the K Nearest Neighbors?

How KNN Works?

Building a KNN Classifier in Python

Pros and Cons of KNN

Pros –

Cons –

Conclusion

Comments

Top Picks & New Arrivals