Jan 25, 20242 min read

Iris Flower Dataset Classification with K-Nearest Neighbors

Introduction:

In the vast landscape of machine learning, the classification of iris flowers based on their sepal and petal measurements is a quintessential challenge. In this blog post, we'll embark on a journey through a Python code snippet that leverages the simplicity and effectiveness of the K-Nearest Neighbors (KNN) algorithm. By utilizing the scikit-learn library, we'll explore how KNN can gracefully classify iris flowers, unraveling the intricacies of the code and the underlying principles of this intuitive and versatile algorithm.

Libraries Used:

The code employs various modules from scikit-learn, with a specific focus on the K-Nearest Neighbors classifier.

1. scikit-learn: A comprehensive machine learning library, scikit-learn provides tools for data analysis, model building, and evaluation.

2. K-Nearest Neighbors (KNN): KNN is a versatile algorithm used for classification and regression tasks.

3. Iris Dataset: The Iris dataset is a classic dataset for machine learning, often used for classification tasks.

Code Explanation:

# Import necessary modules

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.neighbors import KNeighborsClassifier

# Load the Iris dataset

iris = load_iris()

X = iris.data

y = iris.target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Initialize a K-Nearest Neighbors classifier with 5 neighbors

clf = KNeighborsClassifier(n_neighbors=5)

# Train the classifier on the training data

clf.fit(X_train, y_train)

# Make predictions on the test data

y_pred = clf.predict(X_test)

# Print the accuracy score of the classifier

print(accuracy_score(y_test, y_pred))

Explanation:

1. Loading the Dataset: Our exploration commences with loading the Iris dataset using the `load_iris` function from scikit-learn. This dataset contains measurements of sepal length, sepal width, petal length, and petal width for three species of iris flowers.

2. Data Splitting: The dataset is then split into training and testing sets using the `train_test_split` function. This ensures that the model is trained on a subset of the data and evaluated on a separate, unseen subset.

3. K-Nearest Neighbors Classifier Initialization: An instance of the K-Nearest Neighbors classifier is initialized using the `KNeighborsClassifier` class from scikit-learn. In this case, the classifier is set to consider 5 neighbors when making predictions.

4. Training the Classifier: The classifier is trained on the training data using the `fit` method. During this phase, the KNN model learns the distribution of data points in the feature space.

5. Making Predictions: Predictions are then made on the test data using the `predict` method. The KNN algorithm classifies each test point based on the majority class among its nearest neighbors.

6. Accuracy Calculation and Output: The accuracy score, indicating the percentage of correctly predicted instances, is calculated using the `accuracy_score` function from scikit-learn. The result is then printed to the console.

Conclusion:

In this exploration, we've navigated through a concise yet powerful machine learning code snippet employing the K-Nearest Neighbors algorithm for iris flower classification. KNN, with its simplicity and intuitive nature, stands as a go-to algorithm for various classification tasks. As you continue your journey in machine learning, experimenting with different algorithms and understanding their strengths will empower you to tackle diverse challenges in data classification, fostering blooms of insights and knowledge.

The link to the github repo is here.

Iris Flower Dataset Classification with K-Nearest Neighbors

Related Posts

Subscribe to get updates