Top libraries that you should know to master Machine Learning
The study of computer algorithms that improve automatically through experience and with the help of data is called Machine Learning. There are vast real time projects that use this domain. To support this learning process, a wide range of libraries are required.
What is a library?
It is a collection of non-volatile resources that is often used for development. Based on the purpose of learning, the required libraries are chosen in Machine Learning. Are you a newbie in the Machine Learning domain? Then you have to know the frequently used libraries and their purpose. For reading a dataset, listing the output, building a model and performing an EDA, there are predefined libraries in python to support a certain process. So, knowing the perfect library to be used is much important for an effective output.
How to choose a library?
1.Analyze your dataset
- Identify the independent variables and the target.
- Find the type of data — labeled or unlabeled
2. Analyze the requirements to be satisfied
3. List down the steps to be done
Generally, the steps include:
- Reading the dataset
- Check for null values
- Fill the null values
- Split the dataset into x and y
- Choose a suitable model
- Prepare train and test data
- Build the model / Train the model
- Test the Model using test data
- Check for accuracy
- Optimize the model
4. Match the libraries that satisfy each step
5. If more than one library is available, then compare their Accuracy and optimistic feature.
Let us list down some basic machine learning libraries with their purposes.
NumPy:
It is an array processing package that can handle large, complex multi-dimensional arrays and matrices. NumPy can also serve as an efficient multi-dimensional container for any generic data that is in any data type. You can look into the practical usage of this library in the link below.
https://github.com/manish2509/Machine-Learning-Libraries/blob/main/Numpy.ipynb
Pandas:
It is a library used for data analytics. It is developed for the data extraction and preparation process. It has a wide range of in-built functions that supports filtering, joining, and grouping of data.
Basic operations done by pandas include
- Creating a dataframe
- Dealing with rows and columns in a dataframe
- Indexing data
- Iterating over rows and columns
- Working with missing data
A practical use case of this library is shown in the link below.
https://github.com/manish2509/Machine-Learning-Libraries/blob/main/Pandas.ipynb
Scipy:
It stands for Scientific Python. It is a scientific computation library that provides more functions for optimization and statistics. It acts as an extension of Numpy with optimistic feature. The basic sata structure used by Scipy is multi-dimensional array. It provides the user with high-level commands and classes for manipulating and visualizing data.
Refer: https://www.w3schools.com/python/scipy/index.php
Matplotlib:
It is a treasure for Data visualization. Data can be visualized using
- Line chart
It displays the data points in space that are connected by a straight line.
2. Bar chart
It is used to represent categorical data with rectangular bars with its heights proportional to the values that they represent.
3. Pie chart
It is a circular statistical graphic that is used to illustrate some numerical values.
4. Histogram
It is a distributed representation of numerical data in graphical space.
5. Scatter charts
It is a type of plot that uses Cartesian Coordinates to display values for two variables for a set of data. You can learn how to implement the graphs one by one using the below link.
Seaborn:
It is a data visualizing library based on Matplotib. The additive feature of seaborn is the interactive version of graphs. Behind the scenes, seaborn uses matplotlib to draw its plots. For interactive work, it’s recommended to use a Jupyter/IPython interface in matplotlib mode, or else you’ll have to call matplotlib.pyplot.show() when you want to see the plot. A combination of seaborn’s high-level interface and matplotlib’s deep customizability will allow you both to quickly explore your data and to create graphics that can be tailored into a publication quality final product.
Scikit Learn (sklearn):
It is a package that supports a range of supervised and unsupervised learning. It is developed for model building. Algorithms like Linear and Logistic Regression, Decision Tree, Random Forest, K-means, Support Vector Machine are build using the Scikit library.
OpenCV:
It is a library that supports algorithms related to Computer Vision. Image processing, Object Detection, and Video analysis are some real-time use cases of this package. You can have a hands-on real time basic face Detection project using OpenCV in the link below.
The figure above illustrates the process of face detection
- Haar Cascade Classifier is a predefined model that contains all the features of face.
- An image from user end is taken and converted into array of pixels.
- This image is compared with the predefined model and the face is detected.
https://github.com/manish2509/Face-Detection
NLTK:
It is a library for Natural Language Processing. It contains text processing libraries for tokenization, parsing, classification, stemming, tagging, and semantic reasoning. Summarization is one of the real-time use cases of this package. Here is a project that does Summarization of text using NLTK algorithm is given in the link below.
https://github.com/manish2509/Notes-Summarization
TensorFlow:
It is a Machine Learning Library that is used for large-scale computations and deployment. Real time projects like Image Recognition, Video detection, Voice/Sound Recognition, Text-Based Applications, Recommendation systems are done using Tensorflow.
Keras:
Keras follows best practices for reducing cognitive load. It offers consistent and simple APIs. It acts as an interface for TensorFlow library.
Take a look into Image classification project by Google using Tensorflow and Keras Libraries: https://github.com/manish2509/Machine-Learning-Libraries/blob/main/Image_classification.ipynb
Go ahead and dig in more.
Happy Learning!
-Manishma Sundararajan