Machine Learning Project development Step 2: Choice of algorithm based on Supervised Learning Algorithm:
Machine Learning Project development Step 2: Choice of algorithm
based on Supervised Learning Algorithm:
First step2: Machine learning
project development: Choose an algorithm if the project is coming under supervised
learning
There are many supervised algorithms. A supervised learning algorithm is divided into regression and classification. If the output is discrete or binary choose the classification algorithm and if it is continuous choose regression. So step 2 is --
Identify whether the problem is coming under regression or classification?. Choose the algorithm accordingly.
Different algorithms are used for regression and classification problems. A few of them are listed below.
Algorithm for
regression analysis
1. Linear Regression
2. Decision tree regression
3. Random Forest
4. Gradient boosted trees
5. Neural Network
Algorithm for
classification are :
1. Logistic Regression
2. Naïve Bayes
3. Stochastic Gradient Decent
4. K-Nearest Neighbors
5. Decision Tree
6. Random Forest
7. Support Vector Machine
Criteria to choose
the algorithm:
1. Amount of data set: i)No of samples. ii)Number of features in one sample
2. Non-linearity present in the data
Case1: If the amount of data is less and (no of features) and less non-linearity a traditional algorithm such as linear regression, logistic regression Decision the tree will work better.
Majority of the cases Artificial Neural Network with different size network will provide good result
As data size goes on increasing performance goes on reducing and need to build a big network to improve results.
Case2: Data set is an essential requirement of the machine learning algorithm. Companies, Govt offices, the Medical sector, The education sector, the Business sector produces a huge amount of data every day. With the help of these data, machine learning models are created. All generated data is in raw form. Directly raw data is of no use and needs to apply data analysis tools. Data analysis is the process of collecting, cleaning, and transforming data into useful information
Various data analysis tools are available. The languages of R and Python (with no prior programming experience required), how to create data visualizations with Tableau, and apply by applying statistics and analytics we can find out nonlinearities in it, which will be useful to make the choice of algorithm
As the data size goes on increasing performance of the net goes on reducing. Nowadays as the speed of a computer is not an issue and therefore ANN is the best choice. In ANN, network size goes on increasing as data size increases and the performance of the model also goes on increasing. For too large data set Deep Learning is the most preferred choice.
(Ref.:Diagram is taken from ( Coursera course of ANN)
Comments
Post a Comment