# Machine Learning Simplified

Machine learning is a subset of artificial intelligence, it is the science of getting computers to act without being explicitly programmed, and is mostly just statistics. Machine learning is used to find patterns in data that you can then make predictions on. It can be subdivided into **supervised learning **and **unsupervised learning **or some mixture of both.

Machine learning is a computer program said to learn from experience ‘E’ with respect to some class of tasks ‘T’ and performance measure ‘P’, if its performance at tasks in ‘T’, as measured by ‘P’, improves with experience ‘E’.

— Tom Mitchell

# What makes machine learning so amazing ?

The great thing about machine learning is the “magic” component to it. Machine learning can find patterns and make many successful predictions like a persons movie preference, and the expected price of a house. It can do so many useful things like automatically tag faces, diagnose cancer, drive cars, detect fraud, marketing personalization, and voice recognition. There is a great book on machine learning called “Introduction to Machine Learning with Python” if you really want to dig into this subject.

**1. Diagnosis With Machine Learning****:**

Diagnostic errors contribute to about ten percent (10%) of deaths of patience according to the Institute of Medicine at the National Academies of Science, Engineering and Medicine. The following are some of the causes of the diagnostic errors:

- The lack of communication of patients with their families and clinicians.
- Failing to make the best use of collaboration and failing to integrate health information technologies also known as (Health IT).
- A healthcare work system which, does not adequately support the diagnostic process, by design.

To address these errors many researchers and many companies are using machine learning to make better medical diagnostics. Below are some current applications of Artificial Intelligence (A.I.) and Machine Learning.

**Oncology**: Researchers are using**deep learning**( part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms) to train algorithms to identify cancerous tissue as well or better than trained physicians.*(Readers with a specific focus on cancer treatments may be interested in reading a full article on**deep learning in oncology**.)***Chatbots**: Artificial Intelligent-chatbots that can recognize speech are used to find patterns in patient symptoms and form a potential diagnosis and recommend an appropriate course of action and/or prevent disease. Chatbots can be used to help mental health patients for example.**Pathology**: Pathology is the medical specialty that is concerned with the diagnosis of disease based on the laboratory analysis of bodily fluids such as blood and urine, as well as tissues. Machine vision and other machine learning technologies can enhance the efforts traditionally left only to pathologists with microscopes.**Rare Diseases**: Facial recognition software is being combined with machine learning to help clinicians diagnose rare diseases. Patient photos are analyzed using facial analysis and deep learning to detect phenotypes that correlate with rare genetic diseases.

The diagnostics market size is projected to reach $76 billion by 2023. National health expenditures are expected to have reached $3.4 trillion in 2016, and the health share of the GDP is projected to reach nearly 20 percent by 2025.

If you would like to read the full article on Diagnosis with Machine Learning, you can find it here.

## 2. Machine Learning in Fraud Detection:

Eighty three percent of businesses in North America conduct reviews manually, and on an average, they review twenty nine percent of orders manually, according to Fraud Benchmark Report by cybersource. Computers are much more accurate and faster than humans at processing extremely large sets of data. Computers are able to recognize and detect many patterns on a user’s purchasing habits, so much so that when the pattern is irregular this could be an indication of fraud. Computers can predict fraud in a large volume of transactions by applying machine learning to data. This is another reason why we use machine learning algorithms, to help prevent fraud.

# Where is machine learning being applied today?

Machine learning is being used every where today some of the most notable companies that are using it are Facebook, Google, Amazon, Apple, Netflix and FitBit.

Your personal assistant — that’s right, the likes of Siri and Google Now use machine learning, largely to better understand speech patterns. With so many people using Siri, the system is able to seriously advance in how it treats languages, accents, and so on.

Netflix uses machine learning to predict which movie or video you would like to watch to give you better video suggestions. It learns from past videos you have watched or people with similar demographics as you and based on that data, Netflix makes a prediction about the future videos you would like, maybe it is a similar genre or/and has a lot of the same cast that the other videos you have watched had. Netflix also use machine learning for the image, thumbnail or art work to display on its video recommendations. For example Netflix may try to personalize the image used to depict the movie Good Will Hunting. Here they might personalize this decision based on how much a member prefers different genres and themes. For someone who has watched a lot of romantic movies may be interested in Good Will Hunting if they show artwork containing Matt Damon and Minnie Driver on the image, whereas, a user who has watched a lot of comedies might be drawn to the movie if they use the artwork containing Robin Williams, a well-known comedian.

Amazon makes good use of machine learning by suggesting products you may like based off of your past purchases, this also allows for more targeted marketing. Amazon saw you bought Harry Potter and the Sorcerer’s Stone and may suggest Harry Potter and the Chamber of Secrets, because the data shows that you are most likely to buy that second book.

Now that Fitbit trackers are ubiquitous in the market and they’re capturing data from millions of individuals, and are leveraging machine learning to provide smart guidance as part of a personalized experience, according to an employee named Raj. Fitbit integrated Fitstar data with Fitbit device data. So, whether users have a proclivity toward cycling or running or using the elliptical or hiking, Fitbit automatically tracks those preferences and uses them to generate a custom workout for the user.

# Supervised Learning (Labeled Data)

*The Right Answer Is Given*

Supervised learning is a type of machine learning where a model / function is created from labeled data. There is a training set that has input and the desired output. In this type of learning, the correct outcome for each data point is explicitly labeled when training the model / function. This means that the learning algorithm is already given the answer when reading the data. There are two tasks of supervised learning, **classification** and **regression**.

# Classification

Classification is discrete valued output. It assigns a label. For example is this a horse or a tiger ? Another example and probably a more practical example is predicting if a tumor is malignant (harmful)or benign (not harmful).

# Support Vector Machines (SVM)

A **support vector machine** (**SVM**) is a **machine** learning algorithm that analyzes data for classification and regression analysis. SVM is a supervised learning method that looks at data and sorts it into one of two categories. An SVM outputs a map of the sorted data with the margins between the two as far apart as possible. SVMs are used in text categorization, image classification, handwriting recognition and in the sciences.

# Regression

Regression predicts a continuous numerical value. In statistical modeling, regression analysis is a set of statistical processes for estimating the relationship among variables. For example estimating the value of a house based on it’s size in square footage or predicting a persons annual income based on years of higher education. There are many regression algorithms being used such as **linear regression** and **logistic regression**.

**Linear Regression**

Linear regression is a linear model between two variables a dependent variable ‘y’ and an independent variable ‘x’. Does f(x) = mx + b look familiar ? Well that’s not only the slope intercept equation but also the equation for a linear model where f(x) is the estimated dependent variable A.K.A the prediction for the independent variable ‘x’. Linear regression is a common type of predictive analysis.

**The overall idea of regression is to examine two things: ***(1)** Does a set of predictor variables do a good job in predicting an outcome (dependent) variable?*

*(2)** Which variables in particular are significant predictors of the outcome variable, and in what way do they indicated by the magnitude and sign of the beta estimates impact the outcome variable?*

# Logistic Regression

Logistic Regression is a technique used by Machine Learning, and is similar to linear regression in that it is predictive analysis. In statistics, the logistic model is a widely used statistical model that, in its basic form, uses a logistic function a.k.a. a log function to model a binary dependent variable. **Logistic regression** is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Logistic regression is emphatically **not** a classification algorithm on its own. It is only a classification algorithm *in combination with* a decision rule that makes dichotomous the predicted probabilities of the outcome. Logistic regression *is* a regression model because it estimates the probability of class membership as a (transformation of a) multi-linear function of the features.

For example in the below picture, if our threshold was .5 and our prediction function returned .7, we would classify this observation as positive, because it is above our decision boundary. If our prediction was .2 we would classify the observation as negative because it’s below our decision boundary. For logistic regression with multiple classes we could select the class with the highest predicted probability.

Unlike linear **regression** which outputs continuous number values, **logistic regression** transforms its output using the **Sigmoid function** to return a probability value which can then be mapped to two or more discrete classes (Classification). In order to map predicted values to probabilities, we use the Sigmoid function. The Sigmoid function maps any real value into another value between 0 and 1. In machine learning, we use Sigmoid to map predictions to probabilities. A **sigmoid function** is a mathematical function having a characteristic “S”-shaped curve or **sigmoid curve**. Often, *sigmoid function* refers to the special case of the logistic function

- s(z) = output between 0 and 1 (probability estimate)
- z = input to the function (your algorithm’s prediction e.g. mx + b)

So, one of the nice properties of logistic regression is that the sigmoid function outputs the conditional probabilities of the prediction, the class probabilities. How does it work? Let’s start with the so-called “odds ratio” *p / (1 — p)*, which describes the ratio between the probability that a certain, positive or successful event occurs divided by the probability that it doesn’t occur — where positive refers to the “event that we want to predict”, i.e., *p(y=1 | x)*.

Our current prediction function (Sigmoid) returns a probability score between 0 and 1. In order to map this to a discrete class (true/false, cat/dog), we select a threshold value or tipping point above which we will classify values into class 1 and below which we classify values into class 0.

p≥0.5,class=1

p<0.5,class=0

For example, if our threshold was .5 and our prediction function returned .7, we would classify this observation as positive. If our prediction was .2 we would classify the observation as negative. For logistic regression with multiple classes we could select the class with the highest predicted probability.

# Types of logistic regression

- Binary (Pass/Fail)
- Multi (Cats, Dogs, Sheep)
- Ordinal (Low, Medium, High)

# Decision Tree

A **decision tree** is a **decision** support tool that uses a **tree**-like model of **decisions. **A** **decision tree can be used to visually and explicitly represent decisions and decision making. Think of it in programming terms as a bunch of **if and else statements. **However the tree is created from the data and decision boundaries on that data as opposed to being explicitly programmed.

# Unsupervised Learning (Unlabeled data)

*Here is the data set, can you find some structure*

# Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.

# k-means clustering

K-Means is a very well known clustering algorithm. It’s taught in a lot of introductory data science and machine learning classes. It’s easy to understand and implement in code!

- To begin, we first select a number of classes/groups to use and randomly initialize their respective center points. To figure out the number of classes to use, it’s good to take a quick look at the data and try to identify any distinct groupings. The center points are vectors of the same length as each data point vector and are the “X’s” in the graphic above.
- Each data point is classified by computing the distance between that point and each group center, and then classifying the point to be in the group whose center is closest to it.
- Based on these classified points, we recompute the group center by taking the mean of all the vectors in the group.
- Repeat these steps for a set number of iterations or until the group centers don’t change much between iterations. You can also opt to randomly initialize the group centers a few times, and then select the run that looks like it provided the best results.

K-Means has the advantage that it’s pretty fast, as all we’re really doing is computing the distances between points and group centers; very few computations! It thus has a linear complexity *O*(*n*).

## Artificial Neural Networks

Artificial neural networks are modeled on the human brain and nervous system. An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another. This Algorithm is a classification, regression, and clustering algorithm.

An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another.

A **deep neural network** (DNN) is an **artificial neural network**(ANN) **with multiple layers** between the input and output **layers**.

# Where can I Learn More

## Dive Right Into Machine Learning

Here are some of the best **free** **introductory** **courses** on the “interwebs”.

- Udacity Machine Learning Intro
- Udacity Intro to Statistics
- Udacity Intro to Data Science
- EverythingComputerScience.com

Free online YouTube Videos for CS231n Winter 2016.

Machine Learning Crash Course By Google

Andrew Ng Machine Learning Course on Coursera

https://edgylabs.com/top-10-free-deep-learning-moocs

Thanks for reading this article I hope its helpful to you all ! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you would like more mathematics, computer science, programming and algorithm analysis videos please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 ).

SOURCES: