According to Wikipedia, the activation function of a node defines the output of that node given an input or set of inputs in terms of an artificial neural network. A standard integrated circuit can be seen as a digital network of activation functions that can be “ON” (1) or “OFF” (0), depending on the input.

There are different types of activation functions

· Binary Step · Linear · Sigmoid · Tanh · ReLU · Leaky ReLU · Parameterised ReLU· Exponential Linear Unit· Swish · Softmax

Today, we will discuss more on the common type of activation, sigmoid function.

**What is the Sigmoid Function?**

The sigmoid function is a mathematical function having a characteristic “S” — shaped curve, which transforms the values between the range 0 and 1. The sigmoid function also called the sigmoidal curve or logistic function. It is one of the most widely used non- linear activation function.

The mathematical expression for sigmoid:

**Graph**

In the above graph, if the value of x goes to positive infinity then the predicted value of y will become 1 and if it goes to negative infinity then the predicted value of y will become 0.

I assume you know the logistic regression, which is the common algorithm used for binary classification or when the value of the target variable is categorical in nature. Logit function or sigmoid is used to predict the probabilities of a binary outcome. For example, we use logistic regression for classification in spam detection, fraud detection etc.

**Let’s walk through one of the examples of implementing Logistic Regression in Python.**

First, we have to import the required libraries.

let’s load the dataset, today we are going to use with some doctors on heart attack patients. We copied the data from here to the excel sheet.

Our target is to predict whether the patient will have had a second heart attack within 1 year (yes = 1). We have two independent variables; one is whether the patient completed a treatment consisting of anger control practices (yes=1). The other IV is a score on a trait anxiety scale (a higher score means more anxious).

I assume you do have a basic understanding of training the datasets. Now, we will use the *“train_test_split”* function in Scikit-learn to split the data into training and testing sections.

We have used the StandardScaler from Scikit-Learn to rescale the data which brings values that may have extremely different ranges or units.

Now, we’ll import the logistic regression algorithm from sci-kit-learn and feed into the logistic regression and then create an instance of the classifier and fit it to the training data.

Once we train and fit the data, we will be able to see the coefficient and intercept for the respective variable.

Below is the z (equation) in the sigmoid function with respective coefficient and intercept.

Calculation of z for the different variable for the test data using an equation from above.

Even, we can get the decision function or z, we calculated above with the decision function from the logistic_clf variable.

We will define the sigmoid function as below.

We already know that sigmoid function will convert real value between 0 and 1. We can test that scenario with this function with our calculated z or decision function. For example, we calculated z1 as -1.061 which is not between 0 and 1. Below we can see it was converted to 0.052.

Else, we can use predict_prob class:

Here, we use the sigmoid or logit function to map predicted values to probabilities. The functions will map any real value into another value which will be between 0 and 1 or in other word predictions to probabilities.

**Decision boundary**

What after the conversion to probabilities value, from our sigmoid function? To map the binary class, we will decide some threshold value, which varies on the use case, to classify the values into two different categories. The common threshold is 0.5.

**Making Predictions**

Using our knowledge of sigmoid functions and decision boundaries, we can now write a prediction function. A prediction function in logistic regression returns the probability of our observation being positive, True, or “Yes”.

For example, if our threshold was .5 and our prediction function returned .7, we would classify the observation as a person will have a 2nd heart attack with a 70% chance. If our prediction was .2 we would classify the observation as a person will have a 2nd heart attack with a 20% chance. As the probability gets closer to 1, our model is more confident that the observation being positive.