Exploring Neural Network Activation Functions: ReLU, Sigmoid, Tanh, Softmax, Identity

Table of Contents

Neural Network Activation Functions

import numpy as np
import tensorflow as tf

# Define activation functions
relu = lambda x: tf.nn.relu(x) # ReLU
sigmoid = lambda x: tf.nn.sigmoid(x) # Sigmoid
tanh = lambda x: tf.nn.tanh(x) # Tanh
softmax = lambda x: tf.nn.softmax(x) # Softmax

# Example usage of activation functions in a neural network layer
input_layer = tf.keras.Input(shape=(784,))
hidden1 = relu(tf.keras.layers.Dense(units=512)(input_layer))
output1 = sigmoid(tf.keras.layers.Dense(units=10)(hidden1))
output2 = tanh(tf.keras.layers.Dense(units=10)(hidden1))
outputs = softmax(tf.keras.layers.Dense(units=10)(input_layer))

# Create and compile the model with selected activation functions
model = tf.keras.models.Sequential([input_layer, hidden1, output1, output2, outputs])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, epochs=10)

:CUSTOMID: neural-network-activation-functions

1. ReLU (Rectified Linear Unit) - (relu . 0)

ReLU is defined as f(x) = max(0, x)

  • Characteristics:
    • Linear for all positive values and zero for all negative values.
    • Simple to compute and derivative is easy (1 for x > 0, 0 for x < 0).
    • Helps mitigate the vanishing gradient problem.
  • Use cases:
    • Default choice for many hidden layers in neural networks.
    • Works well in convolutional neural networks.

2. Sigmoid - (sigmoid . 1)

Sigmoid is defined as f(x) = 1 / (1 + e^(-x))

  • Characteristics:
    • Outputs values between 0 and 1.
    • S-shaped curve, smoothly differentiable.
    • Can cause vanishing gradient problem for very high or low values.
  • Use cases:
    • Binary classification problems (output layer).
    • Gates in certain recurrent neural network architectures.

3. Tanh (Hyperbolic Tangent) - (tanh . 2)

Tanh is defined as f(x) = (e^x - e^(-x)) / (e^x + e^(-x))

  • Characteristics:
    • Outputs values between -1 and 1.
    • Zero-centered, which can help in normalizing the data.
    • Similar to sigmoid but with a steeper derivative.
  • Use cases:
    • Hidden layers in neural networks, especially in recurrent neural networks.
    • Can be preferred over sigmoid in hidden layers due to its zero-centered nature.

4. Softmax - (softmax . 3)

Softmax is defined as f(x_i) = e^(x_i) / Σ(e^(x_j)) for j = 1 to n

  • Characteristics:
    • Converts a vector of real numbers into a probability distribution.
    • Outputs sum to 1, each output is between 0 and 1.
    • Emphasizes the largest values while suppressing those significantly below the maximum.
  • Use cases:
    • Multi-class classification problems (output layer).
    • Useful when you need to represent a categorical distribution.

5. Identity - (identity . 4)

Identity is defined as f(x) = x

  • Characteristics:
    • Simply returns the input value unchanged.
    • Linear activation with a constant derivative of 1.
  • Use cases:
    • Rarely used as an activation function in hidden layers.
    • Sometimes used in the output layer for regression problems.
    • Often used in the final layer of autoencoders.

Each of these activation functions has its strengths and is suited for different types of layers or problems in neural networks. The choice of activation function can significantly impact the network's ability to learn and its overall performance.

Author: Jason Walsh

j@wal.sh

Last Updated: 2024-10-30 16:43:54