Summarize with AI

AI 22 September 2023

Self-Organizing Maps For Machine Learning: A Practical Guide

Self-organizing maps (SOMs), also known as Kohonen maps, are a type of artificial neural network used for unsupervised learning tasks such as cluster neural networks used for unsupervised learning tasks such as clutter and reduced dimensionality. SOMs train using a competitive learning algorithm, in which neurons compete to represent the input data. This results in a two-dimensional grid of neurons, where similar data points map to nearby neurons.

This blog post introduces self-organizing maps (SOMs) and their applications, including fraud detection. Also provides a step-by-step guide to implementing SOMs for fraud detection.

What is a Self-Organizing Map(SOM)?

Imagine you have a map, and you want to organize data points on it in a way that captures their inherent structure and relationships. This is precisely what a Self-Organizing Map (SOM) does in the world of machine learning. SOM was developed by Teuvo Kohonen in the 1980s.

At its core, SOM consists of an array of neurons arranged in a grid-like fashion. Each neuron has an associated weight vector that represents a point in the data space. During training, SOM learns to adapt these weight vectors to the input data in a way that nearby neurons on the map respond to similar input patterns. This self-organization process is what makes SOM so powerful.

How Does SOM Work?

SOM architecture works based on a two-dimensional grid of neurons. Each neuron in the grid has a weight vector that initializes randomly. The weight vector represents the position of the neuron in the input space.

Step-by-step guide to how a SOM works

InitializationEach neuron’s weight vector is assigned random values or values drawn from the input data.

Competition and CooperationWhen presented with an input data point, neurons compete to be the best match. The winning neuron (the one with the closest weight vector) and its neighbors are updated to become more similar to the input.

Iterative LearningThis process is repeated for multiple data points, gradually fine-tuning the neurons’ weight vectors.

Over time SOM neurons learn to represent different features of the input data, which organizes the data into a two-dimensional map.

Also Read: How to Use Linear Regression to Make Predictions in Machine Learning

Applications of SOMs

SOMs have a wide range of applications in machine learning, including:

Data ClusteringOn a map, SOM can be used to cluster similar data points together. This helps identify natural groupings or patterns within complex datasets.
Dimensionality ReductionWith SOM, the dimensions of data are reduced while their structure is preserved. Visualizations and feature selections can be made using this method.
Data VisualizationA SOM creates 2D and 3D representations of high-dimensional data that make it easier for humans to interpret and comprehend.
Anomaly DetectionAn anomaly or outlier can be detected using SOM by identifying data points that do not conform to established patterns.
Customer SegmentationIn marketing and business, SOM can be applied to segment customers based on their behavior which allows for targeted marketing strategies.

Implementation of Fraud Detection Using SOMs

SOMs can be used to implement a fraud detection system by following these steps:

Part 1: Data Preprocessing

Preprocess the data to ensure that it is in a format that is compatible with the SOM algorithm. To process the data you have to execute the given steps:

1.1 Import Libraries

To execute data processing we have to implement 3 libraries,

NumPy Library, used for multidimensional array
Pandas Library, used for importing the dataset
Matplotlib Library, used for plotting the graph

# import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

1.2 Import Dataset

To import the dataset we use Pandas library to load the Credit Card Application dataset from Kaggle into a DataFrame. then split the dataset into independent and dependent variables using the iloc function, which we store in the variables X and y, respectively.

# import the database from kaggle
dataset = pd.read_csv('Credit_Card_Applications.csv')

#independent variables
X = dataset.iloc[:, :-1].values 

# dependent variables
y = dataset.iloc[:, -1].values

1.3 Feature Scaling

It is important in data preprocessing because it helps to normalize the features so that they have the same scale. This makes it easier for machine learning algorithms to learn from the data. The MinMaxScaler class from the scikit-learn library is a popular normalization technique that scales the features so that they fall within the range [0, 1].

from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0,1))
X = sc.fit_transform(X)

Part 2: Train model

Train a SOM on the preprocessed data.

2.1 Import The Model

If you use the MiniSom model for SOMs and you get an error, you need to install it using the following command:

#!pip install minisom
from minisom import MiniSom

2.2 Initialize SOM Model

To initialize a SOM model, we need to specify a few parameters:

Dimensions of the SOM MapThis defines the size of the SOM grid. For example, a 10×10 grid would have 100 nodes.
Input LengthThis is the number of features in the input data. For example, we have 15 different attributes available in the dataset, we take input_length = 15.
Neighborhood Radius This defines the size of the neighborhood around the best matching unit (BMU) that is updated during training. A larger neighborhood radius will result in smoother updates to the SOM weights. Here we’ll use a 1.0 radius.
Learning RateIt is a hyperparameter, that defines how much the SOM weights are updated during each iteration of training which is default is 0.5.

som = MiniSom(x=10,y=10,input_len=15, sigma=1.0,learning_rate=0.5)

2.3 Initialize The Weight

We randomly initialize the weights of our SOM model using our data (X).

som.random_weights_init(X)

2.4 Train the model

In this step, we train the SOM model. We pass two parameters to the training function: the input data and the number of iterations. In this example, we choose 100 iterations.

som.train_random(data = X, num_iteration = 100)

Part 3: Visualize The Model

To simply visualize a self-organizing map(SOM) use the following code:

# Imports the necessary libraries from pylab
from pylab import bone, pcolor, colorbar, plot, show

#Creates a bone colormap.
bone()

#Creates a pcolor plot of the SOM's distance map
pcolor(som.distance_map().T)

#Adds a colorbar to the plot
colorbar()

#Defines the marker types and colors that will be used to plot the data points
markers = ['o','s']
colors = ['r','g']

#Iterates over the input data
for i,x in enumerate(X):

#Finds the winning node for the current data point
w = som.winner(x)

#Plots a marker at the winning node for the current data point
plot(w[0] + 0.5,
      w[1] + 0.5,
      markers[y[i]],
      markeredgecolor = colors[y[i]],
      markerfacecolor = 'None',
      markersize=10,
      markeredgewidth=2)

#Displays the plot
show()

The Self-Organizing Map (SOM) visualizes customer data, where red circles represent customers who did not get approval and green squares represent customers who did get approval. Outliers, shown in white, have a high potential for fraud.

Part 4: Catch the Potential Fraud

4.1 Map the winning node

To catch the potential fraud, we need to map the winning code using the given code.

# mapping the winning node
mappings = som.win_map(X)

The code creates a dictionary that maps each winning node to its customers. This dictionary can be used to identify potential fraud by looking for customers that are mapped to winning nodes with high inter-neuron distances (IND). IND is a measure of how different a winning node is from its neighbors. Outlier data points, such as fraudulent customers, are more likely to have higher INDs.

4.2 Catch The Cheater

Use the given Python code to catch the cheater using SOM.

frauds=np.concatenate((mappings[(7,8)], mappings[(3,1)], mappings[(5,1)]), axis=0)

The code first uses a self-organizing map (SOM) to map the winning node for each customer. The winning node is the node on the SOM that is most similar to the customer’s data. The code then takes only the customers whose winning nodes are [(7,8), (3,1), and (5,1)]. These customers are flagged as potential fraud customers because their winning nodes are outliers on the SOM.

4.3 Rescale the value

# Rescale the values of the potential fraud customers
for i in range(len(frauds)):
    frauds[i] = inverse_scale_function(frauds[i])

This code will rescale the values of the potential fraud customers back to the original scale using the inverse scale function. The inverse scale function scales the values before training the SOM.

Part 5: Final Result

# Print the list of potential fraud customers in the original scale
print(frauds)

The final result of catching fraud using SOM is a list of customers who are most likely to be cheating. The generated list by SOM helps to identify customers whose winning nodes are outliners. The winning node for a customer is the node on the SOM that is most similar to the customer’s data.

Conclusion

The guide will help you to find fraud detections using SOMs. Also, Self-Organizing Maps (SOMs) are a powerful type of unsupervised machine-learning algorithm that helps to reduce dimensionality, visualize data, and cluster data points. It is particularly useful for analyzing high-dimensional data, as it can project the data into a lower-dimensional space while preserving the underlying relationships between the data points.

If you are working with high-dimensional data or if you need to develop a machine learning algorithm that can learn without labeled data, then SOMs are a good option to consider.

Get a free consultation for AI & ML from CodeTrade, a leading AI & ML Development Company in India. We offer a free consultation to help you understand how to use AI and ML to improve your business. Our team of experienced AI and ML developers can provide you with high-quality AI and ML services tailored to your specific needs.

Author

Chand Prakash

Chand Prakash founded CodeTrade India and continues to lead it as CTO, shaping the technical direction of the company since its early days. He has spent his career solving hard engineering problems and building teams that ship reliable software, with a focus on ERP, e-commerce, and custom enterprise platforms.

Similar Blogs