AI and Machine Learning: How to Learn them Visually
CSS Visual Dictionary
Contains visual diagrams for every single CSS property in common use.
medium.com
I created this tutorial as an entry-level piece on Artificial Intelligence.
Any new subject must be presented in language matching the learner’s level of skill at that time. So don’t expect crazy math formulas just yet.
In particular we’ll take a look at Machine Learning aka Deep Learning.
The depth of a Neural Network is determined by number of input layers.
Machine Learning algorithms weigh the likelihood of a particular data set against a specific pattern.
Thinking In Ranges
Neurons in your brain are definitely not digital, but they resemble binary logic as either on or off state. But in software, we use a range of values instead.
The result of a calculation cycle in an AI operation is a precision estimate in the range between 0.0 - 1.0. Ultimately — an output value is produced based on how well input data matches a specific pattern with 1.0 being 100% match (You rarely reach that but 0.95 - 0.97 is good.)
This pattern is usually trained before meaningful results can be produced. More on this a bit later in this tutorial. But first, here’s ML at its most basic.
It all begins with neural networks — a software imitation of the physical structure of the neurons in a brain.
Simple Neural Network Structure
In this minimalist example 1 input layer consisting of 3 input nodes is shown.
A multiple set of inputs per layer is usually provided. Each input is gathered from some type of a source. Like an array of pixels from an image used for face recognition, for example / or any other data. It depends on the purpose of what you’re trying to accomplish with your AI algorithm.
Both input and output values are floating pt. numbers between 0.0 and 1.0.
Logistically, during network operation the data is fed from left to right. However… Back-propagation is sometimes used to optimize the Neural Network. That’s when we travel the network in reverse. But for now we don’t need to concern ourselves with that.
Sum
The sum of several input nodes is just what it sounds like. It is the total sum of the weights from every node from the previous input layer. After calculating the sum it is then passed into the Activation Function for processing.
Activation Function
The Activation Function converts sum of input values into an output value.
But how exactly does it work?
We need to take a look at another aspect of Machine Learning.
Remember those math equations from high-school? Parabolas — anyone?
An Activation Function is literally just a math equation. So for those with a math background this might be a bit easier to grasp. If not - read on forward to the visual diagrams and the rest of this tutorial so it starts to sink in!
Reason we can’t use simple linear equations is due to their limitations.
They are not sufficient enough for creation of useful neural networks.
Neural Networks are designed around more complex equations. For example the Sigmoid (also known as Logistic) function is quite common. (We’ll take a look at a few of different ones in the section below.)
They all take on the form of f(x) = … and then crunch the x value in a way unique to that function. Why this matters and why we have different AF functions will become more apparent a bit later.
What happens once we get our result?
AF passes calculated value onto the next node and essentially as a partial input into one of the activation functions in a node in the next input set.
You can think of it as taking a set of multiple inputs. And passing the calculated value onto the next node. It’s the value gateway between input sets.
Different Types Of Activation Functions
Just like there are different types of math equations…there are different types of activation functions.
Exactly how they crunch numbers to arrive at the final output value is tightly related to training an existing network first. So we can’t go that deep into the subject just yet, because overall, the system is not based on something as simple as calculating and returning a numeric result.
But what we can do — to deepen our understanding, thus far — is take a look at the visual representation of each mathematical equation behind different activation functions!
This is a visual tutorial. And to give you a basic idea of what you’ll be grappling with here is a table of the classic set of math equations many classic Activation Functions can be based on.
The most basic AF is represented by f(x)=x or the Identity Function.
There are several others. But they are a bit more complex.
Essentially these functions are used to determine the resulting node value.
How Exactly Does An Activation Function Determine Its Value?
Well, that’s what an AF is. It takes an input in the form of a number and produces a return value between 0.0–1.0 (sometimes range is +/- infinity). The actual formulas are described above. You can re-write these equations as functions in Python, JavaScript or any other programming language.
If you are into math and have a lot of time on your hands you will love writing out these functions in code! But often you don’t have to. And that’s because already existing A.I. libraries take care of that for you. This way you can focus on building your Neural Network and training it for a specific purpose.
Each Node Carries A Calculated Weight
So these Activation Functions produce a value.
The most important thing to notice at this time — each point is a weight.
This weight measures the likelihood a certain pattern was matched.
But multiple layers of input sets are possible, as shown in the next example.
Each single node communicates with every single node in the next input layer making up this cross-connected communication highway.
The number of items in each layer is arbitrary. It doesn’t have to be the same number as shown in the diagram above. Depending on which problem you’re trying to solve.
It will take some intuition and creativity to determine the number of input nodes you want to use in each layer. But even solving the same problem can be accomplished by different neural network structures.
Due to the non-linear nature of calculations this process is ambiguous.
Hidden Layers
We’ve just discussed how a Neural Network can have multiple input layers. They can be thought of as vertical rows of nodes.
All of the inner layers between first input row and output node are often referred to as hidden layers. That makes sense because this is where most of the gritty AI processing work is done. Basically it’s the AI mystery box.
Different Types of Neural Network Patterns
At times ML may seem a lot like crafting a network pattern to match patterns.
Neural networks come in different shapes and forms.
Different types of neural network structures are more apt at solving particular types of problems associated with their structure.
OK — But How Do We Write The Code?
That was a lot of theory.
But how do we actually implement it in code?
You can use a library like Tensorflow.js to get started.
But that won’t do any good because there is still so much to cover.
OK — But How Does It Produce Meaningful Results?
We’ve discussed the structure of a neural network up to this point.
We talked about activation functions, data inputs and hidden layers.
We also talked about weights passed to and fro the simulated connections.
In order for a non-linear Machine Learning algorithm to produce any sensible outcome it first needs to be trained on a set of pre-existing data.
You always start with choosing data to train your AI algorithm.
That depends on what problem you’re trying to solve.
If you want to recognize numbers in an image you start with images of digits.
Recognizing Numbers From A Screenshot
The classic AI example is to teach a neural network to recognize numbers between 0 — 9. In the same way as you can train a machine algorithm to recognize A—Z letters or even parts of a human face — an eye or a mouth on a photograph also represents a particular type of shape or pattern that is common to all humans but might appear slightly different.
Remember all we are dealing here is patterns.
When the algorithm recognizes a pattern it is never a 100% match. But the closer we can get to 1.0 (100%) the more likely the shape we’re looking for represents what it was trained to recognize.
If we used a standard font, we wouldn’t even have to do any AI work. We could simply scan each digit for exact pixel pattern. But the key point of AI is to recognize a pattern in obscurity.
First, we need to have some type of a medium which will be used as a piece of training data. Each digit can be represented by an image:
You can easily recognize each digit by sight. But an AI algorithm needs to be trained to recognize similar patterns because while they are similar they are still not 100% identical.
In order to achieve this we can break down the primary pattern into smaller blocks and implement something referred to as feature extraction.
Feature Extraction
To identify a digit the algorithm implements a feature extraction system which breaks down common patterns into counterparts relevant for constructing the complete digit / symbol / letter / etc.
The essence of a pattern remains the same. For example 0 is mostly a circle — you can break it down into smaller patterns with an arch on each of the sides:
If we can only train our algorithm to recognize these 4 unique patterns and check for their presence within localized area of an image we can calculate the amount of certainty with which it can be said that it might be a zero.
It’s the same for other digits. Digit 1 for example is a single vertical bar. Or perhaps with a smaller line at a slight angle at the top.
Number 2 is half a circle on top, a diagonal line and a horizontal line.
Number 3 can be broken into two semi-arch patterns.
Number 4 can be thought of as 3 lines: vertical, horizontal and diagonal.
…and so on.
What if it’s a hand-written digit? It still has the same properties of that digit: the same edges, the same loops.
What if the digit appears on a speed limit sign out on the street from an indirect angle on a photograph? Much like our own vision AI should be able to accommodate for some type of error term.
Try out this AI JavaScript demo that allows you to draw something on the screen and have the pre-trained algorithm tell you what you just drew.
The algorithm will try to give you the best match even if what you draw isn’t really a number. Still you can see artificial intellect at work trying to provide the closest approximation it can muster.
What Does The Trained Set Look Like?
Here is a snippet of the training data from the algorithm. It’s just a list of weights stored in a very long array (thousands of values):
// The neural network's weights (unit-unit weights, and unit biases) // training was done in Matlab with the MNIST dataset.
// this data is for a 784-200-10 unit, with logistic non-linearity
// in the hidden and softmax in the output layer. The input is a
// [-1;1] gray level image, background == 1, 28x28 pixels linearized
// in column order (i.e. column1(:); column2(:); ...) i-th output
// being the maximum means the network thinks the input encodes
// (i-1) the weights below showed a 1.92% error rate on the test
// data set (9808/10000 digits recognized correctly).let w12 = [[-0.00718674, 0.00941102, -0.0310175, -0.00121102, -0.00978546, -4.65943e-05, 0.0150367, 0.0101846, 0.0482145, 0.00291535, -0.00172736, 0.0234746, 0.0416268, 0.0315077, -0.00252011, 0.0163985, 0.00853601, 0.00836308, 0.00692898, 0.0215552, 0.0540464, 0.0393167, 0.0668207, 0.0232665, 0.031598, 0.0143047, 0.0156885, -0.0269579, -0.00777022, 0.0397823, -0.00825727, 0.0212889, -0.00755215, 0.0353843, 0.0297246, .../* ... Thousands weights more follow ... */
The complete source code wouldn’t fit into this article. But the sets are usually pretty long even for what seems to be trivial tests.
Painting Image Input Into Neural Net
This bit of code was taken from recognize() function written in JavaScript.
It was taken from the demo at http://myselph.de
You can check out the entire source code here.
// for visualization/debugging: paint the input to the neural net. if (document.getElementById('preprocessing').checked == true)
{
ctx.clearRect(0, 0, canvas.width, canvas.height);
ctx.drawImage(copyCtx.canvas, 0, 0);
for (var y = 0; y < 28; y++) {
for (var x = 0; x < 28; x++) {
var block = ctx.getImageData(x * 10, y * 10, 10, 10);
var newVal = 255 * (0.5 - nnInput[x*28+y]/2);
for (var i = 0; i < 4 * 10 * 10; i+=4) {
block.data[i] = newVal;
block.data[i+1] = newVal;
block.data[i+2] = newVal;
block.data[i+3] = 255;
}
ctx.putImageData(block, x * 10, y * 10);
}
}
}
This partial piece of code “pastes” the image input (a free hand drawing) that was previously divided into 10 x 10 blocks storing average grayscale values for that area of the image.
It will then check it against the trained set and after crunching the sums / and average comparisons against it will return the likelihood of the result in terms of how closely your HTML canvas drawing matches a particular digit.
Final Words
Artificial Intelligence is a vast subject. There are different types of machine learning patterns and tutorials coming out each day. This tutorial should serve only as an introduction for someone who is just starting out!
Follow Me On Twitter For Free Book Giveaways
Grab your copy of CSS Visual Dictionary incl. diagrams of all CSS properties.
On Twitter 🌊 Tidal Wave is the account that gives away my books for free.
Follow me on 👨💻 @ js_tut where I post freemium JavaScript tutorials.