From Basic Gates to Deep Neural Networks: The Definitive Perceptron Tutorial | by Joseph Robinson, Ph.D. | Apr, 2023

By Jessie Hobb On Apr 28, 2023

Towards Mastering AI

Mathematics, binary classification, logic gates, and more

TL;DR

The world of perceptrons is fascinating. Perceptron models are the building blocks of modern artificial intelligence. This blog post makes a long story short. Let us learn the story of the perceptron (i.e., from neural network to multilayer perceptron and beyond). We will dig into the simple mathematics that drives the model deployable as a binary classifier and a simulated computer transistor, multiplier, and even logic gate. Let us look at how the perceptron model paved the way for more advanced classifiers, such as logistic regression, SVM, and deep learning. Sample code snippets and illustrations are provided throughout to enhance our understanding. Furthermore, we will examine the perceptron model using practical use cases to learn how and where it should be used.

Whether you are a self-taught data scientist, an AI practitioner, or a seasoned professional proficient in ML, there is likely something in here for you! Let us dive deep and look wide at the model that was there when AI was in its infancy and is still here today. Let us look at how and in what ways the perceptron works, walk through its history, build models and gates, compare it to other models, and forecast where we’ll go from here.

1.1 A Brief History of the Perceptron Model

Warren McCulloch and Walter Pitts’ work on artificial neurons in 1943 [1] inspired a psychologist named Frank Rosenblatt to make the perceptron model in 1957 [2]. Rosenblatt’s perceptron was the first neural network (NN) to be described with an algorithm, paving the way for modern techniques for machine learning (ML). Upon its discovery, the perceptron got much attention from scientists and the general public. Some saw this new technology as essential for intelligent machines—a model for learning and changing [3].

However, the perceptron’s popularity did not persist. Then, in 1969, Marvin Minsky and Seymour Papert published their book, “Perceptrons,” which highlighted the limitations of the perceptron model while revealing that it could not solve problems like the XOR classification [4] (Section 3). This work triggered a significant loss of interest in NNs, turning their attention to other methods. The early years of the perceptron are listed in Fig. 1.

**Fig 1.** Significant milestones in the history of the perceptron (1943–1982). Figure created by the author.

It took over a decade, but the 1980s saw interest in NNs rekindle. Many thanks, in part, for introducing multilayer NN training via the back-propagation algorithm by Rumelhart, Hinton, and Williams [5] (Section 5).

In 2012, using the prior development and the advancements in computing power (i.e., GP-GPUs), big data, non-linear activations (i.e., RELU), and dropout, the largest convolutional neural networks trained to date were produced. ImageNet provided the large labeled dataset needed to fill its capacity.

**Fig 2.** Significant milestones in the history of the perceptron (1985–1997). Figure created by the author.

Out came the rise of today’s frenzy for deep learning. Hence, the perceptron model plays a pivotal role in their foundation. Figs. 2 and 3 list the remaining milestones (continuation of Fig. 1).

**Fig 3.** Significant milestones in the history of the perceptron (2006–2018). Figure created by the author.

1.2. The Importance of the Perceptron Model in Machine Learning

Despite its limitations, the perceptron model remains an essential building block in ML. It is a fundamental part of artificial neural networks, which are now used in many different ways, from recognizing images to figuring out what people say.

The simplicity of the perceptron model makes it a great place to start for people new to machine learning. It makes linear classification and learning from data easy to understand. Also, the perceptron algorithm can be easily changed to create more complex models, such as multilayer perceptrons (MLP) and support vector machines (SVMs), which can be used in more situations and get around many of the problems with the original perceptron model.

In the following sections, we’ll cover the math behind the perceptron model; how it can be used as a binary classifier and to make logic gates, and how it can be used to do multiplication tasks like a computer’s transistors. We’ll also talk about the differences between the perceptron model and logistic regression and show how the perceptron model can be used in new and exciting ways.

2.1. Linear Separability

At its core, the perceptron model is a linear classifier. It aims to find a “hyperplane” (a line in two-dimensional space, a plane in three-dimensional space, or a higher-dimensional analog) separating two data classes. For a dataset to be linearly separable, a hyperplane must correctly sort all data points [6].

Mathematically, a perceptron model can be represented as follows:

y = f(w * x + b).

xis the input vector;w is the weight vector;b is the bias term; and f is the activation function. In the case of a perceptron, the activation function is a step function that maps the output to either 1 or 0, representing the two classes (Fig. 4).

**Fig. 4.** Depiction of the unit step function, with the piece-wise conditions for mapping outputs to 0 or 1. Figure created by the author.

A perceptron model can be extended to have multiple features in inputx, which are defined as follows:

y = f(w_1 * x_1 + w_1 * x_1 ... w_n * x_n + b).

The above equation, along with the step function for its output, is activated (i.e., turned off via 0 or on via 1), as depicted in the following figure, Fig. 5.

**Fig. 5.** Multi-variant linear classification. Note that the weighted sum is passed through the activation, the step function mentioned above—source link.

2.2. The Perceptron Learning Algorithm

The perceptron learning algorithm is a way to keep the weights and biases up-to-date to reduce classification errors [2]. The algorithm can be summarized as follows:

Initialize the weights and the bias to small random values.
For each input-output pair(x, d), compute the predicted outputy = f(w * x + b).
Update the weights and bias based on the errore = d - y:

w = w + η * e * x

b = b + η * e,

whereη is the learning rate, a small positive constant that controls the step size of the updates.

4. Repeat steps 2 and 3 for a fixed number of iterations or until the error converges.

We can use Python and Sklearn to implement the steps above quickly:

import numpy as np
from sklearn.linear_model import PerceptronX = np.array([2, 3], [1, 4], [4, 1], [3, 2])
y = np.array([1, 1, 0, 0])
perceptron = Perceptron()
perceptron.fit(X, y)

Then, using the fitted model, we can predict as follows:

new_data_point = np.array([[1, 2]])
prediction = perceptron.predict(new_data_point)
print(prediction)

The perceptron learning algorithm guarantees convergence if the data is linearly separable [7].

**Fig. 6.** Boolean classification, where the classes are linearly separable. Image created by the author.

2.3. The Perceptron Convergence Theorem

Rosenblatt proved the perceptron convergence theorem in 1960. It says that if a dataset can be separated linearly, the perceptron learning algorithm will find a solution in a finite number of steps [8]. The theorem says that, given enough time, the perceptron model will find the best weights and biases to classify all data points in a linearly separable dataset.

But if the dataset isn’t linearly separable, the perceptron learning algorithm might not find a suitable solution or converge. Because of this, researchers have developed more complex algorithms, like multilayer perceptrons and support vector machines, that can deal with data that doesn’t separate in a straight line [9].

3.1. Linear Classification

As previously mentioned, the perceptron model is a linear classifier. It makes a decision boundary, a feature-space line separating the two classes [6]. When a new data point is added, the perceptron model sorts it based on where it falls on the decision boundary. The perceptron is fast and easy to use because it is simple, but it can only solve problems with data that can be separated linearly.

3.2. Limitations of the Perceptron Model

One big problem with the perceptron model is that it can’t deal with data that doesn’t separate in a straight line. The XOR problem is an example of how some datasets are impossible to divide by a single hyperplane, which prevents the perceptron from finding a solution [4]. Researchers have developed more advanced methods to get around this problem, such as multilayer perceptrons, which have more than one layer of neurons and can learn to make decisions that don’t follow a straight line [5].

The perceptron model is also sensitive to setting the learning rate and initial weights. For example, if the learning rate is too low, convergence might be slow, whereas a large learning rate may cause oscillations or divergence. In the same way, the choice of initial weights can affect how fast the solution converges and how it turns out [10].

3.3. Multi-class Classification with the Perceptron Model

Even though the basic perceptron model is made for two-class problems, it can solve problems with more than two classes by training multiple perceptron classifiers, one for each category [11]. The most common approach is one-vs-all (OvA), in which a separate perceptron is trained to distinguish classes. Then, when classifying a new data point, the perceptron with the highest output is chosen as the predicted class.

Another approach is the one-versus-one (OvO) method, in which a perceptron is trained for each pair of classes. The final classification decision is made using a voting scheme, where each perceptron casts a vote for its predicted class, and the type with the most votes is selected. While OvO requires training more classifiers than OvA, each perceptron only needs to handle a smaller subset of the data, which can benefit large datasets or problems with high computational complexity.

4.1. How Perceptrons Can Be Used to Generate Logic Gates

Perceptron models can be used to represent logic gates, which are the most basic building blocks of digital circuits. By appropriately adjusting the weights and biases of a perceptron, it can be trained to perform logical operations such as AND, OR, and NOT [12]. This link between perceptrons and logic gates shows that neural networks can do computation and have the potential to simulate complex systems.

**Fig. 6.** Linearly separable logic gates: **AND** and OR (left and middle, respectively). On the other hand, **XOR** cannot be separated by a single linear classifier (right) but can be with a two-layer network (more on this later)—figure created by the author.

4.2. Example: Implementing a NAND Gate Using a Perceptron

A NAND gate is a fundamental logic gate that produces an output of 0 only when both inputs are 1, resulting in 1 in all other cases. The truth table for a NAND gate is as follows:

NAND Gate Truth Table. Table created by the author.

To implement a NAND gate using a perceptron, we can either manually set the weights and biases or train the perceptron using the perceptron learning algorithm. Here’s a possible configuration of weights and bias:

w1 = -1;

w2 = -1;

b = 1.5.

With these parameters, the perceptron can be represented as:

y = f((-1 * A) + (-1 * B) + 1.5).

**Fig. 6.** Training data, graphical depiction, and linear function for an AND gate. Figure created by the author.

Here, f is the step function, and A and B are the inputs. If you test this setup with values from the truth table, you’ll get the right results from a NAND gate:

**Fig. 7.** Truth table for the logic NAND, along with the output of the perceptron trained above. Created by the author.

In Python, the NAND can be implemented as follows:

def nand_gate(x1, x2):
w1, w2, b = -1, -1, 1.5
return int(w1 * x1 + w2 * x2 + b > 0)binary_inputs = [(0,0), (0,1), (1,0), (1,1)]
for A, B in binary_inputs:
print(f"({A}, {B}) --> {nand_gate(A, B)}")

As expected, reproducing the table summarizing the NAND gate above:

(0, 0) --> 1
(0, 1) --> 1
(1, 0) --> 1
(1, 1) --> 0

A NAND gate can be used to build all other gates because it is functionally complete, meaning that any other logic function can be derived using just NAND gates. Here’s a brief explanation of how to create some of the basic gates using NAND gates:

NOT gate: Connect both inputs of the NAND gate to the input value.
AND gate: First, create a NAND gate and then pass the output through a NOT gate.
OR gate: Apply a NOT gate to each input before feeding them into a NAND gate.

To create a NAND gate that accepts an arbitrary number of inputs, you can use Python to define a function that takes a list of inputs and returns the NAND output. Here’s a code snippet demonstrating this:

def nand_gate(inputs):
assert len(inputs) > 1, "At least two inputs are required."# Helper function to create a 2-input AND gate
def and_gate (x1, x2):
w1, w2, b = 1, 1, -1.5
return int(w1 * x1 + w2 * x2 + b > 0)
# Reduce the inputs to a single NAND output using the helper function
result = and_gate(inputs[0], inputs[1])
for i in range (2, len (inputs)):
result = and_gate(result, inputs[i])
return 0 if result > 0 else 1
# Example usage
inputs = [(0, 0, 0, 0),
(0, 0, 0, 1),
(0, 0, 1, 0),
(0, 0, 1, 1),
(0, 1, 0, 0),
(0, 1, 0, 1),
(0, 1, 1, 1),
(1, 0, 0, 0),
(1, 0, 0, 1),
(1, 0, 1, 0),
(1, 0, 1, 1),
(1, 1, 0, 0),
(1, 1, 0, 1),
(1, 1, 1, 0),
(1, 1, 1, 1)]
for A0, A1, A2, and A3 inputs:
output = nand_gate((A0, A1, A2, A3))
print(f"({A0}, {A1}, {A2}, {A3}) --> {output}")

This function uses a helper function (i.e., and_gate) to make a NAND gate with two or more inputs. The AND operation is then repeated on the given inputs. The final result is the output of the NAND gate, with an arbitrary number of input bits, which is the negated value of the AND gates.

4.3. Extending to Other Logic Gates: AND, OR, XOR

Similarly, perceptrons can model other logic gates, such as AND, OR, and NOT. For example, an AND gate can be represented by a perceptron with weights w1 = 1, w2 = 1, andb = -1.5.

def and_gate(x1, x2):
w1, w2, b = 1, 1, -1.5
return int(w1 * x1 + w2 * x2 + b > 0)binary_inputs = [(0,0), (0,1), (1,0), (1,1)]
for A, B in binary_inputs:
print(f"({A}, {B}) --> {and_gate(A, B)}")

Again, outputs mimic those of the intended AND gate.

(0, 0) --> 0
(0, 1) --> 0
(1, 0) --> 0
(1, 1) --> 1

However, a single perceptron cannot model the XOR gate, which is not linearly separable. Instead, a multi-layer perceptron or a combination of perceptrons must be used to solve the XOR problem [5].

5.1. Analogies Between Perceptrons and Transistors

Transistors are the basic building blocks of electronic devices. They are in charge of simple tasks like adding and multiplying. Interestingly, perceptrons can also be viewed as computational units that exhibit similar functionality. For example, perceptrons are used in machine learning and artificial neurons. Conversely, transistors are physical parts that change how electrical signals flow [13]. Still, as the last section showed, both systems can model and carry out logical operations.

5.2. Performing Multiplication with Perceptrons

We can leverage their capabilities for binary operations to perform multiplication using perceptrons. For example, let’s consider the expansion of two binary digits (i.e., A and B), which can be represented as a simple AND gate. As demonstrated in Section 4, an AND gate can be modeled using a perceptron.

But for more complicated multiplication tasks involving binary numbers with more than two bits, we need to add more parts, like half and full adders, which require a combination of logic gates [14]. Using perceptrons to build these parts makes making an artificial neural network that can perform binary multiplications possible.

For example, suppose we want to multiply two 2-bit binary numbers, A1A0 and B1B0. Then, we can break down multiplication into a series of AND operations and additions:

Compute the partial products: P00 = A0 * B0, P01 = A0 * B1, P10 = A1 * B0, and P11 = A1 * B1.
Add the partial products using half and full adders, resulting in a 4-bit binary product.

Each AND operation and addition can be done with perceptrons or groups of perceptrons that represent the logic gates needed.

Using the AND gate function, we set up in the last section, we can do the following in Python to implement perceptron-based multiplication:

A1A0 = [1, 0]
B1B0 = [1, 1]P00 = and_gate(A1A0[1], B1B0[1])
P01 = and_gate(A1A0[1], B1B0[0])
P10 = and_gate(A1A0[0], B1B0[1])
P11 = and_gate(A1A0[0], B1B0[0])
# Implement a simple adder using perceptron-based logic gates
result = [P00, P01 ^ P10, (P01 & P10) ^ P11, P11]
print(result)

5.3. The Future of Perceptrons and Hardware Implementation

Even though perceptrons can act like transistors and perform basic math operations, their hardware implementation is less efficient than traditional transistors. But recent improvements in neuromorphic computing have shown that it might be possible to make hardware that acts like neural networks, like perceptrons [15]. These neuromorphic chips could help machine learning tasks use less energy and open the door to new ways of thinking about computers.

6.1. Similarities Between Perceptron and Logistic Regression

Both the perceptron model and logistic regression are linear classifiers that can be used to solve binary classification problems. They both rely on finding a decision boundary (a hyperplane) that separates the classes in the feature space [6]. Moreover, they can be extended to handle multi-class classification problems through techniques like one-vs-all and one-vs-one [11].

Let’s take a look at the differences in Python implementation:

from sklearn.linear_model import LogisticRegressionlog_reg = LogisticRegression()
log_reg.fit(X, y)
new_data_point = np.array([[1, 2]])
prob_prediction = log_reg.predict_proba(new_data_point)
print(prob_prediction)

import numpy as np
from sklearn.linear_model import Perceptron, LogisticRegression# Dataset
X = np.array([[2, 3], [1, 4], [4, 1], [3, 2]])
y = np.array([1, 1, 0, 0])
# Train Perceptron
perceptron = Perceptron()
perceptron.fit(X, y)
# Train Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(X, y)
# New data point
new_data_point = np.array([[1, 2]])
# Perceptron prediction
perc_prediction = perceptron.predict(new_data_point)
print("Perceptron prediction:", perc_prediction)
# Logistic Regression prediction
log_reg_prediction = log_reg.predict(new_data_point)
print("Logistic Regression prediction:", log_reg_prediction)
# Logistic Regression probability prediction
prob_prediction = log_reg.predict_proba(new_data_point)
print("Logistic Regression probability prediction:", prob_prediction)

This outputs:

Perceptron prediction: [1]
Logistic Regression prediction: [1]
Logistic Regression probability prediction: [[0.33610873 0.66389127]]

6.2. Differences Between Perceptron and Logistic Regression

Even though the perceptron model and logistic regression have some similarities, there are some essential differences between the two:

Activation function: The perceptron model uses a step function as its activation function, while logistic regression uses the logistic (sigmoid) function [10]. This difference results in a perceptron having a binary output (0 or 1). At the same time, logistic regression produces a probability value (between 0 and 1) representing the likelihood of an instance belonging to a particular class.
Loss function: The perceptron learning algorithm minimizes the misclassification errors, whereas logistic regression minimizes the log-likelihood or cross-entropy loss [16]. This distinction makes logistic regression more robust to noise and outliers in the dataset, as it considers the magnitude of the errors rather than just the number of misclassified instances.
Convergence: The perceptron learning algorithm can converge if the data is linearly separable but may fail to converge otherwise [7]. Logistic regression, on the other hand, employs gradient-based optimization techniques like gradient descent or Newton-Raphson, which are guaranteed to reach a global optimum for convex loss functions like the log-likelihood [17].
Non-linearly separable data: While the perceptron model struggles with non-linearly separable data, logistic regression can be extended to handle non-linear decision boundaries by incorporating higher-order polynomial features or using kernel methods [18].

6.3. Choosing Between Perceptron and Logistic Regression

The perceptron model and logistic regression choices depend on the problem and dataset. Logistic regression is more reliable and can deal with a broader range of problems because it is based on probabilities and can model non-linear decision boundaries. But the perceptron model may be easier to use and use less computing power in some situations, especially when dealing with data that can be separated linearly.

7.1. Optical Character Recognition (OCR)

The perceptron model has been used in optical character recognition (OCR) tasks, where the goal is to recognize and turn printed or handwritten text into machine-encoded text [19]. A perceptron or other machine learning algorithm is often used for OCR tasks to preprocess the image that will be read, pull out features from it, and classify them. The perceptron model is a good choice for OCR tasks with characters that can be separated in a straight line because it is easy to use and works well with computers.

7.2. Music Genre Classification

Perceptrons can also be used for music genre classification, which involves identifying the genre of a given audio track. A perceptron model can be trained to classify audio into already-set genres [20]. This is done by taking relevant parts of audio signals, such as spectral or temporal features, and putting them together. Even though more advanced methods like deep learning and convolutional neural networks often give better results, the perceptron model can work well, especially when only a few genres or features can be separated linearly.

7.3. Intrusion Detection Systems

Intrusion detection systems, or IDS, are used in cybersecurity to look for bad behavior or unauthorized access to computer networks. IDS can use perceptrons as classifiers by looking at packet size, protocol type, and network traffic connection length to determine if the activity is regular or malicious [21]. Support vector machines and deep learning may better detect things, but the perceptron model can be used for simple IDS tasks or as a comparison point.

7.4. Sentiment Analysis

Perceptrons can be applied to sentiment analysis, a natural language processing task determining the sentiment (e.g., positive, negative, or neutral) expressed in text. By turning text into numerical feature vectors like term frequency-inverse document frequency (TF-IDF) representations [22], a perceptron model can be taught to classify text based on its tone. More advanced techniques like recurrent neural networks or transformers have since surpassed perceptrons in sentiment analysis performance. However, perceptrons can still be an introduction to text classification or a simpler alternative for specific use cases.

8.1. The Evolution of Perceptrons to Multi-Layer Perceptrons (MLPs)

The perceptron model has been able to solve problems with clear decision lines, but it needs help with tasks that need clear decision lines. The introduction of multi-layer perceptrons (MLPs), consisting of multiple layers of perceptron-like units, marked a significant advancement in artificial neural networks [5]. MLPs can approximate any continuous function, given a sufficient number of hidden layers and neurons [23]. By employing the backpropagation algorithm, MLPs can be trained to solve more complex tasks, such as the XOR problem, which is not solvable by a single perceptron.

8.2. Deep Learning and Perceptron’s Legacy

The perceptron model laid the foundation for deep learning, a subfield of machine learning focused on neural networks with multiple layers (deep neural networks). The perceptron model was the basis for deep learning techniques like convolutional neural networks (CNNs) and recurrent neural networks (RNNs), which have reached state-of-the-art performance in tasks like image classification, natural language processing, and speech recognition [24].

In CNNs, the idea of weighted input signals and activation functions from perceptrons is carried over to the convolutional layers. To learn about spatial hierarchies in the data, these layers apply filters to the input regions near them. In the same way, RNNs build on the perceptron model by adding recurrent connections. This lets the network learn temporal dependencies in sequential data [25].

Deep learning versus other models: Google trend over time. Image created by the author following Carrie Fowle’s TDS Medium blog (link).

8.3. The Future of Perceptrons and Deep Learning

While fundamental, more sophisticated deep learning techniques have primarily eclipsed the perceptron model. But it is still valuable for machine learning because it is a simple but effective way to teach the basics of neural networks and get ideas for making more complicated models. As deep learning keeps improving, the perceptron model’s core ideas and principles will likely stay the same and influence the design of new architectures and algorithms.

This blog comprehensively explores the perceptron model, its mathematics, binary classification, and logic gate generation applications. By understanding these fundamentals, we have unlocked the potential to harness the perceptron’s power in various neat applications and even construct more advanced models like multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs).

We also compared perceptrons and logistic regression, highlighting the differences and similarities by examining the role of a perceptron as a foundation for more advanced techniques in ML. We extended this upon setting perceptron’s role in artificial intelligence, historical significance, and ongoing influence.

Let us remember that ‌perceptron is just one piece of the puzzle. Countless other models and techniques, either discovered or waiting to be, each with unique strengths and applications. Nonetheless, with a solid foundation provided by this tutorial, you are well-equipped to tackle the challenges and opportunities in your journey through artificial intelligence.

I hope this blog is engaging, informative, and inspiring, and I encourage you to continue learning and experimenting with the perceptron model and beyond. Embrace your newfound knowledge, and let your creativity and curiosity guide you toward the exciting world of AI and machine learning. Please share your thoughts and comments below!

[1] McCulloch, W.S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity Bulletin of Mathematical Biophysics, 5, 115–133.

[2] Rosenblatt, F. (1958). The perceptron is a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.

[3] The New York Times (1958, July 8). A New Navy Device Learns by Doing The New York Times

[4] Minsky, M., & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry, MIT Press.

[5] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors Nature, 323 (6088), 533–536.

[6] Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification (2nd ed.). Wiley.

[7] Novikoff, A. B. (1962), on convergence proofs for perceptrons Symposium on the Mathematical Theory of Automata, 12, 615–622.

[8] Rosenblatt, F. (1960). The perceptron: A theory of statistical separability in cognitive systems (Project PARA Report 60–3777). Cornell Aeronautical Laboratory

[9] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

[10] Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer,

[11] Rifkin, R., & Klautau, A. (2004). In defense of the one-vs-all classification Journal of Machine Learning Research, 5, 101–141.

[12] Minsky, M. L. (1961). Steps toward artificial intelligence. Proceedings of the IRE, 49(1), 8–30.

[13] Horowitz, P., & Hill, W. (1989). The Art of Electronics (2nd ed.). Cambridge University Press

[14] Mano, M. M., & Ciletti, M. D. (2007). Digital Design (4th ed.). Prentice Hall.

[15] Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F.,… & Modha, D. S. (2014). A million spike-neuron integrated circuits with a scalable communication network and interface Science, 345 (6197), 668–673.

[16] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.

[17] Nocedal, J., & Wright, S. (2006). Numerical Optimization (2nd ed.). Springer.

[18] Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press,

[19] LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation was applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.

[20] Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals IEEE Transactions on Speech and Audio Processing, 10(5), 293–302.

[21] Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., & Vázquez, E. (2009). Anomaly-based network intrusion detection: techniques, systems, and challenges Computers & Security, 28 (1–2), 18–28.

[22] Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10, 79–86.

[23] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366

[24] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436–444.

[25] Hochreiter, S., & Schmidhuber, J. (1997). long-term memory. Neural Computation, 9(8), 1735–1780.

Want to Connect? Follow Dr. Robinson on LinkedIn, Twitter, Facebook, and Instagram. Visit my homepage for papers, blogs, email signups, and more!