Alan Turing OBE (Officer of the Order of the British Empire) FRS (Fellow of the Royal Society) born 23 June 1912 and died 7 June 1954 was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist, or what we would call today an under-achiever. Mr. Turing was highly influential in the development of theoretical computer science, providing a formalization of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer. Turing is widely considered to be the father of theoretical computer science and artificial intelligence. At one point he made the statement “A computer would deserve to be called intelligent if it could deceive a human into believing it was human”. The basis for what became known as the Turing test. If you’ve seen any of my presentations on Artificial Intelligence (AI) I use the Goggle assistant as an example. See https://www.youtube.com/watch?v=-RHG5DFAjp8 to hear the Google assistant call. I’m sure you will agree the Turing test has been passed.
Alan Turing was discussing this idea back in 1950 so AI has been around now for 70 years. Today AI and its subsets of machine learning and deep learning are all over the Internet and the new hope is that these systems will be able to solve mankind’s greatest issues such as the climate, food, energy, and transportation. One may ask why the sudden excitement given its 70-year history? For me, it involves the 3 requirements for AI which are mathematics, massive data for training, and compute power. This article will discuss these three.
Besides the prodigious work done by Alan Turing, you should be aware of a breakthrough by the team of Warren McCulloch and Walter Pitts who, in 1943, proposed the McCulloch-Pitts neuron model. The model was specifically targeted as a computational model of the “nerve net” in the brain. See Figure 1.
Figure 1 McCulloch-Pitts neuron model
This mathematical model of how a human neuron works would allow, potentially, artificial intelligence. A machine could learn like a human being learns, perhaps. Based on this concept Frank Rosenblatt built a machine known as the Perceptron. It was a machine you didn’t program but trained. In an example I use (see: https://www.youtube.com/watch?v=cNxadbrN_aI&t=7s ) it is trained to distinguish between men and women. It is given a lot of photographs during the training period and told whether they are male or female. After enough training, the Perceptron is able to accurately determine if a photograph is of a man or a woman, most of the time. This was a very promising technology and proved the McCulloch-Pitts model and that machines could learn, instead of being programmed. However, there were issues. First and foremost in the 1950s and 1960s, this was very expensive. But what really slowed progress was a paper by Marvin Minsky and Seymour Papert which discussed some of the limitations of Perceptron’s called “Perceptrons: an introduction to computational geometry”. It has been argued that the paper was the reason for what is known as the AI winter. This was a period of about twenty years when funding for AI virtually dried up. Mathematics marches on however and although there are many things to note two items of significance are the multi-layer perceptron and backpropagation.
The original perceptron designed by Rosenblatt and based on the McCulloch-Pitts model just had an input area and output. This meant that what it could be ‘trained to do’ was very limited and binary. With the invention of the multilayer model much greater capabilities were opened up. This type of model is closer to the human brain which uses a multilayer approach with neurons signaling other neurons to arrive at an answer. See Figure 2.
Figure 2 – multilayer Perceptron
The layers separating the input from the output are called hidden layers and there can be many hidden layers. As the number of layers increases (more than 3), it is considered to be ‘deep’ which we will see is the basis for the AI method known as ‘deep learning’ but more on that later.
Another breakthrough that allowed AI to get the winter behind them and move forward was a tuning technique called backpropagation. What happens when a machine learns is that weights are assigned to each node or neuron (the blue circles in Figure 2). Based on the input layer, weight is given to each of the blue circles in the first hidden layer (note that each blue circle in the first hidden layer is connected to each input node. Based on the input and the training, different weights will be assigned to each node in the first hidden layer. All the nodes in the first hidden layer are connected to all the nodes in the second hidden layer and weights are assigned to them also based on the training. When training is started these are simply estimates (or guesses). As more training is done the accuracy is increased as the weights are tuned. So if a picture of a cat is input on a trained system, the weights should lead to a cat output. Backpropagation is a method for fine-tuning the weights assigned to each node. Once the weights are established, in our case left to right or from input through the hidden layers to output backpropagation goes backward through the network to fine-tune the weights. This method has decreased the learning time and increased the accuracy of the models.
At this point, I’d like to turn to the second requirement for AI which is massive amounts of data. I’m sure we have all heard the predictions for data growth. It didn’t seem like so long ago that a Gigabyte was a lot of data. Now we are discussing Exabyte and beyond. The amount of data is overwhelming and we have created a situation where only computers are fast enough to sift through all this information. To make any sense of it will require AI. This is probably a good time to define a few terms. When you train a machine or AI system it is done generally in three different ways. Supervised learning, Unsupervised Learning, and Reinforcement Learning.
Supervised learning is when we train a machine with known inputs. In the Perceptron example above, pictures were given to the Perceptron and identified as being male or female. In supervised learning we are providing known examples to the machine. They are labeled (in the picture case male or female). The machine is provided a training set. A training set is a number of labeled pictures used as input. The machine reads them all in setting and adjusting its weights based on the inputs. Once this is done a test set – pictures it has not seen – is given to it and the data scientist will see how accurately the machine identifies the test set. If the accuracy is high enough the machine is considered trained and can start processing live data. So to take a more tangible example a machine could be trained on fraud patterns provided by labeled transactions. Once it proves to be accurate in identifying fraud using test sets actual transactions can be sent through it to check for fraud in real-time.
A second method is called unsupervised learning. As it sounds the machine is not provided labeled data but is simply given raw data that it evaluates on its own looking for patterns and correlations. A training set is still used. This type of learning is useful in areas such as recommendation engines “People who bought x also bought y and z”. It is also useful for customer segmentation and intersectionality (age, gender, salary, education, etc). In short the machine determines what data is like other data – think of a heat map.
My last method is called reinforcement learning. This is especially good for things like a rumba vacuum. The vacuum goes in a direction until it hits something. It will create a map of the room over time and be able to vacuum around tables, chairs, furniture and know the rooms dimensions. Although great for vacuums and robots we wouldn’t want self-driving cars to learn this way. Smile.
You have probably heard the terms Artificial Intelligence, Machine Learning and Deep learning. Let’s break those down a bit. Artificial intelligence is the term used for everything. Machine learning and deep learning are merely subsets of AI. Think of it as the umbrella term. Starting in about the 1990’s as the Interent is ramping up and Moore’s law is going like gangbusters Machine Learning (ML) gets going. Machine learning is the idea of using particular algorithms that may be tuned against a particular data set to derive useful information and insights. Some of these algorithms include: Linear Regression, K-means, Naïve Bayes, self-organizing maps, etc. For details feel free to Google any and all. This is a very shortlist. Within a company, the data scientist would have a lot of data and would want to create some insights using the data. One of the many things a data scientist would do is determine the best algorithm to use. The correct one will likely provide great insights. The wrong ones will provide garbage results. A really good data scientist might determine a linear regression model will yield the best results and will be able to tune it to get even better results. If your child is in school and good in math/statistics, we will need lots of data scientists for a long while and they are pretty well paid.
For me, Deep Learning (DL) is in areas where the machine would take in information similar to a human. We collect our information through our eyes and ears so visual and audio learning fall into the area of deep learning. Generally, there are many hidden layers in a deep learning model and based on the evidence the more layers the better it gets. In this area, there are two big models currently. A Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN). An RNN is a deep learning system that needs to remember information over time. Think of audio input and imagine me speaking a sentence like “My Aunt Betty lived in Georgia until she was 21 then she moved to Florida met and married Steve”. A neural network takes in a word at a time, just like we do. By using the RNN method the machine can keep track of keywords and concepts. We easily know that the word ‘she’ in the sentence refers back to Betty but a machine needs to remember that association. Additionally words such as “Aunt” “Betty” “Georgia” “21” “Florida” “Married” “Steve” may be important depending on what the machine needs to learn. By using RNN techniques the system will know there is an association (marriage) between Steve and Betty and that Betty must have been older than 21 when she was married.
A method used for visual information is called Convolutional (CNN) and processes pixels for understanding. This is the best method for facial recognition and self-driving cars. It has the ability to take in information and process it quickly. Think of car cameras looking in all directions, calculating the speed of everything while also scanning for signs, pedestrians, and bikes. It’s a lot to process which is why we have so many commercials on not texting, eating, or drinking while driving. It’s pretty hard for us and so far our brains are way beyond the best super-computer. The advantage the machines have is that, generally, they specialize. All a self-driving car does is drive – no distractions. That being said, I have participated in an MIT project to try and provide moral principles to the self-driving car. This involves what to do in a situation that will likely cause injury or death. For example, a self-driving car has 3 people inside and the brakes go out as it’s heading toward a crosswalk with people in it. Assuming no ability to warn the pedestrians should the car veer into a wall possibly killing the passengers or go into the walkway possibly killing the pedestrians? What if there are 2 in the walkway? What if there are 4? Does the age of the people in the vehicle or in the walkway come into play? Do the careers of the people in the vehicle or in the walkway come into play? These are thorny issues.
No mention of deep learning would be complete without acknowledging the contribution by Fei-Fei Li and ImageNet. A professor of computer science at Stanford University in addition to being Co-Director of the Stanford Institute for Human-Centered Artificial Intelligence, and a Co-Director of the Stanford Vision and Learning Lab, she is clearly another under-achiever. In 2006 Fei-Fei had an idea to create a massive image database At that time there were pictures all over the Internet but they were not classified (labeled). Fei-Fei managed to get funding to have pictures labeled and placed into the ImageNet database. It grew to millions of labeled pictures which could then be used for training systems. ImageNet was and is a massive amount of labeled image data. She started the ImageNet challenge which was about which team could do the best job in correctly identifying pictures. Now a human being is about 95% accurate in identifying pictures. The first contest was held in 2010 with the winner achieving 72% accuracy. The next year it was slightly better with 74% accuracy but the following year a team using a CNN won with an accuracy of 84%. Their system used a model with 8 hidden layers. More layers were added each year until 2015 when the winning team had an accuracy of 97% (better than human) using a model with 152 hidden layers. This work really laid the foundation for facial recognition and autonomous driving. See figure 3.
Figure 3 Self-driving real-time identification
Our last area for successful AI is compute technology. We are all familiar with Moore’s law that compute capabilities would double every 18 months. This held true for much longer than Gordon Moore believed thanks to miniaturization and multicore technology but depending on who you believe Moore’s law came to an end in 2013 or sometime thereafter. We can attribute the continued compute increases for AI through advances in GPU technology. Originally used for gaming, GPUs were used to process graphic information. What became clear was their ability to process heavy compute cycles. This is exactly what is required for deep learning applications. In figure 3 we see a real-time view of a street and the need to process everything on that street in real-time. HPE with its acquisition of SGI along with the recent acquisition of Cray has two powerhouses in terms of AI/ML/DL. The Apollo line along with Cray provides massive supercomputing to process today’s Exabyte workloads.
As I said initially for AI to be successful it required mathematics (check), massive amounts of data for training (check) and blazing compute power (check). So I believe we can see why there is so much discussion around AI since data and compute are catching up to the math. But are we destined for some Skynet future? I am hopeful we are not. Most people know that in 1997 an IBM system known as ‘Deep Blue’ beat Grand Master Chess champion Garry Kasparov. What most people do not know is that Garry went on to create a new chess league known as Centaur. His position was that ‘Deep Blue’ had access to a massive number of historical chess games which Garry could not possibly keep in his own memory. He suggested a combination of machine and Grand Master (Centaur) as the basis for a new league. A grandmaster would receive a recommended move from the system. The master could accept the recommended move or decide to make a different move. When Centaur’s play machine only systems, the Centaurs usually win. So man plus machine is better than man versus machine. That’s what I hope for as the future of AI.
Some may recall the NonStop strategy known as ZLE – the Zero Latency Enterprise which was a Gartner concept proposed by vice president Roy Schultz. The idea proposed back in the late 1990s was the elimination of batch processing and the movement of data within an organization to where and when it was needed. At that time it was called real-time or right time data movement. NonStop, being a transactional system, was well-suited to this just-in-time data movement and developed several massive demonstrations and several large customers. Later in the 20-teens during the ‘Big Data’ era, I had suggested NonStop in the ‘data in motion’ space which was not dissimilar to ZLE. We had some initial pilots with startups around this idea and I still believe in NonStop as an enhanced message switch with an underlying scalable database. So an idea I have been proposing is something I’ve called the Artificial Intelligence-Driven Enterprise (AIDE). This is the marriage of transactional, real-time information ‘in motion’ being analyzed by AI systems. Imagine AI systems ingesting a company’s application system information and beginning to understand how the company operates. Information collected on competitors and best practices might suggest beneficial changes for the company. See Figure 4.
Figure 4 – Centaur business model
In figure 4 we can see the progression from early business days of best guesses to the delivery of early KPM’s (Key Performance Metrics). As graphic capability grew the visualization of the state of the business became easier to read and understand but it was still just a visualization of what had happened. As we used to say a rearview mirror look at the company. Once an AI system is trained a data scientist can start playing some ‘what if’ games. What might be the outcome given 3 different strategies for the upcoming year? Similar to the Centaur chess teams might an AI system ‘recommend’ a business strategy that the executive can accept, modify, or reject? For smaller or remote business units might an AI system once trained be able to make most decisions on its own? Imagine NonStop as the guaranteed collection point for corporate information where it is cleaned and transformed before hitting the AI systems which will model and improve the information provided. This information can tweak business units and adjust supply chains to drive more efficiencies within an organization. The combinations of NonStop and Apollo high-performance computing systems could prove very powerful within organizations wanting to take advantage of artificial intelligence in driving better business outcomes.