Topic: General history: Art of AI and Automation
Author: Andrew Liew weida
Hi my name is Andrew. What we want to do in the next half an hour is to explain the technology shift happening in the world and especially in Singapore. Singapore has been investing in some of the initiatives in becoming an AI Tech Hub in recent times.
So in that short time, Mobile and Cloud have become the 2 dominant platform shifts that creates tons of opportunities for the global society and other countries. Now as we speak, AI is dominating the agenda of all the tech giant companies. And they are the most important companies in Silicon Valley and outside Silicon Valley. Look at Sundar Pichai from Google and Mark Zuckerberg from Facebook These CEOs are saying that artificial intelligence are the hearts of what we are doing. AI is guiding our R&D agenda.
Is this something so new that we haven’t even heard of?
So it turns out that human have been dreaming about making machines that behaves like humans or exhibits human intelligence. The ancient Chinese civilizations have been trying to create mechanics that behave like automatons or autonomous tools. In science fiction, we have the Frankenstein movie and the recent Star Wars movies and so on. All these feeds the human impulse that maybe we can create something that behaves like humans.
That’s because the public don’t understand what is AI. How did AI get started?
Ok So what is AI? Artificial Intelligence is basically a subject that is exhibited when machines or applications exhibit or mimic human behavior or human intelligence.
So if AI behaves like the human being mind, then how should we think about AI?
Like a servant, our human mind can serve to help us and help each other. OR, our human mind can also destroy us when we give into the evil thoughts and the careless decisions that we did not think deeply and slowly about it.
This is also resonated by Christian Lous Lange who was a Norwegian historian and political scientist. Lange shared the Nobel Peace Prize with Hjalmar Branting in 1921. He believe that technology is a useful servant but a dangerous master.
The famous Ghanaian proverb mentioned a slave does not choose his Master. And because of that, we need to learn how to master AI. so how can we Master AI? If we don’t collective have the responsibility to share this knowledge and discuss about it, what would happen?
Henry Wilson Allen, the author of over 50 novels, a five-time winner of the Spur Award from the Western Writers of America and a recipient of the Levi Strauss Award for lifetime achievement, believe that knowledge should be shared so that humanity can better controlled its destiny. That’s why he say this: Keep it a secret and it’s your slave. Tell it and it’s your Master. In the same fashion, I’m here to explain the history of AI and what is AI in as simple as I can so that we hopefully can be its Master.
We think artificial intelligence have a profound impact to our daily lives in recent time and into the near future. Yet at the same time, a lot of my friends and the public is clueless about what is artificial intelligence and what is the history of artificial intelligence.
By understanding the history of artificial intelligence, we appreciate the implications of artificial intelligence and most importantly not worried about the effects of artificial intelligence because we understand the limitation of artificial intelligence through understanding the history of artificial intelligence.
7 Key Schools of thought
There are 7 key schools of thoughts. Let me quickly describe the overview about these schools of thoughts and then we dive deep into them with the stories and histories.
The most current one that we often hear about is machine learning and that is a class of study that use data to analyze, to decipher patterns, to learn, to predict and to automate tasks. Within machine learning, we have others but the main driving forces are deep learning and predictive analytics.
Before you can learn to predict or learn to automate complex tasks, you probably need to talk to the machine and the machine needs to understands you. That’s what Natural Language processing is about. Or NLP in short.
So how do you translate the verbal communication to digital bits ? That’s speech to text translation. What about the other way? That’s text to speech translation.
Suppose you get an expert. You codify their thinking processes and workflow using machine learning, speech recognition and NLP, you might get an expert system that can solve a problem like an expert.
So is driverless car an expert system? That’s if you add planning, navigation and optimization. Planning, navigation and optimization is a category of AI that figure out the possibilities of different paths and options under a set of constraints to reach a particular goal.
Now if you put these into a moving machine with smart sensors, you get robots. If you want moving drones or robots that can see, you give them vision. So these are the basic key schools of thoughts.
Have this picture at the back of your mind as I share with you the stories of these disciplines over time.
To understand the history we have to go back in time. There are currently 3 different stories that are commonly shared as the key historical chapters of AI. The history of planning and scheduling, the history of predictive analytics and the modern history of AI.
History of Optimization
Let’s begin with the history of planning, scheduling and optimization.
In 1754, Joseph-Louis Lagrange who was a mathematician discovered a method of maximising and minimising functionals. Lagrange wrote several letters to Leonhard Euler to describe his results. As such,Lagrange created the Lagrange optimization technique. Because of that creation, we are able to find the maximum or minimum value of a function by changing different parameters subject to various constraints.
Although Lagrange created Lagrange optimization, Isaac Newton developed the iterative approach to enable us to compute a value on an unit by unit approach to reach that answer.
Newton's method was first published in 1685 in A Treatise of Algebra both Historical and Practical by John Wallis. In 1690, Joseph Raphson published a simplified description in Analysis aquationum universalis. In 1740, Thomas Simpson described Newton's method as an iterative method for solving general nonlinear equations using calculus and notes that Newton's method can be used for solving optimization problems by setting the gradient to zero.
Yet, we know that it is important to be able to computationally program these methods. The breakthrough of putting these into programmable algorithm come from George Bernard Dantzig was an American scientist. He created this method as he was finding the best assignment of 70 people to 70 jobs. The computing power required to test all the permutations to select the best assignment is enormous; the number of possible configurations exceeds the number of particles in the universe. Dantzig created the simplex algorithm and laid the foundations for linear programming. The theory behind linear programming drastically reduces the number of possible optimal solutions that must be checked.
So to recap, the idea of computing incrementally to reach a goal originates in 1685 with the help of Newton, along the way the mathematical solution of optimization was conceived by Lagrange in 1754. It’s only in 1947, that Dantzig develop the simplex algorithm and laid the foundation of linear programming before we can apply optimization as an AI tool.
Now let’s turn to the history of predictive analytics.
History of Predictive Analytics
When you want to predict something, you need to first draw some inference from your past and then evaluate an outcome based on your inference. For example, if you notice that Singapore has been raining for the past 200 days out of 365 days, you will most likely predict that the chances for a rainy tomorrow is more than 50%. In order to reach that prediction, you associate Singapore with the historical probability of getting a rain. So how do we make the computer infer such a pattern and then make a prediction out of that pattern?
Turns out back in 1859, Charles Darwin wrote The Origin of Species and that first chapter was about "Variation under Domestication". Charles’ cousin was then [Francis Gold-ton] Francis Galton that Galton devoted much of the rest of his life to exploring variation in human populations. As such, he established a research program which embraced multiple aspects of human variation, from mental characteristics to height; from facial images to fingerprint patterns. This required inventing novel measures of traits, devising large-scale collection of data using those measures, and in the end, the discovery of new statistical techniques for describing and understanding the data. After examining forearm and height measurements, Galton independently rediscovered the concept of correlation in 1888 and demonstrated its application in the study of heredity, anthropology, and psychology. Galton's statistical study of the probability of extinction of surnames led to the concept of Galton–Watson stochastic processes. All of these laid the foundations for modern statistics and regression. So who put them into large scale applications especially business and economics applications?
Turns out that, Ragnar Frisch and Jan Tinbergen started using these methods to apply onto economic problems. They won the first Nobel Prize for economics in 1969 for creating econometrics , a class of study that use statistics and regressions to solve business and economic problems.
So these inventors created the techniques and the application of techniques. but, What about those that created the first statistical computing machines? Turns out that one of the early statistical computing hardware was developed by John Atanasoff.
John Atanasoff is a Mathematics and Physics professor at the Iowa State College. Atanasoff built his machine between 1940 and 1942.His machine was inspired by the Iowa State Statistical Lab, which is the first statistical lab in the United States and that was setup by the University of Michigan by James Glover, a professor of mathematics back in 1910. Atanasoff saw his machine within the context of a computing lab, much like the Statistics Lab, and that it would solve linear systems “at low cost for technical and research purposes.”
Modern History of AI
Now let’s turn to the modern history of AI and that can be traced back to the year, 1956. Coming back to the origins of AI, A group of scientists kicked off a series of research projects and the explicit goals of these projects is to program computers to behave like humans. So we have Marvin Minsky, John McCarthy, Claude Shannon and Nathaniel Rochester. All of them came together at Dartmouth and say let’s do research. And the aim of research is to create artificial intelligence that behave like human beings. So from the point of view of the academic discipline that later became the sub-discipline of computer science This marks the birthdate of AI
So what did they set out to do?
Their research agenda basically said that “ computer are sophisticated. Human beings are sophisticated too. Let's try to program computers to do the sophisticated tasks just like human beings.
So let's first try to see whether we can teach the computers to reason like a human being. In other words to see whether a computer can play chess. Or solve algebra word problems. Or proof geometry theorem. Or diagnose diseases. The computer is presented with a problem and we see whether it will reason its way to the answer. So here you see AlphaGo playing go. And this was the prototypical reasoning system that the computer scientist tries to design.
Another thing that we try to teach computers to do is to represent the knowledge in the real world. In order for the computer to understand and interact with people, that computer has to understand and interact with the real world. What are objects? what are people? What are language? All of these have to be programmed into the computer and to do that using specific languages like [list] lisp, which is a language that got invented for this purpose. If you recall one of the computer scientist, John McCarty, he is the inventor of [list] lisp. He was trying to teach computers about the real things in the world.
[Planning and Navigation]
The next thing that we need in our daily lives is planning and navigating. We want to teach computers to know how to plan and navigate around a wall. Something like this. How do we get from City Hall to SGinnovate? How do we understand where is the bus? Where are the traffic lights? How do we stop and what path should we take to get there? Where is the safest path to reach the destination if there are many ways to get there? So in the mid 1960s, there is a group of scientist that made SRI robots invented at Menlo Park with cameras to plan and navigate a short path.
[Natural Language Processing]
Another thing that we want to teach computers how to do is to speak language. How to understand a language? How to understand the context of a sentences? You and I use languages to express feelings and ideas around the world in very subtle and yet powerful ways.The goal is to teach computers to understand these in as much as we can. One of the first natural language programming system was built by IBM in Georgetown. It tried to translate English to Russian and Russian to English during the Cold War.
So another thing that we need to teach computers to do is to perceive things in the world. How do we see things in the world? How do we hear things in the world? How do we feel things in the world? The team at Dartmouth taught the computer to perceive the five senses of human beings. The most tractable problem that the scientists thought they should solve is sight. So Marvin Minsky from MIT created a project to focus on how computers perceive objects in the real world. So that experiment is basically getting a robot to figure out how to pick an object and put it on top of another object. And so the computer has to use a camera to recognize that object in order to pick that up and to recognize the position of the other object and then placing the first object onto the second one.
And the hope is that if we can teach the computer to perceive with the five senses, to understand languages in context, to plan, navigate and apply logical reasonings, then perhaps we are able to get the computer to exhibits general intelligence. So what is general intelligence? It is a set of intelligence that we human being have. It’s like having emotional intelligence, the ability to read , understand and response to another human being in a specific situation on a specific society using a specific set of moral values. It’s like having the intuition that is drawn from the data and being able to explain the intuition in a common sense knowledge to the counterparty or being able to act with ‘common sense’. We also hope that computer can apply creativity which is like the ability to see new patterns and to be able to be creative and apply that creativity to practical use. Eventually we will be able to create a robot that really think, behave and reason like a human being. It’s like the character, David from the prometheus movie, an android learning to drive a spaceship. Or the other character, an Android bartender, Arthur in the movie, the Passenger 2016. And that we can no longer tell the difference between a human being and a smart robot. This is the hope that we can teach the robot the basic knowledge of a specific subject that they would be able to learn, apply and think like a human on the same specific subject.
And that was the goal of a very huge ambitious undertaking that we are seeing in the world right now.
So let’s now begin to talk about the general history of AI winters. The period before each AI winter is a period of time where there is a lot of investment and a lot of resources, dedicated by the government and by the private sector.
Yet when you think about it in the 1960s the computers at that time was mainframes. And so the computer scientists at the time were very visionary. So what happened was a series of business cycles which is what we called it as AI winters. So people would produce some super compelling demos and a lot of research and resources started pouring into this space. And then these startups and research run the course using these resources and they go bust on each cycle.
And so during each winter, people are starting to ponder and feel delusional about the implications of AI research. And as we know is that these AI winter cycles happened more than just one time but happened for a couple of times.
I'm here to basically explain to you about the key AI winters. So in each cycle, I am going just to show you why people get so excited and why people suddenly get so this disillusion.
Bear in mind that the first AI Winter begins between the 1950s and the 1960s and there is this analogy to the nuclear Winters. If a nuclear explosion caught up in that area, no one would want to go there anymore and so this effect is the same as that we are seeing for each AI winter. The burst was so bad that people wonder if we should ever put resources into AI research. It was during each AI Winter that the fundings dries up for years and years and years.
So let me take you to the stories of these Cycles now .
[1st AI winter]
So the first cycle begins with the interest in translating between English to Russia and that is basically natural language processing. At that time, the Americans were getting out of the Korean War and enters into the Cold war where the missiles imported into Cuba from Russia and the Americans were scared. And so the first experimental NLP begins in 1954 . It seemed like a simple success of translating the first sixty sentences that companies and government dedicated a lot of resources. They were hoping that they would come with a generalized translational system. It turns out that it is an incredibly hard to do.
So to put that into context, so here’s a simple example, if you take out a Bible and you look at the following sentence is called The “Spirit is willing but the flesh is weak” and it's a metaphor. It is basically saying that we have the motivation to do it but we just don't have the discipline to do so.
And yet when we put that English words into the Russian translation and put it back . This is what we get when we put the Russian translation back to English, it says that the whiskey is strong but the meat is rotten. And if you think about it for a second, this is mistranslation and this is so obvious. That’s because we as human beings, understand contextual intelligence.
So back in the 1960s, NLP doesn't capture the contextual meaning of sentences. It just literally translates word for word. And a linguist would call this a semantic translation error but a computer scientist will call this a synthetic word translation. And the word for word translation at that time is not bad. And yet when you have a lot of errors and a lot of these errors in these sentences, you would get a very horrible body of text that doesn't capture the essence of the meaning that the original writer wished to express.
And this is what lead to the first AI winter. So the disillusion is about actually capturing the essence of the contextual meaning from a body of text, instead of a word-by-word translation.
It was the first beginning of the realization that computer scientists realized that there is a need to capture contexts, to capture idioms and the nuances of a specific language within the body of text. And during that time the automated processing language community said that the progress made on developing smart NLP capabilities was so slow that it makes no sense to inject further investment anymore. This begins the first AI winter.
[2nd AI winter]
Now let’s look at the second AI winter. So after the first AI winter attempt to translate languages, computer scientists decided to go for another approach. So they say “perhaps we should constrain the language translation into a specific domain.” And so the kind of projects that they begin moving and researching along this direction it's called microworld.
So the computer science community think in like this: let’s see by further constraining AI to a specific domain knowledge, perhaps we can make further progress in the AI space.
During that time, Marvel Minsky started some of these projects. He focused on developing the perceived capabilities of the robots. That’s like the robot looking at objects to the camera and having a specific context in that sense. So in that Microworld, the specific focus was to say to the computer: ‘pick up a block and put that block on top of another block.’ So the system has to figure out this possibility. So in that project, the second block is a pyramid and it is not possible to put a cube onto a pyramid and the system will reply that it is not possible.
At the same time, Joseph weizenbaum started a project called Eliza and Eliza is a common referenced project commonly taught in the current AI classes in the computer science faculty today. And the Eliza project is about talk therapy. Imagine you go and see a talk therapist. And you want to talk about a set of problems. And so this system is mimicking the behavior of a professional talk therapist or a counselor. So let me give you a simple example of how this talk therapist actually work.
So this is how the conversation would go and I'm going to do my best to do it slowly so you can keep up and so the typical order is the user would say something and then the computer will say something else in response. So imagine now you are sitting in front of the computer and you type this input.
You: Men are all alike.
Computer: in what way
You:They are always bugging us something or other
Computer: Can you think of a specific example
You: well my boyfriend makes me come here
Computer: Your boyfriend makes you come here
You: He say i’m depressed much of the time
Computer: I’m sorry to hear that you are depressed
You: It’s true. I’m very unhappy.
Computer: Do you think coming here will make you be unhappy?
You: I need some help and that might seem like a solution
Computer: what would it mean to you if you get some help?
You:well maybe i could get along well with my mother.
Computer: Tell me more about your family
You: My mother takes good care of me
Computer: who else in your family takes good care of you
So you see through this example that it’s incredibly life-like. You can imagine the exact same Real Talk therapist saying in the exact same conversation that the computer say to you!
People are very sophisticated and yet the computer can parsing that language. And you can see the above example is super compelling. But here’s the problem. The problem is that as soon as you start talking to that computer about anything else other than what the talk therapist talks about, then the illusion completely broke down. The responses become nonsensical.
Let me give you another example:
You: I wish to know about you
Computer: why would you say that i want to know me?
You: so I can write my report
Computer: what makes you believe that you can write your report?
You:I don’t understand your question, please clarify?
Computer: earlier like you said that you would like to know me
Computer: Can you elaborate on that?
You: I wish to know Eliza
Computer: how does it feel to want?
You: Want what?
Computer: maybe your plan has got something to do with this?
You: Errr, what plans?
Computer: Is it because you say about your life ? errr, what plans?
You: It’s because of you , I said that? Huh?
So you see that in any other context other than what Eliza was programmed to think, the system responses become incorrect , not relevant. It’s almost as if you are playing with the syntax of the English sentences and trying to trick somebody into misunderstanding the sentences. As soon as you try to say anything out of the programmed script, then the illusion that Eliza is a real talk therapist break down. In fact, there was a report published by the British government which states that AI has totally failed to achieve any of its grandiose objectives. And so this begins the second AI winter. No more funding. No more startups.
[3rd AI winter]
And then now we got the 3rd AI winter which might be the last winter.
This winter begins in the early 1980s. Yet, the activities of deep expert system actually begins in the 1960s and startups started coming in the 1970s and companies started dabbling in it in the 1980s.
So we suck at general machine translation as in we cannot turn any English text into meaningful Russian sentences and vice versa. We also can’t seem to be effective at the microworlds be it placing blocks of object or talking like a talk therapist. Maybe we should go the opposite way.
Why don’t we model on something sophisticated and incredibly complex behavior like diagnosing diseases or acting as a chemist? Let's see whether we can get into this sophisticated domain and program them so call Expert systems .
The intuition was that we find an expert and we interview them. By doing so, we try to understand their world and codified their knowledge into an expert system. And the computer tries to mimic this expert behavior.
Here’s a good example of this. In 1965 at [Edward Find-gen-bum and Carl Ja-sa-ri]] Edward Feigenbaum and Carl Djasorri wrote a system which would take the output from a Mass spectrometer to identify the molecules that are chemical compounds.
That inspired Edward schlife at Stanford in 1972 to write a program called Mycin to diagnose infectious blood diseases.So you feed a bunch of data that represents the symptoms and the computer diagnosed what kind of infectious blood disease it is. And it was about 50 to 60% accuracy. That level of accuracy is pretty comparable to doctors doing the same diagnosis so they got people excited again.
And so they think like that. Hey we can take this highly expert behavior and turn it into a set of rules that can make the computer to become an expert. And presumably the next expert system will be easier to build. If we just keep on iterating this, then we might write a thousands of these kind of systems and eventually we will get a full artificial intelligence system. And so having this excitement, IBM came out with this integrated reasoning system called the Shell and Reid Hoffman, the founder of Linkedin was interning on this system. The other thought was that if we can churn out these systems out very quickly, we can solve a lot of problems in fields where experts are in short supply .That thought might also get humans to the promised land of AI . And during this time there was a lot of startups that tries to build 1 AI deep expert system at 1 time. One of the most famous start up at that time was called the symbolic system . It was a lisp machine running on a programming language, invented by John McCarthy. The community was super excited that this was the path forward. Unfortunately what happened was building one expert system didn't give you a leg up in building other expert systems. At that time data storage was very expensive. Data Computing was expensive and everything was in silo. Obviously you have to build each system by itself and they cannot talk to each other and this means that such a path forward was almost unscalable. Furthermore, most of the startups were not founded by founders with Deep domain expertises. So the startup has to find the expert, talk to them decode the knowledge code it into the system and this entire process was very long tedious in terms of time and resources. And this led to the collapse of the expert system and the start of the third AI winter .
So we have this 3 boom and bust Cycles, 3 AI Winters. If you read the Wikipedia and do a bit of research you'll find that that's a lot more AI winters. Nonetheless, these are the three major dominant AI winter business cycles.
[Deep learning: beginning of AI spring?]
In very recent times, we hit a breakthrough. OR maybe we are getting out of future AI winters. And I want to describe this breakthrough to you.
So what is the this breakthrough? So this breakthrough is deep learning and it's one of the class of machine-learning algorithm.
Let’s recap the previous techniques that I talk about where attempts to talk experts and learn about how does expert behave, and then going ahead to codifying these rules and then putting them into the computer.
Unlike the previous techniques that I talk about, these deep learning techniques models upon the human brain and then help the computers to learn like babies via a set of data.
This is the opposite approach to the previous mentioned techniques in the previous AI winters. The idea originated in the 1940s with 2 researchers Mcculler and Pits who proposed this idea of modeling data structures and algorithms on human brains what we call neural network today.
Many researchers elaborate on those ideas to make the algorithms faster, more accurate and make better predictions since I'm not trying to do a comprehensive history on Deep learning or Neural Network. Let me give a couple of key stories along the way to give you the gist of the most recent times.
Ian Laocoon who runs the Facebook AI Labs use the neural networks to recognise the handwriting of postal codes. His method can take a handwritten address of a postal code and automatically digitizing them. This is done in the 1980s.
Picking up from that, Geoffrey Hinton and Yoshua bengio works on Deep belief network. Geoffrey Hinton is part of now Google AI team and Yoshua bengio is in University of Montreal. Their research leads to Google search voice. So the speech to text translation is a direct descendants of the deep belief neural networks.
Another important significant contributing researcher from Germany is Jürgen Schmidhuber and his work is about Recurrent long short term memory with deep feedforward neural networks. The basic is that your brains has many neurons connected to many other neurons. So why don't we write an algorithm to mimic the neural networks in the brain that interact with each other in the nodes.
So continuing the path on the history of neural networks, Google decided to run an experiment. And as you have might expect that Google can contribute by using its economics of scale to run seemingly unfundable projects.
And there are 2 dimensions of scale here.
The first dimension of the scale is around data . Google has a lot of emails, a lot of videos and a lot of Search results. Andrew Ng use Youtube videos for his experiment. At that time he was a Stanford Professor so he got 10 million videos for his training set.
The second dimension of the scale is computing power. Google love distributed computing. He apply neural networks on the 16,000 cores computers on the 10 million videos for 1 week. Guess what ? Guess what they found? So the first thing you find is Cats. people love uploading cats videos. On a 20,000 objects in the videos, the NN recognise 16% of these objects. Thousands of objects got recognized by this neural networks.
So here's the fascinating thing. We did not have an expert to teach you to identify a cat. We basically feed a bunch of data into the model and the model learn to categorize the inputs and so in this case to recognize the cats and other similar objects. This is done without any guidance from an expert and without any rules. And so this forms the heart of the revolution. This is the big breakthrough of artificial intelligence. Just fit a bunch of data and the computer will learn to classify by itself.
So if you ask the question why are people so excited about neural networks or deep learning at this very juncture of time when we have been working on this since the 1940s?
Here’s the answer. The answer is scale. Comparing to Ian Lancoon experiment, Andrew’s experiment has 1 million more computing cycles, 33,000 more pixel of data. So you can see that as computing power gets cheaper and cheaper and as storage cost gets cheaper and cheaper, the resources required to train computers to learn using deep learning becomes cheaper and cheaper. Therefore we start to see a lot of startups and projects using Deep Learning.
So let me give you a sense of what is deep learning. I'll be using tensorflow which is a Google deep learning tool to show you. So the objective here is to learn to figure out whether we can classify or distinguish a set of patterns from a bunch of data. You can sort of see the portion of blue and orange using the neural network. So let’s get started.
On the right hand side we see orange dots and blue dots. So the orange dots and blue dots are data. Think of the orange dots as spam and the blue dots as non spam. Or the orange dots are offensive forum posts and the blue dots are non offensive forum posts. Or orange dots are the cats in the youtube videos and the blue dots are the dogs in the youtube videos. All of these categories are represented by these dots. The job of the neural network is to draw a boundary around these dots.
On the left side, you can see that we select different types of data. The data can be email messages, forum posts or pictures that we are trying to categories.
Now let’s start to build the Neural network. So you can see that I’m adding layers and each layer represents a layer of neural network. This is why we call deep learning and deep represents the additional multiple layers that we are adding to the network. Then I am going to add a number of neurons to each layer. As I am adding an extra neuron to that specific layer, the neurons are being connected. What is going to happen is that the neural network is going to train itself given the input of the existing data that we have feed to it and that is going to set connection strengths between each of the node.
So you see that as I feed data that blue and orange line connecting to each of the node will become darker or lighter depending on the connection strengths between each of the node. Now let’s see what these neurons are fundamentally doing.
So I am going hit the play button here to start feeding the model data and what you will see is the connection strengths between all of the nodes are going to be adjusted and then the output that’s result will appear as shaded portions and the goal for the network is to draw a boundary around the blue dot in the middle and the orange dots separately.
Notice that I only need to get a few iteration to draw that boundary. So that gives you an intuition of what these tools are doing. And that is I haven’t told the system what I am trying to accomplish. All i have done is fed a bunch of data to the system into the specific data structure and the computer has itself learn to set the connection weights between all of these nodes in the network so it can mathematically draw the boundaries between the blue dots and the orange dots.
I hope you can sense how the data structure and algorithm are doing in a Deep learning system and what can they do. To give you a sense of the scale, we have a couple of hundreds connections up on the screen between the nodes of the network. If you are to buy a Nvidia Drive computer Card and run this, it most probably have about 27 million connections. If you recall the Andrew Ng Youtube Cat project, it has 10s of millions of connections. If you are feeling a bit intimidated , you will know that your brain visual cortex has 10 to the power of 6 times more neurons than that! It will take awhile before we can build a deep learning system to be as complicated as our human brain.
Did I hear someone say Phew! What a relief! That’s one perspective. The other perspective is oh! this is exciting to think about that we are on that path to building more sophisticated systems through bigger data and bigger neuron networks!
So how useful is Deep learning in everyday life? If you have a smartphone, you will have Siri or google voice uses machine learning. If you are buying something from Amazon, the next recommendation comes from a recommendation engine that use machine learning. If you take Grab or Uber, the pricing mechanism is powered using machine learning. Because deep learning has become so productive at making applications better, companies are seeking to apply deep learning and machine learning in every business unit, every department, every country.
So where is Deep learning going to take us to?
Let me give you an example of deep learning in navigation search in autonomous driving. George is the CEO of Comma.AI. He build a self driving car all by himself. Let me put this in context.
In 2004, Dartmouth sponsor the first grand challenge of building a self driving solution to navigate the mahari dessert. They got 20 entries from leading universities and startups at that time. The leading entry was from Carnegie mellon and the self driving car call Sandstorm go for a 7.5 miles out of the 100 miles in the dessert.
In 2007, Dartmouth sponsor a 60 miles city course challenge. CMU took the top spot and finish the course. So you can see the progress of autonomous driving with deep learning. Previously only 7.5 miles in 2004 , now 60 miles in 2007.
This set off an arms race for autonomous driving program. Now every huge major manufacturer has a huge autonomous driving programs with thousands of engineers. You can recall Tesla, Grab, Uber, Google and they have thousands of engineers working on this.
NOW George has build a self driving cars ALL by himself using deep learning techniques. So you can see how advanced these techniques and data can enable 1 man to achieve this feat when there are thousands of engineers working on the same staff in some of the best labs in the world!
George is able to do this thanks to open source. By the way, George solution works! So that shows the promise of deep learning that one guy can build a self driving car all by himself.
And there are many other great examples of deep learning in NLP, perception. So moving from here all the corporate and venture capital companies are looking at startups and their products to see whether they have incorporate some deep learning techniques or machine learning techniques as part of their solutions. So it seems that we are in the AI spring and that deep learning has make a fundamental breakthrough on all of the AI sub-disciplines
Hold onto your horses! Hold on. YET I want to let everybody know that as much as this is still exciting, we need to have some cautious about the limitation of neural networks or the limitation of deep learning. If a pure deep learning application really works as if a child is learning stuff by observing all the data and making sense of it without any experts guiding it then it will behave like this story .
So I was reading to my baby nephew about a story of a car that gets stuck and the car receive help from another helper. And the baby recognize that if he gets stuck he will ask for help and any loving person will come and help him.
So one day he went cycling down the void deck. He was cycling very fast that his head hit the edge of the wall and his head was bleeding profusely. Fortunately his mom and his dad are medical professionals and they decided to stitch his head up. That leaves a small scar in his head and he cries a lot. And so I saw his head and I asked him what happened to your head .
So instead of telling me that he hit his head against the wall because he cycle very fast and ramp to the edge of the wall. Guess what he said? This is what he said. He the point to his head and say Papa Mama make me pain. And so what does this example illustrate? It illustrates the importance of really understanding how the child learn and how the child tell the story. If I have not asked his mom and dad what has happened, I would not have any idea that the driving force behind his head injury is his fast cycling and his need for his parents’ attention. Because the child is so young, he learn to tell from his recent memory instead of reveal the mechanics of his fall.
And so let’s come back to learn about the cautionary message behind the let’s just nail every problem using deep learning. So like the above example, suppose you just purely apply the current deep learning approach on its own without having a deep understanding of what you are doing. While you might know the algorithm, there might come a day that you fit deep learning on the data and something happen and you don't really know how and what causes the system to response in ways you don’t understand. So deep learning sometimes does not allow you to interpret the mechanics of your results. This is the part that is lacking or what we call interpretability. And it is very important to learn to interpret the model, to know what triggers the outcome so in certain use cases we might be able to really get a better sense of how do we apply deep learning to solve problems. And in this case that is very different from the expert system or the supervised machine learning where you basically understand and able to interpret the results because you know what is in the system and the Deep learning model is what we called the black box model while by contrast , the other model is the white box model.
Having said that deep learning is still one great tool to the other tools of AI to solve problems. Like the old management saying, if you only have 1 hammer in your toolbox , everything looks like nail. As such, it is important to understand the business problem, make sense of it , use various computer science techniques , use skills like critical thinking, design thinking, econometric thinking to really build a deep solution to that problem.
Thank you very much and we have come to the end of the podcast.
I hope you enjoy this podcast, please share it so we can build a better world.
This is Andrew Liew speaking and have a day ahead.