Distributed Artificial Intelligence.
Table of Contents
The similarity between Brain and Deep Neural Networks
How does the Deep Neural Network work?
Input Data Preprocessing and Filtering
Introduction
In this article, I am describing an alternative way of thinking about machine learning and artificial intelligence (AI).
Most scientific papers focus only on a singular machine learning model to solve a specific problem. The media and literature, with the omnipresent robots, reinforce that mental shortcut of self-contained AI.
For me, that all-encompassing, über-model approach to the machine learning problems is not practical, the biology can pack more petabytes of data per gram of living mass than anything humans are able to create.
Instead, I am suggesting a more nature-based approach based on how our brain thinks without the human limitation of space and time.
The similarity between Brain and Deep Neural Networks
First, let’s consider how the brain learns.
As children, we have many more neurons in the brain than adults. Kid’s neurons fire at random which results in their “silly”, yet all-important important behavior. Silliness allows kids to try new things. Occasionally, a particular neural path receives more positive feedback than others, and the synapsis that connects the two neurons gets flooded with chemicals that strengthen the path. If the action is not repeated, we most likely do not remember it, but if it is repeated frequently, over time, it becomes semi-permanent. With age, the synaptic connections that are not being used weaken and disappear. The highly practiced skills become effortless.
We trade some of that flexibility for specialization, but also rigidity.
How does the Deep Neural Network work?
The way that Deep Neural Networks (DNN) work is very similar.
The DNN is nothing else than a polynomial or set of mathematical neurons that transform the data by simple multiplication and addition into the desired outcome.
Each neuron is called a node, there can be thousands of nodes in a network. Comparably, there are about 100 billion neurons in the human brain.
At first, the neural network nodes receive some random values which we call “weights”, the network at this point is an infant.
During its training, the network receives a multitude of sets of data and analyzes each by trying out all possible chains of nodes, and constantly comparing the output with reality. The results for random weights are usually abysmal, so the weights are adjusted, usually using a technique called gradient descent. If a particular chain of nodes gets a positive result, it is noted. At that point, the weights may still be slightly changed to optimize them for the best results, but these changes are increasingly smaller. The learning continues until the system gives satisfactory results, or runs out of time and resources. That is a very simplistic view, but it suffices.
The process of DNN training gives excellent, better than human results, but for a very particular application such as object recognition. At this time, however, computers are not a match for general-purpose thinking. Let’s analyze a few factors.
Terminology
In this article, I will use prefixes to indicate large numbers.
Many people commonly use Mega (i.e. 10-megapixel camera) or Giga (i.e. 1-gigabyte file), or even Tera (i.e. 1-terabyte disk).
However, for clarity, I am providing a reference table.
- deca, 1 da = 1⁰¹ = 10
- hecto, 1 h = 1⁰² = 100
- kilo, 1 k = 1⁰³ = 1,000 or thousand
- mega, 1 M = 1⁰⁶ = 1,000,000 or million
- giga, 1 G = 1⁰⁹ = 1,000,000,000 or billion
- tera, 1 T = 1⁰¹² = 1,000,000,000,000 or trillion
- peta, 1 P = 1⁰¹⁵ = 1,000,000,000,000,000 or quadrillion
- exa, 1 E = 1⁰¹⁸ = 1,000,000,000,000,000,000 or quintillion
- zetta, 1 Z = 1⁰²¹ = 1,000,000,000,000,000,000,000 or sextillion
- yotta, 1 Y = 1⁰²⁴ = 1,000,000,000,000,000,000,000,000 or septillion
Ok, that should be big enough, for now. I run out of fingers and toes anyway.
In addition, I use:
- OPS = numeric (integer) operations per second, e.g. 2x2
- FLOPS = decimal number (floating point) operations per second, e.g. 2.5x2.5
- Hertz, 1 Hz = doing something once per second
FLOPS, of course, require much more computing power than just OPS, but surprisingly OPS suffice perfectly well for most machine learning applications.
Number of Neurons
As already mentioned, a single DNN has thousands of neurons, whereas the human brain has about 100 billion. The is 1⁰¹¹ brain cells or neurons.
The amount of DNN networks we can run is only limited by the computational power and time. Today, it is entirely feasible that I could afford and easily build a system that could run a system that runs 1,000 neural networks at the same time (i.e. in parallel), my current problem is that I am underutilizing the computer power I already own. The idea is still valid.
At this point, I am not concerned about the amount of processing power per volume of space because we do not have to store all of it in the brain-size head of the robot. On contrary, distributed computing has its advantages.
Energy Preservation
We have to ask ourselves how much we are willing to spend on day-to-day operations.
Our brain consumes somewhere between 20 Watt with the snooze button on, and up to 100 Watt when working on a hard, multi-dimensional problem. That is very efficient and rather thrifty, in fact, the early evolution might not have afforded humans bigger energy expenditure.
To compare, a powerful workstation designed for machine learning training consumes closer to 1,000 Watts. The biggest super-computers require MegaWatts of power and cost millions of dollars in annual operations. This, in turn, is cost-prohibitive. In the future, we should expect a lower level of energy consumption using RISK ARM core-based architecture as mobile, some of the supercomputers and some cloud servers already do.
Also, a proliferation of cheaper power sources will make vast computing power accessible. A 4-acre solar farm in the desert can provide a MegaWatt of power at a relatively low expense.
Enough Speed
Next, you have to ask yourself how fast you can process the data.
There are a couple of ways to describe the processing speed of the system, given the same throughput of data we can talk about cycles per second. Human neurology (the wetware) is massively parallel, but very slow running at a rate of about 200 Hz. Computers can run as fast as 4 GHz. This metric however fails to explain anything.
We know we can drive a car by processing at the same time the following inputs:
- 2 high definition movable cameras (i.e. our eyes) which some estimate at 572 Megapixels each, or 50 times better what your iPhone has
- 2 microphones, i.e. your ears
- millions of touch, temperature, smell, and taste sensors spread across your body
- 2 IMU or inertia measuring units located in your inner ear
The amount of processing power required to make sense of the above is mind-blowing.
Computers are ever faster at processing data at the rate of hundreds of trillions (or tera) calculations per second, and that is multiplied by the number of parallel cores in the given system, and that is the case for every single autonomous car of the near future.
This technology is becoming increasingly inexpensive, today, even a student can buy 100 TOPS of hardware, which is enough to run 200 neural networks at the same time.
In the near future, maybe when you are reading it, we might commonly talk about PETA operations per second (POPS) available to any research organization. That is 1⁰¹⁵ operations per second.
This is staggering compared to our “wetware” running at less than 200Hz. Computers, in theory, are millions of times faster.
The Problem
So what is the problem with computers? Actually, there is none. There is a problem with humans. We are impatient and we do not architect machine learning properly.
Training Expense
It is a general opinion that training a machine model takes time and is energy consuming, but really, is it?
Let’s take the 2 most well-known examples, the Google Alpha Go (Zero) and IBM Watson, the Jeopardy winners. The last one used 2,880 CPU threads and 16 TB of RAM, that is a lot. It is expensive, too.
From the examples above we know we can teach computers the skills that were thought to be specifically in human intelligence domain.
At the same time, teaching the human player is far more expensive, and risky, then teaching the computer. Things that would take us a decade to master only takes weeks on the computer.
Let’s imagine that we provide a new massive amount of data to a computer for 12 hours a day for 10 years. What would you expect? Let’s break it down.
Minimum Activation Model
The human brain is not all busy at the same time, only some centers are active. Depending on the context of the incoming data and desired output, a particular center is active.
The architecture of the computer AI has to be similar. There have to be centers that specialize in a very particular task. Training the huge and complex models is hard, if not impossible. The task of training very specific models is relatively easy.
Input Data Preprocessing and Filtering
The human body has many sensors, such as touch, smell, temperature, and more, but we only feel the first moments of activation of the sensor and ignore the rest. The input data that a human receives is heavily filtered. When we talk about vision, the human eye can see in hundreds of megapixels (estimated 324 Megapixel). The brain, however, focuses on a small thumbnail of that at the time.
In machine learning terms that is equivalent of running object detection on the image using “You Look Only Once” (YOLO) bounding boxes. Then, selecting which few objects are interesting, cutting out the bounding boxes and passing the smaller images to more specialized machine learning models. Only the models that are needed should be active at the given time.
Distributed Model
The next logical question is how can I load all of this into a computer? Google and IBM computers are all distributed and connected with Gigabit fiber optics. Our devices such as laptops, phones or cars are not high-speed connected, yet.
I have been ignoring all solutions that cannot be run on the local Android device. However, I have realized that by the time I perfect my models there will be the prevalence of 5G connectivity with ultra-fast data transfer speeds.
Remember, you can already stream 30 frame-per-second videos and audio today from the service across the country to a phone’s hot-spot and from that to a tablet and all of that while driving kids to school.
When we consider the ultra-high-speed connection, it will be very feasible to run a low power 8 or 16 core Android device that captures the audio, the video, and many other streams of data inputs, then locally detects what is important and passes subsets of that information to the cloud machine learning models that specialize in that particular aspect. These models will be relatively light and answer quickly with an even smaller set of information that will be combined and further processed by other specialized nodes until the output is achieved.
When the device is in the “No Internet” zone, the more powerful onboard computer activates (additional cores) and does essential processing, or if not necessary, the experience gets downgraded to more manual interaction.
The Cloud Model API
Now, let’s get back to training.
Imagine that we train small, very focused machine learning models. Once we train one, we deploy it to the cloud with the very well defined interface (API). We do it every day for 10 years and we have thousands of such services. The services are regularly evaluated, retrained and re-deployed, but the interface (API) remains the same. When a particular model is not busy it scales down and when it is in demand it scales up, just how Amazon AWS Lambda does today.
With the proliferation of the Maker movement and IoT solutions, the data will go far beyond the images, sound, and basics like the weather information. Millions of sensors will gather data and new machine learning models will process them.
In this vision, the only thing that is missing today is the directory of the machine model APIs and the connectivity, both of which are coming soon.
I am leaving the conversation of the singularity for another occasion, but I think it is clear that you will not require a supercomputer for AI.
The first cloud-connected devices may use a dozen of the machine learning models, later hundreds and eventually thousands. At that point, you will be able to play Jeopardy in your self-driving car, receive personalized one-on-one history or quantum physics lessons, or simply have a nice conversation with the car on any topic imaginable.
For better, or worse, that is the future.
The speculative opinions in this article are my own, do not originate, or reflect the opinions of my employer.
If you would like to contact me, let’s connect on LinkedIn https://www.linkedin.com/in/ukidlucas/
or follow in on Twitter @UkiDLucas
References
- This article on Medium:
https://ukidlucas.medium.com/distributed-machine-learning-and-gp-ai-fc2a0e3f6db