The Age of Spiritual Machines: When Computers Exceed Human Intelligence

Page 37

by Ray Kurzweil

(i) wire the neural net randomly; or

(ii) use an evolutionary algorithm (see next section of this Appendix) to determine an optimal wiring; or

(iii) use the system designer’s best judgment in determining the wiring.

• The initial synaptic strengths (i.e., weights) of each connection. There are a number of possible ways to do this:

(i) set the synaptic strengths to the same value; or

(ii) set the synaptic strengths to different random values; or

(iii) use an evolutionary algorithm to determine an optimal set of initial values; or

(iv) use the system designer’s best judgment in determining the initial values.

• The firing threshold of each neuron.

• Determine the output. The output can be:

(i) the outputs of layerM of neurons; or

(ii) the output of a single output neuron, whose inputs are the outputs of the neurons in layerM;

(iii) a function of (e.g., a sum of) the outputs of the neurons in layerM; or

(iv) another function of neuron outputs in multiple layers.

• Determine how the synaptic strengths of all the connections are adjusted during the training of this neural net. This is a key design decision and the subject of a great deal of neural net research and discussion. There are a number of possible ways to do this:

(i) For each recognition trial, increment or decrement each synaptic strength by a (generally small) fixed amount so that the neural net’s output more closely matches the correct answer. One way to do this is to try both incrementing and decrementing and see which has the more desirable effect. This can be time consuming, so other methods exist for making local decisions on whether to increment or decrement each synaptic strength.

(ii) Other statistical methods exist for modifying the synaptic strengths after each recognition trial so that the performance of the neural net on that trial more closely matches the correct answer.

Note that neural net training will work even if the answers to the training trials are not all correct. This allows using real-world training data that may have an inherent error rate. One key to the success of a neural net-based recognition system is the amount of data used for training. Usually a very substantial amount is needed to obtain satisfactory results. Just like human students, the amount of time that a neural net spends learning its lessons is a key factor in its performance.

Variations

Many variations of the above are feasible. Some variations include:

• There are different ways of determining the topology, as described above. In particular, the interneuronal wiring can be set either randomly or using an evolutionary algorithm.

• There are different ways of setting the initial synaptic strengths, as described above.

• The inputs to the neurons in layer; do not necessarily need to come from the outputs of the neurons in layeri-1 .Alternatively, the inputs to the neurons in each layer can come from any lower layer or any layer.

• There are different ways to determine the final output, as described above.

• For each neuron, the method described above compares the sum of the weighted inputs to the threshold for that neuron. If the threshold is exceeded, the neuron fires and its output is 1. Otherwise, its output is 0. This “all or nothing” firing is called a nonlinearity. There are other nonlinear functions that can be used. Commonly a function is used that goes from 0 to 1 in a rapid but more gradual fashion (than all or nothing). Also, the outputs can be numbers other than 0 and 1 .

• The different methods for adjusting the synaptic strengths during training, briefly described above, represent a key design decision.

• The above schema describes a “synchronous” neural net, in which each recognition trial proceeds by computing the outputs of each layer, starting with layer0 through layerM. In a true parallel system, in which each neuron is operating independently of the others, the neurons can operate asynchronously (i.e., independently). In an asynchronous approach, each neuron is constantly scanning its inputs and fires (i.e., changes its output from 0 to 1) whenever the sum of its weighted inputs exceeds its threshold (or, alternatively, using another nonlinear output function).

Happy Adaptation!

EVOLUTIONARY ALGORITHMS

If biologists have ignored self-organization, it is not because self-ordering is not pervasive and profound. It is because we biologists have yet to understand how to think about systems governed simultaneously by two sources of order. Yet who seeing the snowflake, who seeing simple lipid molecules cast adrift in water forming themselves into cell-like hollow lipid vesicles, who seeing the potential for the crystallization of life in swarms of reacting molecules, who seeing the stunning order in networks linking tens upon tens of thousands of variables, can fail to entertain a central thought: if ever we are to attain a final theory in biology, we will surely have to understand the commingling of self-organization and selection. We will have to see that we are the natural expressions of a deeper order. Ultimately, we will discover in our creation myth that we are expected after all.

—Stuart Kauffman

As I discussed earlier, an evolutionary algorithm involves a simulated environment in which simulated software “creatures” compete for survival and the right to reproduce. Each software creature represents a possible solution to a problem encoded in its digital “DNA.”

The creatures allowed to survive and reproduce into the next generation are the ones that do a better job of solving the problem. Evolutionary algorithms are considered to be part of a class of “emergent” methods because the solutions emerge gradually and usually cannot be predicted by the designers of the system. Evolutionary algorithms are particularly powerful when they are combined with our other paradigms. Here is a unique way of combining all of our “intelligent” paradigms.

Combining All Three Paradigms

The human genome contains three billion rungs of base pairs, which equals six billion bits of data. With a little data compression, your genetic code will fit on a single CD-ROM. You can store your whole family on a DVD (digital video disc). But your brain has 100 trillion “wires,” which would require about 3,000 trillion bits to represent. How did the mere 12 billion bits of data in your chromosomes (with contemporary estimates indicating that only 3 percent of that is active) designate the wiring of your brain, which constitutes about a quarter million times more information?

Obviously the genetic code does not specify the exact wiring. I said earlier that we can wire a neural net randomly and obtain satisfactory results. That’s true, but there is a better way to do it, and that is to use evolution. I am not referring to the billions of years of evolution that produced the human brain. I am referring to the months of evolution that go on during gestation and early childhood. Early in our lives, our interneuronal connections are engaged in a fight for survival. Those that make better sense of the world survive. By late childhood, these connections become relatively fixed, which is why it is worthwhile exposing babies and young children to a stimulating environment. Otherwise, this evolutionary process runs out of real-world chaos from which to draw inspiration.

We can do the same thing with our synthetic neural nets: use an evolutionary algorithm to determine the optimal wiring. This is exactly what the Kyoto Advanced Telecommunications Research Lab’s ambitious brain-building project is doing.

Now here’s how you can intelligently solve a challenging problem using all three paradigms. First, carefully state your problem. This is actually the hardest step. Most people try to solve problems without bothering to understand what the problem is all about. Next, analyze the logical contours of your problem recursively by searching through as many combinations of elements (for example, moves in a game, steps in a solution) that you and your computer have the patience to sort through. For the terminal leaves of this recursive expansion of possible solutions, evaluate them with a neural net. For the optimal topology of your neural net, de
termine this using an evolutionary algorithm. And if all of this doesn’t work, then you have a difficult problem, indeed.

“PSEUDO CODE” FOR THE EVOLUTIONARY ALGORITHM

Here is the basic schema for an evolutionary algorithm. Many variations are possible, and the designer of the system needs to provide certain critical lparameters and methods, detailed below.

THE EVOLUTIONARY ALGORITHM

Create N solution “creatures.” Each one has:

• A genetic code—a sequence of numbers that characterizes a possible solution to the problem. The numbers can represent critical parameters, steps to a solution, rules etc.

For each generation of evolution, do the following:

• Do the following for each of the N solution creatures:

(i) Apply this solution creature’s solution (as represented by its genetic code) to the problem, or simulated environment.

(ii) Rate the solution.

• Pick the L solution creatures with the highest ratings to survive into the next generation.

• Eliminate the (N-L) nonsurviving solution creatures.

• Create (N-L) new solution creatures from the L surviving solution creatures by:

(i) making copies of the L surviving creatures. Introduce small random variation into each copy; or

(ii) create additional solution creatures by combining parts of the genetic code (using “sexual” reproduction, or otherwise combining portions of the chromosomes) from the L surviving creatures; or

(iii) doing a combination of (i) and (ii) above.

• Determine whether or not to continue evolving:

Improvement = (highest rating in this generation)—(highest rating in the previous generation) If improvement
• The Solution Creature with the highest rating from the last generation of evolution has the best solution. Apply the solution defined by its genetic code to the problem.

Key Design Decisions

In the simple schema above, the designer of thi is evolutionary algorithm needs to determine at the outset:

• Key parameters:

N

L

Improvement Threshold

• What the numbers in the genetic code represent and how the solution is computed from the genetic code.

• A method for determining the N solution creatures in the first generation. In general, these need only be “reasonable” attempts at a solution. If these first-generation solutions are too far afield, the evolutionary algorithm may have difficulty converging on a good solution. It is often worthwhile to create the initial solution creatures in such a way that they are reasonably diverse. This will help prevent the evolutionary process from just finding a “locally” optimal solution.

• How the solutions are rated.

• How the surviving solution creatures reproduce.

Variations

Many variations of the above are feasible. Some variations include:

• There does not need to be a fixed number of surviving solution creatures (i.e., “L”) from each generation. The survival rule(s) can allow for a variable number of survivors.

• There does not need to be a fixed number of new solution creatures created in each generation (i.e., [N—L]). The procreation rules can be independent of the size of the population. Procreation can be related to survival, thereby allowing the fittest solution creatures to procreate the most.

• The decision as to whether or not to continue evolving can be varied. It can consider more than just the highest-rated solution creature from the most recent generation(s). It can also consider a trend that goes beyond just the last two generations.

Happy Evolving!

GLOSSARY

Aaron A computerized robot (and associated software), designed by Harold Cohen, that creates original drawings and paintings.

Alexander’s solution A term referring to Alexander the Great’s slicing of the Gordian knot with his sword. A reference to solving an insoluble problem with decisive yet unexpected and indirect means.

Algorithm A sequence of rules and instructions that describes a procedure to solve a problem. A computer program expresses one or more algorithms in a manner understandable by a computer.

Alu A meaningless sequence of 300 nucleotide letters that occurs 300,000 times in the human genome.

Analog A quantity that is continuously varying, as opposed to varying in discrete steps. Most phenomena in the natural world are analog. When we measure and give them a numeric value, we digitize them. The human brain uses both digital and analog computation.

Analytical Engine The first programmable computer, created in the 1840s by Charles Babbage and Ada Lovelace. The Analytical Engine had a random access memory (RAM) consisting of one thousand words of fifty decimal digits each, a central processing unit, a special storage unit for software, and a printer. Although it foreshadowed modern computers, Babbage’s invention never worked.

Angel Capital Refers to funds available for investment by networks of wealthy investors who invest in start-up companies. A key source of capital for high-tech start-up companies in the United States.

Artificial intelligence (AI) The field of research that attempts to emulate human intelligence in a machine. Fields within AI include knowledge-based systems, expert systems, pattern recognition, automatic learning, natural-language understanding, robotics, and others.

Artificial life Simulated organisms, each including a set of behavior and reproduction rules (a simulated “genetic code”), and a simulated environment. The simulated organisms simulate multiple generations of evolution. The term can refer to any self-replicating pattern.

ASR See Automatic speech recognition.

Automatic speech recognition (ASR) Software that recognizes human speech. In general, ASR systems include the ability to extract high-level patterns in speech data. BGM See Brain-generated music.

Big bang theory A prominent theory on the beginning of the Universe: the cosmic explosion, from a single point of infinite density, that marked the beginning of the Universe billions of years ago.

Big crunch A theory that the Universe will eventually lose momentum in expanding and contract and collapse in an event that is the opposite of the big bang.

Bioengineering The field of designing pharmaceutical drugs and strains of plant and animal life by directly modifying the genetic code. Bioengineered materials, drugs, and life-forms are used in agriculture, medicine, and the treatment of disease.

Biology The study of life-forms. In evolutionary terms, the emergence of patterns of matter and energy that could survive and replicate to form future generations.

Bionic organ In 2029, artificial organs that are built using nanoengineering.

Biowarfare Agency (BWA) In the second decade of the twenty-first century, a government agency that monitors and polices bioengineering technology applied to weapons.

Bit A contraction of the phrase “binary digit.” In a binary code, one of two possible values, usually zero and one. In information theory, the fundamental unit of information.

Brain-generated music (BGM) A music technology pioneered by NeuroSonics, Inc., that creates music in response to the listener’s brain waves. This brain-wave biofeedback system appears to evoke the Relaxation Response by encouraging the generation of alpha waves in the brain.

BRUTUS.1 A computer program that creates fictional stories with a theme of betrayal; invented by Selmer Bringsjord, Dave Ferucci, and a team of software engineers at Rensselaer Polytechnic Institute in New York.

Buckyball A soccer-ball-shaped molecule formed of a large number of carbon atoms.

Because of their hexagonal and pentagonal shape, the molecules were dubbed “buckyballs” in reference to R. Buckminster Fuller’s building designs.

Busy beaver One example of a class of noncomputational functions; an unsolvable problem in mathematics. Being a “Turing machine unsolvable problem,” the busy beaver function cannot be comput
ed by a Turing machine. To compute busy beaver of n, one creates all the n-state Turing machines that do not write an infinite number of Is on their tape. The largest number of Is written by the Turing machine in this set that writes the largest number of Is is busy beaver of n.

BWA See Biowarfare Agency

Byte A contraction for “by eight.” A group of eight bits clustered together to store one unit of information on a computer. A byte may correspond, for example, to a letter of the English alphabet.

CD-ROM See Compact disc read-only memory.

Chaos The amount of disorder or unpredictable behavior in a system. In reference to the Law of Time and Chaos, chaos refers to the quantity of random and unpredictable events that are relevant to a process.

Chaos theory The study of patterns and emergent behavior in complex systems comprised of many unpredictable elements (e.g., the weather).

‹ Prev Next ›