**Overview**

Watson is the product of IBM’s DeepQA project, whose goal is to develop a computer system which excels at the classic AI problem of question answering (QA). As with any complicated engineering system, one of the best ways to learn how it works is with a high-level diagram:

Two striking attributes of this system that are clear from the diagram, is that Watson does a 2-pass search. First it does a broad “primary search”, and then it performs a more detailed “evidence scoring” search. I’ll talk more about these below. The other feature which stands out is that it has a parallel path (the gray boxes) for breaking up a single question into multiple ones, based on the question having multiple parts and various potential meanings due to ambiguities in the language. Watson will evaluate all of the resulting questions in parallel and compare the results. Anyway, let’s go into more detail on each of the steps.

**Content Acquisition**

Before the first question can even be answered, Watson must attain its knowledge. This is a non-real-time process which is done before and independently of the actual question-answering process. This knowledge-base contains many sources of structured (e.g. a database or triplestore) and unstructured (e.g. text passages) data.

. . . the sources for Watson include a wide range of encyclopedias, dictionaries, thesauri, newswire articles, literary works, . . . dbPedia, WordNet (Miller 1995), and the Yago ontology. (Ferrucci 2010, p. 69)

Watson will access this content during the real-time answering process within the **Hypothesis Generation **and **Hypothesis and Evidence scoring** processes which are described below.

**Question Analysis**

When a human is asked a question, deriving the semantic meaning of that question (what the question is actually asking), is usually the easy part, but for AI, understanding even simple questions is a formidable task. Question analysis is where Watson derives how many parts or meanings are embedded in the question. Later in the process, Watson will analyze the different questions separately (the gray boxes in the above diagram) and compare the results. In this stage, Watson also determines the expected lexical answer type (LAT).

. . . a lexical answer type is a word or noun phrase in the question that specifies the type of the answer without any attempt to understand its semantics. Determining whether or not a candidate answer can be considered an instance of the LAT is an important kind of scoring and a common source of critical errors. (Ferrucci 2010, p. 70)

**Hypothesis Generation**

The next step of the process is to use the data from the question analysis step to generate candidate answers (hypotheses). It does this by executing a primary search across its answer sources (described above) in a process known as information retrieval, for which Internet search engines are a common example.

A variety of search techniques are used, including the use of multiple text search engines with different underlying approaches (for example, Indri and Lucene), document search as well as passage search, knowledge base search using SPARQL . . . . (Ferrucci 2010, p. 71)

These results are used to generate candidate answers, which will later be scored and compared. These hypotheses are created in various ways depending on the type of data they came from. For example an answer from a text passage (unstructured) might come from named entity recognition techniques, while an answer from a knowledge base (structured) data set might be the exact result of the query.

If the correct answer(s) are not generated at this stage as a candidate, the system has no hope of answering the question. This step therefore significantly favors recall over precision, with the expectation that the rest of the processing pipeline will tease out the correct answer, even if the set of candidates is quite large. (Ferrucci 2010, p.72)

Watson is configured to return about 250 candidate answers during this stage, which are then trimmed down to about 100 after **soft filtering**. These filters are simple, more efficient filters than the intensive evidence scoring which follows. A basic soft filter determines the likelihood of the candidate answer of being of the correct LAT, which was determined in the analysis stage.

**Hypothesis and Evidence Scoring**

In this step, Watson performs new searches which look for evidence of the generated hypotheses being the correct answer. For example:

One particularly effective technique is passage search where the candidate answer is added as a required term to the primary search query derived from the question. This will retrieve passages that contain the candidate answer used in the context of the original question terms. (Ferrucci 2010, p. 72)

The candidate answers and their corresponding evidence are them sent to various scorers. Watson uses about 50 different scorers which work in parallel and then compare their results at the end.

These scorers consider things like the degree of match between a passage’s predicate-argument structure and the question, passage source reliability, geospatial location, temporal relationships, taxonomic classification, the lexical and semantic relations the candidate is known to participate in, the candidate’s correlation with question terms, its popularity (or obscurity), its aliases, and so on. (Ferrucci 2010, p. 72)

**Final Merging and Ranking**

This final processing stage first merges answers which are deemed equivalent by using basic matching techniques and more complicated coreference resolution algorithms. Scores from equivalent answers are then combined. Finally, Watson uses machine learning techniques to assign a confidence level to each of the merged answers, on how likely they are to be the correct. Depending on the confidence level, the highest ranking hypothesis is presented as the answer.

**Final Thoughts**

There is much debate over the significance of this achievement, even within the AI community. I personally feel it is very significant, that a task we once thought reserved only to human thought is now shown possible by machines. It’s easy to forget that when Deep Blue defeated Kasparov in 1996, that task was also thought to be impossible, and now chess programs running on cell phones can beat grandmasters. Before we know it, deep QA software will be running on cell phones as well.

This is also setting the stage for what is considered the hallmark test of AI, the Turing Test. Again this task still seems impossible, but perhaps a bit less so after Watson. After the Turing Test is passed, people will debate on whether it is “true” intelligence, but if you cannot measure the difference between computer intelligence and human intelligence, it doesn’t seem to be a very valuable distinction.

]]>

**The Standard Model**

The best description we have of all the elementary particles is called the Standard Model. It’s very technical, and not intuitive at all (to me at least). It’s pretty amazing that it describes the way Nature works with such amazing accuracy. The model is a gauge theory, and before we get to the Higgs, we have to understand what that means

Wikipedia tells us:

In physics, a

gauge theoryis a type of field theory in which the Lagrangian is invariant under a continuous group of local transformations.

Let’s define each of the important terms:

**field theory**– A theory which assigns a value to every point in space and time.**Lagrangian**– Defines the complete dynamics of a physical system. The langrangian may have terms which depend on mass, for example, a standard lagrangian for a classical object is: , where m is the mass of the object.**invariant**– independent of. If Y is invariant under X, then the value of Y will not change if X changes (and all other inputs remain the same).**continuous group**– we’ve discussed Groups on this blog before because they’re so fundamental, but if you need a refresher it’s an algebraic object which has a set of elements and is closed under a product operation. Each element has an inverse so that the product of an element with its inverse is the identity element. A continuous group is one where the product is a continuous function: infinite and without any discontinuities.**local transformations**– these are the elements of the above group. These types of groups are called Lie groups. an example is rotations of the sphere.

Putting it all together, a gauge theory defines a system where the global state of the system is unchanged with specific local changes.

**Symmetry Group**

These “local changes” are the elements of symmetry groups. An example of such a group is the Circle group, which is the rotations of a circle about its axis, called U(1). The rotations have a product (adding angles together), an identity, (the rotation of 0 degrees), and an inverse, (a rotation in the opposite direction). It’s called U(1) because this group of transformations is represented by the set of all unitary matrices of dimension 1.

The Standard Model has 3 such symmetry groups: U(1), SU(2), and SU(3). They represent 3 of the fundamental forces of nature: electromagnetic, weak nuclear, and strong nuclear respectively. SU stands for special unitary, and SU(n) is the group of special unitary matrices of n-dimensions.

Somewhat bizarrely, The Standard Model says that the generators of these symmetry groups actually represent particles! For U(1) there is 1 generator, which is the photon. SU(2) has 3 generators which are the Z, W+, and W- particles. SU(3) has 8 generators which are the 8 different gluons.

**Symmetry Breaking**

The Lagrangians described by these symmetry groups describe the complete dynamics of these quantum systems, however, they do not allow for any of these particles to have mass. While this is fine for the photon and the gluon, the Z, and W+- must have mass to explain their observed behavior: namely their very limited range.

Incredibly, one can add mass to the Lagrangian for the weak force, if the SU(2) symmetry is broken. As a first example, let’s see what it means to break U(1) symmetry. If someone hands you a perfect circle, it is impossible to differentiate any point on the circle from any other. However, if one breaks the symmetry by marking a point, then all points can be differentiated by describing how far they are from the marked point.

The Higg field breaks the symmetry of SU(2) and gives us mass terms in the Lagrangian. This field can be represented by a Mexican Hat Potential, at the very center of which is a local maxima which represents a meta-stable state. The stable states are in a circle around this local maxima which are the minimums of the potential.

Fields in Quantum Field Theory have an associated particle, and the particle representing this Higgs field is our Higgs boson.

It is the only particle predicted by the Standard Model which has not been discovered, but the LHC hopes to change that. What’s interesting is that there are other ways to get mass to appear in the lagrangians. The Higgs field is the simplest and most intuitive, however, it is possible that there is no Higgs particle and that mass is derived in some other manner. That would certainly shake up the field of physics!

**UPDATE (7/22/2012): **The LHC found the Higgs and it appears to be just as the Standard Model predicted!

]]>

Well the answer to these riddles lies in a surprisingly simple statement known as the 2nd law of thermodynamics:

In a closed system, heat flows from hot to cold.

Yup, that’s it. This statement is pretty clear in every day life, if you drop an ice cube in hot water, the ice cube will melt and cool the water. What we never experience is the heat from the ice cube flowing into the water, which would make the ice cube grow and cause the rest of the water to get hotter. Of course adding work to a system can cause the 2nd law to locally go in reverse. For example a freezer causes an ice cube to grow, but it does so by heating the the air outside the freezer. The system as a whole still heats up.

He’s a chart of some equivalent ways to describe the 2nd law in other domains:

Domain |
Statement of the 2^{nd} law |
Example “special” state |
Example equilibrium state |

Heat | Heat flows from hot to cold | Ice cube in hot water | Water all the same temperature |

Entropy | Disorder increases | A person | Ashes |

Information | Information is lost | Beethoven’s 5th Symphony | Static noise |

The first thing to notice, is that these “special” states can be considered start states, while the equilibrium states can be considered end states. That is, if some how the system is put into a special state, it will after a series of random and natural processes end in the equilibrium state.

So why does the universe work this way, and what does this have to do with the flow of time? Let’s look at the ice cube in hot water example in more detail because it is the most tangible of the above examples, but the other examples can be explained in a similar fashion.

**Why heat flows from hot to cold**

As heat flows from hot to cold, the system will eventually reach a equilibrium, where everything is the same temperature. At that point there is no more heat flow. Dropping an ice cube in a cup of warm water will eventually lead to a cup of water which is all the same temperature somewhere in between the 2 starting temperatures.

If you think about all of the water molecules in the cup, let’s say for this example there are 2 billions molecules (1 billion hot and 1 billion cold), we can count all the ways in which these molecules can be distributed. There are total possible states in this system. These state spaces are so large it’s impossible to even begin to imagine their enormity. These make other “large” numbers like the number of elementary particles in the universe () and 1 googol seem minuscule in comparison. The vast majority of these states are equilibrium states, while the special starting states are insignificant.

If you pick a state at random from the total space of water molecules in a cup, the state where some of the molecules are ice and some are hot, is effectively 0. (To do a quick calculation try a binomial distribution with something like Binomial(2 billion, 2 million) = .)

**Maxwell’s Demon**

Many physicists thought the 2nd law was purely statistical and could be easily be violated by intelligent manipulation of states. One example of this is called Maxwell’s demon, where a theoretical “demon” reverses the flow of heat without using any work. However, physicists have shown that no matter how hard you try, the system as a whole cannot violate the 2nd law. For example Maxwell’s demon must either conduct work, which will causes heat in another part of the system, or lose information, which we learned is equivalent to the 2nd law. So the 2nd law survives even with small state spaces, however, the huge numbers of states in everyday molecular interactions is what makes time so indomitable.

**Time**

So the answer to why heat flows from hot to cold is essential a question of probability. The equilibrium states dominate the special states, so unless you start in one of the special states you will never reach it. The same can be said about our universe. The Big Bang is the ultimate start state and Heat Death is the ultimate end state. The big bang represents time = 0 while heat death represents the end of time.

The flow of time from past to future can therefore be defined as special states of a system transitioning to their equilibrium. The sheer magnitude of the state space is what makes time so dominant in the Universe.

]]>

Every simply connected, closed 3-manifold is homeomorphic to the 3-sphere.

This is a statement about topological spaces. Let’s define each of the terms in the conjecture:

simply connected space – This means the space has no “holes.” A football is simply connected, but a donut is not. Technically we can say but I’ll explain this notation further on.

closed space – The space is finite and has no boundaries. A sphere (more technically a 2-sphere or ) is closed, but the plane () is not because it is infinite. A disk is also not because even though it is finite, it has a boundary.

manifold – At every small neighborhood on the space, it approximates Euclidean space. A standard sphere is called a 2-sphere because it is actually a 2-manifold. Its surface resembles the 2d plane if you zoom into it so that the curvature approaches 0. Continuing this logic the 1-sphere is a circle. A 3-sphere is very difficult to visualize because it has a 3d surface and exists in 4d space.

homeomorphic – If one space is homeomorphic to another, it means you can continuously deform the one space into the other. The 2-sphere and a football are homeomorphic. The 2-sphere and a donut are not; no matter how much you deform a sphere, you can’t get that pesky hole in the donut, and vice-versa.

What the conjecture is basically saying then is this:

Any finite 3-dimensional space, which doesn’t have any “holes” in it, can be continuously deformed into a 3-sphere.

This was proven for all dimensions > 3, but 3-manifolds proved to be the trickiest.

Now, how do you show that 2 spaces are homeomorphic to each other? One way is to show that the spaces share the same fundamental group. Hopefully you remember what a group is, but basically it’s a collection of elements which includes an identity element, and a product. Also each element has an inverse. The fundamental group basically describes how many ways you can draw a path in a space. It is organized as follows, where each element is a path which starts and ends at the same point. The identity fundamental group is the trivial path of length 0, i.e. a single point. The product in the fundamental group is just combining 2 paths end-to-end. Since all paths start and end and the same point you can always take their product and get a new element in the group.

The fundamental group of the n-sphere is equivalent to the group with one element (the identity element), because any path can always “shrink” down into a single point due to the fact the n-sphere has no hole. If a space’s fundamental group is trivial, then that means it is simply connected, as described above. To say the fundamental group for any n-sphere is trivial you use the following notation:

Where means this group contains elements of 1-dimensional closed paths. A little bit off topic but even more fascinating, is that you can also examine higher dimensional paths and these non-fundamental groups develop an interesting and complex structure.

Anyway, the Poincaré conjecture says that if for any 3-manifold, , if , then can be continuously deformed into .

Perelman finally solved this for 3-manifolds by using Ricci flow techniques to show that these manifolds are homeomorphic.

**Update:** 6 years after his proof of the Poincare Conjecture and 5 days after this post, Clay Mathematics officially awarded the Millennium prize to Grigori Perelman. If he declines, this could be the first time in modern history where someone living at home refuses $1 million.* *

]]>

First Gorton explains that basically all banking crises work the same way: too many investors want to withdraw their deposits from the banks which cannot cover all of the withdrawals. This type of crises is probably as old as banks themselves. Banks lend out a certain percentage of deposits, so if there is a panic, not everyone can get their money back. This financial crises is similar, but less tangible, because it involves an unregulated banking system used by institutional investors and not average individuals.

The Repo market has been growing steadily for the past 30 years. It is basically an unregulated market where institutions can deposit and borrow money outside of the normal banking system. Because there is no FDIC for this market, deposits are backed by collateral, which is how most US banks worked pre-1930. This collateral is longer term, secure investments such as bonds and other securities.

Why is there demand for such a market? As Gorton explains it, imagine you are an institutional investor and you need a secure short-term (overnight) interest-gaining deposit for $100 million. You cannot deposit this at the local bank, FDIC will only cover $100k, so you must go to another banking system. Now imagine you are a bank with mortgages trickling in cash. You can package these mortgages and sell them to add immediate cash to your bottom line. These securities can then be used by as collateral for a Repo deposit. Both parties have benefited from the Repo market and it has been growing accordingly, but because it is so unregulated, no one really knows how large it became! Estimates are at around $12T, which means there should be at least that much in collateral for it to function properly. To give you an example of how large that is, the entire US GDP for 2009 was $14.2T.

This is where the subprime mortgages come in. While subprimes are a small percentage of the total mortgage-security market ($1.2T out of $20T total), no one knew which securities were contaminated by them. Gorton compares this to an E. coli scare where all ground beef is shunned even though only a small percentage is actually contaminated. The Repo market became the target of a panic because the collateral used for deposits became suspect. There was a run on Repo banks which could not possibly honor all of the withdrawals (he estimates a deficit of $2T). The US Government was forced to step-in and bail out some of the crippled banks while many others simply failed.

So while this crisis may seem extraordinary at first, when compared with the history of banking it can bee seen as the same old story repeating itself. There is much more detail and references in the paper, including some nice graphs, so check it out.

**Update 9/7/2010**: Ben Bernake references Gorton’s research into the financial crisis in his testimony before before the Financial Crisis Inquiry Commission.

]]>

Something interesting about polarization is that it works even with a single photon. If you send an unpolarized photon through a L-polarization filter there is a 50% chance it will be blocked and a 50% chance it will pass through. If you then this L-polarized light through another L-polarization filter it will pass through 100% of the time, but with an R-polarization filter it will pass through 0% of the time. This is a purely quantum mechanical (QM) effect that cannot be explained through classic means.

A basic rule of QM is that if you want to observe the state of a system, you must make an observation by operating on the system. In this case our observable is the polarity of the photon and our operator is the polarization filter.

Incredibly, QM says that the eigenvalues of a quantum operator are the observables, where the states of the quantum system are the eigenvectors! If we can remember our eigenvectors and eignvalues from linear algebra we have the following:

where, is the operator, is the eigenvector and is the eigenvalue.

Now for our observables to be real our eigenvalue solutions to this equation must be real, since QM tells us they are equivalent. (We assume our observables are real because we can only measure real values in nature, not imaginary one.) In linear algebra, to assure that we have real eigenvalue solutions, the operator should be a Hermitian matrix. This is a square matrix where the entries on opposite sides of the main diaganol are complex conjugates of each other.

Here is a table which sums up the relations in this equation:

Symbol | Math | QM | 3D Movies |
---|---|---|---|

Hermitian matrix | quantum operator | L or R polarization filter | |

eigenvector | quantum state | L or R polarized light | |

eigenvalue | observable | L or R measurement |

What’s bizarre is that when unpolarized light passes through the filter, it must “decohere” into the eigenvectors of L or R polarized light so that we get a real observable. The process of decoherence is one of the great mysteries of QM: particles only exist as probabilities until they are measured, at that point, Nature appears to “reset” them into a well-defined state.

]]>

Even in my quaint neighborhood of Noe Valley in San Francisco, we are inundated with this ubiquitous symbol of openness. When I see this sign the following thoughts go through my head in this order:

**Shock –**Wow, another one of these signs (?!)**Enlightenment**– This business is probably low quality / unoriginal / not trying hard.**Confusion**– Why would the proprietor want to convey these connotations?-
**Trivia**– I guess this establishment is open / closed.

In just a few blocks along 24th street in Noe Valley I counted six of these eyesores. It is a perfect example of horrible design and poor marketing. A hand painted open sign is friendlier, unique, conveys personality, and is only a fraction of the cost. Here is my favorite sign on 24th Street for a shop called ladybug ladybug:

]]>

“This outage was due to unforeseen system difficulties.” and “Unfortunately, . . . we cannot provide the specific system issues.”

As you can see from the following screen shot, this contradicts the message that the disruption was due to “planned maintenance.”

Evite was very fortunate that there was not more public fallout from this.

]]>

Humans have a very advanced part of their immune system known as the adaptive immune system, which is only found in jawed vertebrates. In this system certain leukocytes, called Antigen-presenting cells (APCs), can bring in another line of defense by notifying T-cells that there is an infection.

How they accomplish this is quite remarkable: by tearing by apart the internalized pathogen and presenting the pathogen’s antigens on its surface. Antigens are structures which identify the pathogen and produce a specific immune response in T-cells. Our APC must now move to our T-cells, which are located in our lymph nodes, and present the antigens to activate them. The APC again uses chemotaxis to traverse through lymph vessels to reach the nodes. It presents the antigens attached to a special structure called MHC to “naive” T-cells.

T-Cells have special T-Cell Receptors (TCRs), which have constant parts which always bind to MHC, and variable parts which will only bind to specific antigens. Upon activation some T-cells become “T helper cells” (CD4+) and others become “Cytotoxic T-cells” (CD8+). The CD4+ cause the growth of more CD4+, which will release chemicals which stimulate more white blood cells to the site of the infection. Cytotoxic cell kill infected cells, as their name implies. They both return to the site of the infection through chemotaxis.

After the infection is eliminated, the specific antigens which the T-cells bound to will be “remembered” by memory T-cells. These cells bind only with the previous antigen and so the immune system is more prepared for a recurrence of the same infection.

Clearly I glossed over much, so read more for the real details. It’s an amazing system all accomplished with molecular machines!

]]>

- A Rhinovirus infects the upper respiratory system
- Leukocytes detect the virus
- Leukocytes initiate inflammation
- Adaptive immune system is activated
- The virus is eliminated

Now let’s check these steps out in more detail:

A virus (usually a rhinovirus) enters your nose and lands on your adenoids (part of your tonsils) the virus binds to the epithelial cells with proteins called cell adhesion molecules. You can think of it like cellular Velcro, but instead of little plastic hooks they’re made of protein structures which interlock. This binding changes the structure of the virus’s protein shell (the capsid), which in turn causes myristic acid to be released onto the infected cell. This acid creates a pore through which the virus can inject its RNA. Once RNA is within the host cell, it begins replication through RNA transcription.

**Detection**

The immune system has cells which contain special “pattern recognition receptors” (PRRa) made of protein, which bind with various pathogens including viruses. Examples of these cells are leukocytes (white blood cells) including macrophages and dendritic cells which both bind with viruses. If a leukocyte binds with a rhinovirus (or other pathogen) with its PRRs, then it will internalize it in a process called phagocytosis.

Here is an incredible video of another leukocyte called neutrophils chasing down a bacterium and phagoticizing it. Neutrophils is about 14 microns across, and I believe this video was shot over several hours:

These activated leukocytes respond to the binding of pathogens by release chemicals called cytokines. These chemicals are one way which cells communicate and organize their efforts. In the immune system, releasing cytokines, as well as other chemicals, draw more leukocytes to the site of the infection through an amazing process of cellular motion called chemotaxis. Chemotaxis is basically proteins moving in response to their chemical environment, which causes a cell to propel itself in a specific direction to reach its chemical goal. This process escalates the inflammation in a feedback loop of chemotaxis, phagocytosis, and cytokine release. These released chemicals also cause the familiar symptoms of inflammation: redness, heat, swelling, pain, and loss of function.

In my next post I’ll talking about the adaptive immune system which is pretty incredible.

]]>