Artificial Intelligence: 2012-09-09

Saturday, September 15, 2012

What is Boltzmann machine

A Boltzmann machine is the name given to a type of stochastic recurrent neural network by Geoffrey Hinton and Terry Sejnowski. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They were one of the first examples of a neural network capable of learning internal representations, and are able to represent and (given sufficient time) solve difficult combinatoric problems. However, due to a number of issues discussed below, Boltzmann machines with unconstrained connectivity have not proven useful for practical problems in machine learning or inference. They are still theoretically intriguing, however, due to the locality and Hebbian nature of their training algorithm, as well as their parallelism and the resemblance of their dynamics to simple physical processes. If the connectivity is constrained, the learning can be made efficient enough to be useful for practical problems.

They are named after the Boltzmann distribution in statistical mechanics, which is used in their sampling function.

Define simulated annealing?

A technique to find a good solution to an optimization problem by trying random variations of the current solution. A worse variation is accepted as the new solution with a probability that decreases as the computation proceeds. The slower the cooling schedule, or rate of decrease, the more likely the algorithm is to find an optimal or near-optimal solution.

Simulated annealing is a generalization of a Monte Carlo method for examining the equations of state and frozen states of n-body systems [Metropolis et al. 1953]. The concept is based on the manner in which liquids freeze or metals recrystalize in the process of annealing. In an annealing process a melt, initially at high temperature and disordered, is slowly cooled so that the system at any time is approximately in thermodynamic equilibrium. As cooling proceeds, the system becomes more ordered and approaches a "frozen" ground state at T=0. Hence the process can be thought of as an adiabatic approach to the lowest energy state. If the initial temperature of the system is too low or cooling is done insufficiently slowly the system may become quenched forming defects or freezing out in metastable states (ie. trapped in a local minimum energy state).

The original Metropolis scheme was that an initial state of a thermodynamic system was chosen at energy E and temperature T, holding T constant the initial configuration is perturbed and the change in energy dE is computed. If the change in energy is negative the new configuration is accepted. If the change in energy is positive it is accepted with a probability given by the Boltzmann factor exp -(dE/T). This processes is then repeated sufficient times to give good sampling statistics for the current temperature, and then the temperature is decremented and the entire process repeated until a frozen state is achieved at T=0.

By analogy the generalization of this Monte Carlo approach to combinatorial problems is straight forward [Kirkpatrick et al. 1983, Cerny 1985]. The current state of the thermodynamic system is analogous to the current solution to the combinatorial problem, the energy equation for the thermodynamic system is analogous to at the objective function, and ground state is analogous to the global minimum. The major difficulty (art) in implementation of the algorithm is that there is no obvious analogy for the temperature T with respect to a free parameter in the combinatorial problem. Furthermore, avoidance of entrainment in local minima (quenching) is dependent on the "annealing schedule", the choice of initial temperature, how many iterations are performed at each temperature, and how much the temperature is decremented at each step as cooling proceeds.

Simulated annealing has been used in various combinatorial optimization problems and has been particularly successful in circuit design problems

Describe Pattern Recognition problem?

Methods for solving pattern recognition tasks generally assume a sequential model for the pattern recognition process, consisting of pattern environment, sensors to collect data from the environment, feature extraction from the data and association/ storage/classification/clustering using the features. The simplest solution to a pattern recognition problem is to use template matching, where the data of the test pattern are matched point by point with the corresponding data in the reference pattern. Obviously, this can work only for very simple and highly restricted pattern recognition tasks. At the next level of complexity, one can assume a deterministic model for the pattern generation process, and derive the parameters of the model from given data in order to represent the pattern information in the data. Matching test and reference patterns are done at the parametric level. This works well when the model of the gene;ation process is known with reasonable accuracy. One could also assume a stochastic model for the pattern generation process, and derive the parameters of the model from a large set of training patterns. Matching between test and reference patterns can be performed by several statistical methods like likelihood ratio, variance weighted distance, Bayesian classification etc. Other approaches for pattern recognition tasks depend on extracting features from parameters or data. These features may be specific for the task. A pattern is described in terms of features, and pattern matching is done using descriptions of the features.

Another method based on descriptions is called syntactic or structural pattern recognition in which a pattern in expressed in terms of primitives suitable for the classes of pattern under study (Schalkoft 1992). Pattern matching is performed by matching the descriptions of the patterns in terms of the primitives. More recently, methods based on the knowledge of the sources generating the patterns are being explored for pattern recognition tasks. These knowledge-based systems express I knowledge in the form of rules for generating and perceiving patterns.

The main difficulty in each of the pattern recognition techniques alluded to above is that of choosing an appropriate model for the pattern generating process and estimating the parameters of the model in the case of a model-based approach, or extraction of features from data parameters in the case of feature-based methods, or selecting appropriate primitives in the case of syntactic pattern recognition, or deriving rules in the case of a knowledge-based approach. It is all the more difficult when the test patterns are noisy and distorted versions of the patterns used in the training process. The ultimate goal is to impart to a machine the pattern recognition capabilities comparable to those of human beings. This goal is difficult to achieve using most of the conventional methods, because, as mentioned earlier, these methods assume a sequential model for the pattern recognition process. On the other hand, the human pattern recognition process is an integrated process involving the use of biological neural processing even from the stage of sensing the environment. Thus the neural processing takes place directly on the data for feature extraction and pattern matching. Moreover, the large size (in terms of number of neurons and interconnections) of the biological neural network and the inherently different r mechanism of processing are attributed to our abilities of pattern recognition in spite of variability and noise in the data. Moreover, we are able to deal effortlessly with temporal patterns and also with the so-called stability-plasticity dilemma as well.

It is for these reasons attempts are being made to explore new models of computing, inspired by the structure and function of the biological neural network. Such models for computing are based on artificial neural networks,

A Simple Artificial Neuron

Our basic computational element (model neuron) is often called a node or unit. It receives input from some other units, or perhaps from an external source. Each input has an associated weight w, which can be modified so as to model synaptic learning. The unit computes some function f of the weighted sum of its inputs:

Its output, in turn, can serve as input to other units

The weighted sum is called the net input to unit i, often written neti.

Note that wij refers to the weight from unit j to unit i (not the other way around).
The function f is the unit's activation function. In the simplest case, f is the identity function, and the unit's output is just its net input. This is called a linear unit.

Describe neural model?

Computational neurobiologists have constructed very elaborate computer models of neurons in order to run detailed simulations of particular circuits in the brain. As Computer Scientists, we are more interested in the general properties of neural networks, independent of how they are actually "implemented" in the brain. This means that we can use much simpler, abstract "neurons", which (hopefully) capture the essence of neural computation even if they leave out much of the details of how biological neurons work.

People have implemented model neurons in hardware as electronic circuits, often integrated on VLSI chips. Remember though that computers run much faster than brains - we can therefore run fairly large networks of simple model neurons as software simulations in reasonable time. This has obvious advantages over having to use special "neural" computer hardware.

Explain with example traveling salesman problem?

The "Traveling Salesman Problem" (TSP) is a common problem applied to artificial intelligence. The TSP presents the computer with a number of cities, and the computer must compute the optimal path between the cities. This applet uses a genetic algorithm to produce a solution to the "Traveling Salesman Problem".

The traveling salesman problem (TSP) is a problem in discrete or combinatorial optimization. It is a prominent illustration of a class of problems in computational complexity theory, which are classified as NP-hard. In the traveling-salesman problem, which is closely related to the Hamiltonian cycle problem, a salesman must visit n cities. Modeling the problem as a complete graph with n vertices, we can say that the salesman wishes to make a tour, or hamiltonian cycle, visiting each city exactly once and to finishing at the city he starts from. There is an integer cost c(i, j) to travel from city i to city j , and the salesman wishes to make the tour whose total cost is minimum, where the total cost is the sum of the individual costs along the edges of the tour.

Representing the cities by vertices and the roads between them by edges. We get a graph. In this graph, with every edge there is associated a real number such a graph is called a weighted graph being the weight of edge.

In our problem, if each of the cities has a road to every other city, we have a complete weighted graph. This graph has numerators Hamiltonian circuits, and we are to pick the one that has the smallest sum of the distance

The total number of different Hamiltonian circuits in a complete graph of n vertices can be shown to be (n - 1)!/2. This follows from the fact that starting from any vertex we have n - 1 edges to choose from the first vertex , n- 2 from the second, n- 3 from the third, and so on, these being independent, result with (n-1) choices This number is, however, divided y2, because each Hamiftonian circuit has been counted twice

Theoretically the problem of the traveling salesman can always be solved by enumeration all (n-1)!/2 Hamtltonian circuits, calculation the distance traveled in each, and then picking the shortest one. However for a large value of n, the labor involved is too great even for a digital computer.

The problem is to prescribe a manageable algorithm for finding the shortest route. No efficient algorithm for problems of arbitrary size has yet been found, although many attempts have been made. Since this problem has application in operations research, some specific large-scale examples have been worked out. There are also available several heuristic methods of solution that give a route very close to the shortest one, but do not guarantee the shortest.

Write a short note on Best first research?

Best-first search, rather than plunging as deep as possible into the tree (as in depth-first search), or traversing each level of the tree in succession (as in breadth-first search), uses a heuristic to decide at each stage which is the best place to continue the search.

Best-first search in its most basic form consists of the following algorithm (adapted from Pearl, 1984):

The first step is to define the OPEN list with a single node, the starting node. The second step is to check whether or not OPEN is empty. If it is empty, then the algorithm returns failure and exits. The third step is to remove the node with the best score, n, from OPEN and place it in CLOSED. The fourth step “expands” the node n, where expansion is the identification of successor nodes of n. The fifth step then checks each of the successor nodes to see whether of not one of them is the goal node. If any successor is the goal node, the algorithm returns success and the solution, which consists of a path traced backwards from the goal to the start node. Otherwise, the algorithm proceeds to the sixth step. For every successor node, the algorithm applies the evaluation function, f, to it, then checks to see if the node has been in either OPEN or CLOSED. If the node has not been in either, it gets added to OPEN. Finally, the seventh step establishes a looping structure by sending the algorithm back to the second step. This loop will only be broken if the algorithm returns success in step five or failure in step two.

The algorithm is represented here in pseudo-code:

1. Define a list, OPEN, consisting solely of a single node, the start node, s.

2. IF the list is empty, return failure.

3. Remove from the list the node n with the best score (the node where f is the minimum), and move it to a list, CLOSED.

4. Expand node n.

5. IF any successor to n is the goal node, return success and the solution (by tracing the path from the goal node to s).

6. FOR each successor node:

a) apply the evaluation function, f, to the node.

b) IF the node has not been in either list, add it to OPEN.

7. GOTO 2.

Applications

Best-first search and its more advanced variants have been used in such applications as games and web crawlers. In a web crawler, each web page is treated as a node, and all the hyperlinks on the page are treated as unvisited successor nodes. A crawler that uses best-first search generally uses an evaluation function that assigns priority to links based on how closely the contents of their parent page resemble the search query (Menczer, Pant, Ruiz, and Srinivasan, 2001). In games, best-first search may be used as a path-finding algorithm for game characters. For example, it could be used by an enemy agent to find the location of the player in the game world. Some games divide up the terrain into “tiles” which can either be blocked or unblocked. In such cases, the search algorithm treats each tile as a node, with the neighbouring unblocked tiles being successor nodes, and the goal node being the destination tile (Koenig, 2004).

Describe procedural Vs declarative knowledge?

Declarative knowledge is defined as the factual information stored in memory and known to be static in nature. Other names, e.g. descriptive knowledge, propositional knowledge, etc. are also given. It is the part of knowledge which describes how things are. Things/events/processes, their attributes, and the relations between these things/events/processes and their attributes define the domain of declarative knowledge.

Procedural knowledge is the knowledge of how to perform, or how to operate. Names such as know-how are also given. It is said that one becomes more skilled in problem solving when he relies more on procedural knowledge than declarative knowledge.

Procedural knowledge

Declarative knowledge

· high efficiency

· low modifiability

· low cognitive adequacy (better for knowledge engineers)

· higher level of abstraction

· suitable for indipendent facts

· good modifiability

· good readablity

· good cognitive matching (better for domain experts and end-users)

low computational efficiency

What do you understand by forward Vs backward reasoning?

Whether you use forward or backwards reasoning to solve a problem depends on the properties of your rule set and initial facts. Sometimes, if you have some particular goal (to test some hypothesis), then backward chaining will be much more efficient, as you avoid drawing conclusions from irrelevant facts. However, sometimes backward chaining can be very wasteful - there may be many possible ways of trying to prove something, and you may have to try almost all of them before you find one that works. Forward chaining may be better if you have lots of things you want to prove (or if you just want to find out in general what new facts are true); when you have a small set of initial facts; and when there tend to be lots of different rules which allow you to draw the same conclusion. Backward chaining may be better if you are trying to prove a single fact, given a large set of initial facts, and where, if you used forward chaining, lots of rules would be eligible to fire in any cycle.

Forward Reasoning	Backward Reasoning
Forward: from the start states.	Backward: from the goal states.
Forward rules: to encode knowledge about how to respond to certain input.	Backward rules: to encode knowledge about how to achieve particular goals.
Combining forward and backward reasoning ¬ A1, …, Ak-1, Ak, Ak+1, …, An achieved by achieved by forward reasoning backward reasoning

What is Expert System?

· An expert system, is an interactive computer-based decision tool that uses both facts and heuristics to solve difficult decision making problems, based on knowledge acquired from an expert.

· An expert system is a model and associated procedure that exhibits, within a specific domain, a degree of expertise in problem solving that is comparable to that of a human expert.

· An expert system compared with traditional computer : Inference engine + Knowledge = Expert system ( Algorithm + Data structures = Program in traditional computer )

· First expert system, called DENDRAL, was developed in the early 70's at Stanford University

Expert systems are computer applications which embody some non-algorithmic expertise for solving certain types of problems. For example, expert systems are used in diagnostic applications. They also play chess, make financial planning decisions, configure computers, monitor real time systems, underwrite insurance policies, and perform many services which previously required human expertise.

Expert System Components And Human Interfaces :

Expert systems have a number of major system components and interface with individuals who interact with the system in various roles. These are illustrated below.

What are the issues in knowledge representation?

The fundamental goal of Knowledge Representation is to facilitate inferencing (conclusions) from knowledge. The issues that arise while using KR techniques are many. Some of these are explained below.

· Important Attributes : Any attribute of objects so basic that they occur in almost every problem domain ?

· Relationship among attributes: Any important relationship that exists among object attributes ?

· Choosing Granularity : At what level of detail should the knowledge be represented ?

· Set of objects : How sets of objects be represented ?

· Finding Right structure : Given a large amount of knowledge stored, how can relevant parts be accessed ?