# Top Artificial Intelligence Interview Questions And Answers

## Top Answers to Artificial Intelligence Interview Questions

Weak AI | Strong AI |

Narrow application, scope is very limited | Widely applied, scope is vast |

Good at specific tasks | Incredible human- level intelligence |

Uses supervised and unsupervised learning | Uses clustering and association to process data. |

Eg. Siri, Alexa | Ex. Advanced Robotics |

AI is a field of computer science wherein the cognitive functions of human brain is studied and tried to be replicated on a machine/system. Artificial Intelligence is today widely used for various applications like computer vision, speech recognition, decision-making, perception, reasoning, cognitive capabilities and so on.

- Natural language processing
- Chat bots
- Sentiment analysis
- Sales Prediction
- Self-driving cars
- Facial expression recognition
- Image tagging

- Python
- R
- Lisp
- Prolog
- Java

Tower of Hanoi is a mathematical puzzle which shows how recursion might be utilized as a device in building up an algorithm to take care of a specific problem. Using decision tree and Breath first search algorithm(BFS) we can solve Tower of Hanoi using AI.

The Turing test is a method to test the machine’s ability to match the human level intelligence. A machine is used to challenge the human intelligence that when it passes the test, it is considered as intelligent. Yet a machine could be viewed as intelligent without sufficiently knowing about people to mimic a human.

An artificial intelligence program that has expert-level knowledge about a specific area and knows how to utilize its information to react appropriately. These systems have expertise to substitute a human expert. Their characteristics include –

- High performance
- Adequate response time
- Reliability
- Understandable

- Consistency
- Memory
- Diligence
- Logic
- Multiple expertise
- Ability to reason
- Fast response
- Unbiased in nature

A* is a computer algorithm that is extensively used for the purpose of finding the path or traversing a graph in order to find the most optimal route between the various points called as the nodes.

Start with the root node, then proceed through neighboring nodes. Further, moves towards next level of nodes. Till the arrangement is found, produces one tree at any given moment. As this pursuit can be executed utilizing FIFO(First in First Out) data structure. This strategy gives the shortest path to the solution.

Depth first search is based on LIFO (Last In First Out). A recursion is implemented with LIFO stack data structure. Thus, the nodes were different order than in BFS. The path is stored in each iteration from root to leaf node in linear with space requirement.

The search begins forward from the beginning state and in reverse from objective state. The search meets to identify a common state. The initial state way is linked with the objective state in reverse way. Each search is done just up to half of the aggregate way.

The repetitive search process of level 1, level 2 happens in this search. The search process continues till the solution is found. Nodes are generated till a single node is created. Stack of nodes are saved. The search ends once the solution is found.

The uniform cost search performs sorting in increasing cost of the path to a node. It expands the least cost node. It is identical to BFS if each iteration has same cost. It investigates ways in the expanding order of cost.

AI system uses game theory for enhancement, it requires more than one participant which narrows the field quite a bit. The two fundamental roles:

- Participant Design: Game theory is used to enhance the decision of a participant to get maximum utility.
- Mechanism Design: Inverse game theory, designs a game for a group of intelligent participants. Ex. Auctions.

A search algorithm that tries to reduce the number of nodes that are searched by the minimax algorithm in the search tree. It can be applied to ‘n’ depth, prunes entire subtrees and leaves.

Fuzzy logic is the subset of AI, it is a way of encoding human learning for artificial processing. It is a form of many-valued logic. It is represented as IF-THEN rules.

- Facial pattern recognition
- Air conditioners, washing machines, vacuum cleaners
- Antiskid braking systems, transmission systems
- Control of subway systems and unmanned helicopters
- Weather forecasting systems
- Project risk assessment
- Medical diagnosis and treatment plans
- Stock trading

A problem has to be solved in a sequential approach to attain some goal, the partial-order plan specifies all actions that need to be taken, but specifies an ordering of the actions only when required.

First-order Predicate logic is collection of formal systems, where each statement is divided into a subject and a predicate. The predicate refers to only one subject and it can either modify or define the properties of the subject.

Inductive machine learning | Deductive machine learning | Abductive machine learning |

Learns from the set of instances to draw conclusion | Derives conclusion and then improves conclusion based on previous decision | It is a deep learning technique where conclusions are derived based on various instances. |

Statistical machine learning such as KNN (K-nearest neighbor) or SVM (Support Vector Machine) | Machine learning algorithm to deductive reasoning using a decision tree. | Deep neural network is used in abductive learning. |

A ⋀ B ⊢ A → B (Induction) | A ⋀ (A → B) ⊢ B (Deduction) | B ⋀ (A → B) ⊢ A (Abduction) |

- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforcement Learning
- Transduction
- Learning to Learn

It is a subset of machine learning which is used to create an artificial multi-layer neural network. They have self-learning capability based on previous instances and provides high accuracy.

Supervised learning | Unsupervised learning | Reinforcement learning | |

Definition | Training set has both predictors and predictions. | Training set has only predictors in the data set. | They can establish state-of-art results on any task. |

Algorithm | Linear and logistic regression, Support vector machine, Naive Bayes | K-Means, Clustering algorithm, Dimensionality reduction algorithms | Q-Learning, State-Action-Reward-State-Action (SARSA), Deep Q Network (DQN) |

Uses | Image recognition, speech recognition, forecasting | Pre-process the data, pre-train supervised learning algorithms. | Warehouses, Inventory management, delivery management, Power system, Financial systems. |

Parametric model | Non-parametric model | |

Definition | Finite number of parameters to predict new data | Unbounded number of parameters |

Algorithm | Logistic Regression, Linear Discriminant Analysis, Perceptron, Naive Bayes | k-Nearest Neighbors, Decision Trees like CART and C4.5, Support Vector Machines |

Benefits | Simple, Fast, Less data. | Flexibility, Power, Performance |

Limitations | Constrained, Limited complexity, Poor fit | More data, slower, over fitting |

- Logistic regression
- Linear regression
- Decision Trees
- Support vector machines
- Naive Bayes etc.

Naive Bayes machine learning algorithm is a powerful algorithm for predictive modeling. It is a set of algorithms with common principle based on Bayes Theorem. The fundamental Naive Bayes assumption is that each feature makes independent and equal contribution to the outcome

Perceptron is an algorithm which is able to simulate the ability of the brain to understand and discard, it is for supervised classification of the input into one of several possible non-binary outputs.

- Independent Component Analysis
- Principal Component Analysis
- Kernel Based Principal Component Analysis

kNN | kmeans |

Supervised | Unsupervised |

Classification algorithms | Clustering algorithms |

Minimal training model | Exhaustive training model |

Used in classification and regression of known data | Used in Population demographics, Market segmentation, Social media trends, Anomaly detection. |

Ensemble learning is a computational technique in which classifiers or experts are strategically formed and combined. It is used to improve the classification, prediction, function approximation etc of a model.

- Data collection
- Data preparation
- Choosing an appropriate model
- Training the data set
- Evaluation
- Parameter tuning
- Predictions

A hash table is a data structure that is used to produce an associative array which are mostly used for database indexing.

Regularization comes to picture when the model is either overfit or underfit. It is basically used to minimize the error in the dataset. A new piece of information is fit in the data set to avoid fitting issues.

- Data Acquisition
- Ground Truth Acquisition
- Cross Validation Technique
- Query Type
- Scoring Metric
- Significance Test

Model accuracy is just a subset of model performance. Model accuracy is based on model performance of algorithm, model performance is based on the datasets we feed as input to the algorithm.

It is the weighted average of precision and recall. It considers both false positive and false negative into account. It is used to measure the model’s performance.

- Image, speech, face detection
- Bioinformatics
- Market segmentation
- Manufacturing and Inventory management
- Fraud detection etc.

- Lasso Regression method.
- Random Forest, plot variable importance chart.
- Linear regression.

A recognition system is an information filtering system that is used to predict user preference based on choice patterns followed by the user while browsing/using the system.

Dimensionality reduction is the process of reducing the number of random variables. You can reduce dimensionality by missing values ratio, low variance filter, high correlation filter, random forest, PCA etc.

- Sliding-window methods
- Recurrent sliding windows,
- Hidden Markov models
- Maximum entropy Markov models
- Conditional random fields
- Graph transformer networks

- Requires less formal statistical training
- Ability to detect nonlinear relationship between variables
- Detects all possible interaction between predictor variables.
- Availability of multiple training algorithm

Bias error is used to measure how much on an average the predicted values vary from the actual value. In case a high bias error occurs, we have an under-performing model.

Variance is used to measure how are the prediction made on same observation different from each other. A high variance model will over-fit the dataset and perform badly on any observation.

TensorFlow is an open source machine learning library. It is fast, flexible and a low-level toolkit for doing complex algorithm and offers the users customizability to build experimental learning architectures and to work on them to produce desired outputs.

Tensorflow : installation Guide

CPU : pip install tensorflow-cpu

GPU : pip install tensorflow-gpu

- Constants
- Variables
- Placeholder
- Graph
- Session

Cost function is a scalar functions which Quantifies the error factor of the Neural Network. Lower the cost function better the Neural network. Eg: MNIST Data set to classify the image, input image is digit 2 and the Neural network wrongly predicts it to be 3

- Linear Neuron
- Binary Threshold Neuron
- Stochastic Binary Neuron
- Sigmoid Neuron
- Tanh function
- Rectified Linear Unit (ReLU)

**Learning Rate :**Learning rate is how fast the network learns new beliefs.**Momentum :**Parameter which helps to come out of local minima and smoothen the jumps while gradient decent.**Epoch :**Epoch is the complete once forward and backward propagation to correct its weights. As epoch increases loss or error decreases as it learns better and better.

As we add more and more hidden layers, back propagation becomes less and less useful in passing information to the lower layers. In effect, as information is passed back, the gradients begin to vanish and become small relative to the weights of the networks.

Dropout is a simple way to prevent a Neural network from overfitting. Dropping out some of the units in neural network. In Reproduction nature produces offspring’s by combining distinct genes rather than strengthening co-adapting them.

Long Short Term Memory – are explicitly designed to address the long term dependency problem, by maintaining a state what to remember and what to forget.

Gates (forget, Memory, update & Read)

tanh(x) (values between -1 to 1)

Sigmoid(x) (values between 0 to 1)

- LSTM: Long Short Term Memory
- GRU: Gated Recurrent Unit
- End to End Network
- Memory Network

Auto encoder is basically used to learn a compressed form of given data. Few applications include-

- Data denoising
- Dimensionality reduction
- Image reconstruction
- Image colorization

**Components of GAN –**

- Generator
- Discriminator

**Deployment steps –**

- Train the model
- Validate & Finalize the model
- Save the model
- Load the saved model for next prediction.

Gradient descent is an optimization algorithm that is used to find the coefficients of parameters that is used to reduce the cost function to minimum.

Step 1 : Allocate the weights(x,y) with random values and calculate Error (SSE)

Step 2 : Calculate the gradient i.e. variation in SSE when the weights (x,y) are changed by a very small value. This helps us move the values of x & y in the direction in which SSE is minimized.

Step 3 : Adjust the weights with the gradients to move towards the optimal values where SSE is minimized

Step 4: Use the new weights for prediction and to calculate the new SSE

Step 5 : Repeat steps 2 and 3 till further adjustments to weights doesn’t significantly reduce the Error

**Syntax – Class Session**

It is a class for running TensorFlow operations. The environment is encapsulated in the session object wherein the operation objects are executed and Tensor objects are evaluated.

# Build a graph.

x = tf.constant(2.0)

y = tf.constant(5.0)

z = x * y

# Launch the graph in a session.

sess = tf.Session()

# Evaluate the tensor `z`.

print(sess.run(z))

TensorFlow “cluster” is a set of “tasks” that participate in the distributed execution of a TensorFlow graph. Each task is associated with a TensorFlow “server”, which contains a “master” that can be used to create sessions, and a “worker” that executes operations in the graph. A cluster can also be divided into one or more “jobs”, where each job contains one or more tasks.

To use HDFS with TensorFlow, we need to change the file paths to read and write data to an HDFS path. For example:

filename_queue = tf.train.string_input_producer([

“hdfs://namenode:8020/path/to/file1.csv”,

“hdfs://namenode:8020/path/to/file2.csv”,

])

The intermediate tensors that are created as part of a call to Session.run() will be freed at or before the end of the call. Sessions can own resources, few classes like tf.Variable, tf.QueueBase, and tf.ReaderBase and they use a significant amount of memory. These resources (and the associated memory) are released when the session is closed, by calling tf.Session.close.

When you first run the tf.Variable.initializer operation for that variable in a session it is started. It is destroyed when that tf.Session.close.

## "0 Responses on Top Artificial Intelligence Interview Questions And Answers"