## Top 100 questions for Machine Learning Interview

- How is machine learning different from normal computer programs?

- What is the difference between supervised and unsupervised machine learning techniques? Explain with one example

- Can you give us one example of a reinforcement machine learning technique?

- If there are some missing data in your dataset, how would you like to treat them?

- In case you do not apply techniques such as MICE, would you like to fill the missing data in a continuous variable with the mean value or some other measure of central tendency and why?

- If there are n variables in your dataset, how many parameters would you have in your linear regression model?

- What is the most basic check that we need to perform before we proceed with linear regression?
- What is the relationship between covariance and correlation?

- Can we conclude that if there is a strong correlation between two variables there should also be a causal relationship between them?

- If you are doing a simple linear regression model then the coefficient of a variable is going to be equal to which value?

- If the R square value is .98 what does it mean in case of SISO(Single input and single output model)?

- What is an adjusted R square? Do you think if we add more variables the multiple r square and the adjusted r square will always increase? Why?

- What do you understand about the terms heteroscedasticity and homoscedasticity?

- What is the assumption with regards to the outliers in the case of linear regression?

- Do you think outliers in the input variable will have any impact on the linear regression model?
- Can you explain the term Multicollinearity? How can we identify multicollinearity in our regression model?
- Why is the best fit line in linear regression also called the least-squares line?

- What is the advantage of standardizing the data set before we start creating a linear regression model?

- Using which diagnostic plot you can identify the outliers in your linear regression model?

- What is the impact of including the interaction term in your linear regression model?

- Which test would let you conclude if a variable is significant in your linear regression model?What is the null and alternate hypothesis in that test?

- In linear regression, why do we need to calculate the square of the error terms instead of simply taking a sum of all the error terms?

- How would you confirm that the errors or the residuals are normally distributed?

- If there are 5 candidate linear regression models, how would you choose the best model among them?

- What do you mean by AIC and how is it calculated?

- Is there any relationship between the Coefficient of determination and the Pearson correlation coefficient of input & output variable of a SISO model?

- What is the difference between a gradient descent algorithm and a stochastic gradient descent algorithm? Which will converge faster? Out of these two which is bound to reach global minima and which may get stuck at local minima?

- In order to do a binary class classification, which ML technique would you like to use?

- What is the advantage of a Logistic regression model over a CART model?

- Can you explain the ROC curve? What do we have on the x and y-axis?

- How do we decide the threshold value by looking at the ROC plot?

- If the total area under the ROC curve is 1 what does it mean in terms of predictive capability?

- What is a confusion matrix? explain Sensitivity, Specificity and Kappa statistics.

- What do you understand by the terms False positive rate(FPR) and False Negative rate?

- Suppose you are working in a diagnostic lab that is doing COVID tests, for which error you will have a preference, FPR or FNR?

- I am creating a logistic regression model for predicting if a person is likely to have cancer or not. what impact a higher threshold will have on my predicted class?

- What advantage does Linear discriminant analysis have over logistic regression?

- There are 5 input variables and 3 classes of independent variables. How many linear discriminant functions will be there in the model?

- Are 2 linear discriminant functions always going to be orthogonal to each other?

- What are the two most important assumptions of linear discriminant analysis?

- Can we say that Linear Discriminant analysis is also a dimension reduction technique besides being a classifier?

- Is a decision tree a parametric machine learning algorithm? What advantage does it have over a logistic regression model?

- Can you explain the concept of entropy and information gain?

- What is a root and decision node in a CART?

- In case we are using CART for regression what criteria will you use to make splits?

- What is the GINI index?

- What advantage does a random forest have over a decision tree?

- Can you explain the concept of bagging and boosting?

- Can you give us an example where you would like to apply the Naive Bayes technique?

- Are KNN and K-means both supervised machine learning techniques?

- Are KNN and K-means both parametric machine learning techniques?

- WHich proximity metric do we use to calculate the distance among data points in the case of KNN?
- What is a single linkage method?

- What is the difference between Euclidean distance and Manhattan distance?

- Why is KNN called a lazy learning algorithm?

- How do we decide the value of K in KNN?

- What advantage do we have when we normalize the data before going ahead with a KNN?
- What is the biggest drawback of KNN particularly for a high dimensional dataset?

- Which machine learning technique would you like to use for market segmentation?

- Can you explain the process of K-means? Is K a parameter?

- Have you done Hierarchical clustering? Is it a supervised or unsupervised technique?

- Which algorithm in machine learning is also referred to as a maximum margin classifier?

- What constraint do we have on the margin width in the case of a support vector machine?

- What is a Kernel trick and in which scenarios they are useful? Can you give examples of some kernel functions?

- Is the alpha value of a support vector always non-zero?

- When we are doing the hyperparameter tuning in the case of SVM what impact will a low gamma value have?
- What is a slack variable in SVM?

- What are some of the characteristics of Time Series Data?

- Explain the various components of a time series?

- How will you identify if a time series is stationary or not? What can be done to make it stationary?
- What is an exponential smoothing technique in time series?

- Explain the difference between the additive and the multiplicative time series model.

- What all terms are there in the AR, MA, ARMA, ARIMA, SARIMA model of time series?

- What is the null and alternate Hypothesis in the Augmented Dickey-Fuller test?

- What is the use of ACF and PACF plots?

- What is the assumption with regards to the error terms in a time series model?How do you verify the same?

- Is Sum of squared error sensitive to outliers?

- In which scenarios do we use log transformation of the time series data?

- What are the 2 main advantages of a dimension reduction technique?

- What is the difference between feature extraction and feature selection?

- What is filter strategy in feature selection?

- Can you give us scenarios where you can use PCA and FA? Is there any specific requirement with respect to the data type, in order to proceed with any dimension reduction technique?

- What are the three key differences between PCA and FA?

- What is the difference between EFA and CFA?

- What is the purpose of Bartlett’s test of sphericity? In your opinion, which test is better, KMO or Bartlett’s test?

- Have you heard of the phrase “Curse of Dimensionality”, what does it mean?

- Is there any relationship between the variance explained and the Principal components?

- What is a scree plot?

- What is EIgenVector and EigenValue?

- On what basis do we identify and retain the important principal components?

- What is communality and factor loading in factor analysis?

- What is oblique rotation? How is it different from Quartimax rotation?

- What is the flipside of using an oblique rotation?

- Is there any relationship between the two Principal components?

- What will be the dot product of two orthogonal principal components?

- Which algorithm do we use in mining the frequent pattern from the data of a retail chain?

- What is the meaning of support and confidence of a rule in Market basket analysis?

- Can you explain the term Bias variance dichotomy?

- Do you see any problem if we try to improve the machine learning model by making it more complex and thereby getting a better accuracy on the training and validation data?

- What is K-fold cross-validation? Where have you used it?

Pingback: 50+ Statistics Questions for Data Science Interview: By Ajay Ohri