Knowee
Questions
Features
Study Tools

The default value of test_size parameter in train_test_split() is _____.1 point0.250.20.80.32. The confusion_matrix() function comes under _____ module.1 pointsklearn.utilssklearn.metricssklearn.model_selectionsklearn.calibration3. Pandas ______ is used to view some basic statistical details like percentile, mean, std etc. of a data frame.1 pointdescribe()desc()details()info()4. Consider a dataframe df containg two tuples. Then df.head() will return1 pointFive tuples where bottom 3 containing NoneFive tuples where bottom 3 containing garbage valuesTwo tuplesError5. To select a specific column (say ‘col3’) from a dataframe (say ‘df’), we have to write1 pointdf(‘col3’)df[['col3']]df.col3df[3]6. To implement linear regression, we can use _____.1 pointsklearn.model_selection.LinearRegression()sklearn.multiclass.LinearRegression()sklearn.preprocessing.LinearRegression()sklearn.linear_model.LinearRegression()7. What is the effect of following line:                                                 df = df.dropna(axis=0)1 pointDrops all rowsDrops all columnsDrop rows with null valuesDrop columns with null values8. Following data points represents ___________.1 pointPositive CorrelationNegative CorrelationNegative CovarianceZero Covariance9. Regression is one of the types of supervised learning models, where data is classified according to labels and output data need not be continuous. (True/False)1 pointTrueFalse10. Which of the following is defined as the measure of balance between precision and recall?1 pointAccuracyF1-scoreReliabilityPunctuality11. _____ helps to find the best model that represents our data and how well the chosen model will work in future.1 pointEvaluationPerformance MeasureLearningValidation12. While evaluating a model's performance, recall parameter considers _____.1 pointFalse PositiveFalse NegativeTrue PositiveTrue Negative13. Two conditions when prediction matches with the reality are true positive and __________.1 pointFalse PositiveFalse NegativeTrue PositiveTrue Negative14. Odd man out:Regression, Classification, Clustering1 pointRegressionClassificationClustering15. Which of the following talks about how true the predictions are by any model?1 pointAccuracyReliablityRecallF1-score16. Which of the following tasks can be best solved using reinforcement learning?1 pointPredicting the amount of rainfall based on various cuesDetecting fraudulent credit card transactionsTraining a robot to solve a maze17. During linear regression, with regard to residuals, which among the following is true?1 pointLower is betterHigher is betterDepends upon the dataNone of the above18. We can handle missing values in Machine Learning by1 pointDeleting rows with missing valuesReplacing with the mean, median, or mode of remaining values in the columnReplacing with the most frequent categoryAll of the mentioned19. Which of the following is NOT supervised learning?1 pointPCADecision TreeLinear RegressionNaive Bayesian20. A computer program is said to learn if1 pointIt improves with experienceIt learns from experienceIt learns from mistakesIt learns from supervisor21. A well-defined learning problem must include1 pointTaskPerformance measureTraining experienceAll of the mentioned22. Inductive bias is the assumption made by the learner.1 pointTrueFalse23. If X represents a matrix of feature, then1 pointA row in the X represents one data point or one instanceA column in the X represents one feature or one attributeAll of the mentionedNone of the mentioned24. Semi-supervised Learning combines a __________ with a __________ during training.1 pointsmall amount of labelled data, large amount of unlabelled datasmall amount of labelled data, small amount of unlabelled datalarge amount of labelled data, large amount of unlabelled datalarge amount of labelled data, small amount of unlabelled data25. In multiple regression, we have ____ independent variable and _____ dependent variable.1 pointsingle, singlemore than one, singlemore than one, more than onesingle, more than one26.  Entropy([9+,5-]) = ?1 point0.2460.2830.940.6527.  Entropy([5+,0-]) = ?1 point0.50.25010.7528. To measure the overall strength of the model in regression analysis, we use _______.1 pointFactor analysisCoefficient of partial correlationCoefficient of partial regressionCoefficient of determination29.  What is the purpose of performing cross-validation?1 pointTo assess the predictive performance of the modelsTo judge how the trained model performs outside the sample on test dataAll of the mentionedNone of the above30. What does p indicate in the following figure?1 pointProportionProbabilityPrecisionPercentage

Question

The default value of test_size parameter in train_test_split() is _____.1 point0.250.20.80.32. The confusion_matrix() function comes under _____ module.1 pointsklearn.utilssklearn.metricssklearn.model_selectionsklearn.calibration3. Pandas ______ is used to view some basic statistical details like percentile, mean, std etc. of a data frame.1 pointdescribe()desc()details()info()4. Consider a dataframe df containg two tuples. Then df.head() will return1 pointFive tuples where bottom 3 containing NoneFive tuples where bottom 3 containing garbage valuesTwo tuplesError5. To select a specific column (say ‘col3’) from a dataframe (say ‘df’), we have to write1 pointdf(‘col3’)df[['col3']]df.col3df[3]6. To implement linear regression, we can use _____.1 pointsklearn.model_selection.LinearRegression()sklearn.multiclass.LinearRegression()sklearn.preprocessing.LinearRegression()sklearn.linear_model.LinearRegression()7. What is the effect of following line:                                                 df = df.dropna(axis=0)1 pointDrops all rowsDrops all columnsDrop rows with null valuesDrop columns with null values8. Following data points represents ___________.1 pointPositive CorrelationNegative CorrelationNegative CovarianceZero Covariance9. Regression is one of the types of supervised learning models, where data is classified according to labels and output data need not be continuous. (True/False)1 pointTrueFalse10. Which of the following is defined as the measure of balance between precision and recall?1 pointAccuracyF1-scoreReliabilityPunctuality11. _____ helps to find the best model that represents our data and how well the chosen model will work in future.1 pointEvaluationPerformance MeasureLearningValidation12. While evaluating a model's performance, recall parameter considers _____.1 pointFalse PositiveFalse NegativeTrue PositiveTrue Negative13. Two conditions when prediction matches with the reality are true positive and __________.1 pointFalse PositiveFalse NegativeTrue PositiveTrue Negative14. Odd man out:Regression, Classification, Clustering1 pointRegressionClassificationClustering15. Which of the following talks about how true the predictions are by any model?1 pointAccuracyReliablityRecallF1-score16. Which of the following tasks can be best solved using reinforcement learning?1 pointPredicting the amount of rainfall based on various cuesDetecting fraudulent credit card transactionsTraining a robot to solve a maze17. During linear regression, with regard to residuals, which among the following is true?1 pointLower is betterHigher is betterDepends upon the dataNone of the above18. We can handle missing values in Machine Learning by1 pointDeleting rows with missing valuesReplacing with the mean, median, or mode of remaining values in the columnReplacing with the most frequent categoryAll of the mentioned19. Which of the following is NOT supervised learning?1 pointPCADecision TreeLinear RegressionNaive Bayesian20. A computer program is said to learn if1 pointIt improves with experienceIt learns from experienceIt learns from mistakesIt learns from supervisor21. A well-defined learning problem must include1 pointTaskPerformance measureTraining experienceAll of the mentioned22. Inductive bias is the assumption made by the learner.1 pointTrueFalse23. If X represents a matrix of feature, then1 pointA row in the X represents one data point or one instanceA column in the X represents one feature or one attributeAll of the mentionedNone of the mentioned24. Semi-supervised Learning combines a __________ with a __________ during training.1 pointsmall amount of labelled data, large amount of unlabelled datasmall amount of labelled data, small amount of unlabelled datalarge amount of labelled data, large amount of unlabelled datalarge amount of labelled data, small amount of unlabelled data25. In multiple regression, we have ____ independent variable and _____ dependent variable.1 pointsingle, singlemore than one, singlemore than one, more than onesingle, more than one26.  Entropy([9+,5-]) = ?1 point0.2460.2830.940.6527.  Entropy([5+,0-]) = ?1 point0.50.25010.7528. To measure the overall strength of the model in regression analysis, we use _______.1 pointFactor analysisCoefficient of partial correlationCoefficient of partial regressionCoefficient of determination29.  What is the purpose of performing cross-validation?1 pointTo assess the predictive performance of the modelsTo judge how the trained model performs outside the sample on test dataAll of the mentionedNone of the above30. What does p indicate in the following figure?1 pointProportionProbabilityPrecisionPercentage

...expand
🧐 Not the exact question you are looking for?Go ask a question

Solution

  1. The default value of test_size parameter in train_test_split() is 0.25.
  2. The confusion_matrix() function comes under sklearn.metrics module.
  3. Pandas describe() is used to view some basic statistical details like percentile, mean, std etc. of a data frame.
  4. Consider a dataframe df containing two tuples. Then df.head() will return two tuples.
  5. To select a specific column (say ‘col3’) from a dataframe (say ‘df’), we have to write df['col3'].
  6. To implement linear regression, we can use sklearn.linear_model.LinearRegression().
  7. The effect of the following line: df = df.dropna(axis=0) is to drop rows with null values.
  8. Without the data points, it's impossible to determine what they represent.
  9. False. Regression is a type of supervised learning model where the output data is continuous.
  10. F1-score is defined as the measure of balance between precision and recall.
  11. Evaluation helps to find the best model that represents our data and how well the chosen model will work in future.
  12. While evaluating a model's performance, recall parameter considers True Positive and False Negative.
  13. Two conditions when prediction matches with the reality are true positive and true negative.
  14. Clustering is the odd one out as it is a type of unsupervised learning, while regression and classification are types of supervised learning.
  15. Accuracy talks about how true the predictions are by any model.
  16. Training a robot to solve a maze can be best solved using reinforcement learning.
  17. During linear regression, with regard to residuals, lower is better.
  18. We can handle missing values in Machine Learning by all of the mentioned methods.
  19. PCA (Principal Component Analysis) is not a supervised learning method.
  20. A computer program is said to learn if it improves with experience.
  21. A well-defined learning problem must include all of the mentioned: task, performance measure, and training experience.
  22. True. Inductive bias is the assumption made by the learner.
  23. If X represents a matrix of feature, then both statements are true: A row in the X represents one data point or one instance and a column in the X represents one feature or one attribute.
  24. Semi-supervised Learning combines a small amount of labelled data with a large amount of unlabelled data during training.
  25. In multiple regression, we have more than one independent variable and a single dependent variable.
  26. Entropy([9+,5-]) = 0.94.
  27. Entropy([5+,0-]) = 0.
  28. To measure the overall strength of the model in regression analysis, we use the coefficient of determination.
  29. The purpose of performing cross-validation is to assess the predictive performance of the models and to judge how the trained model performs outside the sample on test data.
  30. Without the figure, it's impossible to determine what p indicates.

This problem has been solved

Similar Questions

Question 9Select the correct syntax to obtain the data split that will result in a train set that is 60% of the size of your available data.1 pointX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.6)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)X_train, y_test = train_test_split(X, y, test_size=0.40)X_train, y_test = train_test_split(X, y, test_size=0.6)

Which function in scikit-learn is used to split data into training and testing sets?Answer areatrain_test_split()split_data()data_split()train_test()

10. import pandas as pdfrom sklearn.preprocessing import train_test_splitdf = pd.read_csv('insurance_claims.csv')xtrain, xtest, ytrain, ytest = train_test_split(df.drop("is_claim", axis=1), df.is_claim, test_size=0.3, random_state=42)Which of the following is true about the code above?The code reads a csv file named insurance_claims. It splits the data into train and test sets. The test split contains 30% of the data. The random state makes sure that the data is split at random to remove inherent order which may be in the data. When the code is run multiple times it produces the diffent splits since `train_test_split` with the parameter `random_state` splits data at random.None of the given answersThe code reads a csv file names insurance claims. The `train_test_split` function will give an error since the second position argument `df.is_claim` is referencing a column that has been drop on the first position argument `df.drop("is_claim", axis=1)The code reads a csv file named insurance_claims. It splits the data into train and test sets. The train split contains 70% of the data. The random state makes sure that when the code is run multiple times it produces the same identical splits since `train_test_split` splits data at random.

This question refers to the following code snippet, which assumes that all required libraries have been imported.Xtrain, Xtest, ytrain, ytest = train_test_split(X,y,test_size = 0.3)yhat = GaussianNB().fit(Xtrain,ytrain).predict(Xtest)acc = accuracy_score(ytest, yhat)This code uses with of available data used for training. It outputs the based on . Every time we run this code, we will get .

What is the value of True Positive (TP) in the confusion matrix generated by the RandomForestClassifier below? Modify the code to print the value.from sklearn.metrics import confusion_matrixfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier# Generate synthetic binary classification datasetX, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)# Split the dataset into training and test setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)# Initialise and train the RandomForestClassifierrf_classifier = RandomForestClassifier(random_state=42)rf_classifier.fit(X_train, y_train)# Predict the test set resultsy_pred = rf_classifier.predict(X_test)# Generate the confusion matrixcm = confusion_matrix(y_test, y_pred)# insert code here

1/1

Upgrade your grade with Knowee

Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.