You want to predict if a hospital patient is in danger of cardiac arrest and needs to be moved to the ICU. You have collected data from multiple patients consisting of vital signs with labels of whether or not the patient experienced cardiac arrest. Some of the data was collected on the same patient who may have experienced multiple cardiac arrests at different times. How would you split the data effectively into a training set and a validation set?Group of answer choicesYou should split by age of patient.You should split uniformly at random by visit.You should split uniformly at random by patient.You should split by the presence of cardiac arrest in medical history.
Question
You want to predict if a hospital patient is in danger of cardiac arrest and needs to be moved to the ICU. You have collected data from multiple patients consisting of vital signs with labels of whether or not the patient experienced cardiac arrest. Some of the data was collected on the same patient who may have experienced multiple cardiac arrests at different times. How would you split the data effectively into a training set and a validation set?Group of answer choicesYou should split by age of patient.You should split uniformly at random by visit.You should split uniformly at random by patient.You should split by the presence of cardiac arrest in medical history.
Solution
You should split uniformly at random by patient. This is because if a patient's data is in both the training and validation set, the model might simply memorize the patient's data and perform well on the validation set, giving a false sense of accuracy. By ensuring all of a patient's data is in either the training or validation set, but not both, you can better evaluate the model's ability to generalize to unseen patients.
Similar Questions
A data analyst trained a machine learning model to classify whether a transaction is fraudulent or not. The model had a training score of 89% and a test score of 92% percent. It was suggested that the high score on the test set may be due to the fact that the test split was easy to predict on. The analyst was then advised to use cross validation so that he gets metrics on different and random splits. Which of the following is/are the possible algorithms that can be used?- i. KFold cross validation- ii. Shuffle Split Cross validation- iii. Grid search cross validation- iv. Linear regressioni. onlyi. and ii onlyiii. and iv onlyiii. onlyiv. onlyii. only
5. During a pandemic, hospitals have to decide quickly which patients should be admitted for further treatment. The hospitals have already manually scanned through a small number of medical records (including data such as daily blood pressure, daily body temperature) and decided which patients should be admitted. The hospitals would like to explore using AI to scan through more medical records of patients for making admissions decisions. Which type of technique should they use?在一次傳染病全球大流行期間,醫院需要迅速地決定病人應否入院作進一步的治療。現時醫院已透過工作人員檢視小部份病人的醫療紀錄(數據包括:每日血壓及體溫)以決定病人應否入院。醫院希望研究運用人工智能檢視更多病人醫療紀錄用作決定病人入院的需要。醫院應使用以下哪一種技術?Classification algorithm分類演算法Convolution neural network 卷積神經網路Clustering algorithm分群演算法Object detection 物件偵測
You want to classify images of dogs from cats. You have collected 2,000 images of dogs and 2,000 images of cats. How would you split the data effectively into a training set and a validation set?Group of answer choicesYou should split by light or dark fur color.You should split by whether the image contains a cat or dog.You should split by high- or low-quality images.You should split uniformly at random.
You want to classify Twitter posts, or tweets, by sentiment. You have collected two months' worth of tweets which were labeled by volunteers as positive or negative. How would you split the data effectively into a training set and a validation set?Group of answer choicesYou should split at a single point in time; e.g., before or after a certain date.You should split by time of day; e.g., before or after noon every day.You should split by length of tweet; e.g., more or less than 40 characters.You should split uniformly at random.
What is the main objective to split the time series data in to training and test set?Answer choicesSelect only one optionREVISITBoth,training and test set are used to train the forecasting modelThe training set is used to train the forecasting model and the Test set is used to validate the modelThe training set is used to train and evaluate the model and the test set is used to check the model performanceThe training set is used to train the forecasting model and the test set is to retrain the forecasting model
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.