The dataThe dataset we will be using is called MNIST. This is a large collection of hand-drawn digits 0 to 9 and is a good dataset to learn image classification on as it requires little to no preprocessing.The dataset can be downloaded from The MNIST Database. Download all four files. These files are the images and their respective labels (normally, we're required to split the x (image data / characteristics) and y (labels) out during preprocessing, but this has already been done for us). The dataset has also already been split into a train and a test set.Once you've downloaded the data, make sure that the data are in the same folder as this Jupyter notebook. If you've managed to do all that, we can now begin!By default, the MNIST files are compressed in the gzip format. The following two functions will extract the data for you. ** Don't change this code. **In [2]:def extract_data(filename, num_images, IMAGE_WIDTH): """Extract the images into a 4D tensor [image index, y, x, channels].""" with gzip.open(filename) as bytestream: bytestream.read(16) buf = bytestream.read(IMAGE_WIDTH * IMAGE_WIDTH * num_images) data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32) data = data.reshape(num_images, IMAGE_WIDTH*IMAGE_WIDTH) return datadef extract_labels(filename, num_images): """Extract the labels into a vector of int64 label IDs.""" with gzip.open(filename) as bytestream: bytestream.read(8) buf = bytestream.read(1 * num_images) labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64) return labelsChallenge 1: Extracting the dataThe MNIST dataset consists of 60,000 training images and 10,000 testing images. This is a lot of data! Let's not extract all of that right now. Create a function get_data that uses the above functions to extract a certain number of images and their labels from the gzip files.The function will take as input two integer values, the number of train and test images to be extracted. Let's extract 5000 train images and 1000 test images. The function then returns four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the extracted images and labels of the training set, and (X-test, y_test) are the extracted images and labels of the testing set. (Hint – you'll have to use the functions provided more than once.)Image pixel values range from 0 to 255. We need to normalise the image pixels so that they are in the range 0 to 1.Function specifications:Should take two integers as input, one representing the number of training images and the other the number of testing images.Should return two tuples of the form (X_train, y_train), (X_test, y_test).Note that the size of the MNIST images is 28x28.Usually when setting up your dataset, it is a good idea to randomly shuffle your data in case your data are ordered. Think of this as shuffling a pack of cards. Here, however, we aren't going to shuffle the data so that all our answers are the same.
Question
The dataThe dataset we will be using is called MNIST. This is a large collection of hand-drawn digits 0 to 9 and is a good dataset to learn image classification on as it requires little to no preprocessing.The dataset can be downloaded from The MNIST Database. Download all four files. These files are the images and their respective labels (normally, we're required to split the x (image data / characteristics) and y (labels) out during preprocessing, but this has already been done for us). The dataset has also already been split into a train and a test set.Once you've downloaded the data, make sure that the data are in the same folder as this Jupyter notebook. If you've managed to do all that, we can now begin!By default, the MNIST files are compressed in the gzip format. The following two functions will extract the data for you. ** Don't change this code. **In [2]:def extract_data(filename, num_images, IMAGE_WIDTH): """Extract the images into a 4D tensor [image index, y, x, channels].""" with gzip.open(filename) as bytestream: bytestream.read(16) buf = bytestream.read(IMAGE_WIDTH * IMAGE_WIDTH * num_images) data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32) data = data.reshape(num_images, IMAGE_WIDTH*IMAGE_WIDTH) return datadef extract_labels(filename, num_images): """Extract the labels into a vector of int64 label IDs.""" with gzip.open(filename) as bytestream: bytestream.read(8) buf = bytestream.read(1 * num_images) labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64) return labelsChallenge 1: Extracting the dataThe MNIST dataset consists of 60,000 training images and 10,000 testing images. This is a lot of data! Let's not extract all of that right now. Create a function get_data that uses the above functions to extract a certain number of images and their labels from the gzip files.The function will take as input two integer values, the number of train and test images to be extracted. Let's extract 5000 train images and 1000 test images. The function then returns four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the extracted images and labels of the training set, and (X-test, y_test) are the extracted images and labels of the testing set. (Hint – you'll have to use the functions provided more than once.)Image pixel values range from 0 to 255. We need to normalise the image pixels so that they are in the range 0 to 1.Function specifications:Should take two integers as input, one representing the number of training images and the other the number of testing images.Should return two tuples of the form (X_train, y_train), (X_test, y_test).Note that the size of the MNIST images is 28x28.Usually when setting up your dataset, it is a good idea to randomly shuffle your data in case your data are ordered. Think of this as shuffling a pack of cards. Here, however, we aren't going to shuffle the data so that all our answers are the same.
Solution
The text is asking you to create a function named get_data that will extract a specified number of training and testing images from the MNIST dataset. The MNIST dataset is a large collection of hand-drawn digits from 0 to 9, and it's a good dataset for learning image classification.
Here are the steps to follow:
-
Download the MNIST dataset from The MNIST Database. The dataset is split into training and testing sets, and each set includes images and their respective labels.
-
Place the downloaded files in the same folder as your Jupyter notebook.
-
Use the provided
extract_dataandextract_labelsfunctions to extract the images and labels from the gzip files. These functions read the data from the gzip files and convert them into a format that can be used for machine learning. -
Create the
get_datafunction. This function should take two integers as input, representing the number of training images and the number of testing images to extract. The function should use theextract_dataandextract_labelsfunctions to extract the specified number of images and labels from the gzip files. -
The
get_datafunction should return two tuples:(X_train, y_train)and(X_test, y_test).X_trainandX_testare the extracted images for the training and testing sets, respectively, andy_trainandy_testare the corresponding labels. -
Normalize the image pixels so that they are in the range 0 to 1. This is because the pixel values in the images range from 0 to 255, and normalizing these values will make it easier for the machine learning model to process the data.
-
Although it's usually a good idea to randomly shuffle your data to ensure that the model doesn't learn any unintended patterns from the order of the data, you're not going to do this in this case so that everyone's answers are the same.
Remember, the size of the MNIST images is 28x28.
Similar Questions
1. Create a train-test split and classify the images using any classifier you have used previously. What is the classifier performance?
Create labeled data csv. Our codebase relies on labeled and unlabeled data during training. To specify which images should be used as labeled data, you can do one of the following options: Option 1. Specify a CSV file indicating which images to treat as labeled data. The path to this CSV file will be passed as an arg into the training script. This CSV should contain the patient IDs and slice indices of the images to use as labeled data. An example CSV for using 5 patients' labeled data is provided in scripts/csv_samplers, and a notebook for automatically creating CSVs is provided in notebooks/create_csv.ipynb. This option is appropriate if you only want to use a subset of the available ground truth segmentation masks to train the segmentation network. example_label_specification.csv patient_0,0 patient_0,1 patient_0,2 patient_0,3 patient_1,0
You want to classify images of dogs from cats. You have collected 2,000 images of dogs and 2,000 images of cats. How would you split the data effectively into a training set and a validation set?Group of answer choicesYou should split by light or dark fur color.You should split by whether the image contains a cat or dog.You should split by high- or low-quality images.You should split uniformly at random.
How many images are there in each class of the CIFAR-10 dataset?15000100006000500012000
What is the focus of the "Introduction to the Object Detection Project"?a.To introduce the concept of labeling images for machine learning.b.To explain the process of identifying and locating objects within images.c.To provide an overview of deep learning techniques for image classification.d.To discuss the challenges of evaluating classification models in MATLAB.Clear my choice
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.