The dataThe dataset we will be using is called MNIST. This is a large collection of hand-drawn digits 0 to 9 and is a good dataset to learn image classification on as it requires little to no preprocessing.The dataset can be downloaded from The MNIST Database. Download all four files. These files are the images and their respective labels (normally, we're required to split the x (image data / characteristics) and y (labels) out during preprocessing, but this has already been done for us). The dataset has also already been split into a train and a test set.Once you've downloaded the data, make sure that the data are in the same folder as this Jupyter notebook. If you've managed to do all that, we can now begin!By default, the MNIST files are compressed in the gzip format. The following two functions will extract the data for you. Don't change this code. In [2]:def extract_data(filename, num_images, IMAGE_WIDTH): """Extract the images into a 4D tensor [image index, y, x, channels].""" with gzip.open(filename) as bytestream: bytestream.read(16) buf = bytestream.read(IMAGE_WIDTH * IMAGE_WIDTH * num_images) data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32) data = data.reshape(num_images, IMAGE_WIDTHIMAGE_WIDTH) return datadef extract_labels(filename, num_images): """Extract the labels into a vector of int64 label IDs.""" with gzip.open(filename) as bytestream: bytestream.read(8) buf = bytestream.read(1 num_images) labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64) return labelsChallenge 1: Extracting the dataThe MNIST dataset consists of 60,000 training images and 10,000 testing images. This is a lot of data! Let's not extract all of that right now. Create a function get_data that uses the above functions to extract a certain number of images and their labels from the gzip files.The function will take as input two integer values, the number of train and test images to be extracted. Let's extract 5000 train images and 1000 test images. The function then returns four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the extracted images and labels of the training set, and (X-test, y_test) are the extracted images and labels of the testing set. (Hint – you'll have to use the functions provided more than once.)Image pixel values range from 0 to 255. We need to normalise the image pixels so that they are in the range 0 to 1.Function specifications:Should take two integers as input, one representing the number of training images and the other the number of testing images.Should return two tuples of the form (X_train, y_train), (X_test, y_test).Note that the size of the MNIST images is 28x28.Usually when setting up your dataset, it is a good idea to randomly shuffle your data in case your data are ordered. Think of this as shuffling a pack of cards. Here, however, we aren't going to shuffle the data so that all our answers are the same.

Question

The dataThe dataset we will be using is called MNIST. This is a large collection of hand-drawn digits 0 to 9 and is a good dataset to learn image classification on as it requires little to no preprocessing.The dataset can be downloaded from The MNIST Database. Download all four files. These files are the images and their respective labels (normally, we're required to split the x (image data / characteristics) and y (labels) out during preprocessing, but this has already been done for us). The dataset has also already been split into a train and a test set.Once you've downloaded the data, make sure that the data are in the same folder as this Jupyter notebook. If you've managed to do all that, we can now begin!By default, the MNIST files are compressed in the gzip format. The following two functions will extract the data for you. ** Don't change this code. **In [2]:def extract_data(filename, num_images, IMAGE_WIDTH): """Extract the images into a 4D tensor [image index, y, x, channels].""" with gzip.open(filename) as bytestream: bytestream.read(16) buf = bytestream.read(IMAGE_WIDTH * IMAGE_WIDTH * num_images) data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32) data = data.reshape(num_images, IMAGE_WIDTH*IMAGE_WIDTH) return datadef extract_labels(filename, num_images): """Extract the labels into a vector of int64 label IDs.""" with gzip.open(filename) as bytestream: bytestream.read(8) buf = bytestream.read(1 * num_images) labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64) return labelsChallenge 1: Extracting the dataThe MNIST dataset consists of 60,000 training images and 10,000 testing images. This is a lot of data! Let's not extract all of that right now. Create a function get_data that uses the above functions to extract a certain number of images and their labels from the gzip files.The function will take as input two integer values, the number of train and test images to be extracted. Let's extract 5000 train images and 1000 test images. The function then returns four variables in the form of (X_train, y_train), (X_test, y_test), where (X_train, y_train) are the extracted images and labels of the training set, and (X-test, y_test) are the extracted images and labels of the testing set. (Hint – you'll have to use the functions provided more than once.)Image pixel values range from 0 to 255. We need to normalise the image pixels so that they are in the range 0 to 1.Function specifications:Should take two integers as input, one representing the number of training images and the other the number of testing images.Should return two tuples of the form (X_train, y_train), (X_test, y_test).Note that the size of the MNIST images is 28x28.Usually when setting up your dataset, it is a good idea to randomly shuffle your data in case your data are ordered. Think of this as shuffling a pack of cards. Here, however, we aren't going to shuffle the data so that all our answers are the same.

Question

Solution

Similar Questions

Upgrade your grade with Knowee