Knowee
Questions
Features
Study Tools

The file ScottMorrisonMP.json contains tweets from April 10th, 2015 until August 3rd, 2020. The data are organised as following:'created_at', 'favorite_count', 'followers_count', 'full_text', 'id', 'retweet_count', 'source'created_at: this column contains information about the date and time the tweet was createdfavorite_count: number of people who liked the tweetfollowers_count: number of Twitter users that follow Scott Morrisonfull_text: full text of the tweetid: tweet IDretweet_count: number of times a tweet has been retweetedsource: the device used to send the tweetQuestion 1Write a Python script to:import the required librariesload the data (ScottMorrisonMP.json)convert to a data frame called tweet_dffind the 5 most retweeted tweets and order them by the most likes and display only the tweet text (full_text), number of times the tweet has been retweeted (retweet_count), the date and time of the tweet (created_at) and number of people who liked the tweet (favorite_count) in this order. Save this into a dataframe called top5.The data are in your working directory.PLEASE LEAVE THE COMMENT ### edTest(test_q1) ### in the code box below for auto-marking.Hintimport jsonuse "open("ScottMorrisonMP.json")" and "json.load()" functionsuse "panda.DataFrame.from_records()" function

Question

The file ScottMorrisonMP.json contains tweets from April 10th, 2015 until August 3rd, 2020. The data are organised as following:'created_at', 'favorite_count', 'followers_count', 'full_text', 'id', 'retweet_count', 'source'created_at: this column contains information about the date and time the tweet was createdfavorite_count: number of people who liked the tweetfollowers_count: number of Twitter users that follow Scott Morrisonfull_text: full text of the tweetid: tweet IDretweet_count: number of times a tweet has been retweetedsource: the device used to send the tweetQuestion 1Write a Python script to:import the required librariesload the data (ScottMorrisonMP.json)convert to a data frame called tweet_dffind the 5 most retweeted tweets and order them by the most likes and display only the tweet text (full_text), number of times the tweet has been retweeted (retweet_count), the date and time of the tweet (created_at) and number of people who liked the tweet (favorite_count) in this order. Save this into a dataframe called top5.The data are in your working directory.PLEASE LEAVE THE COMMENT ### edTest(test_q1) ### in the code box below for auto-marking.Hintimport jsonuse "open("ScottMorrisonMP.json")" and "json.load()" functionsuse "panda.DataFrame.from_records()" function

...expand
🧐 Not the exact question you are looking for?Go ask a question

Solution

Here is a Python script that accomplishes the tasks you've outlined:

# Import required libraries
import pandas as pd
import json

# Load the data
with open('ScottMorrisonMP.json') as f:
    data = json.load(f)

# Convert to a DataFrame
tweet_df = pd.DataFrame.from_records(data)

# Find the 5 most retweeted tweets and order them by the most likes
top5 = tweet_df.nlargest(5, 'retweet_count').sort_values(by='favorite_count', ascending=False)

# Display only the tweet text, number of times the tweet has been retweeted, the date and time of the tweet, and number of people who liked the tweet
top5 = top5[['full_text', 'retweet_count', 'created_at', 'favorite_count']]

### edTest(test_q1) ###

This script first imports the necessary libraries, then loads the JSON data from the file 'ScottMorrisonMP.json'. It converts this data into a pandas DataFrame, then finds the five tweets with the most retweets and sorts these by the number of likes. It then selects only the columns of interest ('full_text', 'retweet_count', 'created_at', 'favorite_count') and saves this into a new DataFrame called 'top5'.

This problem has been solved

Similar Questions

Using the date column created in Question 3A, write code to analyse the yearly change (if any) of the number of Scott Morrison favorite_count, which is the number of people who liked each tweet.You should group your dataframe by date and calculate the median of the number of people who liked each tweet. Save this grouped dataframe as a new dataframe called "grouped".Plot the number of people who liked each tweet versus date.PLEASE LEAVE THE COMMENT ### edTest(test_q3b) ### in the code box below for auto-marking.HintsImport the required library/librariesRemember to properly label your axisYour plot should look like this:

Create a new column named 'date' in the data frame "tweet_df" (from Question 1), using "panda.to_datetime()" function with only the years extracted from the 'created_at' column.

Search for those tweets that contain either the word "COVID" or "pandemic". Save the output to a new dataframe called covid_tweets.Calculate the percentage (2 decimal places) of tweets that contain either the word "COVID" or "pandemic" and save this as "perc_covid". Use this value to create a sentence that says: "(perc_covid) % of tweets from Scott Morrison were about COVID or the pandemic". Save this sentence as a variable called answer.

Each day, Heather records the number of news articles she reads. Here are her results for the last nine days.7, 8, 3, 3, 2, 8, 7, 3, 3Find the range and the mode for the data.

Mode of the data: 15,23,22,26,13,21,28,22,15,19,22,13,19,28,2115,23,22,26,13,21,28,22,15,19,22,13,19,28,21

1/1

Upgrade your grade with Knowee

Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.