Knowee
Questions
Features
Study Tools

Write Python code to count the frequency of hashtags in a twitter feed.Your code assumes a twitter feed variable tweets exists, which is a list of strings containing tweets. Each element of this list is a single tweet, stored as a string. For example, tweets may look like:tweets = ["Happy #IlliniFriday!", "It is a pretty campus, isn't it, #illini?", "Diving into the last weekend of winter break like... #ILLINI #JoinTheFight", "Are you wearing your Orange and Blue today, #Illini Nation?"]Your code should produce a sorted list of tuples stored in hashtag_counts, where each tuple looks like (hashtag, count), hashtag is a string and count is an integer. The list should be sorted by count in descending order, and if there are hashtags with identical counts, these should be sorted alphabetically, in ascending order, by hashtag.From the above example, our unsorted hashtag_counts might look like:[('#illini', 2), ('#jointhefight', 1), ('#illinifriday!', 1), ('#illini?', 1)]The hashtag_counts sorted by the above specifications will look like:[('#illini', 2), ('#illini?', 1), ('#illinifriday!', 1), ('#jointhefight', 1)]You may use str.split() to split each tweet into a list of words. A hashtag is any word that starts with a hash mark (#). (That means that the hash mark # should be included in the hashtag value above.)Steps/Hints:Preprocessing: You will need to convert each hashtag to lower case before you count it. For example, for this question #UIUC and #Uiuc add to the count of same hashtag (#uiuc).Do not further process the tweets or hashtags beyond using .split(), such as attempting to remove punctuation. While in the 'real world' you would absolutely do this, in this problem the autograder will be unhappy with you if you do.And if using .split(), do not pass any arguments (when no arguments are added then every kind of whitespace will be considered).You may find it helpful to use an intermediate data structure for this problem to count the frequency of each hastag.If you aren't sure how to sort or convert to lowercase, you may find Python docs how to sort and Python docs for string methods useful.

Question

Write Python code to count the frequency of hashtags in a twitter feed.Your code assumes a twitter feed variable tweets exists, which is a list of strings containing tweets. Each element of this list is a single tweet, stored as a string. For example, tweets may look like:tweets = ["Happy #IlliniFriday!", "It is a pretty campus, isn't it, #illini?", "Diving into the last weekend of winter break like... #ILLINI #JoinTheFight", "Are you wearing your Orange and Blue today, #Illini Nation?"]Your code should produce a sorted list of tuples stored in hashtag_counts, where each tuple looks like (hashtag, count), hashtag is a string and count is an integer. The list should be sorted by count in descending order, and if there are hashtags with identical counts, these should be sorted alphabetically, in ascending order, by hashtag.From the above example, our unsorted hashtag_counts might look like:[('#illini', 2), ('#jointhefight', 1), ('#illinifriday!', 1), ('#illini?', 1)]The hashtag_counts sorted by the above specifications will look like:[('#illini', 2), ('#illini?', 1), ('#illinifriday!', 1), ('#jointhefight', 1)]You may use str.split() to split each tweet into a list of words. A hashtag is any word that starts with a hash mark (#). (That means that the hash mark # should be included in the hashtag value above.)Steps/Hints:Preprocessing: You will need to convert each hashtag to lower case before you count it. For example, for this question #UIUC and #Uiuc add to the count of same hashtag (#uiuc).Do not further process the tweets or hashtags beyond using .split(), such as attempting to remove punctuation. While in the 'real world' you would absolutely do this, in this problem the autograder will be unhappy with you if you do.And if using .split(), do not pass any arguments (when no arguments are added then every kind of whitespace will be considered).You may find it helpful to use an intermediate data structure for this problem to count the frequency of each hastag.If you aren't sure how to sort or convert to lowercase, you may find Python docs how to sort and Python docs for string methods useful.

...expand
🧐 Not the exact question you are looking for?Go ask a question

Solution

Here is a step-by-step solution to your problem:

  1. First, we need to initialize an empty dictionary to store the hashtags and their counts.

  2. Then, we iterate over each tweet in the list of tweets.

  3. For each tweet, we split it into words using the split() method.

  4. We then iterate over each word in the list of words.

  5. If a word starts with a '#', we convert it to lowercase and add it to the dictionary. If the hashtag is already in the dictionary, we increment its count. If it's not, we add it to the dictionary with a count of 1.

  6. After counting all the hashtags, we convert the dictionary to a list of tuples, where each tuple contains a hashtag and its count.

  7. Finally, we sort the list of tuples first by count in descending order, and then by hashtag in ascending order.

Here is the Python code that implements these steps:

# Initialize an empty dictionary
hashtag_counts = {}

# Iterate over each tweet
for tweet in tweets:
    # Split the tweet into words
    words = tweet.split()
    # Iterate over each word
    for word in words:
        # If the word is a hashtag
        if word.startswith('#'):
            # Convert the hashtag to lowercase
            hashtag = word.lower()
            # If the hashtag is in the dictionary, increment its count
            if hashtag in hashtag_counts:
                hashtag_counts[hashtag] += 1
            # If the hashtag is not in the dictionary, add it with a count of 1
            else:
                hashtag_counts[hashtag] = 1

# Convert the dictionary to a list of tuples
hashtag_counts = list(hashtag_counts.items())

# Sort the list of tuples by count in descending order and then by hashtag in ascending order
hashtag_counts.sort(key=lambda x: (-x[1], x[0]))

This code will give you the sorted list of tuples stored in hashtag_counts, where each tuple contains a hashtag and its count.

This problem has been solved

Similar Questions

In the first line of input, accept a sequence of space-separated words. In the second line of input, accept a single word. If this word is not present in the sequence, print NO. If this word is present in the sequence, then print YES and in the next line of the output, print the number of times the word appears in the sequence.words = input().split('')test = input()if test not in words:    Print('No')else:    Print('YES')    count = 0    for word in words:        if test == word:            count -= 1                    Print(count)Sample Test CasesDownload All Test Case 1InputExpected OutputActual Outputa good collection of wordstheNOTest Case 2InputExpected OutputActual Outputno sentence can begin with because because because is a conjunctionbecauseYES3

Write a Python program to count the number of occurrences of a specific character in a string

Search for those tweets that contain either the word "COVID" or "pandemic". Save the output to a new dataframe called covid_tweets.Calculate the percentage (2 decimal places) of tweets that contain either the word "COVID" or "pandemic" and save this as "perc_covid". Use this value to create a sentence that says: "(perc_covid) % of tweets from Scott Morrison were about COVID or the pandemic". Save this sentence as a variable called answer.

Define the function count_substring(string, substring) that finds how many times the substring appears in the given string.Example:string = "Python is a powerful language. I want to learn Python"substring = "Python"Expected Output:2

Using the date column created in Question 3A, write code to analyse the yearly change (if any) of the number of Scott Morrison favorite_count, which is the number of people who liked each tweet.You should group your dataframe by date and calculate the median of the number of people who liked each tweet. Save this grouped dataframe as a new dataframe called "grouped".Plot the number of people who liked each tweet versus date.PLEASE LEAVE THE COMMENT ### edTest(test_q3b) ### in the code box below for auto-marking.HintsImport the required library/librariesRemember to properly label your axisYour plot should look like this:

1/1

Upgrade your grade with Knowee

Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.