Suppose that you are employed as a data mining consultant for an Internet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied. Discuss whether or not each of the following activities is a data mining task. Dividing the customers of a company according to their gender. Dividing the customers of a company according to their profitability. Computing the total sales of a company. Sorting a student database based on student identification numbers. Predicting the outcomes of tossing a (fair) pair of dice. Predicting the future stock price of a company using historical records. Monitoring the heart rate of a patient for abnormalities. Monitoring seismic waves for earthquake activities. Extracting the frequencies of a sound wave. For each of the following data sets, explain whether or not data privacy is an important issue. Census data collected from 1900-1950. IP addresses and visit times of Web users who visit your Website. Images from Earth-orbiting satellites. Names and addresses of people from the telephone book. Names and email addresses collected from the Web.
Question
Suppose that you are employed as a data mining consultant for an Internet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied. Discuss whether or not each of the following activities is a data mining task. Dividing the customers of a company according to their gender. Dividing the customers of a company according to their profitability. Computing the total sales of a company. Sorting a student database based on student identification numbers. Predicting the outcomes of tossing a (fair) pair of dice. Predicting the future stock price of a company using historical records. Monitoring the heart rate of a patient for abnormalities. Monitoring seismic waves for earthquake activities. Extracting the frequencies of a sound wave. For each of the following data sets, explain whether or not data privacy is an important issue. Census data collected from 1900-1950. IP addresses and visit times of Web users who visit your Website. Images from Earth-orbiting satellites. Names and addresses of people from the telephone book. Names and email addresses collected from the Web.
Solution
Data mining can help an Internet search engine company in several ways:
-
Clustering: This technique can be used to group similar data together. For example, the company can use clustering to group similar search queries together. This can help the company understand the types of queries that are being made, which can then be used to improve the search engine's performance.
-
Classification: This technique can be used to predict the category of a given data. For example, the company can use classification to predict the type of content a user is likely to click on based on their search query. This can help the company provide more relevant search results.
-
Association Rule Mining: This technique can be used to discover interesting relations between different variables. For example, the company can use association rule mining to discover that users who search for 'X' also tend to search for 'Y'. This can help the company provide better search suggestions.
-
Anomaly Detection: This technique can be used to identify unusual data. For example, the company can use anomaly detection to identify unusual search queries that could be indicative of spam or malicious activity.
Now, let's discuss whether the following activities are data mining tasks:
-
Dividing the customers of a company according to their gender: This is not a data mining task. This is a simple categorization based on a known attribute.
-
Dividing the customers of a company according to their profitability: This could be a data mining task if the profitability is not directly known and needs to be inferred from other data.
-
Computing the total sales of a company: This is not a data mining task. This is a simple aggregation of data.
-
Sorting a student database based on student identification numbers: This is not a data mining task. This is a simple sorting operation.
-
Predicting the outcomes of tossing a (fair) pair of dice: This is not a data mining task. The outcome of a fair dice toss is random and cannot be predicted.
-
Predicting the future stock price of a company using historical records: This is a data mining task. It involves using historical data to make predictions about future events.
-
Monitoring the heart rate of a patient for abnormalities: This could be a data mining task if the definition of 'abnormal' is not directly known and needs to be inferred from other data.
-
Monitoring seismic waves for earthquake activities: This is a data mining task. It involves analyzing complex data to identify patterns that could indicate an earthquake.
-
Extracting the frequencies of a sound wave: This is not a data mining task. This is a signal processing task.
Finally, let's discuss whether data privacy is an important issue for the following data sets:
-
Census data collected from 1900-1950: Data privacy is not a major concern here as the data is old and most of the individuals are likely deceased.
-
IP addresses and visit times of Web users who visit your Website: Data privacy is a major concern here as this data can be used to track individual users.
-
Images from Earth-orbiting satellites: Data privacy is not a major concern here as the images are not likely to contain personally identifiable information.
-
Names and addresses of people from the telephone book: Data privacy is a major concern here as this data contains personally identifiable information.
-
Names and email addresses collected from the Web: Data privacy is a major concern here as this data contains personally identifiable information.
Similar Questions
A goal of data mining includes which of the following?. To explain some observed event or condition To confirm that data exists To analyze data for expected relationships To create a new data warehouse
Data mining is a process of extracting valid, previously unknown, and ultimately comprehensible information from large datasets and using it for organizational decision making [10]. However, there a lot of problems exist in mining data in large datasets such as data redundancy, the value of attributes is not specific, data is not complete and outlier [13].Outlier is defined as an observation that deviates too much from other observations that it arouses suspicions that it was generated by a different mechanism from other observations [21]. The identification of outliers can provide useful, sufficient and meaningful knowledge and number of applications in areas such as climatology, ecology public health, transportation, and location based services. Recently, a few studies have been conducted on outlier detection for large dataset [4]. However, most existing study concentrate on the algorithm based on special background, compared with outlier identification approach is comparatively less. This paper mainly discusses about outlier detection approaches from data mining perspective. The inherent idea is to research and compare achieving mechanism of those approaches to determine which approach is better based on special dataset and different background. if nothing seems compatible or relevant just tell me
major issues in data mining
Definition of Data Mining
What is the primary goal of data mining?Select one:a.To extract and compile data from multiple sourcesb.To find hidden patterns and relationships in datac.To organize and structure datad.To create charts and graphs to visualize data
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.