Complete any relevant data preparation tasks demonstrated in the textbook chapters and in class

Azure ML studio and tableau

For the first one download the below Tableau workbook file and complete the instructions within using the embedded data (no dataset upload required). Send completed Tableau workbook.

For the second assignment create a basic machine learning modeling experiment in Azure ML Studio to predict one of the labels

Potential features:

Gender: of the person who posted the Tweet

Country or State: of the location where the Tweet originated from

Weekday, Day, Hour: of the date it was tweeted

Klout: a score representing how “popular” or “important” the person is who posted the tweet

Sentiment: a score representing the tone of the tweet text

Reach: how many people had viewed the tweet at the time the data was collected

IsReshare: whether or not the tweet was a reshared of another tweet

RetweetCount: the number of “Retweets” other users had given the tweet

Likes: the number of “Likes” other users had given the tweet

Lang: the language that the tweet was written in

Candidate labels: Each of these features might represent the popularity or impact of a tweet. However, you can only use one. Your goal is to select a label that is 1) as meaningful as possible, and 2) as easy to predict with strong accuracy and fit metrics as possible. However, you’ll find that those objects can conflict with each other at times: more of one may mean less of the other. Choose carefully.

Reach

IsReshare

RetweetCount

Likes

Requirements:

Build an experiment in Azure ML Studio to predict one of the candidate labels listed above or some derived version of those labels.

Follow the pattern and techniques learned in this module to select columns, split the data into a training and testing set, and then train, score, and evaluate the model

You should select/include any feature that you think should logically explain or predict your label.

Use linear regression to train the model.

However, you will learn later that there are other algorithms available that are better suited to count-based data like RetweetCount, Likes, and Reach. But don’t worry about that for now.

Complete any relevant data preparation tasks demonstrated in the textbook chapters and in class (minimum 3 types).