How does your dataset compare to one of the other datasets?

Datasets are essential for data analysis since datasets contain the observations and variables that you are interested in analyzing.

Polimetrics

About

Datasets are spreadsheets that contain rows and columns. The intersection of rows and columns creates cells. Numeric, alpha, and alphanumeric data can reside in these cells.

The image below is a screenshot of a Microsoft Excel spreadsheet, a very common software. There are four rows marked: 1, 2, 3, and 4; and there are four columns marked: A, B, C, and D. These 4 rows and 4 columns create 16 cells. Cells A1, B1, and C1 are populated with the following data: “123” (numeric), “abc” (alpha), and “123abc” (alphanumeric), respectively. Note that the remaining 13 cells are empty.

Figure 3‑1: Screenshot of Excel spreadsheet with 4 rows and 4 columns

Datasets are essential for data analysis since datasets contain the observations and variables that you are interested in analyzing.

Estimated Time

An estimated 90-120 minutes is needed to complete this activity.

Cross-Section dataset

Cross-section, or cross-sectional, datasets refers to a dataset that look at many objects in a single time period.

Observations can be persons, cities, states, countries, legislation, committees, schools, and so on. Variables are concepts that have at least two values. For example, the variable age can have values from 0 to 100+. Or the variable race can have the values African American, White, Hispanic, Asian American, and so on.

The data is cross-sectional because we are looking at many objects (notable persons) in a single time period (year 2020).

Figure 3‑2: Example of a cross-sectional dataset

Time Series dataset

Time series datasets refer to a dataset that looks at a single object over multiple time periods.

To illustrate a time series dataset, decided to focus on Cardi B, one of my notable persons from the cross-section datasets. In cells A1 through F1, we see six variables: name, gender, age, race, year, and singlerecords. The variable singlerecords refers to the number of single songs with Cardi B as lead artist (Cardi B discography – Wikipedia

Links to an external site.).

The data is time series because we are looking at one object (Cardi B) over multiple time periods (years 2017 to 2020). And in this case, our variables age, year, and singlerecords change for each row of data.

Figure 3‑3: Example of a time series dataset

Panel dataset

Panel datasets refer to a dataset that looks at multiple objects over multiple time periods.

To demonstrate a panel dataset,updated the time series dataset to include a second musical artist: Harry Styles. Again, in cells A1 through F1, we see six variables: name, gender, age, race, year, and singlerecords.

The data is panel because we are looking at multiple objects (Cardi B and Harry Styles) over multiple time periods (years 2017 to 2020). And again, our variables age, year, and singlerecords change for each row of data for each artist. For example, for year 2017, both Cardi B and Harry Styles (Harry Styles discography – Wikipedia

Links to an external site.) had 3 single records. But in year 2019, Cardi B had 3 compared to Harry’s 2 singles.

Figure 3‑4: Example of a panel dataset

Mini-Assignment 1: Instructions

Step 1: Select 1 dataset type that interests you.

Your dataset choices are:

Cross-section

Time series

Panel

Step 2: In 4 or more sentences, explain why you selected this dataset type.

To help write your explanation, consider the following questions:

What is one strength of the dataset you selected?

What is one weakness of the dataset your selected?

How does your dataset compare to one of the other datasets?