%%{init: {"themeVariables": {"fontSize": "24px"}}}%%
flowchart LR
G[Goals] ==> P[Problem]
P ==> Q[Question]
Q ==> Da[Data]
Da ==> M[Model]
M ==> R[Result]
R ==> D[Decision]
%%{init: {"themeVariables": {"fontSize": "24px"}}}%%
flowchart LR
G[Goals] ==> P[Problem]
P ==> Q[Question]
Q ==> Da[Data]
Da ==> M[Model]
M ==> R[Result]
R ==> D[Decision]
What is the USDA’s goal?
Problem?
Question?
You have county-year yield data, but weather data are monthly. What should you do before merging?
A. Drop the weather data because it is higher frequency.
B. Aggregate weather to the county-year level to match the unit of observation.
C. Duplicate yield rows to match each month.
D. Convert yield to monthly values by dividing by 12.
Which handling of missing values preserves information about missingness itself?
A. Dropping all rows with any missing values. B. Replacing missing values with zero. C. Creating a missingness indicator or category. D. Replacing missing values with the mean.
What variables make up the compound key in our corn yield and drought example in lab?
A. state and year
B. county and year
C. year only
D. state only

left_join(): keep all rows in left tableinner_join(): keep only matched rowsright_join()/full_join(): mostly for debugging| state | year | yield |
|---|---|---|
| A | 2020 | 120 |
| B | 2020 | 95 |
| C | 2020 | 105 |
| state | year | drought |
|---|---|---|
| A | 2020 | 0.8 |
| D | 2020 | 0.6 |
What would an inner join create?
| state | year | yield | drought |
|---|---|---|---|
| A | 2020 | 120 | 0.8 |
Only state A appears because it’s the only match.
| state | year | yield |
|---|---|---|
| A | 2020 | 120 |
| B | 2020 | 95 |
| C | 2020 | 105 |
| state | year | drought |
|---|---|---|
| A | 2020 | 0.8 |
| D | 2020 | 0.6 |
What would an left join create?
| state | year | yield | drought |
|---|---|---|---|
| A | 2020 | 120 | 0.8 |
| B | 2020 | 95 | NA |
| C | 2020 | 105 | NA |
All yield with drought data for matches
| state | year | yield |
|---|---|---|
| A | 2020 | 120 |
| B | 2020 | 95 |
| C | 2020 | 105 |
| state | year | drought |
|---|---|---|
| A | 2020 | 0.8 |
| D | 2020 | 0.6 |
What would an full join create?
| state | year | yield | drought |
|---|---|---|---|
| A | 2020 | 120 | 0.8 |
| B | 2020 | 95 | NA |
| C | 2020 | 105 | NA |
| D | 2020 | NA | 0.6 |
All data from both with drought data for matches
left_join() preserves your target sample; inner_join() restricts to matches; full_join() shows all records.%%{init: {"themeVariables": {"fontSize": "24px"}}}%%
flowchart LR
G[Goals] ==> P[Problem]
P ==> Q[Question]
Q ==> Da[Data]
Da ==> M[Model]
M ==> R[Result]
R ==> D[Decision]