2 Data Quality
The U.S. Department of Education released their own reports regarding the important properties and limitations of the data along with a data quality segment, delving into detail regarding consistency and completeness assessment of reporting by institutions.
All blocked quotes in this chapter are directly obtained from one of the two technical documentation reports released by the U.S. Department of Education.
Believability
The data is gathered by the U.S. Department of Education, therefore there is no reason to presume intentional falsity in the distributed data. However, credibility of the information is also relient on institution transparency and accurate algorithmic matching. Earnings-related information is one such example of algorithmically obtained/matched information.
>NSLDS is the Department’s central database for monitoring Title IV federal student aid…provides administrative data from which loan debt data elements and cohorts for earnings can be constructed.
Scorecard linked NSLDS records to administrative tax records maintained by the IRS within the Department of the Treasury.
Completeness
##
## Percentage of Values for costt4_a
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 22.94713 77.05287
##
## Percentage of Values for costt4_p
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 1.308124 98.69188
##
## Percentage of Values for tuitionfee_in
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 39.73838 60.26162
##
## Percentage of Values for tuitionfee_out
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 39.73838 60.26162
##
## Percentage of Values for mn_earn_wne_p10
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 11.4983 88.5017
##
## Percentage of Values for mn_earn_wne_p6
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 15.02144 84.97856
##
## Percentage of Values for mn_earn_wne_p8
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 13.41101 86.58899
##
## Percentage of Values for sd_earn_wne_p10
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 11.4983 88.5017
##
## Percentage of Values for sd_earn_wne_p6
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 15.02144 84.97856
##
## Percentage of Values for sd_earn_wne_p8
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 13.41101 86.58899
##
## Percentage of Values for md_earn_wne_p10
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 11.4983 88.5017
##
## Percentage of Values for md_earn_wne_p6
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 15.02144 84.97856
##
## Percentage of Values for md_earn_wne_p8
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 13.41101 86.58899
##
## Percentage of Values for pct10_earn_wne_p10
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 3.891393 96.10861
##
## Percentage of Values for pct25_earn_wne_p10
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 9.096405 90.90359
##
## Percentage of Values for pct75_earn_wne_p10
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 9.096405 90.90359
##
## Percentage of Values for pct90_earn_wne_p10
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 3.891393 96.10861
##
## Percentage of Values for pct10_earn_wne_p6
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 6.667033 93.33297
##
## Percentage of Values for pct25_earn_wne_p6
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 11.938 88.062
##
## Percentage of Values for pct75_earn_wne_p6
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 11.938 88.062
##
## Percentage of Values for pct90_earn_wne_p6
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 6.667033 93.33297
##
## Percentage of Values for pct10_earn_wne_p8
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 5.29845 94.70155
##
## Percentage of Values for pct25_earn_wne_p8
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 10.75629 89.24371
##
## Percentage of Values for pct75_earn_wne_p8
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 10.75629 89.24371
##
## Percentage of Values for pct90_earn_wne_p8
## Column Type: Numeric
## Accepted Value Null/Infinite Value
## [1,] 5.29845 94.70155
##
## Percentage of Values for year
## Column Type: Numeric
## No NA Values Found
## Accepted Value Null/Infinite Value
## [1,] 100 0
To test completeness we checked the number of NA values contained in each column and present the number of values that can be pushed into statistical analysis as percentages of the total entries. The general test checks for boolean, numeric and character/factor column types, with the majority of variables we chose being numeric in nature (the character columns, which we subset out, are observation identifiers). Many of the variables related to earnings had roughly \(%85-90\) of the values as N/A’s whereas there was more usable data in variables pertaining to cost/tuition, with roughly \(%60-70\) NA values.
Consistent Representation
There is inconsistencies regarding which monetary amounts are numeric and which are inflation-adjusted. For example, tuition/cost are raw values representative of the year they were documented in whereas earning values are presented in inflation-adjusted dollars. More inconsistency lies in the reference year for inflation adjustment.
Earnings included in the 2011_12 and prior Scorecard data files are inflation adjusted to 2014 dollars using the Consumer Price Index for all Urban Consumers (CPI-U). Beginning with the 2012_13 Scorecard data file, earnings included in the XXXX_YY data file are inflation adjusted to XXXX+3 dollars using the CPI- U. For example, earnings included in the 2014_15 Scorecard data file are inflation adjusted to 2017 dollars.
The reference year for inflation adjustment is one year following the latest measurement year for the cohort (e.g., earnings measured in calendar years 2016 and 2017 were adjusted to 2018 dollars)
Beginning with the 2018_19 data files…earnings inflation adjusted to correspond to the closing year of the data file they are included in (e.g, adjusted to 2019 dollars if included in the 2018_19 data file)
Objectivity
According to the technical documentation, the earnings data is only tracked for Title IV (federal financial aid) receiving students. Therefore, the data may not be representative of institutions or fields of study with a low proportion of Title IV-eligible students. Furthermore, not all income for Title IV-receiving students is accounted for.
>Earnings are defined as the sum of wages and deferred compensation40 from all W-2 forms received for each individual, plus self-employment earnings from Schedule SE.
Timeliness
New data is released annually. Updates of older data also occurs once newer information becomes available (i.e. retrospectively populating columns regarding metrics that were not tracked at the time of the initial data release).