Using the public school characteristic dataset collected in the 2017-2018 school year
Let’s get familiar with the data set by looking at the a few records:
# A tibble: 6 x 79
X Y OBJECTID NCESSCH NMCNTY SURVYEAR STABR LEAID ST_LEAID
<dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 -149. 61.6 1 0200510~ Matanus~ 2017-20~ AK 0200~ AK-33
2 -157. 71.3 2 0200610~ North S~ 2017-20~ AK 0200~ AK-36
3 -151. 60.5 3 0200390~ Kenai P~ 2017-20~ AK 0200~ AK-24
4 -151. 60.6 4 0200390~ Kenai P~ 2017-20~ AK 0200~ AK-24
5 -151. 60.6 5 0200390~ Kenai P~ 2017-20~ AK 0200~ AK-24
6 -133. 56.1 6 0200700~ Prince ~ 2017-20~ AK 0200~ AK-44
# ... with 70 more variables: LEA_NAME <chr>, SCH_NAME <chr>,
# LSTREET1 <chr>, LSTREET2 <chr>, LSTREET3 <lgl>, LCITY <chr>,
# LSTATE <chr>, LZIP <chr>, LZIP4 <chr>, PHONE <chr>, GSLO <chr>,
# GSHI <chr>, VIRTUAL <chr>, TOTFRL <dbl>, FRELCH <dbl>,
# REDLCH <dbl>, PK <dbl>, KG <dbl>, G01 <dbl>, G02 <dbl>,
# G03 <dbl>, G04 <dbl>, G05 <dbl>, G06 <dbl>, G07 <dbl>, G08 <dbl>,
# G09 <dbl>, G10 <dbl>, G11 <dbl>, G12 <dbl>, G13 <lgl>, ...
Hmm. That looks like a lot of data. How many rows and columns are here?
[1] 100729 79
Finally, the column names may help us understand the types of information collected.
[1] "X" "Y" "OBJECTID"
[4] "NCESSCH" "NMCNTY" "SURVYEAR"
[7] "STABR" "LEAID" "ST_LEAID"
[10] "LEA_NAME" "SCH_NAME" "LSTREET1"
[13] "LSTREET2" "LSTREET3" "LCITY"
[16] "LSTATE" "LZIP" "LZIP4"
[19] "PHONE" "GSLO" "GSHI"
[22] "VIRTUAL" "TOTFRL" "FRELCH"
[25] "REDLCH" "PK" "KG"
[28] "G01" "G02" "G03"
[31] "G04" "G05" "G06"
[34] "G07" "G08" "G09"
[37] "G10" "G11" "G12"
[40] "G13" "TOTAL" "MEMBER"
[43] "AM" "HI" "BL"
[46] "WH" "HP" "TR"
[49] "FTE" "LATCOD" "LONCOD"
[52] "ULOCALE" "STUTERATIO" "STITLEI"
[55] "AMALM" "AMALF" "ASALM"
[58] "ASALF" "HIALM" "HIALF"
[61] "BLALM" "BLALF" "WHALM"
[64] "WHALF" "HPALM" "HPALF"
[67] "TRALM" "TRALF" "TOTMENROL"
[70] "TOTFENROL" "STATUS" "UG"
[73] "AE" "SCHOOL_TYPE_TEXT" "SY_STATUS_TEXT"
[76] "SCHOOL_LEVEL" "AS" "CHARTER_TEXT"
[79] "MAGNET_TEXT"
Now that we are more familiar with the date set, the next blog post will start to wrangle the data for our eventual analysis.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kenison (2021, Sept. 16). DACSS 601 Fall 2021: HW#2 - Getting familiar with the US public schools data set. Retrieved from https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-15-examining-characteristics-of-public-schools/
BibTeX citation
@misc{kenison2021hw#2, author = {Kenison, Brittany}, title = {DACSS 601 Fall 2021: HW#2 - Getting familiar with the US public schools data set}, url = {https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-15-examining-characteristics-of-public-schools/}, year = {2021} }