HW2-Data wrangling
Objective: Identify Chicago Public Schools with highest safety scores and college enrollments.
Tasks:(1) Read data set into R (2) Explain variables in data set (3) Perform 2 data wrangling operations
Confirm we have the following packages: distill, dplyr, readr
library(distill)
library(dplyr)
library(readr)
HW2<- read.csv('ChicagoPublicSchools.csv',TRUE,',')
class(HW2)
[1] "data.frame"
colnames(HW2)
[1] "School_ID"
[2] "NAME_OF_SCHOOL"
[3] "Elementary..Middle..or.High.School"
[4] "Street_Address"
[5] "City"
[6] "State"
[7] "ZIP_Code"
[8] "Phone_Number"
[9] "Link"
[10] "Network_Manager"
[11] "Collaborative_Name"
[12] "Adequate_Yearly_Progress_Made_"
[13] "Track_Schedule"
[14] "CPS_Performance_Policy_Status"
[15] "CPS_Performance_Policy_Level"
[16] "HEALTHY_SCHOOL_CERTIFIED"
[17] "Safety_Icon"
[18] "SAFETY_SCORE"
[19] "Family_Involvement_Icon"
[20] "Family_Involvement_Score"
[21] "Environment_Icon"
[22] "Environment_Score"
[23] "Instruction_Icon"
[24] "Instruction_Score"
[25] "Leaders_Icon"
[26] "Leaders_Score"
[27] "Teachers_Icon"
[28] "Teachers_Score"
[29] "Parent_Engagement_Icon"
[30] "Parent_Engagement_Score"
[31] "Parent_Environment_Icon"
[32] "Parent_Environment_Score"
[33] "AVERAGE_STUDENT_ATTENDANCE"
[34] "Rate_of_Misconducts__per_100_students_"
[35] "Average_Teacher_Attendance"
[36] "Individualized_Education_Program_Compliance_Rate"
[37] "Pk_2_Literacy__"
[38] "Pk_2_Math__"
[39] "Gr3_5_Grade_Level_Math__"
[40] "Gr3_5_Grade_Level_Read__"
[41] "Gr3_5_Keep_Pace_Read__"
[42] "Gr3_5_Keep_Pace_Math__"
[43] "Gr6_8_Grade_Level_Math__"
[44] "Gr6_8_Grade_Level_Read__"
[45] "Gr6_8_Keep_Pace_Math_"
[46] "Gr6_8_Keep_Pace_Read__"
[47] "Gr_8_Explore_Math__"
[48] "Gr_8_Explore_Read__"
[49] "ISAT_Exceeding_Math__"
[50] "ISAT_Exceeding_Reading__"
[51] "ISAT_Value_Add_Math"
[52] "ISAT_Value_Add_Read"
[53] "ISAT_Value_Add_Color_Math"
[54] "ISAT_Value_Add_Color_Read"
[55] "Students_Taking__Algebra__"
[56] "Students_Passing__Algebra__"
[57] "X9th.Grade.EXPLORE..2009."
[58] "X9th.Grade.EXPLORE..2010."
[59] "X10th.Grade.PLAN..2009."
[60] "X10th.Grade.PLAN..2010."
[61] "Net_Change_EXPLORE_and_PLAN"
[62] "X11th.Grade.Average.ACT..2011."
[63] "Net_Change_PLAN_and_ACT"
[64] "College_Eligibility__"
[65] "Graduation_Rate__"
[66] "College_Enrollment_Rate__"
[67] "COLLEGE_ENROLLMENT"
[68] "General_Services_Route"
[69] "Freshman_on_Track_Rate__"
[70] "X_COORDINATE"
[71] "Y_COORDINATE"
[72] "Latitude"
[73] "Longitude"
[74] "COMMUNITY_AREA_NUMBER"
[75] "COMMUNITY_AREA_NAME"
[76] "Ward"
[77] "Police_District"
[78] "Location"
dim(HW2)
[1] 566 78
Dataset contains numeric, integer and character data types.
class("NAME_OF_SCHOOL")
[1] "character"
typeof("SAFETY_SCORE")
[1] "character"
class("COLLEGE_ENROLLMENT")
[1] "character"
class("Graduation_Rate__")
[1] "character"
Data set dimension is 566 x 78. We will only show select columns of interest.
TASK: Identify Schools with highest safety scores and highest college enrollment
NAME_OF_SCHOOL SAFETY_SCORE
1 Abraham Lincoln Elementary School 99
2 Alexander Graham Bell Elementary School 99
3 Annie Keller Elementary Gifted Magnet School 99
4 Augustus H Burley Elementary School 99
5 Edgar Allan Poe Elementary Classical School 99
6 Edgebrook Elementary School 99
7 Ellen Mitchell Elementary School 99
8 James E McDade Elementary Classical School 99
9 James G Blaine Elementary School 99
10 LaSalle Elementary Language Academy 99
11 Mary E Courtenay Elementary Language Arts Center 99
12 Northside College Preparatory High School 99
13 Northside Learning Center High School 99
14 Norwood Park Elementary School 99
15 Oriole Park Elementary School 99
16 Sauganash Elementary School 99
17 Stephen Decatur Classical Elementary School 99
18 Talman Elementary School 99
19 Walter Payton College Preparatory High School 98
20 Wildwood Elementary School 99
NAME_OF_SCHOOL COLLEGE_ENROLLMENT
1 Albert G Lane Technical High School 4368
2 Marie Sklodowska Curie Metropolitan High School 3320
3 William Howard Taft High School 2922
4 Thomas Kelly High School 2883
5 Carl Schurz High School 2366
6 Lincoln Park High School 2342
7 Whitney M Young Magnet High School 2166
8 Charles P Steinmetz Academic Centre High School 1890
9 Kenwood Academy High School 1852
10 Sidney Sawyer Elementary School 1846
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Vespa (2021, Dec. 28). Data Analytics and Computational Social Science: HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/hw2/
BibTeX citation
@misc{vespa2021hw2, author = {Vespa, Rhowena}, title = {Data Analytics and Computational Social Science: HW2}, url = {https://github.com/DACSS/dacss_course_website/posts/hw2/}, year = {2021} }