HW2

HW2-Data wrangling

Rhowena Vespa
12/22/2021

Homework 2 using data set from Chicago Public Schools

Objective: Identify Chicago Public Schools with highest safety scores and college enrollments.

Tasks:(1) Read data set into R (2) Explain variables in data set (3) Perform 2 data wrangling operations

Confirm we have the following packages: distill, dplyr, readr

  1. Read CSV file into R
library(distill)
library(dplyr)
library(readr)
HW2<- read.csv('ChicagoPublicSchools.csv',TRUE,',')
class(HW2)
[1] "data.frame"
colnames(HW2)
 [1] "School_ID"                                       
 [2] "NAME_OF_SCHOOL"                                  
 [3] "Elementary..Middle..or.High.School"              
 [4] "Street_Address"                                  
 [5] "City"                                            
 [6] "State"                                           
 [7] "ZIP_Code"                                        
 [8] "Phone_Number"                                    
 [9] "Link"                                            
[10] "Network_Manager"                                 
[11] "Collaborative_Name"                              
[12] "Adequate_Yearly_Progress_Made_"                  
[13] "Track_Schedule"                                  
[14] "CPS_Performance_Policy_Status"                   
[15] "CPS_Performance_Policy_Level"                    
[16] "HEALTHY_SCHOOL_CERTIFIED"                        
[17] "Safety_Icon"                                     
[18] "SAFETY_SCORE"                                    
[19] "Family_Involvement_Icon"                         
[20] "Family_Involvement_Score"                        
[21] "Environment_Icon"                                
[22] "Environment_Score"                               
[23] "Instruction_Icon"                                
[24] "Instruction_Score"                               
[25] "Leaders_Icon"                                    
[26] "Leaders_Score"                                   
[27] "Teachers_Icon"                                   
[28] "Teachers_Score"                                  
[29] "Parent_Engagement_Icon"                          
[30] "Parent_Engagement_Score"                         
[31] "Parent_Environment_Icon"                         
[32] "Parent_Environment_Score"                        
[33] "AVERAGE_STUDENT_ATTENDANCE"                      
[34] "Rate_of_Misconducts__per_100_students_"          
[35] "Average_Teacher_Attendance"                      
[36] "Individualized_Education_Program_Compliance_Rate"
[37] "Pk_2_Literacy__"                                 
[38] "Pk_2_Math__"                                     
[39] "Gr3_5_Grade_Level_Math__"                        
[40] "Gr3_5_Grade_Level_Read__"                        
[41] "Gr3_5_Keep_Pace_Read__"                          
[42] "Gr3_5_Keep_Pace_Math__"                          
[43] "Gr6_8_Grade_Level_Math__"                        
[44] "Gr6_8_Grade_Level_Read__"                        
[45] "Gr6_8_Keep_Pace_Math_"                           
[46] "Gr6_8_Keep_Pace_Read__"                          
[47] "Gr_8_Explore_Math__"                             
[48] "Gr_8_Explore_Read__"                             
[49] "ISAT_Exceeding_Math__"                           
[50] "ISAT_Exceeding_Reading__"                        
[51] "ISAT_Value_Add_Math"                             
[52] "ISAT_Value_Add_Read"                             
[53] "ISAT_Value_Add_Color_Math"                       
[54] "ISAT_Value_Add_Color_Read"                       
[55] "Students_Taking__Algebra__"                      
[56] "Students_Passing__Algebra__"                     
[57] "X9th.Grade.EXPLORE..2009."                       
[58] "X9th.Grade.EXPLORE..2010."                       
[59] "X10th.Grade.PLAN..2009."                         
[60] "X10th.Grade.PLAN..2010."                         
[61] "Net_Change_EXPLORE_and_PLAN"                     
[62] "X11th.Grade.Average.ACT..2011."                  
[63] "Net_Change_PLAN_and_ACT"                         
[64] "College_Eligibility__"                           
[65] "Graduation_Rate__"                               
[66] "College_Enrollment_Rate__"                       
[67] "COLLEGE_ENROLLMENT"                              
[68] "General_Services_Route"                          
[69] "Freshman_on_Track_Rate__"                        
[70] "X_COORDINATE"                                    
[71] "Y_COORDINATE"                                    
[72] "Latitude"                                        
[73] "Longitude"                                       
[74] "COMMUNITY_AREA_NUMBER"                           
[75] "COMMUNITY_AREA_NAME"                             
[76] "Ward"                                            
[77] "Police_District"                                 
[78] "Location"                                        
dim(HW2)
[1] 566  78
  1. Explain variables in data set

Dataset contains numeric, integer and character data types.

class("NAME_OF_SCHOOL")
[1] "character"
typeof("SAFETY_SCORE")
[1] "character"
class("COLLEGE_ENROLLMENT")
[1] "character"
class("Graduation_Rate__")
[1] "character"
  1. Data Wrangling Operations

Data set dimension is 566 x 78. We will only show select columns of interest.

TASK: Identify Schools with highest safety scores and highest college enrollment

  filter(select(HW2, NAME_OF_SCHOOL,SAFETY_SCORE),SAFETY_SCORE>95)
                                     NAME_OF_SCHOOL SAFETY_SCORE
1                 Abraham Lincoln Elementary School           99
2           Alexander Graham Bell Elementary School           99
3      Annie Keller Elementary Gifted Magnet School           99
4               Augustus H Burley Elementary School           99
5       Edgar Allan Poe Elementary Classical School           99
6                       Edgebrook Elementary School           99
7                  Ellen Mitchell Elementary School           99
8        James E McDade Elementary Classical School           99
9                  James G Blaine Elementary School           99
10              LaSalle Elementary Language Academy           99
11 Mary E Courtenay Elementary Language Arts Center           99
12        Northside College Preparatory High School           99
13            Northside Learning Center High School           99
14                   Norwood Park Elementary School           99
15                    Oriole Park Elementary School           99
16                      Sauganash Elementary School           99
17      Stephen Decatur Classical Elementary School           99
18                         Talman Elementary School           99
19    Walter Payton College Preparatory High School           98
20                       Wildwood Elementary School           99
  arrange(select(HW2, NAME_OF_SCHOOL,COLLEGE_ENROLLMENT),desc(COLLEGE_ENROLLMENT)) %>%
  slice(1:10)
                                    NAME_OF_SCHOOL COLLEGE_ENROLLMENT
1              Albert G Lane Technical High School               4368
2  Marie Sklodowska Curie Metropolitan High School               3320
3                  William Howard Taft High School               2922
4                         Thomas Kelly High School               2883
5                          Carl Schurz High School               2366
6                         Lincoln Park High School               2342
7               Whitney M Young Magnet High School               2166
8  Charles P Steinmetz Academic Centre High School               1890
9                      Kenwood Academy High School               1852
10                 Sidney Sawyer Elementary School               1846

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Vespa (2021, Dec. 28). Data Analytics and Computational Social Science: HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/hw2/

BibTeX citation

@misc{vespa2021hw2,
  author = {Vespa, Rhowena},
  title = {Data Analytics and Computational Social Science: HW2},
  url = {https://github.com/DACSS/dacss_course_website/posts/hw2/},
  year = {2021}
}