-
Notifications
You must be signed in to change notification settings - Fork 0
Data Analysis 2014
For the 2014 Georgia SGP analyses, we are following an analyses work flow established in previous years that includes the following 7 steps:
- Update the Georgia meta-data elements in
SGPstateData. - Create annual SGP configurations for EOCT analyses as well as the associated norm group preferences included in
SGPstateData. - Create any baseline matrices and simex baseline coefficient matrices needed for new content areas sequences.
- Conduct CRCT SGP Analyses.
- Conduct EOCT SGP Analyses.
- Combine results into the master longitudinal data set, summarize results and output unformatted data.
- Export formatted data from Georgia_SGP object.
The use of higher level functions included in the SGP package (e.g. analyzeSGP) requires the availability of state assessment specific information. This meta-data is compiled in a R object named SGPstateData that is housed in the package. The required updates for the 2014 analyses included a) the additions of knots and boundaries, proficiency level cutscores, and conditional standard errors of measurement (CSEMs) for Analytic Geometry, b) adding a new variable to the Variable Name Lookup table, and c) addubg thge updated norm group preferences object.
Calculation of SGPs includes the use of cubic B-spline basis functions to more adequately model the heteroscedasticity and non-linearity found in assessment data. These functions require the selection of boundary and interior knots. Boundary knots are end points outside of the scale score distribution that anchor the B-spline basis. These are generally selected by extending the range of scale scores by 10%. That is, they are defined as lying 10% below the lowest obtainable (or observed) scale score (LOSS) and 10% above the highest obtainable scale score (HOSS). The interior knots are the internal breakpoints that define the spline. The default choice in the SGP package is to select the 20th, 40th, 60th and 80th quantiles of the observed scale score distribution.
In general the knots and boundaries are computed from a distribution comprised of several years of test data (i.e. multiple cohorts) so that any irregularities in a single year are smoothed out. Subsequent annual analyses use these same knots and boundaries as well. All defaults were used to compile the knots and boundaries for Georgia from the CRCT and EOCT tests in previous years, and were also used in 2014 to compute the Analytic Geometry knots and boundaries. New knots and boundaries will be required for Georgia Milestones assessments beginning in 2016 at which point they will be used as the dependent variables in the quantile regressions.
### Calculate Knots and Boundaries For ANALYTIC_GEOMETRY
createKnotsBoundaries(Georgia_SGP@Data[CONTENT_AREA=='ANALYTIC_GEOMETRY'])
The output from this was then manually added to the SGPstateData.R file and committed to Github. Note that these knots and boundaries were not used in the 2014 analyses becuase this was the first year in which the Analytic Geometry test was administered. Given that they were not used, and the fact that this calculation only a single year of data, it may be necessary to recalculate them next year and change them in SGPstateData if they values are quite different (small changes would likely not have a large impact on analyses going forward).
Cutscores, which are set by the GaDOE, are mainly required for student growth projections, which were not computed as part of the 2014 analyses.^[Projections were not included in the 2014 analyses due to the switch to Georgia Milestones Assessments in 2015. Student Growth Projections assume consistency in assessment programs, and would therefore would be nonsensical to project progress toward tests that will never be taken.] However, they will likely be used in future years and so were added at this point.
The calculation of SIMEX adjusted student growth percentiles requires the availability of the standard errors of measurement. The CSEM data for all other content areas had been compiled and added in previous years, but were required for Analytic Geometry in this initial year of testing. Raw CSEM data was provided by GaDOE in spreadsheets and the NCIEA compiled this into an appropriate R data object for inclusion in the SGPstateData file. See the Github change commit here. The base file "AGE CSEMs Spring 2014.txt" was read into R and then rbind was used to add it to the existing CSEM data base - "Georgia_CSEM.Rdata" and saved. That file is then automatically included when SGPstateData is compiled.
The SGPstateData file also includs a lookup table that allows various functions in the SGP package to translate between the naming conventions used within them and the variable names GaDOE uses. The 2014 data included an identifier for gifted and talented students, which will be used in result summarizations.
The process through which EOCT analyses are run can produce multiple SGPs for some students. In order to identify which quantity will be used as the students' "official" SGP and subsequently merged into the master longitudinal data set, a system of norm group preferencing is established and is encoded into a lookup table and included in the SGPstateData. In general, the preference is given to:
- Progressions with the greatest number of prior scale scores.
- Progressions in which a student has repeated a course.
- Progressions that do not include a skipped year (i.e. a gap in the scale score history).
- Progressions that are set up for block-schedule course taking patterns.
The next section descibes the process by which the course progressions are established and encoded, and how the norm group preference object is created. Here is the Github comit in which the object was included in SGPstateData.
Unlike CRCT analyses, EOCT analyses are specialized enough so that it is necessary to specify the analyses to be performed via a configuration. For several years, configurations have been employed to conduct EOCT SGP analyses for Georgia. The configurations associated with the 2014 annual EOCT SGP analyses are located in the Georgia Repo folder SGP_CONFIG. The configurations are broken up into four separate R scripts: ELA, MATHEMATICS, SCIENCE, and SOCIAL_STUDIES.
Each configuration specifies a set of parameters that defines the norm group of students to be examined. Every potential norm group is defined by, at a minimum, the progressions of content area, academic year and grade-level. Other parameters may also be defined. Each configuration used for the Georgia EOCT analyses contain these elements:
-
sgp.content.areas: A progression of values that specifies the content areas to be looked at and the order in which the courses were taken. -
sgp.panel.years: The progression of the years associated with the content area progression (sgp.content.areas) provided in the configuration, potentially allowing for skipped years, block schedules, etc. -
sgp.grade.sequences: The grade progression associated with the content area and year progressions provided in the configuration. 'EOCT' stands for 'End Of Course Test'. The use of the generic 'EOCT' allows for secondary students to be compared based on the pattern of course taking rather than being dependent upon grade-level/class-designation. -
sgp.panel.years.within: A vector of same length as the year progression (sgp.panel.years) indicating what observation is to be used for the individual student (when multiple observations exist within a single year). Typically the "Last" observation is used as the prior score (covariate) and the "First" observation is used as the current year score (outcome). -
sgp.exact.grade.progression: A Boolean argument (set to TRUE) indicating whether to run the EXACT configuration as written (rather than taking progressively restricted nested subsets of the configuration if FALSE). -
sgp.calculate.simex: A Boolean argument indicating whether cohort referenced SIMEX adjustment analyses should be run as part of the analysis for this configuration. Excluding the argument (or explicitly setting to NULL) has the same effect as setting it to FALSE. -
sgp.calculate.simex.baseline: A Boolean argument indicating whether baseline referenced SIMEX adjustment analyses should be run as part of the analysis for this configuration. Excluding the argument (or explicitly setting to NULL) has the same effect as setting it to FALSE. -
sgp.norm.group.preference: Because a student can be potentially analyzed by more than one configuration, this argument provides a ranking specifying which SGP is preferable for being matched with the student in thecombineSGPstep. Lower numbers correspond with higher preference.
Note that sgp.content.areas, sgp.panel.years, and sgp.grade.sequences elements are all character strings, and their values correspond to levels found in the CONTENT_AREA, YEAR, and GRADE variables in the Georgia_SGP@Data slot respectively. As an example, here is the Mathematics II configuration script used to defined the 2014 SGP analyses:
### Mathematics II
MATHEMATICS_II_2014.config <- list(
MATHEMATICS_II.2014 = list( #32
sgp.content.areas=c('MATHEMATICS_I', 'MATHEMATICS_II'),
sgp.panel.years=c('2012', '2014'),
sgp.grade.sequences=list(c('EOCT', 'EOCT')),
sgp.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.exact.grade.progression=TRUE,
sgp.calculate.simex=TRUE,
sgp.norm.group.preference=4),
MATHEMATICS_II.2014 = list( #33
sgp.content.areas=c('MATHEMATICS', 'MATHEMATICS_I', 'MATHEMATICS_II'),
sgp.panel.years=c('2011', '2012', '2014'),
sgp.grade.sequences=list(c('8', 'EOCT', 'EOCT')),
sgp.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.exact.grade.progression=TRUE,
sgp.calculate.simex=TRUE,
sgp.norm.group.preference=3),
MATHEMATICS_II.2014 = list( #34
sgp.content.areas=c('MATHEMATICS_I', 'MATHEMATICS_II'),
sgp.panel.years=c('2013', '2014'),
sgp.grade.sequences=list(c('EOCT', 'EOCT')),
sgp.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.exact.grade.progression=TRUE,
sgp.calculate.simex=TRUE,
sgp.norm.group.preference=2),
MATHEMATICS_II.2014 = list( #35
sgp.content.areas=c('MATHEMATICS_II', 'MATHEMATICS_II'),
sgp.panel.years=c('2013', '2014'),
sgp.grade.sequences=list(c('EOCT', 'EOCT')),
sgp.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.exact.grade.progression=TRUE,
sgp.calculate.simex=TRUE,
sgp.norm.group.preference=1)#,
# MATHEMATICS_II.2014 = list( #36 - Too few kids ( ~ 400 )
# sgp.content.areas=c('MATHEMATICS_II', 'MATHEMATICS_II'),
# sgp.panel.years=c('2014', '2014'),
# sgp.grade.sequences=list(c('EOCT', 'EOCT')),
# sgp.panel.years.within=c('FIRST_OBSERVATION', 'LAST_OBSERVATION'),
# sgp.exact.grade.progression=TRUE,
# sgp.norm.group.preference=0)
) ### END MATHEMATICS_II_2014.configConfigurations are R scripts that are sourced as part of the larger SGP analysis to be discussed later. In addition, the SGPstateData needs to be updated with the norm group preference embedded within the configurations. To do this, an Rdata object needs to be constructed that is embedded within SGPstateData (either manually or included in the package build itself). To create the Rdata object with the norm groups preferences utilize/source the R script configToSGPNormGroup.R in the SGP_CONFIG folder as follows:
source("configToSGPNormGroup.R")This creates the Rdata object GA_SGP_Norm_Group_Preference.Rdata) containing the norm group preferences (the GA_SGP_Norm_Group_Preference object is just a data.frame/data.table containing information about what the rank ordering of the configurations are in terms of preference).
The GA_SGP_Norm_Group_Preference can either be embedded into SGPstateData manually (see Step 4 below) or submitted to the SGP Package maintainers for inclusion in the package so that it is contained in SGPstateData when the package is loaded.
For the 2014 CRCT & EOCT Georgia will employ baseline referenced and SIMEX adjusted baseline referenced SGPs. For most grade and content area analyses the coefficient matrices required to produce these results were produced prior to 2014. The following script creates baseline matrices for content areas and cohorts with adequate data as well as baseline matrices for SIMEX adjusted SGPs.
####################################################################
###
### R Script to create Baseline and Simex Adjusted Baseline Matrices
### for 2014 Georgia Analyses
###
####################################################################
### Load SGP Package
require(SGP)
### Load Data
load("Data/Georgia_SGP.Rdata")
# Extract/save the existing Baseline Matrices from the object first in order to create additional matrices.
# Then remove all baseline matrices from the object.
Georgia_Baseline_Matrices <-
Georgia_SGP@SGP$Coefficient_Matrices[grep("BASELINE", names(Georgia_SGP@SGP$Coefficient_Matrices))]
Georgia_SGP@SGP$Coefficient_Matrices <-
Georgia_SGP@SGP$Coefficient_Matrices[-grep("BASELINE", names(Georgia_SGP@SGP$Coefficient_Matrices))]
### Construct baseline analysis configuration lists for each content area and
### GRADE_9_LIT
g9l.baseline.config <- list(
list( # 7,584 students #1
sgp.baseline.content.areas=c('ELA', 'READING', 'GRADE_9_LIT'),
sgp.baseline.panel.years=c('2007', '2008', '2009', '2010', '2011', '2012'),
sgp.baseline.grade.sequences=c(8,8, 'EOCT'),
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.baseline.grade.sequences.lags=c(0, 3)),
list( # 4,399 students #2
sgp.baseline.content.areas=c('ELA', 'READING', 'ELA', 'READING', 'GRADE_9_LIT'),
sgp.baseline.panel.years=c('2007', '2008', '2009', '2010', '2011', '2012'),
sgp.baseline.grade.sequences=c(7,7, 8,8, 'EOCT'),
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.baseline.grade.sequences.lags=c(0, 1, 0, 3)),
list( # 4,882 students #7
sgp.baseline.content.areas=c('ELA', 'READING', 'GRADE_9_LIT'),
sgp.baseline.panel.years=c('2007', '2008', '2009', '2010', '2011', '2012'),
sgp.baseline.grade.sequences=c(7,7, 'EOCT'),
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.baseline.grade.sequences.lags=c(0, 1)),
list( # 3,813 students #8
sgp.baseline.content.areas=c('ELA', 'READING', 'ELA', 'READING', 'GRADE_9_LIT'),
sgp.baseline.panel.years=c('2007', '2008', '2009', '2010', '2011', '2012'),
sgp.baseline.grade.sequences=c(6,6, 7,7, 'EOCT'),
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.baseline.grade.sequences.lags=c(0, 1, 0, 1))) # Continuous NO 8th grade ELA/Reading
GA_GRADE_9_LIT_Baseline_Matrices <- baselineSGP(
Georgia_SGP,
sgp.baseline.config=g9l.baseline.config,
sgp.percentiles.baseline.max.order=4, ## NOTE Change here
return.matrices.only=TRUE,
calculate.baseline.sgps=FALSE,
goodness.of.fit.print=FALSE,
parallel.config=list(
BACKEND="PARALLEL",
WORKERS=list(TAUS=20)))
## Loop to investigate the N size and other info
for (i in 1:length(GA_GRADE_9_LIT_Baseline_Matrices[[1]])) {
print(paste(GA_GRADE_9_LIT_Baseline_Matrices[[1]][[i]]@Version$Matrix_Information$N,
GA_GRADE_9_LIT_Baseline_Matrices[[1]][[i]]@Version$Date_Prepared,
GA_GRADE_9_LIT_Baseline_Matrices[[1]][[i]]@Grade_Progression,
GA_GRADE_9_LIT_Baseline_Matrices[[1]][[i]]@Content_Areas,
GA_GRADE_9_LIT_Baseline_Matrices[[1]][[i]]@Time_Lags[[1]], "\n", sep=", "))
}
### AMERICAN_LIT
aml.baseline.config <- list(
list( # 19,722 students #11
sgp.baseline.content.areas=c('GRADE_9_LIT', 'AMERICAN_LIT'),
sgp.baseline.panel.years=c('2007', '2008', '2009', '2010', '2011', '2012'),
sgp.baseline.grade.sequences=c('EOCT', 'EOCT'),
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.baseline.grade.sequences.lags=3), # skip 2 years
list( # 11,293 students #12
sgp.baseline.content.areas=c('ELA', 'READING', 'GRADE_9_LIT', 'AMERICAN_LIT'),
sgp.baseline.panel.years=c('2007', '2008', '2009', '2010', '2011', '2012'),
sgp.baseline.grade.sequences=c(8,8, 'EOCT', 'EOCT'),
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
sgp.baseline.grade.sequences.lags=c(0, 1, 3))) # skip 2 years
GA_AMERICAN_LIT_Baseline_Matrices <- baselineSGP(
Georgia_SGP,
sgp.baseline.config=aml.baseline.config,
sgp.percentiles.baseline.max.order=3, ## NOTE Change here
return.matrices.only=TRUE,
calculate.baseline.sgps=FALSE,
goodness.of.fit.print=FALSE,
parallel.config=list(
BACKEND="PARALLEL",
WORKERS=list(TAUS=20)))
for (i in 1:length(GA_AMERICAN_LIT_Baseline_Matrices[[1]])) {
print(paste(GA_AMERICAN_LIT_Baseline_Matrices[[1]][[i]]@Version$Matrix_Information$N,
GA_AMERICAN_LIT_Baseline_Matrices[[1]][[i]]@Version$Date_Prepared,
GA_AMERICAN_LIT_Baseline_Matrices[[1]][[i]]@Grade_Progression,
GA_AMERICAN_LIT_Baseline_Matrices[[1]][[i]]@Content_Areas,
GA_AMERICAN_LIT_Baseline_Matrices[[1]][[i]]@Time_Lags[[1]], "\n", sep=", "))
}
### US History
ush.baseline.config <- list(
list( # 11,507 students #62
sgp.baseline.content.areas=c('SOCIAL_STUDIES', 'US_HISTORY'),
sgp.baseline.panel.years=c('2008', '2009', '2010', '2011', '2012', '2013'),
sgp.baseline.grade.sequences=c('8', 'EOCT'),
sgp.baseline.grade.sequences.lags=4,
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION')),
list( # 15,155 students #67
sgp.baseline.content.areas=c('ECONOMICS', 'US_HISTORY'),
sgp.baseline.panel.years=c('2008', '2009', '2010', '2011', '2012', '2013'),
sgp.baseline.grade.sequences=c('EOCT', 'EOCT'),
sgp.baseline.grade.sequences.lags=1,
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION')),
# list( # 924 students #68 TOO FEW STUDENTS - NOT RUN / MATRICES NOT KEPT
# sgp.baseline.content.areas=c('SOCIAL_STUDIES', 'ECONOMICS', 'US_HISTORY'),
# sgp.baseline.panel.years=c('2008', '2009', '2010', '2011', '2012', '2013'),
# sgp.baseline.grade.sequences=c('8', 'EOCT', 'EOCT'),
# sgp.baseline.grade.sequences.lags=c(1,1),
# sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION')),
list( # 5,938 students #69
sgp.baseline.content.areas=c('ECONOMICS', 'US_HISTORY'),
sgp.baseline.panel.years=c('2008', '2009', '2010', '2011', '2012', '2013'),
sgp.baseline.grade.sequences=c('EOCT', 'EOCT'),
sgp.baseline.grade.sequences.lags=2,
sgp.baseline.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION')),
list( # 12,060 students #73
sgp.baseline.content.areas=c('ECONOMICS', 'US_HISTORY'),
sgp.baseline.panel.years=c('2008', '2009', '2010', '2011', '2012', '2013'),
sgp.baseline.grade.sequences=c('EOCT', 'EOCT'),
sgp.baseline.grade.sequences.lags=0,
sgp.baseline.panel.years.within=c('FIRST_OBSERVATION', 'LAST_OBSERVATION')))
GA_USHIST_Baseline_Matrices <- baselineSGP(
Georgia_SGP,
sgp.baseline.config=ush.baseline.config,
sgp.percentiles.baseline.max.order=1,
return.matrices.only=TRUE,
calculate.baseline.sgps=FALSE,
goodness.of.fit.print=FALSE,
parallel.config=list(
BACKEND="PARALLEL",
WORKERS=list(TAUS=20)))
for (i in 1:length(GA_USHIST_Baseline_Matrices[[1]])) {
print(paste(GA_USHIST_Baseline_Matrices[[1]][[i]]@Version$Matrix_Information$N,
GA_USHIST_Baseline_Matrices[[1]][[i]]@Version$Date_Prepared,
GA_USHIST_Baseline_Matrices[[1]][[i]]@Grade_Progression,
GA_USHIST_Baseline_Matrices[[1]][[i]]@Content_Areas,
GA_USHIST_Baseline_Matrices[[1]][[i]]@Time_Lags[[1]], "\n", sep=", "))
}
## Now combine the newly computed coefficient matrices with the previously existing ones.
## Save and add into SGP object before running analyses AND needed to produce SIMEX coeffient matrices below.
Georgia_Baseline_Matrices[["AMERICAN_LIT.BASELINE"]] <-
c(Georgia_Baseline_Matrices[["AMERICAN_LIT.BASELINE"]], GA_AMERICAN_LIT_Baseline_Matrices[["AMERICAN_LIT.BASELINE"]])
Georgia_Baseline_Matrices[["GRADE_9_LIT.BASELINE"]] <-
c(Georgia_Baseline_Matrices[["GRADE_9_LIT.BASELINE"]], GA_GRADE_9_LIT_Baseline_Matrices[["GRADE_9_LIT.BASELINE"]])
Georgia_Baseline_Matrices[["US_HISTORY.BASELINE"]] <-
c(Georgia_Baseline_Matrices[["US_HISTORY.BASELINE"]], GA_USHIST_Baseline_Matrices[["US_HISTORY.BASELINE"]])
save(Georgia_Baseline_Matrices, file="Georgia_Baseline_Matrices.Rdata")
###################################################################################################
###
### Georgia Baseline SIMEX matrix calculation
###
###################################################################################################
SGPstateData[["GA"]][["Baseline_splineMatrix"]][["Coefficient_Matrices"]] <-
SGPstateData[["GA"]][["Baseline_splineMatrix"]][["Coefficient_Matrices"]][
-grep("BASELINE.SIMEX", names(SGPstateData[["GA"]][["Baseline_splineMatrix"]][["Coefficient_Matrices"]]))]
### GRADE_9_LIT
GA_GRADE_9_LIT_SIMEX_Baseline_Matrices <- baselineSGP(
Georgia_SGP,
sgp.baseline.config=g9l.baseline.config,
sgp.percentiles.baseline.max.order=4, ## NOTE Change here
return.matrices.only=TRUE,
calculate.baseline.sgps=FALSE,
calculate.simex.baseline=TRUE,
goodness.of.fit.print=FALSE,
parallel.config=list(BACKEND="PARALLEL", WORKERS=list(SIMEX=25)))
### AMERICAN_LIT
GA_AMERICAN_LIT_SIMEX_Baseline_Matrices <- baselineSGP(
Georgia_SGP,
sgp.baseline.config=aml.baseline.config,
sgp.percentiles.baseline.max.order=3, ## NOTE Change here
return.matrices.only=TRUE,
calculate.baseline.sgps=FALSE,
calculate.simex.baseline=TRUE,
goodness.of.fit.print=FALSE,
parallel.config=list(BACKEND="PARALLEL", WORKERS=list(SIMEX=25))) #16
### US History
GA_USHIST_SIMEX_Baseline_Matrices <- baselineSGP(
Georgia_SGP,
sgp.baseline.config=ush.baseline.config,
sgp.percentiles.baseline.max.order=1,
return.matrices.only=TRUE,
calculate.baseline.sgps=FALSE,
calculate.simex.baseline=TRUE,
goodness.of.fit.print=FALSE,
parallel.config=list(BACKEND="PARALLEL", WORKERS=list(SIMEX=25)))
Georgia_SIMEX_Baseline_Matrices <- Georgia_Baseline_Matrices[grep("BASELINE.SIMEX", names(Georgia_Baseline_Matrices))]
Georgia_Baseline_Matrices <- Georgia_Baseline_Matrices[-grep("BASELINE.SIMEX", names(Georgia_Baseline_Matrices))]
Tmp_SIMEX_Baseline_Matrices <-
c(GA_GRADE_9_LIT_SIMEX_Baseline_Matrices, GA_AMERICAN_LIT_SIMEX_Baseline_Matrices, GA_USHIST_SIMEX_Baseline_Matrices)
SIMEX_Baseline_Matrices <-
SGP:::mergeSGP(list(Coefficient_Matrices= Georgia_SIMEX_Baseline_Matrices), list(Coefficient_Matrices= Tmp_SIMEX_Baseline_Matrices))
SIMEX_Baseline_Matrices$Coefficient_Matrices$GRADE_9_LIT.BASELINE.SIMEX[[2]][[1]][[101]]@Version
Georgia_Baseline_Matrices <- c(Georgia_Baseline_Matrices, SIMEX_Baseline_Matrices$Coefficient_Matrices)
save(Georgia_Baseline_Matrices, file="Georgia_Baseline_Matrices.Rdata")#########################################################
###
### Calculate SGPs for Georgia - 2014
###
##########################################################
### Load SGP Package
require(SGP)
### Load Georgia SGP object
load("Data/Georgia_SGP.Rdata")
load('Data/Georgia_Baseline_Matrices.Rdata' )
### AnalyzeSGP : Grade level CRCT tests
SGPstateData[["GA"]][["Baseline_splineMatrix"]][["Coefficient_Matrices"]] <- Georgia_Baseline_Matrices
Georgia_SGP <- analyzeSGP(
Georgia_SGP,
years='2014',
content_areas=c("ELA", "READING", "MATHEMATICS", "SOCIAL_STUDIES"), # "SCIENCE" is produced in SGP_Config
sgp.percentiles=TRUE,
sgp.projections=TRUE,
sgp.projections.lagged=TRUE,
sgp.percentiles.baseline=TRUE,
sgp.projections.baseline=TRUE,
sgp.projections.lagged.baseline=TRUE,
simulate.sgps=TRUE, # Needed for SGP_STANDARD_ERROR and SGP_BASELINE_STANDARD_ERROR
calculate.simex = TRUE, # Produce Cohort SIMEX for CRCT now.
calculate.simex.baseline=TRUE, # TRUE or NULL
goodness.of.fit.print="GROB", # Print all out once after running EOCTs - "GROB" produces R graphical object, but doesn't print.
parallel.config=list(BACKEND="PARALLEL", WORKERS=list(PERCENTILES=12, BASELINE_PERCENTILES=12, PROJECTIONS=6, LAGGED_PROJECTIONS=6)))
### Save Results
#save(Georgia_SGP, file="Data/Georgia_SGP.Rdata")#########################################################
###
### Calculate EOCT SGPs for Georgia for 2014
###
##########################################################
### Load SGP Package
require(SGP)
### Load Georgia SGP object
load("Data/Georgia_SGP.Rdata")
### Load EOCT configurations
require(SGP)
setwd('Github_Repos/Projects/Georgia')
source("SGP_CONFIG/EOCT/2014/ELA.R")
source("SGP_CONFIG/EOCT/2014/SCIENCE.R")
source("SGP_CONFIG/EOCT/2014/SOCIAL_STUDIES.R")
source("SGP_CONFIG/EOCT/2014/MATHEMATICS.R")
####################################################################################
###
### EOCT Analyses
###
####################################################################################
###
### Cohort referenced EOCT content areas
### - originally run seperate to keep SIMEX production limited to "official" version
###
GA_EOCT.config <- c(
ANALYTIC_GEOMETRY_2014.config,
COORDINATE_ALGEBRA_2014.config,
MATHEMATICS_II_2014.config)
Georgia_SGP <- analyzeSGP(
Georgia_SGP,
sgp.config=GA_EOCT.config,
sgp.percentiles=TRUE,
sgp.projections= TRUE,
sgp.projections.lagged= TRUE,
sgp.percentiles.baseline= FALSE,
sgp.projections.baseline= FALSE,
sgp.projections.lagged.baseline=FALSE,
simulate.sgps = TRUE, # Needed for SGP_STANDARD_ERROR and SGP_BASELINE_STANDARD_ERROR
parallel.config=list(BACKEND='PARALLEL', WORKERS=list(SIMEX=15, TAUS=15)))
###
### BASELINE SGPs
###
GA_EOCT.config <- c(
AMERICAN_LIT_2014.config,
BIOLOGY_2014.config,
ECONOMICS_2014.config,
GRADE_9_LIT_2014.config,
PHYSICAL_SCIENCE_2014.config,
US_HISTORY_2014.config)
load('Georgia_Baseline_Matrices.Rdata' )
SGPstateData[["GA"]][["Baseline_splineMatrix"]][["Coefficient_Matrices"]] <- Georgia_Baseline_Matrices
### Replace original baseline matrices with updated matrices (original + additional US Hist and ELA progressions)
Georgia_SGP@SGP$Coefficient_Matrices <- Georgia_SGP@SGP$Coefficient_Matrices[-grep("BASELINE", names(Georgia_SGP@SGP$Coefficient_Matrices))]
### analyzeSGP
Georgia_SGP <- analyzeSGP(
Georgia_SGP,
sgp.config=GA_EOCT.config,
sgp.percentiles= FALSE,
sgp.projections=FALSE,
sgp.projections.lagged=FALSE,
sgp.percentiles.baseline=TRUE,
sgp.projections.baseline= TRUE,
sgp.projections.lagged.baseline= TRUE,
simulate.sgps = TRUE, # Needed for SGP_STANDARD_ERROR and SGP_BASELINE_STANDARD_ERROR
calculate.simex.baseline = TRUE,
goodness.of.fit.print=FALSE,
parallel.config=list(BACKEND="PARALLEL", WORKERS=list(BASELINE_PERCENTILES=6, PROJECTIONS=3, LAGGED_PROJECTIONS=2)))
save(Georgia_SGP, file="Georgia_SGP.Rdata")Once all analyses are completed the results are merged into the master longitudinal data set in the @Data slot of the SGP class object using the combineSGP function. The data is then summarized using the summarizeSGP function, which produced many tables of discriptive statistics which are disaggregated at the state, district, school and instructor levels. These basic summary tables are also further disaggregated by the demographic groups available in the data set and listed in Georgia's Variable Name Lookup table.
### combineSGP
# Add Norm Group Preferences if they have not been submitted to SGPstateData:
SGPstateData[["GA"]][["SGP_Norm_Group_Preference"]] <- GA_SGP_Norm_Group_Preference
Georgia_SGP <- combineSGP(Georgia_SGP, years='2014')
### Save results
save(Georgia_SGP, file="Data/Georgia_SGP.Rdata")Note that the INSTRUCTOR_NUMBER data set was not available during the initial data cleaning and preparation in September 2014, and therefore was not included in the prepareSGP (or updateSGP step). This was added to the Georgia_SGP object before summarizeSGP was run in order to produce teacher level mean/median SGP summary statistic tables. The data cleaning of this data set was addressed above, but it was added to the existing data using this code:
### Merge 2014 data with existing @Data_Supplementary$INSTRUCTOR_NUMBER
load('Data/Georgia_SGP.Rdata')
load('Data/Base_Files/GA_INSTRUCTOR_2013-14.Rdata')
# First change existing Instructor LName FName to character
unlist(sapply(Georgia_SGP@Data_Supplementary[[1]], class))
Georgia_SGP@Data_Supplementary[[1]][, INSTRUCTOR_LAST_NAME := as.character(INSTRUCTOR_LAST_NAME)]
Georgia_SGP@Data_Supplementary[[1]][, INSTRUCTOR_FIRST_NAME := as.character(INSTRUCTOR_FIRST_NAME)]
# Use rbind to combine the existing and new data
Georgia_SGP@Data_Supplementary[["INSTRUCTOR_NUMBER"]] <- rbind(Georgia_SGP@Data_Supplementary[["INSTRUCTOR_NUMBER"]], GA_INSTRUCTOR_2014)
# You may want to save Georgia_SGP and restart before running summarizeSGP
# if CPU memory is now exhausted. Just to be safe ...
At this point we proceeded with the summarization and output of the data.
### summarizeSGP (Produces aggregate tables)
Georgia_SGP <- summarizeSGP(Georgia_SGP, parallel.config=list(BACKEND="PARALLEL", WORKERS=list(SUMMARY=20)))
# Extract and save the summary tables seperately
Georgia_Summary <- Georgia_SGP@Summary
save(Georgia_Summary, file="Data/Georgia_Summary_2014.Rdata")
Georgia_SGP@Summary <- NULL
### outputSGP
outputSGP(Georgia_SGP, output.type=c("LONG_Data", "LONG_FINAL_YEAR_Data"))
In the final line of code above, a pipe delimited version of the complete long data is exported via outputSGP. Additionally for Georgia, the NCIEA produces a formatted version of the 2014 results, which contains fields needed for rendering data in the state's vizualization tool such as students' entire prior score and course progression history. The code used in 2014 is presented here:
######################################################################################
###
### Script to produce formatted text output for Georgia from annual long data
###
######################################################################################
### Load packages
require(SGP)
require(data.table)
### Load data
setwd('Georgia')
load("Data/Georgia_SGP.Rdata")
load("Data/Georgia_SGP_LONG_Data_2014.Rdata")
### Variables to output
variables.to.output <- c("VALID_CASE", "GTID", "SCHOOL_YEAR", "SUBJECT_CODE", "YEAR_WITHIN", "GRADE", "GRADE_REPORTED", "SCALE_SCORE", "SCALE_SCORE_PRIOR_STANDARDIZED",
"ADMINISTRATION_PERIOD", "FIRST_OBSERVATION", "LAST_OBSERVATION", "PERFORMANCE_LEVEL", "SR_SYSTEM_ID", "SCHOOL_NUMBER", "ADMIN_INVALIDATION", "ADMIN_TYPE", "MATCH_STATUS",
"RACE_CODE", "GENDER_CODE", "ED", "SWD", "LEP", "GIFT", "BIRTH_DATE", "LAST_NAME", "FIRST_NAME", "MIDDLE_NAME",
"SGP_NORM_GROUP", "SGP", "SGP_SIMEX", "SGP_LEVEL", "SGP_STANDARD_ERROR", "SGP_NORM_GROUP_SCALE_SCORES",
"SGP_NORM_GROUP_BASELINE", "SGP_BASELINE", "SGP_SIMEX_BASELINE", "SGP_LEVEL_BASELINE", "SGP_NORM_GROUP_BASELINE_SCALE_SCORES",
"SGP_NORM_GROUP_FINAL", "SGP_FINAL", "SGP_SIMEX_FINAL", "SGP_LEVEL_FINAL", "SGP_SIMEX_LEVEL_FINAL", "SGP_NORM_GROUP_FINAL_SCALE_SCORES",
"SCHOOL_YEAR_PRIOR_1", "SUBJECT_CODE_PRIOR_1", "SCALE_SCORE_PRIOR_1", "PERFORMANCE_LEVEL_PRIOR_1", "GRADE_PRIOR_1", "ADMINISTRATION_PERIOD_PRIOR_1",
"SCHOOL_YEAR_PRIOR_2", "SUBJECT_CODE_PRIOR_2", "SCALE_SCORE_PRIOR_2", "PERFORMANCE_LEVEL_PRIOR_2", "GRADE_PRIOR_2", "ADMINISTRATION_PERIOD_PRIOR_2",
"SCHOOL_YEAR_PRIOR_3", "SUBJECT_CODE_PRIOR_3", "SCALE_SCORE_PRIOR_3", "PERFORMANCE_LEVEL_PRIOR_3", "GRADE_PRIOR_3", "ADMINISTRATION_PERIOD_PRIOR_3",
"SCHOOL_YEAR_PRIOR_4", "SUBJECT_CODE_PRIOR_4", "SCALE_SCORE_PRIOR_4", "PERFORMANCE_LEVEL_PRIOR_4", "GRADE_PRIOR_4", "ADMINISTRATION_PERIOD_PRIOR_4")
### Subset out relevant variables
tmp.long.data <- subset(Georgia_SGP_LONG_Data_2014, select=intersect(variables.to.output, names(Georgia_SGP_LONG_Data_2014)))
# ### Clean up ADMINISTRATION_PERIOD - Changed in Georgia_SGP@Data and Georgia_SGP_LONG_Data_2014 output, so not needed anymore
# tmp.long.data[, ADMINISTRATION_PERIOD := paste(YEAR_WITHIN, ADMINISTRATION_PERIOD, sep=": ")]
### Create SGP_*_FINAL Variables
### Start with baseline SGPs and then fill in missings (EOCT Math subjects) with cohort referenced SGPs
tmp.long.data[, SGP_FINAL := SGP_BASELINE]
tmp.long.data[which(is.na(SGP_FINAL)), SGP_FINAL := SGP]
tmp.long.data[, SGP_SIMEX_FINAL := SGP_SIMEX_BASELINE]
tmp.long.data[which(is.na(SGP_FINAL)), SGP_FINAL := SGP_SIMEX]
tmp.long.data[, SGP_LEVEL_FINAL := SGP_LEVEL_BASELINE]
tmp.long.data[which(is.na(SGP_LEVEL_FINAL)), SGP_LEVEL_FINAL := SGP_LEVEL]
tmp.long.data[, SGP_SIMEX_LEVEL_FINAL := ordered(
findInterval(SGP_SIMEX_FINAL, SGPstateData[["GA"]][["Growth"]][["Cutscores"]][["Cuts"]]), labels=c("Low", "Typical", "High"))]
tmp.long.data[, SGP_NORM_GROUP_FINAL := SGP_NORM_GROUP_BASELINE]
tmp.long.data[which(is.na(SGP_NORM_GROUP_FINAL)), SGP_NORM_GROUP_FINAL := SGP_NORM_GROUP]
tmp.long.data[, SGP_NORM_GROUP_FINAL_SCALE_SCORES := SGP_NORM_GROUP_BASELINE_SCALE_SCORES]
tmp.long.data[which(is.na(SGP_NORM_GROUP_FINAL_SCALE_SCORES)), SGP_NORM_GROUP_FINAL_SCALE_SCORES := SGP_NORM_GROUP_SCALE_SCORES]
### Split SGP_NORM_GROUP_FINAL
my.tmp.split <- strsplit(as.character(tmp.long.data$SGP_NORM_GROUP_FINAL), "; ")
### YEAR Prior
tmp.long.data$SCHOOL_YEAR_PRIOR_1 <- sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[2]), "/"), '[', 1)
tmp.long.data$SCHOOL_YEAR_PRIOR_2 <- sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[3]), "/"), '[', 1)
tmp.long.data$SCHOOL_YEAR_PRIOR_3 <- sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[4]), "/"), '[', 1)
tmp.long.data$SCHOOL_YEAR_PRIOR_4 <- sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[5]), "/"), '[', 1)
### SUBJECT_CODE Prior
tmp.long.data$SUBJECT_CODE_PRIOR_1 <- sapply(sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[2]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")
tmp.long.data$SUBJECT_CODE_PRIOR_2 <- sapply(sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[3]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")
tmp.long.data$SUBJECT_CODE_PRIOR_3 <- sapply(sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[4]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")
tmp.long.data$SUBJECT_CODE_PRIOR_4 <- sapply(sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[5]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")
### GRADE Prior
tmp.long.data$GRADE_PRIOR_1 <- sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[2]), "/"), '[', 2), "_"), tail, 1)
tmp.long.data$GRADE_PRIOR_2 <- sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[3]), "/"), '[', 2), "_"), tail, 1)
tmp.long.data$GRADE_PRIOR_3 <- sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[4]), "/"), '[', 2), "_"), tail, 1)
tmp.long.data$GRADE_PRIOR_4 <- sapply(strsplit(sapply(strsplit(
sapply(my.tmp.split, function(x) rev(x)[5]), "/"), '[', 2), "_"), tail, 1)
### SCALE_SCORE Prior
my.tmp.split.scale_score <- strsplit(tmp.long.data$SGP_NORM_GROUP_FINAL_SCALE_SCORES, "; ")
tmp.long.data$SCALE_SCORE_PRIOR_1 <- sapply(my.tmp.split.scale_score, function(x) rev(x)[2])
tmp.long.data$SCALE_SCORE_PRIOR_2 <- sapply(my.tmp.split.scale_score, function(x) rev(x)[3])
tmp.long.data$SCALE_SCORE_PRIOR_3 <- sapply(my.tmp.split.scale_score, function(x) rev(x)[4])
tmp.long.data$SCALE_SCORE_PRIOR_4 <- sapply(my.tmp.split.scale_score, function(x) rev(x)[5])
### PERFORMANCE_LEVEL Prior
## Create all 4 PERFORMANCE_LEVEL PRIOR vars as NA vectors
tmp.long.data[, paste("PERFORMANCE_LEVEL_PRIOR", 1:4, sep="_") := factor(NA)]
## Fill in the 1st Prior PERFORMANCE_LEVEL info for CRCT and EOCT. 2nd - 4th Priors will all be CRCT
tmp.long.data[which(GRADE_PRIOR_1 != "EOCT"), PERFORMANCE_LEVEL_PRIOR_1 :=
ordered(findInterval(as.numeric(SCALE_SCORE_PRIOR_1), c(800, 850)),
labels=c("Does Not Meet Expectations", "Meets Expectations", "Exceeds Expectations"))]
tmp.long.data[which(GRADE_PRIOR_1 == "EOCT"), PERFORMANCE_LEVEL_PRIOR_1 :=
ordered(findInterval(SCALE_SCORE_PRIOR_1, c(400, 450)),
labels=c("Does Not Meet Expectations", "Meets Expectations", "Exceeds Expectations"))]
tmp.long.data[which(GRADE_PRIOR_2 != "EOCT"), PERFORMANCE_LEVEL_PRIOR_2 :=
ordered(findInterval(as.numeric(SCALE_SCORE_PRIOR_2), c(800, 850)),
labels=c("Does Not Meet Expectations", "Meets Expectations", "Exceeds Expectations"))]
## 2nd -> 4th Priors will all be CRCT
# tmp.long.data[which(GRADE == "EOCT"), PERFORMANCE_LEVEL_PRIOR_2 :=
# ordered(findInterval(SCALE_SCORE_PRIOR_2, c(400, 450)),
# labels=c("Does Not Meet Expectations", "Meets Expectations", "Exceeds Expectations"))]
tmp.long.data[which(GRADE_PRIOR_3 != "EOCT"), PERFORMANCE_LEVEL_PRIOR_3 :=
ordered(findInterval(as.numeric(SCALE_SCORE_PRIOR_3), c(800, 850)),
labels=c("Does Not Meet Expectations", "Meets Expectations", "Exceeds Expectations"))]
tmp.long.data[which(GRADE_PRIOR_4 != "EOCT"), PERFORMANCE_LEVEL_PRIOR_4 :=
ordered(findInterval(as.numeric(SCALE_SCORE_PRIOR_4), c(800, 850)),
labels=c("Does Not Meet Expectations", "Meets Expectations", "Exceeds Expectations"))]
### ADMINISTRATION_PERIOD_PRIOR_* Prior
tmp.admin.period <- Georgia_SGP@Data[,
c(key(Georgia_SGP@Data)[1:4], "GRADE", "ADMINISTRATION_PERIOD", "SCALE_SCORE"), with=FALSE][VALID_CASE=="VALID_CASE" & GRADE=="EOCT"]
setnames(tmp.admin.period, c("CONTENT_AREA", "YEAR", "GRADE", "ID", "ADMINISTRATION_PERIOD", "SCALE_SCORE"),
c("SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1", "GTID", "ADMINISTRATION_PERIOD_PRIOR_1", "SCALE_SCORE_PRIOR_1"))
tmp.admin.period[, SCALE_SCORE_PRIOR_1 := as.character(SCALE_SCORE_PRIOR_1)]
## Remove the "LAST_OBSERVATION" for within year repeaters that had the exact same score in both Admin Periods
setkeyv(tmp.admin.period, c("GTID", "SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1",
"SCALE_SCORE_PRIOR_1", "ADMINISTRATION_PERIOD_PRIOR_1", "VALID_CASE"))
setkeyv(tmp.admin.period, c("GTID", "SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1",
"SCALE_SCORE_PRIOR_1", "VALID_CASE"))
tmp.admin.period <- tmp.admin.period[which(!duplicated(tmp.admin.period))]
setkeyv(tmp.long.data, c("GTID", "SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1",
"SCALE_SCORE_PRIOR_1", "VALID_CASE"))
tmp.long.data <- merge(tmp.long.data, tmp.admin.period, all.x=TRUE)
tmp.long.data[which(!is.na(GRADE_PRIOR_1) & is.na(ADMINISTRATION_PERIOD_PRIOR_1)),
ADMINISTRATION_PERIOD_PRIOR_1 := "2: SPRING"]
table(tmp.long.data$ADMINISTRATION_PERIOD_PRIOR_1, tmp.long.data$ADMINISTRATION_PERIOD, tmp.long.data$SCHOOL_YEAR_PRIOR_1)
tmp.long.data[, paste("ADMINISTRATION_PERIOD_PRIOR", 2:4, sep="_") := as.character(NA)]
tmp.long.data[which(!is.na(GRADE_PRIOR_2)), ADMINISTRATION_PERIOD_PRIOR_2 := "2: SPRING"]
tmp.long.data[which(!is.na(GRADE_PRIOR_3)), ADMINISTRATION_PERIOD_PRIOR_3 := "2: SPRING"]
tmp.long.data[which(!is.na(GRADE_PRIOR_4)), ADMINISTRATION_PERIOD_PRIOR_4 := "2: SPRING"]
Georgia_SGP_Data_LONG_2014_FORMATTED <- tmp.long.data[, variables.to.output, with=FALSE]
setkeyv(Georgia_SGP_Data_LONG_2014_FORMATTED, c("VALID_CASE", "SUBJECT_CODE", "SCHOOL_YEAR", "GTID", "YEAR_WITHIN"))
### Save results
save(Georgia_SGP_Data_LONG_2014_FORMATTED, file="Data/Georgia_SGP_Data_LONG_2014_FORMATTED.Rdata")
write.table(Georgia_SGP_Data_LONG_2014_FORMATTED, file="Data/Georgia_SGP_Data_LONG_2014_FORMATTED.txt",
sep="|", row.names=FALSE, na="", quote=FALSE)
SGP - Student Growth Percentiles SGP Blog | SGP GitHub Repo | SGP on CRAN