Data Analysis 2015

2015 SGP Analyses

The objective of the student growth percentile (SGP) analysis is to describe how (a)typical a student's growth is by examining his/her current achievement relative to students with a similar achievement history; i.e his/her academic peers (see Section 2 of the GSGM FAQ). The estimation of this norm-referenced growth quantity is conducted using quantile regression to model curvilinear functional relationships between student's prior and current scores. One hundred such regression models are calculated for each separate analysis (defined as a unique year by content area by grade by prior order combination). The end product of these 100 separate regression models is a single coefficient matrix, which serves as a look-up table to relate prior student achievement to current achievement for each percentile. This process ultimately leads to tens of thousands of model calculations (and many more when SIMEX measurement error corrections are performed) during each of Georgia's annual batch of analyses.

The 2015 Georgia SGP analyses follow a work flow established in previous years that includes the following 4 steps:

Update the Georgia assessment meta-data required for SGP calculations using the SGP package.
Create annual SGP configurations for analyses.
Conduct all EOG and EOC SGP analyses (concurrently).
Combine results into the master longitudinal data set, summarize/visualize results.
Format the initial 2015 data output to add additional variables before returning data to Georgia DOE.

1. Update the Georgia metadata in `SGPstateData`.

The use of higher-level functions included in the SGP package (e.g. analyzeSGP) requires the availability of state specific assessment information. This meta-data is compiled in a R object named SGPstateData that is housed in the SGP package. The required updates for the 2015 analyses included a) the additions of Milestones knots and boundaries, proficiency level cutscores and other configuration metadata, and b) updating the norm group preferences object. The entry for Georgia as it appeared in 2015 can be seen here on Github.

Calculation and addition of knots and boundaries

Cubic B-spline basis functions are used in the calculation of SGPs to more adequately model the heteroscedasticity and non-linearity found in assessment data. These functions require the selection of boundary and interior knots. Boundary knots are end points outside of the scale score distribution that anchor the B-spline basis. These are generally selected by extending the range of scale scores by 10%. That is, they are defined as lying 10% below the lowest obtainable scale score (LOSS) and 10% above the highest obtainable scale score (HOSS). The interior knots are the internal breakpoints that define the spline. The default choice in the SGP package is to select the 20^th, 40^th, 60^th and 80^th quantiles of the observed scale score distribution.

In general the knots and boundaries are computed from a distribution comprised of several years of test data (i.e. multiple cohorts combined) so that any irregularities in a single year are smoothed out. Subsequent annual analyses use these same knots and boundaries as well. All defaults were used to compile the knots and boundaries for Georgia from the CRCT and EOCT tests in previous years, and were also used in 2015 to compute the Milestones knots and boundaries required for EOC block schedule and within-year repeater analyses. These knots and boundaries should be recalculated in 2016 when two years of data are available, which should provide better knot estimates for 2016 and subsequent years. The Milestones knots and boundaries values were calculated automatically as part of the updateSGP function when preliminary "equated" SGP analyses were run. This routine saves the computed knots and boundaries, and these values were then added to the SGPstateData object (committed to Github here) for use in the final 2015 analyses.

Proficiency level cutscores

Cutscores, which are set externally by the Georgia DOE through standard-setting processes, are mainly required for student growth projections. Projections were calculated through the use of equating methods in the 2015 analyses due to the switch to Georgia Milestones Assessments because Student Growth Projections assume consistency in assessment programs. However, these projections will not used for official reporting purposes.

The performance (achievement) level metadata, such as labels and descriptions, were also updated to reflect the new Milestones standards.

Conditional standard errors of measurement (CSEMs)

The calculation of SIMEX adjusted SGPs and SGP standard errors require the availability of each assessments' standard errors of measurement. In the CRCT analyses in previous years the CSEM data for all other content areas had been compiled in the SGPstateData file. These values are no longer scale score specific (i.e. there are some identical test scores with different CSEM values, whereas previously these were all identical) and so are now included in the student level data. Although the CSEM data was not added to the SGPstateData object, this change still required the identification of the name of the student level CSEM variable, "SCALE_SCORE_CSEM", in the metadata here.

Set the `sgp.cohort.size` SGP Configuration

Following preliminary analyses, five EOC course progressions were identified as having fewer than 1,500 students. Given the decreasing quality of model fit and the increasing difficulty in interpretability of what SGP values from such a small norm group size represents, the decision was made to institute a minimum N size of 1,500 for all analyses. This could be done either manually by removing the configuration scripts, or by adding a SGP_Configuration entry for it in the SGPstateData. The latter option was adopted, and therefore any future attempts to examine course progressions with fewer than 1,500 students will require a manual override of this (i.e. set to NULL).

SGPstateData[["GA"]][["SGP_Configuration"]][["sgp.cohort.size"]] <- 2 # NULL to remove restriction

Norm group preferences

The process through which EOC and ELA analyses are run can produce multiple SGPs for some students. In order to identify which quantity will be used as the students' "official" SGP and subsequently merged into the master longitudinal data set, a system of norm group preferencing is established and is encoded into a lookup table and included in the SGPstateData. In general, the preference is given to:

Progressions with the greatest number of prior scale scores.
Progressions in which a student has repeated a course.
Progressions that do not include a skipped year (i.e. a gap in the scale score history).
Progressions for block-schedule course taking patterns.

The next section describes the process by which the individual course progression analyses are established and how the preferencing system is included within their configuration code. Here is the Github commit in which the final object was included in SGPstateData.

2. Create annual SGP configurations.

Unlike most EOG analyses, EOC analyses are specialized enough so that it is necessary to specify the analyses to be performed via explicit configuration code. For several years, configurations have been employed to conduct EOC SGP analyses for Georgia, and beginning in 2015 configurations are now used for EOG SGP analyses as well. In part the use in EOG was required for the ELA analyses because this assessment now includes both Reading and ELA content, and the analysis of it uses separate CRCT Reading and ELA tests as priors. The configurations associated with the 2015 annual SGP analyses are located in the Georgia Github repository folder named SGP_CONFIG. This file should also be available on existing Georgia DOE hard and/or shared drives.

The configurations are broken up into four separate R scripts: ELA.R, MATHEMATICS.R, SCIENCE.R, and SOCIAL_STUDIES.R.

Each configuration specifies a set of parameters that defines the norm group of students to be examined. Every potential norm group is defined by, at a minimum, the progressions of content area, academic year and grade-level. Each configuration used for the Georgia EOG analyses contain the first three elements. The EOC test analyses also contain the fourth through sixth elements:

sgp.content.areas: A progression of values that specifies the content areas to be looked at and their order.
sgp.panel.years: The progression of the years associated with the content area progression (sgp.content.areas) provided in the configuration, potentially allowing for skipped years, repeated years, etc.
sgp.grade.sequences: The grade progression associated with the content area and year progressions provided in the configuration. 'EOCT' stands for 'End Of Course Test'. The use of the generic 'EOCT' allows for secondary students to be compared based on the pattern of course taking rather than being dependent upon grade-level/class-designation.
sgp.projection.grade.sequences: This element is used to identify the grade sequence that will be used to produce straight and/or lagged student growth projections. It can, somewhat counter-intuitively, be left out or set to NULL, in which case projections will be produced and the package functions will populate the grade sequence to use based on the values provided in the sgp.grade.sequences element. Alternatively, when set to "NO_PROJECTIONS", no projections will be produced. For EOCT analyses, only configurations that correspond to the canonical course progressions can produce student growth projections. The canonical progressions are codified and stored in the SGP package/SGPstateData object here: SGPstateData[["GA"]][["SGP_Configuration"]][["content_area.projection.sequence"]].
sgp.norm.group.preference: Because a student can potentially be included in more than one analysis/configuration, multiple SGPs will be produced for some students and a system is required to identify the preferred SGP that will be matched with the student in the combineSGP step. This argument provides a ranking that specifies how preferable SGPs produced from the analysis in question is relative to other possible analyses. Lower numbers correspond with higher preference.
sgp.exclude.sequences: Identifies grade, subject, and year combinations that identify students that should be excluded from the norm-group cohort. Generally used in progressions in which a year or similar time period is skipped (i.e. a gap in time exists). For example, in a progression that goes from 8^th grade Science to EOCT Biology with a skipped year in between one may want to exclude kids that fit into that progression, but also repeated either 8^th grade Science or EOCT Biology or took EOCT Physical Science in the skipped year. Students with different course progressions may be inappropriate to include with the cohort of students who truly had no science related course in the intervening year.

Note that sgp.content.areas, sgp.panel.years, and sgp.grade.sequences elements are all character strings, and their values correspond to levels found in the CONTENT_AREA, YEAR, and GRADE variables in the Georgia_SGP@Data slot respectively.

As an example, here is the complete Coordinate Algebra configuration code used to defined the 2015 SGP analyses:

### Coordinate Algebra

COORDINATE_ALGEBRA_2015.config <- list(
  COORDINATE_ALGEBRA.2015 = list( #14
    sgp.content.areas=c('MATHEMATICS', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2013', '2015'),
    sgp.grade.sequences=list(c(8, 'EOCT')),
    sgp.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.exclude.sequences = data.table(VALID_CASE = 'VALID_CASE', 
      CONTENT_AREA=c('MATHEMATICS', 'COORDINATE_ALGEBRA'), 
      YEAR=c('2014', '2014'), GRADE=c(8, 'EOCT')),
    sgp.norm.group.preference=7),
  COORDINATE_ALGEBRA.2015 = list( #15
    sgp.content.areas=c('MATHEMATICS', 'MATHEMATICS', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2012', '2013', '2015'),
    sgp.grade.sequences=list(c(7, 8, 'EOCT')),
    sgp.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.exclude.sequences = data.table(VALID_CASE = 'VALID_CASE',
      CONTENT_AREA=c('MATHEMATICS', 'COORDINATE_ALGEBRA'), 
      YEAR=c('2014', '2014'), GRADE=c(8, 'EOCT')),
    sgp.norm.group.preference=6),

  COORDINATE_ALGEBRA.2015 = list( #16
    sgp.content.areas=c('MATHEMATICS', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2014', '2015'),
    sgp.grade.sequences=list(c(8, 'EOCT')),
    sgp.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.norm.group.preference=5),
  COORDINATE_ALGEBRA.2015 = list( #17
    sgp.content.areas=c('MATHEMATICS', 'MATHEMATICS', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2013', '2014', '2015'),
    sgp.grade.sequences=list(c(7, 8, 'EOCT')),
    sgp.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.norm.group.preference=4),
  COORDINATE_ALGEBRA.2015 = list( #18
    sgp.content.areas=c('MATHEMATICS', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2014', '2015'),
    sgp.grade.sequences=list(c('7', 'EOCT')),
    sgp.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.norm.group.preference=3),
  COORDINATE_ALGEBRA.2015 = list( #19
    sgp.content.areas=c('MATHEMATICS', 'MATHEMATICS', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2013', '2014', '2015'),
    sgp.grade.sequences=list(c('6', '7', 'EOCT')),
    sgp.panel.years.within=c('LAST_OBSERVATION', 'LAST_OBSERVATION', 'FIRST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.norm.group.preference=2),
  COORDINATE_ALGEBRA.2015 = list( #20 - Repeaters
    sgp.content.areas=c('COORDINATE_ALGEBRA', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2014', '2015'),
    sgp.grade.sequences=list(c('EOCT', 'EOCT')),
    sgp.panel.years.within=c('LAST_OBSERVATION', 'FIRST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.norm.group.preference=1),
  COORDINATE_ALGEBRA.2015 = list( #21 - Repeaters (Same Year)
    sgp.content.areas=c('COORDINATE_ALGEBRA', 'COORDINATE_ALGEBRA'),
    sgp.panel.years=c('2015', '2015'),
    sgp.grade.sequences=list(c('EOCT', 'EOCT')),
    sgp.panel.years.within=c('FIRST_OBSERVATION', 'LAST_OBSERVATION'),
    sgp.exact.grade.progression=TRUE,
    sgp.norm.group.preference=0)
) ### END COORDINATE_ALGEBRA_2015.config

Notice the first analysis in particular, which contains a skipped year and requires the sgp.exclude.sequences element. The sgp.exclude.sequences element provides a data table with the cases to exclude from the analysis cohort. Here we want to exclude any kids who may have not actually skipped a math related course in 2014, which might include students who repeated either 8th grade Math or repeated Coordinate Algebra. These students will be included in other course progressions/configurations and should be excluded from the calculation of the skipped year SGPs.

   VALID_CASE         CONTENT_AREA    YEAR     GRADE
1: VALID_CASE          MATHEMATICS    2014         8
2: VALID_CASE   COORDINATE_ALGEBRA    2014      EOCT

Create the norm group preferences lookup table

Configurations are R scripts that are used as part of the larger SGP analysis to be discussed later. In addition, the SGPstateData needs to be updated with the norm group preference data included within the configurations. To do this, an .Rdata object needs to be constructed that is embedded within SGPstateData (either manually or included in the package build itself). To create the object with the norm groups preferences utilize/source the R script configToSGPNormGroup.R in the SGP_CONFIG folder as follows:

source("configToSGPNormGroup.R")

This creates the .Rdata object GA_SGP_Norm_Group_Preference.Rdata) containing the norm group preferences. This object is a data.frame/data.table containing information about what the rank ordering of the configurations are in terms of preference.

The GA_SGP_Norm_Group_Preference object should be submitted to the SGP package maintainers for inclusion in the package so that it is contained in SGPstateData when the package is loaded. The next section describes the process by which the individual course progression analyses are established and how the preferencing system is included within their configuration code. The R script "configToSGPNormGroup.R" in the Github repository uses the code below to produce the object GA_SGP_Norm_Group_Preference.

##################################################################################
###                                                                            ###
###   Convert SGP analysis configurations to SGP_NORM_GROUP preference table   ###
###                                                                            ###
##################################################################################

### Load packages

require("data.table")
options(error=recover)

### utility functions

configToSGPNormGroup <- function(sgp.config) {
  if ("sgp.norm.group.preference" %in% names(sgp.config)) {
    tmp.data.all <- data.table()
    for (g in 1:length(sgp.config$sgp.grade.sequences)) {
      l <- length(sgp.config$sgp.grade.sequences[[g]])
      tmp.norm.group <- tmp.norm.group.baseline <- 
        paste(tail(sgp.config$sgp.panel.years, l),
        paste(tail(sgp.config$sgp.content.areas, l), 
          unlist(sgp.config$sgp.grade.sequences[[g]]), sep="_"), sep="/") 
            
      tmp.data <- data.table(
        SGP_NORM_GROUP=paste(tmp.norm.group, collapse="; "), 
        SGP_NORM_GROUP_BASELINE=paste(tmp.norm.group.baseline, collapse="; "),
        PREFERENCE= sgp.config$sgp.norm.group.preference*100)
            
      if (length(tmp.norm.group) > 2) {
        if ("sgp.exact.grade.progression" %in% names(sgp.config)) {
          if (sgp.config$sgp.exact.grade.progression) {
            tmp.all.prog <- FALSE 
          } else tmp.all.prog <- TRUE
        } else tmp.all.prog <- TRUE
        if (tmp.all.prog) {
          for (n in 1:(length(tmp.norm.group)-2)) {
            tmp.data <- rbind(tmp.data, data.table(
              SGP_NORM_GROUP=paste(tail(tmp.norm.group, -n), collapse="; "), 
              SGP_NORM_GROUP_BASELINE=paste(tmp.norm.group.baseline, collapse="; "),
              PREFERENCE= (sgp.config$sgp.norm.group.preference*100)+n))
          }
        }
      }
      tmp.data.all <- rbind(tmp.data.all, tmp.data)
    }
    return(unique(tmp.data.all))
  } else {
    return(NULL)
  }
}

configToSGPNormGroup_ORIG <- function(sgp.config) {
    tmp.norm.group <- tmp.norm.group.baseline <- 
      paste(sgp.config$sgp.panel.years, paste(sgp.config$sgp.content.areas, 
            unlist(sgp.config$sgp.grade.sequences), sep="_"), sep="/")
    return(data.table(
      SGP_NORM_GROUP=paste(tmp.norm.group, collapse="; "), 
      SGP_NORM_GROUP_BASELINE=paste(tmp.norm.group.baseline, collapse="; "),
      PREFERENCE=as.integer(sgp.config$sgp.norm.group.preference)))
}

### Load 2010 - 2015 EOCT Configurations
sapply(list.files("EOCT/2010/", full.names=TRUE), source, .GlobalEnv)
sapply(list.files("EOCT/2011/", full.names=TRUE), source, .GlobalEnv)
sapply(list.files("EOCT/2012/", full.names=TRUE), source, .GlobalEnv)
sapply(list.files("EOCT/2013/", full.names=TRUE), source, .GlobalEnv)
sapply(list.files("EOCT/2014/", full.names=TRUE), source, .GlobalEnv)
sapply(list.files("EOCT/2015/", full.names=TRUE), source, .GlobalEnv)

###  Compile annual configuration lists
GA_EOCT_2010.config <- c(
	GRADE_9_LIT_2010.config, AMERICAN_LIT_2010.config,
	US_HISTORY_2010.config, ECONOMICS_2010.config,
	BIOLOGY_2010.config, PHYSICAL_SCIENCE_2010.config,
	MATHEMATICS_I_2010.config)

GA_EOCT_2011.config <- c(
	GRADE_9_LIT_2011.config, AMERICAN_LIT_2011.config,
	US_HISTORY_2011.config, ECONOMICS_2011.config,
	BIOLOGY_2011.config, PHYSICAL_SCIENCE_2011.config,
	MATHEMATICS_I_2011.config, MATHEMATICS_II_2011.config)

GA_EOCT_2012.config <- c(
	GRADE_9_LIT_2012.config, AMERICAN_LIT_2012.config,
	US_HISTORY_2012.config, ECONOMICS_2012.config,
	BIOLOGY_2012.config, PHYSICAL_SCIENCE_2012.config,
	ALGEBRA_2012.config, GEOMETRY_2012.config, 
	MATHEMATICS_I_2012.config, MATHEMATICS_II_2012.config)

GA_EOCT_2013.config <- c(
	GRADE_9_LIT_2013.config, AMERICAN_LIT_2013.config,
	US_HISTORY_2013.config, ECONOMICS_2013.config,
	PHYSICAL_SCIENCE_2013.config, BIOLOGY_2013.config,
	COORDINATE_ALGEBRA_2013.config, GEOMETRY_2013.config, 
	MATHEMATICS_I_2013.config, MATHEMATICS_II_2013.config)

GA_EOCT_2014.config <- c(
	GRADE_9_LIT_2014.config, AMERICAN_LIT_2014.config,
	US_HISTORY_2014.config, ECONOMICS_2014.config,
	BIOLOGY_2014.config, PHYSICAL_SCIENCE_2014.config,
	ANALYTIC_GEOMETRY_2014.config, COORDINATE_ALGEBRA_2014.config, 
	MATHEMATICS_II_2014.config)

GA_EOCT_2015.config <- c(
	ELA_2015.config, GRADE_9_LIT_2015.config, AMERICAN_LIT_2015.config,
	US_HISTORY_2015.config, ECONOMICS_2015.config,
	BIOLOGY_2015.config, PHYSICAL_SCIENCE_2015.config,
	ANALYTIC_GEOMETRY_2015.config, COORDINATE_ALGEBRA_2015.config)

### Create configToNormGroup data.frame
tmp.configToNormGroup <- lapply(GA_EOCT_2010.config, configToSGPNormGroup_ORIG)
    GA_SGP_Norm_Group_Preference_2010 <- data.table(
					YEAR="2010", rbindlist(tmp.configToNormGroup))

tmp.configToNormGroup <- lapply(GA_EOCT_2011.config, configToSGPNormGroup_ORIG)
    GA_SGP_Norm_Group_Preference_2011 <- data.table(
					YEAR="2011", rbindlist(tmp.configToNormGroup))

tmp.configToNormGroup <- lapply(GA_EOCT_2012.config, configToSGPNormGroup_ORIG)
    GA_SGP_Norm_Group_Preference_2012 <- data.table(
					YEAR="2012", rbindlist(tmp.configToNormGroup))

tmp.configToNormGroup <- lapply(GA_EOCT_2013.config, configToSGPNormGroup_ORIG)
    GA_SGP_Norm_Group_Preference_2013 <- data.table(
					YEAR="2013", rbindlist(tmp.configToNormGroup))

tmp.configToNormGroup <- lapply(GA_EOCT_2014.config, configToSGPNormGroup_ORIG)
    GA_SGP_Norm_Group_Preference_2014 <- data.table(
					YEAR="2014", rbindlist(tmp.configToNormGroup))

tmp.configToNormGroup <- lapply(GA_EOCT_2015.config, configToSGPNormGroup)
    GA_SGP_Norm_Group_Preference_2015 <- data.table(
					YEAR="2015", rbindlist(tmp.configToNormGroup))

GA_SGP_Norm_Group_Preference <- rbind(
      GA_SGP_Norm_Group_Preference_2010, GA_SGP_Norm_Group_Preference_2011,
      GA_SGP_Norm_Group_Preference_2012, GA_SGP_Norm_Group_Preference_2013,
      GA_SGP_Norm_Group_Preference_2014, GA_SGP_Norm_Group_Preference_2015)

GA_SGP_Norm_Group_Preference$SGP_NORM_GROUP <- 
    as.factor(GA_SGP_Norm_Group_Preference$SGP_NORM_GROUP)
GA_SGP_Norm_Group_Preference$SGP_NORM_GROUP_BASELINE <- 
    as.factor(GA_SGP_Norm_Group_Preference$SGP_NORM_GROUP_BASELINE)

### Save result
setkey(GA_SGP_Norm_Group_Preference, YEAR, SGP_NORM_GROUP)
save(GA_SGP_Norm_Group_Preference, file="GA_SGP_Norm_Group_Preference.Rdata")

3. Conduct SGP analyses

All cohort-referenced (uncorrected) and SIMEX corrected EOG and EOC SGPs were calculated concurrently. We use the updateSGP function to a) do the final preparation and addition of the new cleaned and formatted long data to the new SGP class object (prepareSGP step) and b) calculate SGP estimates (analyzeSGP step).

##################################################################################
###                                                                            ###
###                     Calculate SGPs for Georgia - 2015                      ###
###                                                                            ###
##################################################################################

### Load required packages
require(SGP)
require(data.table)

###  Load NEW Georgia SGP object and 2015 data
load("Data/Georgia_SGP.Rdata")
load("Data/Georgia_Data_LONG_2015.Rdata")

###  Read in 2015 SGP Configuration Scripts and Combine
source("SGP_CONFIG/EOCT/2015/ELA.R")
source("SGP_CONFIG/EOCT/2015/SCIENCE.R")
source("SGP_CONFIG/EOCT/2015/SOCIAL_STUDIES.R")
source("SGP_CONFIG/EOCT/2015/MATHEMATICS.R")

GA_2015.config <- c(
    ELA_2015.config, GRADE_9_LIT_2015.config, AMERICAN_LIT_2015.config,
    SOCIAL_STUDIES_2015.config, US_HISTORY_2015.config, ECONOMICS_2015.config,
    SCIENCE_2015.config, BIOLOGY_2015.config, PHYSICAL_SCIENCE_2015.config,
    MATHEMATICS_2015.config, COORDINATE_ALGEBRA_2015.config, 
    ANALYTIC_GEOMETRY_2015.config)

### updateSGP

Georgia_SGP <- updateSGP(
        what_sgp_object=Georgia_SGP,
        with_sgp_data_LONG=Georgia_Data_LONG_2015,
        sgp.config = GA_2015.config,
        steps=c("prepareSGP", "analyzeSGP", "combineSGP", "outputSGP"),
        sgp.percentiles = TRUE,
        sgp.projections = TRUE,
        sgp.projections.lagged = TRUE,
        sgp.percentiles.baseline=FALSE,
        sgp.projections.baseline = FALSE,
        sgp.projections.lagged.baseline = FALSE,
        sgp.percentiles.equated = TRUE,
        simulate.sgps = TRUE,
        calculate.simex = TRUE,
        goodness.of.fit.print=TRUE,
        save.intermediate.results=FALSE,
        outputSGP.output.type=c("LONG_Data", "LONG_FINAL_YEAR_Data"),
        # parallel.config = list( # Ubuntu/Linux - Adam VI
        #    BACKEND="PARALLEL", WORKERS=list(TAUS=22, SIMEX=20))) 
        parallel.config = list(
           BACKEND="FOREACH", 
           TYPE="doParallel", 
           WORKERS=list(TAUS=11, SIMEX=11))) # WINDOWS - Qi Qin

4. Merge 2015 results into the `@Data` slot, and output data

Once all analyses were completed the results were merged into the master longitudinal data set (combineSGP step). A pipe delimited version of the complete long data is output (outputSGP step). This file requires additional formatting to add fields needed for rendering data in the visualization tool such as students' entire prior score and course progression history. The formatting process is described in detail below.

Summarize and visualize data

The data is then summarized using the summarizeSGP function, which produces many tables of descriptive statistics that are disaggregated at the state, district, school and instructor (if/when available) levels. These basic summary tables are also further disaggregated by the demographic groups available in the data set and listed in Georgia's Variable Name Lookup table. Finally, visualizations (such as bubble charts) are produced from the data and summary tables using the visualizeSGP function.

###  Summarize Results

# Fill in ACHIEVEMENT_LEVEL_PRIOR for ELA First
Georgia_SGP@Data[which(CONTENT_AREA=="ELA" & YEAR=="2015" & VALID_CASE=="VALID_CASE"), 
  ACHIEVEMENT_LEVEL_PRIOR :=
    ordered(findInterval(as.numeric(SCALE_SCORE_PRIOR), levels = c(800, 850)), 
    labels=c("Does Not Meet Expectations", "Meets Expectations", 
             "Exceeds Expectations"))]

### Create Data_Supplementary $ INSTRUCTOR_NUMBER slot in the new SGP object
load("Data/Base_Files/GA_INSTRUCTOR_2015.Rdata")
Georgia_SGP@Data_Supplementary[["INSTRUCTOR_NUMBER"]] <- GA_INSTRUCTOR_2015

###  Calculate summary tables
Georgia_SGP <- summarizeSGP(
    sgp_object = Georgia_SGP,
    parallel.config=list(
      BACKEND="FOREACH", TYPE="doParallel", SNOW_TEST=TRUE, WORKERS=list(SUMMARY=5))
)

###  Visualize Results

##   Need to add School and District names for bubble plots
Georgia_SGP@Data$SCHOOL_NAME <- as.character(NA); gc()
Georgia_SGP@Data$DISTRICT_NAME <- as.character(NA); gc()

visualizeSGP(
    sgp_object=Georgia_SGP,
    plot.types = "bubblePlot",
    bPlot.years= "2015",
    bPlot.content_areas=c("ELA", "SOCIAL_STUDIES", "SCIENCE", "MATHEMATICS"),
    bPlot.anonymize=TRUE,
    gaPlot.content_areas = c("SOCIAL_STUDIES", "SCIENCE", "MATHEMATICS"),
    parallel.config=list(
      BACKEND="FOREACH", TYPE="doParallel", WORKERS=list(GA_PLOTS=12)))

Merge Supplementary Instructor Data

Note that the INSTRUCTOR_NUMBER data set referenced in the above code was not available during the initial data cleaning and preparation in 2015, and therefore was not included in the prepareSGP (or updateSGP step). This was added to the Georgia_SGP object before summarizeSGP was run in order to produce teacher level mean/median SGP summary statistic tables. The data cleaning of this data set was addressed in the 2015 Data Preparation page.

5. Format the exported data

A pipe delimited version of the complete long data is exported in the outputSGP step of the updateSGP function call above. A formatted version of these results is produced externally, which contains fields needed for rendering data in the state's visualization tool, such as students' entire prior score and course progression history. The code used in 2015 is presented here:

##################################################################################
###                                                                            ###
###     Script to produce formatted output for Georgia from 2015 long data     ###
###                                                                            ###
##################################################################################

### Load packages

require(SGP)
require(data.table)

### Load data

load("Data/Georgia_SGP.Rdata")
load("Data/Georgia_SGP_LONG_Data_2015.Rdata")

### Variables to output
variables.to.output <- c("VALID_CASE", "GTID", "SCHOOL_YEAR", "SUBJECT_CODE", 
  "YEAR_WITHIN", "GRADE", "GRADE_REPORTED", "SCALE_SCORE", 
  "SCALE_SCORE_PRIOR_STANDARDIZED","ADMINISTRATION_PERIOD","FIRST_OBSERVATION", 
  "LAST_OBSERVATION", "PERFORMANCE_LEVEL", "SR_SYSTEM_ID", "SCHOOL_NUMBER", 
  "ADMIN_INVALIDATION","ADMIN_TYPE","MATCH_STATUS","RACE_CODE","GENDER_CODE",
  "ED", "SWD", "LEP", "GIFT", "BIRTH_DATE", "LAST_NAME", "FIRST_NAME", 
  "MIDDLE_NAME", "SGP_NORM_GROUP", "SGP", "SGP_SIMEX", "SGP_LEVEL", 
  "SGP_STANDARD_ERROR","SGP_NORM_GROUP_SCALE_SCORES","SCHOOL_YEAR_PRIOR_1", 
  "SUBJECT_CODE_PRIOR_1","SCALE_SCORE_PRIOR_1","PERFORMANCE_LEVEL_PRIOR_1", 
  "GRADE_PRIOR_1","ADMINISTRATION_PERIOD_PRIOR_1","ASSESSMENT_TYPE_PRIOR_1",
  "SCHOOL_YEAR_PRIOR_2", "SUBJECT_CODE_PRIOR_2", "SCALE_SCORE_PRIOR_2", 
  "PERFORMANCE_LEVEL_PRIOR_2","GRADE_PRIOR_2","ADMINISTRATION_PERIOD_PRIOR_2", 
  "ASSESSMENT_TYPE_PRIOR_2", "SCHOOL_YEAR_PRIOR_3", "SUBJECT_CODE_PRIOR_3", 
  "SCALE_SCORE_PRIOR_3", "PERFORMANCE_LEVEL_PRIOR_3", "GRADE_PRIOR_3", 
  "ADMINISTRATION_PERIOD_PRIOR_3", "ASSESSMENT_TYPE_PRIOR_3", 
  "SCHOOL_YEAR_PRIOR_4", "SUBJECT_CODE_PRIOR_4", "SCALE_SCORE_PRIOR_4", 
  "PERFORMANCE_LEVEL_PRIOR_4", "GRADE_PRIOR_4", 
  "ADMINISTRATION_PERIOD_PRIOR_4", "ASSESSMENT_TYPE_PRIOR_4")

### Subset out relevant variables

tmp.long.data <- subset(Georgia_SGP_LONG_Data_2015, 
    select=intersect(variables.to.output, names(Georgia_SGP_LONG_Data_2015)))

### Split SGP_NORM_GROUP
my.tmp.split <- strsplit(as.character(tmp.long.data$SGP_NORM_GROUP), "; ")


### YEAR Prior
tmp.long.data$SCHOOL_YEAR_PRIOR_1 <- 
    sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[2]), "/"), '[', 1)
tmp.long.data$SCHOOL_YEAR_PRIOR_2 <- 
    sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[3]), "/"), '[', 1)
tmp.long.data$SCHOOL_YEAR_PRIOR_3 <- 
    sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[4]), "/"), '[', 1)
tmp.long.data$SCHOOL_YEAR_PRIOR_4 <- 
    sapply(strsplit(sapply(my.tmp.split, function(x) rev(x)[5]), "/"), '[', 1)

### SUBJECT_CODE Prior
tmp.long.data$SUBJECT_CODE_PRIOR_1 <- 
  sapply(sapply(strsplit(sapply(strsplit(sapply(my.tmp.split, 
  function(x) rev(x)[2]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")
tmp.long.data$SUBJECT_CODE_PRIOR_2 <- 
  sapply(sapply(strsplit(sapply(strsplit(sapply(my.tmp.split, 
  function(x) rev(x)[3]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")
tmp.long.data$SUBJECT_CODE_PRIOR_3 <- 
  sapply(sapply(strsplit(sapply(strsplit(sapply(my.tmp.split, 
  function(x) rev(x)[4]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")
tmp.long.data$SUBJECT_CODE_PRIOR_4 <- 
  sapply(sapply(strsplit(sapply(strsplit(sapply(my.tmp.split, 
  function(x) rev(x)[5]), "/"), '[', 2), "_"), head, -1), paste, collapse="_")

### GRADE Prior
tmp.long.data$GRADE_PRIOR_1 <- 
  sapply(strsplit(sapply(strsplit(sapply(my.tmp.split, 
    function(x) rev(x)[2]), "/"), '[', 2), "_"), tail, 1)
tmp.long.data$GRADE_PRIOR_2 <- 
  sapply(strsplit(sapply(strsplit(sapply(my.tmp.split, 
    function(x) rev(x)[3]), "/"), '[', 2), "_"), tail, 1)
tmp.long.data$GRADE_PRIOR_3 <- 
  sapply(strsplit(sapply(strsplit(sapply(my.tmp.split, 
    function(x) rev(x)[4]), "/"), '[', 2), "_"), tail, 1)
tmp.long.data$GRADE_PRIOR_4 <- 
  sapply(strsplit(sapply(strsplit(sapply(my.tmp.split,
    function(x) rev(x)[5]), "/"), '[', 2), "_"), tail, 1)

### SCALE_SCORE Prior
my.tmp.split.scale_score <- 
    strsplit(tmp.long.data$SGP_NORM_GROUP_SCALE_SCORES, "; ")

tmp.long.data$SCALE_SCORE_PRIOR_1 <- 
  sapply(my.tmp.split.scale_score, function(x) rev(x)[2])
tmp.long.data$SCALE_SCORE_PRIOR_2 <- 
  sapply(my.tmp.split.scale_score, function(x) rev(x)[3])
tmp.long.data$SCALE_SCORE_PRIOR_3 <- 
  sapply(my.tmp.split.scale_score, function(x) rev(x)[4])
tmp.long.data$SCALE_SCORE_PRIOR_4 <- 
  sapply(my.tmp.split.scale_score, function(x) rev(x)[5])


###  PERFORMANCE_LEVEL Prior

## Set name of GRADE to GRADE_CURRENT for all 4 prior var creations
setnames(tmp.long.data, "GRADE", "GRADE_CURRENT")

## Create the 1st Prior PERFORMANCE_LEVEL info for Milestones & CRCT
##  2nd - 4th Priors will all be CRCT EOGT only in 2015

setnames(tmp.long.data, 
  c("SCHOOL_YEAR_PRIOR_1", "SUBJECT_CODE_PRIOR_1", "GRADE_PRIOR_1"),
  c("YEAR", "CONTENT_AREA", "GRADE"))

tmp.long.data <- SGP:::getAchievementLevel(tmp.long.data, state="GA", 
  achievement.level.name="PERFORMANCE_LEVEL_PRIOR_1", 
  scale.score.name="SCALE_SCORE_PRIOR_1")

setnames(tmp.long.data, 
  c("YEAR", "CONTENT_AREA", "GRADE"), 
  c("SCHOOL_YEAR_PRIOR_1", "SUBJECT_CODE_PRIOR_1", "GRADE_PRIOR_1"))


## Create the 2nd Prior PERFORMANCE_LEVEL info -- CRCT EOGT only in 2015
setnames(tmp.long.data, 
  c("SCHOOL_YEAR_PRIOR_2", "SUBJECT_CODE_PRIOR_2", "GRADE_PRIOR_2"), 
  c("YEAR", "CONTENT_AREA", "GRADE"))

tmp.long.data <- SGP:::getAchievementLevel(tmp.long.data, state="GA", 
  achievement.level.name="PERFORMANCE_LEVEL_PRIOR_2", 
  scale.score.name="SCALE_SCORE_PRIOR_2")

setnames(tmp.long.data, 
  c("YEAR", "CONTENT_AREA", "GRADE"), 
  c("SCHOOL_YEAR_PRIOR_2", "SUBJECT_CODE_PRIOR_2", "GRADE_PRIOR_2"))

## Create the 3rd Prior PERFORMANCE_LEVEL info -- CRCT EOGT only in 2015
setnames(tmp.long.data, 
  c("SCHOOL_YEAR_PRIOR_3", "SUBJECT_CODE_PRIOR_3", "GRADE_PRIOR_3"), 
  c("YEAR", "CONTENT_AREA", "GRADE"))
  
tmp.long.data <- SGP:::getAchievementLevel(tmp.long.data, state="GA", 
  achievement.level.name="PERFORMANCE_LEVEL_PRIOR_3", 
  scale.score.name="SCALE_SCORE_PRIOR_3")
  
setnames(tmp.long.data, 
  c("YEAR", "CONTENT_AREA", "GRADE"), 
  c("SCHOOL_YEAR_PRIOR_3", "SUBJECT_CODE_PRIOR_3", "GRADE_PRIOR_3"))

## Create the 4th Prior PERFORMANCE_LEVEL info -- CRCT EOGT only in 2015
setnames(tmp.long.data, 
  c("SCHOOL_YEAR_PRIOR_4", "SUBJECT_CODE_PRIOR_4", "GRADE_PRIOR_4"), 
  c("YEAR", "CONTENT_AREA", "GRADE"))
tmp.long.data <- SGP:::getAchievementLevel(tmp.long.data, state="GA", 
  achievement.level.name="PERFORMANCE_LEVEL_PRIOR_4", 
  scale.score.name="SCALE_SCORE_PRIOR_4")
  
setnames(tmp.long.data, 
  c("YEAR", "CONTENT_AREA", "GRADE"), 
  c("SCHOOL_YEAR_PRIOR_4", "SUBJECT_CODE_PRIOR_4", "GRADE_PRIOR_4"))

## Set name of GRADE_CURRENT back to GRADE
setnames(tmp.long.data, "GRADE_CURRENT", "GRADE")

###
### ADMINISTRATION_PERIOD_PRIOR_* Prior

tmp.admin.period <- Georgia_SGP@Data[, 
  c(key(Georgia_SGP@Data)[1:4], 
  "GRADE", "ADMINISTRATION_PERIOD", "SCALE_SCORE"), with=FALSE][
    VALID_CASE=="VALID_CASE" & GRADE=="EOCT"]

setnames(tmp.admin.period, 
  c("CONTENT_AREA", "YEAR", "GRADE", "ID", 
    "ADMINISTRATION_PERIOD", "SCALE_SCORE"), 
  c("SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1", "GTID", 
    "ADMINISTRATION_PERIOD_PRIOR_1", "SCALE_SCORE_PRIOR_1"))

tmp.admin.period[, SCALE_SCORE_PRIOR_1 := as.character(SCALE_SCORE_PRIOR_1)]

##  Remove the "LAST_OBSERVATION" for within year repeaters that had the 
##  exact same score in both Admin Periods
setkeyv(tmp.admin.period, 
  c("GTID", "SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1", 
    "SCALE_SCORE_PRIOR_1", "ADMINISTRATION_PERIOD_PRIOR_1", "VALID_CASE"))
setkeyv(tmp.admin.period, 
  c("GTID", "SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1", 
    "SCALE_SCORE_PRIOR_1", "VALID_CASE"))
tmp.admin.period <- tmp.admin.period[which(!duplicated(tmp.admin.period))]

setkeyv(tmp.long.data, 
  c("GTID", "SUBJECT_CODE_PRIOR_1", "SCHOOL_YEAR_PRIOR_1", "GRADE_PRIOR_1", 
    "SCALE_SCORE_PRIOR_1", "VALID_CASE"))

tmp.long.data <- merge(tmp.long.data, tmp.admin.period, all.x=TRUE)

tmp.long.data[which(!is.na(GRADE_PRIOR_1) & is.na(ADMINISTRATION_PERIOD_PRIOR_1)),
    ADMINISTRATION_PERIOD_PRIOR_1 := "2: SPRING"]

tmp.long.data[, 
    paste("ADMINISTRATION_PERIOD_PRIOR", 2:4, sep="_") := as.character(NA)]

tmp.long.data[which(!is.na(GRADE_PRIOR_2)), 
    ADMINISTRATION_PERIOD_PRIOR_2 := "2: SPRING"]
tmp.long.data[which(!is.na(GRADE_PRIOR_3)), 
    ADMINISTRATION_PERIOD_PRIOR_3 := "2: SPRING"]
tmp.long.data[which(!is.na(GRADE_PRIOR_4)), 
    ADMINISTRATION_PERIOD_PRIOR_4 := "2: SPRING"]


###  ASSESSMENT_TYPE_PRIOR_*
tmp.long.data[,paste("ASSESSMENT_TYPE_PRIOR", 1:4, sep="_") := as.character(NA)]

tmp.long.data[which(!is.na(SCALE_SCORE_PRIOR_1)), ASSESSMENT_TYPE_PRIOR_1 := "CRCT"]
tmp.long.data[which(GRADE_PRIOR_1 == "EOCT"), ASSESSMENT_TYPE_PRIOR_1 := "EOCT"]
tmp.long.data[which(SCHOOL_YEAR_PRIOR_1 == "2015"),ASSESSMENT_TYPE_PRIOR_1 := "EOC"]

##   CRCT EOGT only in 2015 for priors 2 - 4
tmp.long.data[which(!is.na(SCALE_SCORE_PRIOR_2)), ASSESSMENT_TYPE_PRIOR_2 := "CRCT"]
tmp.long.data[which(!is.na(SCALE_SCORE_PRIOR_3)), ASSESSMENT_TYPE_PRIOR_3 := "CRCT"]
tmp.long.data[which(!is.na(SCALE_SCORE_PRIOR_4)), ASSESSMENT_TYPE_PRIOR_4 := "CRCT"]


###  Final arrangement of variables
Georgia_SGP_Data_LONG_2015_FORMATTED <- 
    tmp.long.data[, variables.to.output, with=FALSE]
setkeyv(Georgia_SGP_Data_LONG_2015_FORMATTED, 
    c("VALID_CASE", "SUBJECT_CODE", "SCHOOL_YEAR", "GTID", "YEAR_WITHIN"))


### Save results

save(Georgia_SGP_Data_LONG_2015_FORMATTED, 
    file = "Data/Georgia_SGP_Data_LONG_2015_FORMATTED.Rdata")
write.table(Georgia_SGP_Data_LONG_2015_FORMATTED, 
    file = "Data/Georgia_SGP_Data_LONG_2015_FORMATTED.txt", 
    sep = "|", row.names=FALSE, na = "", quote = FALSE)
zip(zipfile = "Data/Georgia_SGP_Data_LONG_2015_FORMATTED.txt.zip", 
    files = "Data/Georgia_SGP_Data_LONG_2015_FORMATTED.txt")
unlink("Data/Georgia_SGP_Data_LONG_2015_FORMATTED.txt")

Additional Steps for Future Analyses

We conclude the 2015 analysis section with a list of tasks and steps to keep in mind for future analyses:

Update SGPstateData entry annually. Contact the package maintainers (Damian and Adam) directly or file an issue. Entry will include updates for the following data:
- Knots, boundary knots, and LOSS/HOSS Values for Milestones Assessments in 2016 (replace temporary ones from 2015)
- Proficiency level cutscores
Continue including student level CSEMs in the longitudinal data.
Baseline matrices. Number of prior scores required will have to be established first (2017 at the earliest).
- Unadjusted (non-SIMEX) versions created first
- SIMEX adjusted versions (require unadjusted matrices)
Misc configuration and analysis code adjustments based on changes in course progressions and the incorporation of the new assessment program.

Return to Georgia Wiki Home

SGP - Student Growth Percentiles SGP Blog | SGP GitHub Repo | SGP on CRAN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Analysis 2015

2015 SGP Analyses

1. Update the Georgia metadata in `SGPstateData`.

Calculation and addition of knots and boundaries

Proficiency level cutscores

Conditional standard errors of measurement (CSEMs)

Set the `sgp.cohort.size` SGP Configuration

Norm group preferences

2. Create annual SGP configurations.

Create the norm group preferences lookup table

3. Conduct SGP analyses

4. Merge 2015 results into the `@Data` slot, and output data

Summarize and visualize data

Merge Supplementary Instructor Data

5. Format the exported data

Additional Steps for Future Analyses

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Data Analysis 2015

2015 SGP Analyses

1. Update the Georgia metadata in SGPstateData.

Calculation and addition of knots and boundaries

Proficiency level cutscores

Conditional standard errors of measurement (CSEMs)

Set the sgp.cohort.size SGP Configuration

Norm group preferences

2. Create annual SGP configurations.

Create the norm group preferences lookup table

3. Conduct SGP analyses

4. Merge 2015 results into the @Data slot, and output data

Summarize and visualize data

Merge Supplementary Instructor Data

5. Format the exported data

Additional Steps for Future Analyses

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

1. Update the Georgia metadata in `SGPstateData`.

Set the `sgp.cohort.size` SGP Configuration

4. Merge 2015 results into the `@Data` slot, and output data