Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Clone of OpenSAFELY project: Impact of vaccination on the association of COVID-19 with arterial and venous thrombotic diseases: an OpenSAFELY cohort study using linked electronic health records

License

Notifications You must be signed in to change notification settings

opensafely/post-covid-cvd-methods

Repository files navigation

  • Detailed protocols are in the protocol folder.

  • If you are interested in how we defined our code lists, look in the codelists folder.

  • Analyses scripts are in the analysis directory:

    • Dataset definition scripts are in the dataset_definition directory:

      • If you are interested in how we defined our variables, we use the variable script variable_helper_fuctions to define functions that generate variables. We then apply these functions in variables_cohorts to create a dictionary of variables for cohort definitions, and in variables_dates to create a dictionary of variables for calculating study start dates and end dates.
      • If you are interested in how we defined study dates (e.g., index and end dates), these vary by cohort and are described in the protocol. We use the script dataset_definition_dates to generate a dataset with all required dates for each cohort. This script imported all variables generated from variables_dates.
      • If you are interested in how we defined our cohorts, we use the dataset definition script dataset_definition_cohorts to define a function that generates cohorts. This script imports all variables generated from variables_cohorts using the patient's index date, the cohort start date and the cohort end date. This approach is used to generate three cohorts: pre-vaccination, vaccinated, and unvaccinated—found in dataset_definition_prevax, dataset_definition_vax, and dataset_definition_unvax, respectively. For each cohort, the extracted data is initially processed in the preprocess data script preprocess data script, which generates a flag variable for pre-existing respiratory conditions and restricts the data to relevant variables.
    • Dataset cleaning scripts are in the dataset_clean directory:

      • This directory also contains all the R scripts that process, describe, and analyse the extracted data.
      • dataset_clean is the core script which executes all the other scripts in this folder
      • fn-preprocess is the function carrying out initial preprocessing, formatting columns correctly
      • fn-modify_dummy is called from within fn-preprocess.R, and alters the proportions of dummy variables to better suit analyses
      • fn-inex is the inclusion/exclusion function
      • fn-qa is the quality assurance function
      • fn-ref is the function that sets the reference levels for factors
    • Modelling scripts are in the model directory:

      • make_model_input.R works with the output of dataset_clean to prepare suitable data subsets for Cox analysis. Combines each outcome and subgroup in one formatted .rds file.
      • fn-prepare_model_input.R is a companion function to make_model_input.R which handles the interaction with active_analyses.rds.
      • cox-ipw is a reusable action which uses the output of make_model_input.R to fit a Cox model to the data.
      • make_model_output.R combines all the Cox results in one formatted .csv file.
    • The script for generating a random 10% sample of the study population is in the generate_subsample directory:

      • generate_subsample.R generates the subsample itself. The subsample is randomly sampled, but for reproducibility-sake, the seed is set in the program.
    • The script for conducting variable selection using a LASSO (Least absolute shrinkage and selection) model is in the lasso_var_selection directory:

      • lasso_var_selection.R fits a cox-regression model (family = "cox") using the subsample data (10% subsample as generated by generate_subsample.R) and applying a LASSO penalty function (alpha = 1). The regularisation parameter lambda is tuned using cross-validation (cv.glmnet) to minimise cvm (mean cross-validated error). The result is a subset of selected variables whose corresponding coefficient does not shrink to zero. For further information please see the documentation for the glmnet and cv.glmnet functions:
    • The script for conducting variable selection using a LASSO X (Least absolute shrinkage and selection for exposure) model which takes the exposure (COVID-19) as the response variable is in the lasso_X_var_selection directory:

      • lasso_X_var_selection.R fits a logistic regression (family = "binomial") using binary exposure (X) to COVID-19 as the response variable and excluding the oucomes (Y, acute MI and subarachnoid haemorrhage / haemorrhage stroke) from the dataset. The model is fit using the subsample data (10% subsample as generated by generate_subsample.R). LASSO penalty is applied (alpha = 1). The regularisation parameter lambda is tuned using cross-validation (cv.glmnet) to minimise cvm (mean cross-validated error). The result is a subset of selected variables whose corresponding coefficient does not shrink to zero. For further information please see the documentation for the glmnet and cv.glmnet functions:
    • The script for conducting variable selection using a Union LASSO (Least absolute shrinkage and selection) model is in the lasso_union_var_selection directory:

    • The script which implements the Hartwig et al., 2024 empirical unconfoundedness test is in the unconfoundedness_test directory:

      • unconfoundedness_test.R performs the empirical unconfoundedness test in the following manner:
        • A cox-regression model taking the oucomes (Y) as the response is fit in the same manner as in lasso_var_selection.R.
        • A logistic regression model taking the exposure (X) as the response is fit in the same manner as in lasso_X_var_selection.R.
        • These two regression models are used to evaluate associations of each confounder (Z) with the exposure (X) and outcome (Y) in the following manner:
          • For every candidate confounder Z, condition (i) is checked (is Z associated with (i.e., not independent of) X given all other covariates?)
          • For every candidate confounder Z, condition (ii) is checked (are Z and Y are conditionally independent given X and all other covariates?)
          • If any covariate Z satisfies both (i) and (ii), then the covariate set is sufficient for confounding adjustment. If not, then the test is inconclusive.
        • Test conditions, coefifcient values, p-values and standard errors are saved for each confounder Z.
  • The active_analyses contains a list of active analyses.

  • The project.yaml defines run-order and dependencies for all the analysis scripts. This file should not be edited directly. To make changes to the yaml, edit and run the create_project_actions.R script which generates all the actions.

  • Descriptive and Model outputs, including figures and tables are in the released_outputs directory.

Output

Outputs follow OpenSAFELY naming conventions related to suppression rules by adding the suffix "_midpoint6". The suffix "_midpoint6_derived" means that the value(s) are derived from the midpoint6 values. Detailed information regarding naming conventions can be found here.

About the OpenSAFELY framework

The OpenSAFELY framework is a Trusted Research Environment (TRE) for electronic health records research in the NHS, with a focus on public accountability and research quality.

Read more at OpenSAFELY.org.

Licences

As standard, research projects have a MIT license.

About

Clone of OpenSAFELY project: Impact of vaccination on the association of COVID-19 with arterial and venous thrombotic diseases: an OpenSAFELY cohort study using linked electronic health records

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 6