Found here: https://ceds.ed.gov/dataModelNDS.aspx
- Convert create script from
UTF-16 BEintoUTF-8 - Switch
\nGO$to;\n - Extra white space needs to be removed until the interpreter stopped throwing errors
- Scripts must be split into < 10,000 line blocks (and labelled parts 1-11) to allow for loading in DbSchema
- Remove semicolons and line-endings in the Element tables
- Confirm that the Populate scripts are in
Western (Windows 1252)or convert them. - Confirm that all tables have foreign keys attached and are fully loaded.
- Use the query below to locate which if any of tables are empty, and note this down so you don't look for them after conversion.
SELECT
T.NAME AS 'TABLE NAME',
P.[ROWS] AS 'NO OF ROWS'
FROM SYS.TABLES T
INNER JOIN SYS.PARTITIONS P ON T.OBJECT_ID = P.OBJECT_ID
WHERE T.NAME like ('Ref%')
AND P.[ROWS] = 0;
- Search and confirm that these tables are not in the load script. When I initially performed this, I had contacted the really frienly and helpful folks that are managing the CEDS schema to confirm that these tables are not available: Duane Brown in particular was extremely receptive and helpful, and many improvements were made in the 7th release.
- load and bind together all .sql script blocks - 1.TSQLv7camelCaseLoadScriptCollectBind
- Convert all camelCase and PascalCase into snake_case (the capitalization will disappear in the course of loading into the case insensitive postgres compatible environments)2.TSQLv7camel2snakeConvert.R
- Abbreviation collisions and other exceptions are handled in code, albeit as gracefully towards the end when dealing with the small or n = 1 idiosyncratic issues.
- Convert project to Redshift.
- Export data and files
- Slice up into <10K line chunks to facilitate loading in DbSchema or your developer of choice.
- Load into SQL designing tool (I used DbSchema 7.6.0)or another SQL flavor conversion device.
- Check that all tables are full: query is the same as above as we are still in TSQL.
- Convert project properties into AmazonRedshift
- Export Schema and Data
- Change names of schemas to match postgresql standards
- Clean up errors in the translation process 3.redshift_ceds_v7_snake_case_script_clean.R identified most easily with the help of your favorite text editor's highlighting features. (My preference is Sublime Text 3)
- possessives inside single apostrophe redshift/postgreSQL insert statement
- plural abbreviations
- proper nouns with apostrophes included
- Trim down into <10k files
- Load into Redshift environment
- Fix all errors on Upload
- resequence the order of table loads where necessary. i.e.: ref_role_status_type table needs to be filled in before the ref_role_status table which possesses a foreign key that references the former.
- use the interpreter to further identify the syntax errors introduced in the translation process.
- Check the completeness of the loads with this
SELECT schemaname, relname, n_tup_ins
FROM pg_stat_all_tables
WHERE schemaname = 'rs_ceds7_sc'
and relname like ('ref%')
ORDER BY n_tup_ins
- After a complete, correct loading of all the script blocks is achieved, combine the scripts into one single file (that is now in the proper sequence to load correctly.)
Please let me know if you have any questions or encounter any problems. Issues and especially Pull requests are always appreciated!
Happy loading!