Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View johnnyzhuzu's full-sized avatar

Block or report johnnyzhuzu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
johnnyzhuzu/README.md

H-1B Visa Petitions Exploratory Data Analysis

The H-1B is an employment-based, non-immigrant visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process. The application data is available for public access to perform in-depth longitudinal research and analysis. This data provides key insights into the prevailing wages for job titles being sponsored by US employers under H1-B visa category. In particular, I utilize the 2011-2016 H-1B petition disclosure data to analyze the employers with the most applications, data science related job positions and relationship between salaries offered and cost of living index.

Data Set Source

The Office of Foreign Labor Certification (OFLC) generates program data that is useful information about the immigration programs including the H1-B visa. The disclosure data updated annually is available at https://www.foreignlaborcert.doleta.gov/performancedata.cfm

  • Click on Disclosure Data tab
  • Go to Section LCA Programs (H-1B, H-1B1, E-3)
  • You will find data from 2008 onwards.

Requirements

  • R
  • R Studio
  • Packages: readxl, dplyr, hashmap, ggplot2, ggmap, ggrepel

Use install.packages("package_name") to install new packages in R.

Files

  • data_processing.Rmd: R notebook performing the key data transformations on the raw dataset.
  • data_analysis.Rmd: R notebook with code for plots and corresponding
  • helpers.R: helper functions used mainly for data analysis
  • spell_correcter.R: A suite of functions for performing spell correction in a given vector using the frequencies of occurrence of different elements in the vector.
  • coli/: Python Scrapy code directory for scraping cost of living plus rent index. The spider crawl file is the main file describing how the data should be scraped.

Shiny app

I extended this project to build a Shiny app based on the transformed data set.

Blogs

Please read my blogs for key data insights and more details:

Kaggle

I have released the transformed dataset on Kaggle for public use under CC BY-NC-SA 4.0 License.

Acknowledgements

License

Open sourced under the MIT License.

Popular repositories Loading

  1. johnnyzhuzu johnnyzhuzu Public

    johnnyzhuzu

    R

  2. amajor amajor Public

    framework to rapidly implement custom droppers for all three major operating systems

    TypeScript

  3. makese makese Public

    WhatsAsena project - Makes it easy and fun to use Whatsapp. Also first userbot for Whatsapp

    JavaScript

  4. xcstrings xcstrings Public

    An editor for a .xcstrings localization file introduced in Xcode 15

  5. Multimemo Multimemo Public

    📝 Abstractive Summarization of Reddit Posts with Multi-level Memory Networks. In NAACL-HLT, 2019 (oral).

    C++

  6. imgpro imgpro Public

    Forked from ISHowared/imgpro

    Shell