Thanks to visit codestin.com
Credit goes to github.com

Skip to content

AndriiQwq/DETT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Objective

Match companies between two datasets based on company names and locations, and produce a merged output.

Tech Stack

  • Python
  • pandas, numpy, matplotlib

Requirements

  1. Match companies between Dataset 1 and Dataset 2 based on:
    • company name
    • location information
  2. Create a merged dataset that:
    • contains all unique companies from Dataset 1
    • includes corresponding company matches from Dataset 2 where they exist
    • contains column with list of locations for company from Dataset 1
    • contains column with list of locations for company from Dataset 2
    • contains column with overlapping locations between two companies
    • if no locations overlap – keep company name match, and leave overlapping locations column empty
  3. Calculate following metrics:
    • match rate: % of Dataset 1 companies that have a match in Dataset 2
    • unmatched records: % of companies with no match in either dataset
    • one-to-many matches: % of companies with multiple matched entries
    • other metrics you consider useful

Deliverables

  1. Merged dataset (CSV)
  2. Code scripts
  3. Documentation:
    • matching approach
    • data quality issues found
    • normalization / transformations applied
    • calculated metrics

Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors