Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Running Awk in Parallel to process 256M records.

Notifications You must be signed in to change notification settings

pebblefoot31/SMC18

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Running Awk in Parallel to process 256M Records

I wrote a blog post about this work. It was discussed some at Hacker News.

This repo contains the artefacts about the Smoky Mountains Data Challenge 2018 that I solved (and won first prize). In the following, I describe the approach, method and some interesting tidbits.

A pdf report may be found in the /report folder.

SMC Data Challenge 4 Scientific Publications Mining

. To run the awk code:

awk -f prob2.awk stop_words.txt data_dir/*.txt

. To compile the Swift code:

stc runprob2.swift #will generate tic file

. To run Swift code:

turbine -n 340 runprob2.tic

About

Running Awk in Parallel to process 256M records.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Awk 47.4%
  • TeX 24.1%
  • Swift 9.8%
  • HTML 7.4%
  • Shell 4.8%
  • Python 3.6%
  • Other 2.9%