working memo.
Scraping scripts for Go expert players' game records (SGF file format).
Websites:
Working container:
docker build -t go.data .
docker run -it go.data bashScripts:
## gokihu
python scripts/gokifu.py ./expert_play/gokifu/
## kihuu
python scripts/kihuu.py ./expert_play/kihuu/
## kihuu (prolist)
python scripts/kihuu_prolist.py ./expert_play/kihuu_prolist/
## kifudepot
python scripts/kifudepot.py ./expert_play/kifudepot/
## u-go.net
# downloading .tar.gz
python scripts/kgs4d.py ./expert_play/kgs4d/
# Extract sgfs each .tar.gz
find ./expert_play/kgs4d/ -type f -name "*.tar.gz" -exec sh -c 'tar -xzf "$1" -C "$(dirname "$1")"' _ {} \;
De-duplication:
find ./expert_play/ -name "*.sgf" | python ./scripts/dedupe_sgfs.py /path/to/output/dir/Finally, got about 15,000 expert plays (as of 2024.12).