Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Mont9165/cregit

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cregit - Token-Level Git Blame

Cregit tokenizes source code files in a Git repository and creates a token-level blame view, showing which commit introduced each token.

See readme.org for the original documentation.


Fork Changes

This fork includes the following modifications for improved compatibility:

Dependency Updates

Component Original Updated
sqlite-jdbc 3.8.0-SNAPSHOT 3.8.11.2
sbt 0.13.7 0.13.18

Portability Fix

  • Changed shebang in tokenBySha.pl from #!/usr/bin/perl to #!/usr/bin/env perl for better cross-platform compatibility.

Prerequisites

Tool URL Notes
srcml https://www.srcml.org/ Must be in PATH
ctags https://github.com/universal-ctags Universal Ctags recommended
bfg https://github.com/Mont9165/bfg-repo-cleaner/tree/blobexec Use the blobexec branch
Perl - With modules: DBI, DBD::SQLite, Set::Scalar, HTML::FromText

macOS Installation

# Install dependencies with Homebrew
brew install srcml universal-ctags

# Install Perl modules
cpan DBI DBD::SQLite Set::Scalar HTML::FromText

Debian/Ubuntu Installation

apt-get install cmake libarchive-dev libxml++2.6-dev libxml2-dev \
  libcurl4-openssl-dev libxslt1-dev libboost-all-dev libantlr-dev \
  libssl-dev libxerces-c-dev exuberant-ctags libdbi-perl \
  libjgit-java libhtml-fromtext-perl libset-scalar-perl libdbd-sqlite3-perl

Quick Start

1. Clone and Build BFG

git clone https://github.com/Mont9165/bfg-repo-cleaner.git --branch blobexec
cd bfg-repo-cleaner
sbt "bfg/assembly"

2. Clone Cregit

git clone https://github.com/Mont9165/cregit.git
cd cregit

3. Build srcml2token

cd tokenize/srcMLtoken
make
cd ../..

4. Build Scala Components

# Build slickGitLog
cd slickGitLog && sbt one-jar && cd ..

# Build persons
cd persons && sbt one-jar && cd ..

# Build remapCommits
cd remapCommits && sbt one-jar && cd ..

Usage: Tokenizing a Repository

Step 1: Set Environment Variables

export CREGIT_HOME=/path/to/cregit
export BFG_JAR=/path/to/bfg-repo-cleaner/bfg/target/bfg-*.jar
export BFG_MEMO_DIR=/tmp/memo
export BFG_TOKENIZE_CMD="$CREGIT_HOME/tokenize/tokenizeSrcMl.pl \
  --srcml2token=$CREGIT_HOME/tokenize/srcMLtoken/srcml2token \
  --srcml=$(which srcml) \
  --ctags=$(which ctags)"

mkdir -p "$BFG_MEMO_DIR"

Step 2: Clone Your Target Repository

# Clone the repository you want to tokenize
git clone --mirror https://github.com/example/repo.git /tmp/repo.git

Step 3: Run BFG with Cregit

For Java files:

java -jar "$BFG_JAR" \
  --blob-exec '$CREGIT_HOME/tokenizeByBlobId/tokenBySha.pl=\.java$' \
  --no-blob-protection \
  /tmp/repo.git

For C/C++ files:

java -jar "$BFG_JAR" \
  --blob-exec '$CREGIT_HOME/tokenizeByBlobId/tokenBySha.pl=\.[ch]$' \
  --no-blob-protection \
  /tmp/repo.git

Note: Use single quotes around the --blob-exec argument to prevent shell expansion of $ in the regex pattern.


Complete Workflow Example

Here's a complete example for processing the Apache Camel repository:

#!/bin/bash
set -e

# Configuration
REPO_NAME="camel"
REPO_URL="https://github.com/apache/camel.git"
WORK_DIR="/tmp/cregit-work"
CREGIT_HOME=/path/to/cregit
BFG_JAR=/path/to/bfg-*.jar

# Setup
mkdir -p "$WORK_DIR"
export BFG_MEMO_DIR="$WORK_DIR/memo"
export BFG_TOKENIZE_CMD="$CREGIT_HOME/tokenize/tokenizeSrcMl.pl \
  --srcml2token=$CREGIT_HOME/tokenize/srcMLtoken/srcml2token \
  --srcml=$(which srcml) \
  --ctags=$(which ctags)"
mkdir -p "$BFG_MEMO_DIR"

# 1. Clone repository
git clone --mirror "$REPO_URL" "$WORK_DIR/$REPO_NAME.git"

# 2. Keep a copy of the original
cp -r "$WORK_DIR/$REPO_NAME.git" "$WORK_DIR/$REPO_NAME-original.git"

# 3. Tokenize with BFG + cregit
java -jar "$BFG_JAR" \
  --blob-exec '$CREGIT_HOME/tokenizeByBlobId/tokenBySha.pl=\.java$' \
  --no-blob-protection \
  "$WORK_DIR/$REPO_NAME.git"

# 4. Create history database for original repo
java -jar "$CREGIT_HOME/slickGitLog/target/scala-2.11/slickgitlog_2.11-*-one-jar.jar" \
  "$WORK_DIR/$REPO_NAME-original.db" \
  "$WORK_DIR/$REPO_NAME-original.git"

# 5. Create history database for tokenized repo
java -jar "$CREGIT_HOME/slickGitLog/target/scala-2.11/slickgitlog_2.11-*-one-jar.jar" \
  "$WORK_DIR/$REPO_NAME-cregit.db" \
  "$WORK_DIR/$REPO_NAME.git"

echo "Done! Tokenized repository is at: $WORK_DIR/$REPO_NAME.git"

Troubleshooting

Error: BFG_TOKENIZE_CMD environment variable not set

Make sure to export the environment variable before running:

export BFG_TOKENIZE_CMD="..."

Error: None.get when running BFG

Use the patched BFG from this fork: https://github.com/Mont9165/bfg-repo-cleaner/tree/blobexec

Error: srcml not found

Install srcml and ensure it's in your PATH:

# macOS
brew install srcml

# Verify installation
which srcml

Error: ctags not found

Install Universal Ctags (not Exuberant Ctags for best results):

# macOS
brew install universal-ctags

# Verify installation
which ctags

License

GPL-3.0+

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 35.7%
  • Perl 26.5%
  • Scala 14.1%
  • C++ 13.6%
  • Python 4.8%
  • Java 2.0%
  • Other 3.3%