A access log parser to create basic statistics from huge log files.
4 Cores, 16 GB RAM:
## Single Thread
Took 4,810.503 seconds (80.167 minutes)
Processed 14 files
Processed 445,052,807 lines
Processed 92,516.897 lines per second
## 3 Threads
Took 2,439.198 seconds (40.653 minutes)
Processed 32 files
Processed 445,289,011 lines
Processed 182,555.498 lines per second
This tool is developed with .NET Core.
To run it, the .NET Core Runtime Environment is required:
dotnet run -- --help
Or it must build in self-contained mode:
mkdir dist
dotnet publish -c Release -r linux-x64 --self-contained -o dist/
cd dist
./logsplit --help
Early alpha, this tool is just for nerds at the moment.
mkdir /home/christian/weblogs
logsplit init -d /home/christian/weblogs -f examplewebsite
cd /home/christian/weblogs
(See logsplit init --help
for more info)
Now you can put all log files from the website
examplewebsite
into /home/christian/weblogs/input/examplewebsite
.
Inside of the folder is also a loginfo.json
, which contains the configuration
for parsing the access logs. This should fit for NGINX logs when the filename
format is something like example-access.log.1.gz
.
Just edit the JSON file if something is not working.
The import splits the logfiles into one access log per host, per hostgroup, per month.
cd /home/christian/weblogs
logsplit import
When something goes wrong while importing, just delete all
*.new
files in /home/christian/weblogs/repository
and try again.
(See logsplit import --help
for more info)
The analyze process parses the log files and generates a summary JSON file which can be used to generate the actual statistics.
After this process, the raw accesslogs are not needed anymore.
cd /home/christian/weblogs
logsplit analyze
(See logsplit analyze --help
for more info)
This module is in a very early state.
Maybe C# Skills are required to get the infos that you desire.
logsplit statistic -p "-access_log_examplewebsite-"
The -p
parameter is a regular expression which must match a .gz.json
file in the repository folder.