Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A log mining tool for Stackdriver logs, using "stopfiles" and "filterfiles" to store occurrences of previously seen and report on new log entries

Notifications You must be signed in to change notification settings

tlichtenberg/stackdriver_logmine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

LogMine

A Stackdriver Log mining tool

Jan 22, 2018

Tom Lichtenberg

  • OVERVIEW

The Theory of Artificial Ignorance

The process of establishing invariants and subsequently detecting new events is not new, but has been around for decades, pioneered by Unix sysadmins and known as Artificial Ignorance.

In essence, they used grep to detect patterns in loglines, then stored those already seen in a "stoplist", then later used that stoplist in a subsequent call such as 'grep -v -f stoplist' to ignore those previously seen patterns or loglines and report only on new ones.

This is done progressively. Each time you review a batch of log files, call out any previously unseen entries and add them to the stoplist. The stoplist builds up over time while producing a new "newlist" each time. Inspect or report on the newlist and save the stoplist each iteration.

Along the way some log entries may vary in details but not critically. The logmine tool allows these entries to be grouped together through a "filterfile". For example, a call to a method FindEvents() may return a different user name or number of messages found, but we are mainly interested in the number of FindEvents calls. The filter file can grow over time, like the stoplist. It is not updated automatically like the stoplist, but must be modified manually.

  • USAGE

Use a virtual environment and pip to install the required modules:

virtualenv env source env/bin/activate pip install -r requirements.txt

sample run args:

python -m logmine.LogMine --query --parse --process --log_filter 'resource.type=dataflow_step resource.labels.job_id=DATAFLOW_JOB_NAME severity>=WARNING' --output_dir ./logmine --source my-dataflow-job --limit 100 --range_in_minutes 10 --output_file my-dataflow-job.log --project my-gcp-project

  • IDENTIFY PROJECT SOURCES

You'll want to distinguish between project sources so that you can have stoplists per source. Log entries vary considerably between projects

e.g. source = a GKE container or a Dataflow job would produce output files such as stoplist_my_container stoplist_my_dataflow_job

and newlist_* of same

  • FILTER COMMON LOG ENTRIES

A filter file is a text file containing a list of strings from commonly found log entries, used to condense them into a common set where individual differences won't matter. for example, from a container log file there could be many log entries like this:

'Completed call to FindEvents(elapsedMs=98 receivedMessageCount=1 sentMessageCount=1 statusCode=OK)'

where the message count and elapsed time fields will differ, but we don't care about that. We want to count them all as if they were exactly the same, so the filter file would include a line like this:

Completed call to FindEvents

and any log line finding that substring will be treated as the same entry

filterfiles must be named in the pattern:

'{}/filterfile_{}.txt'.format(self.args.output_dir, self.args.source)

for example:

/tmp/filterfile_my_container.txt

  • DATAFLOW JOBS - A SPECIAL CASE

The Stackdriver client's log filter requires the job_id but we only know the job_name. The same job name acquires a new job_id on every launch and Stackdriver logs don't map these intrinsically based on the name. We can use the Dataflow API client to find the job_id through the job name.

--log_filter 'resource.type=dataflow_step resource.labels.job_id=DATAFLOW_JOB_NAME severity>=WARNING'

when the tool sees the string "DATAFLOW_JOB_NAME" in the --log_filter argument, it will look it up and replace that string with the job_id, if it can find it. The actual dataflow job name should be passed as the --source parameter

--source my_dataflow_job

--log_filter 'resource.type=dataflow_step resource.labels.job_id=DATAFLOW_JOB_NAME severity>=WARNING'

About

A log mining tool for Stackdriver logs, using "stopfiles" and "filterfiles" to store occurrences of previously seen and report on new log entries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published