Codestin Search App

LogMine

A Stackdriver Log mining tool

Jan 22, 2018

Tom Lichtenberg

OVERVIEW

The Theory of Artificial Ignorance

The process of establishing invariants and subsequently detecting new events is not new, but has been around for decades, pioneered by Unix sysadmins and known as Artificial Ignorance.

In essence, they used grep to detect patterns in loglines, then stored those already seen in a "stoplist", then later used that stoplist in a subsequent call such as 'grep -v -f stoplist' to ignore those previously seen patterns or loglines and report only on new ones.

This is done progressively. Each time you review a batch of log files, call out any previously unseen entries and add them to the stoplist. The stoplist builds up over time while producing a new "newlist" each time. Inspect or report on the newlist and save the stoplist each iteration.

Along the way some log entries may vary in details but not critically. The logmine tool allows these entries to be grouped together through a "filterfile". For example, a call to a method FindEvents() may return a different user name or number of messages found, but we are mainly interested in the number of FindEvents calls. The filter file can grow over time, like the stoplist. It is not updated automatically like the stoplist, but must be modified manually.

USAGE

Use a virtual environment and pip to install the required modules:

virtualenv env source env/bin/activate pip install -r requirements.txt

sample run args:

python -m logmine.LogMine --query --parse --process --log_filter 'resource.type=dataflow_step resource.labels.job_id=DATAFLOW_JOB_NAME severity>=WARNING' --output_dir ./logmine --source my-dataflow-job --limit 100 --range_in_minutes 10 --output_file my-dataflow-job.log --project my-gcp-project

IDENTIFY PROJECT SOURCES

You'll want to distinguish between project sources so that you can have stoplists per source. Log entries vary considerably between projects

e.g. source = a GKE container or a Dataflow job would produce output files such as stoplist_my_container stoplist_my_dataflow_job

and newlist_* of same

FILTER COMMON LOG ENTRIES

A filter file is a text file containing a list of strings from commonly found log entries, used to condense them into a common set where individual differences won't matter. for example, from a container log file there could be many log entries like this:

'Completed call to FindEvents(elapsedMs=98 receivedMessageCount=1 sentMessageCount=1 statusCode=OK)'

where the message count and elapsed time fields will differ, but we don't care about that. We want to count them all as if they were exactly the same, so the filter file would include a line like this:

Completed call to FindEvents

and any log line finding that substring will be treated as the same entry

filterfiles must be named in the pattern:

'{}/filterfile_{}.txt'.format(self.args.output_dir, self.args.source)

for example:

/tmp/filterfile_my_container.txt

DATAFLOW JOBS - A SPECIAL CASE

The Stackdriver client's log filter requires the job_id but we only know the job_name. The same job name acquires a new job_id on every launch and Stackdriver logs don't map these intrinsically based on the name. We can use the Dataflow API client to find the job_id through the job name.

--log_filter 'resource.type=dataflow_step resource.labels.job_id=DATAFLOW_JOB_NAME severity>=WARNING'

when the tool sees the string "DATAFLOW_JOB_NAME" in the --log_filter argument, it will look it up and replace that string with the job_id, if it can find it. The actual dataflow job name should be passed as the --source parameter

--source my_dataflow_job

--log_filter 'resource.type=dataflow_step resource.labels.job_id=DATAFLOW_JOB_NAME severity>=WARNING'

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
logmine		logmine
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

tlichtenberg/stackdriver_logmine

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages