This Zeek package generates schemas for Zeek's logs. For every log your Zeek installation produces (such as conn.log or tls.log) the schema describes each log field including name, type, docstring, and more. The package supports popular schema formats and understands Zeek's log customization in detail. The schema export code is extensible, allowing you to produce your own schemas.
Install this package via zkg install logschema. The package has no dependencies and
currently works with Zeek 5.2 and newer.
To get a JSON Schema of each Zeek log in your installation, run:
$ zeek logschema/export/jsonschemaYour local directory now contains a JSON Schema file for each of Zeek's logs. For example, for your conn.log:
$ cat zeek-conn-log.schema.json | jq
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Schema for Zeek conn.log",
"description": "JSON Schema for Zeek conn.log",
"type": "object",
"properties": {
"ts": {
"description": "This is the time of the first packet.",
"type": "number",
...
},
...
}To instead get a schema in CSV format, run this:
$ zeek logschema/export/csvThis combines all schema information in one file:
$ cat zeek-logschema.csv
log,field,type,record_type,script,is_optional,default,docstring,package
analyzer,ts,time,Analyzer::Logging::Info,base/frameworks/analyzer/logging.zeek,false,-,"Timestamp of confirmation or violation.",-
analyzer,cause,string,Analyzer::Logging::Info,base/frameworks/analyzer/logging.zeek,false,-,"What caused this log entry to be produced. This can\ncurrently be ""violation"" or ""confirmation"".",-
analyzer,analyzer_kind,string,Analyzer::Logging::Info,base/frameworks/analyzer/logging.zeek,false,-,"The kind of analyzer involved. Currently ""packet"", ""file""\nor ""protocol"".",-
...Zeek features a powerful logging framework that manages Zeek's log streams, log writes, and their eventual output format. The format of Zeek's log entries is highly site-specific and depends on the configuration of log filters, enrichments that add additional fields to existing logs, new logs produced by add-on protocol parsers, etc.
Zeek does not automatically provide a description of what the resulting log data, after all of this customization, look like. This package closes this gap, allowing users to verify that their logs still look the same after an upgrade, that they're compatible with a given log ingester, etc.
The package does this by using reflection APIs at runtime. It scans registered
log streams to retrieve each log's underlying Zeek record type and study its
fields, and inspects a configurable log filter on each of those streams to
understand included/excluded fields, separator naming, field name mappings,
etc. For each schema format a registered exporter then translates the gathered
information into suitable output.
The package does nothing when loaded via @load packages or @load logschema.
Instead, you load the desired exporters, each of which resides in its own script
in logschema/export/<format>. Exports run at startup: in standalone Zeek this
means right after zeek_init() handlers have executed, and when running in a
cluster, it means once the cluster is up and running.
Many aspects of the export are customizable, and you can roll your own logic for when to run (and perhaps re-run) schema generation at runtime if desired.
For each log stream known to Zeek, the package determines for each of the log's fields:
- the name (such as
uidorservice), - its type in the Zeek scripting language (such as
stringorcount), - the record type containing the field (such as
Conn::Infoorconn_id), - whether the field is optional (*),
- the default value of the field, if any,
- the field's docstring,
- the Zeek script that defined the field (*),
- the package that added the field, if applicable (*).
(*) Only available when using Zeek 6 or newer.
The package then filters this information based on modifications applied by the log filter in effect, which can include/exclude fields, transform field names, add extension fields, etc.
At this point, each schema exporter decides how to use the resulting field metadata. Not all schema formats support all of this information -- for example, a schema language may have no concept of the Zeek package providing a log field.
@load logschema/export/jsonschemaThis exporter provides JSON Schema files. By default
the exporter writes one schema file per log, named
zeek-{logname}-log.schema.json. Each log field becomes a property in the
schema. The schemas feature the type of each field when rendered in JSON, a
description (from Zeek's docstrings), default values, and whether a field is
required. They do not annotate or enforce formats (e.g. to convey that an
address string is formatted as an IP address), and they don't apply all
conceivable constraints. The schemas also don't prohibit
additionalProperties. In short, they're somewhat "loose".
Zeek knows more about its log schema than what JSON Schema's expressiveness can
capture naturally. For example, there's no immediate "vocabulary" in JSON Schema
to express that a log field has a certain Zeek type, or that a particular Zeek
package added it. To convey these properties, the package adds an x-zeek
annotation
to each field's property. Schema validators and other JSON Schema-centric
applications safely ignore such annotations. The annotation is
an object
including the Zeek type, the record type containing the field, the script
that defined the field, and the package that added the
field, if applicable. (See Schema information above for
details.)
Each log's schema is self-contained.
Note that Zeek logs written in JSON format are technically JSONL documents, i.e., every log line is a JSON document. Keep this in mind when validating logs, since the validator might need nudging to accept this format.
$ cat zeek-conn-log.schema.json | jq
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Schema for Zeek conn.log",
"description": "JSON Schema for Zeek conn.log",
"type": "object",
"properties": {
"ts": {
"description": "This is the time of the first packet.",
"type": "number",
"examples": [
"1737691432.132607"
],
"x-zeek": {
"type": "time",
"record_type": "Conn::Info",
"script": "base/protocols/conn/main.zeek"
}
},
...
}Redef Log::Schema::JSONSchema::filename to control the file output, see
below for details.
To omit the x-zeek annotation:
redef Log::Schema::JSONSchema::add_zeek_annotations = F;By default, the schema includes various constraining keywords on the log fields, such as the fact that a valid port number ranges from 0 to 65535:
To omit these and similar constraints:
redef Log::Schema::JSONSchema::add_detailed_constraints = F;By default, the schema also contains example values for types where the presentation may not be immediately clear:
"id.orig_h": {
"description": "The originator's IP address.",
"type": "string",
"examples": [
"192.168.0.1",
"fe80::208:74ff:feda:6210"
],
//...To omit these and similar examples:
redef Log::Schema::JSONSchema::add_examples = F;You can also adjust the example values, see the exporter source for details.
Using the Sourcemeta jsonschema CLI:
$ npm install --global @sourcemeta/jsonschema
$ zeek -r test.pcap LogAscii::use_json=T
$ zeek logschema/export/jsonschemaNow:
$ jsonschema validate zeek-conn-log.schema.json conn.log
$
$ # Pass! Now mismatch schema and log:
$ jsonschema validate zeek-conn-log.schema.json ssl.log
fail: /home/christian/t4/logs/ssl.log
error: Schema validation failure
The value was expected to be an object that defines properties "id.orig_h", "id.orig_p", "id.resp_h", "id.resp_p", "proto", "ts", and "uid"
at instance location ""
at evaluate path "/required"@load logschema/export/csvThe CSV exporter renders the schema into comma-separated rows, with one row per
log field. By default it produces a file called zeek-logschema.csv. A header
line explaining each column is optional and included by default. The
line-oriented nature makes this format great for diffing.
For "complex" columns, such as default values or the docstrings, the formatter
uses JSON representation of the resulting strings. It escapes \" to "", but
leaves escaped newline in place.
$ cat zeek-logschema.csv
log,field,type,record_type,script,is_optional,default,docstring,package
analyzer,ts,time,Analyzer::Logging::Info,base/frameworks/analyzer/logging.zeek,false,-,"Timestamp of confirmation or violation.",-
analyzer,cause,string,Analyzer::Logging::Info,base/frameworks/analyzer/logging.zeek,false,-,"What caused this log entry to be produced. This can\ncurrently be ""violation"" or ""confirmation"".",-
analyzer,analyzer_kind,string,Analyzer::Logging::Info,base/frameworks/analyzer/logging.zeek,false,-,"The kind of analyzer involved. Currently ""packet"", ""file""\nor ""protocol"".",-
...Redef Log::Schema::CSV::filename to control the file output, see
below for details.
To disable the header line, use the following:
redef Log::Schema::CSV::add_header = F;To change the separator from commas to another string:
redef Log::Schema::CSV::separater = ":";To change the string used for unset &optional fields from the default of "-":
redef Log::Schema::CSV::separater = "";@load logschema/export/logThis export looks a lot like the CSV format, but produces a regular Zeek log
named logschema with the schema information (and yes, the log itself gets
reflected in the schema :-). This is a handy way to record and archive schema
information as part of your regular Zeek setup.
$ cat logschema.log
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path logschema
#open 2025-05-20-18-00-08
#fields log field _type record_type script is_optional _default docstring package
#types string string string string string bool string string string
analyzer ts time Analyzer::Logging::Info base/frameworks/analyzer/logging.zeek F - Timestamp of confirmation or violation. -
analyzer cause string Analyzer::Logging::Info base/frameworks/analyzer/logging.zeek F - What caused this log entry to be produced. This can\x0acurrently be "violation" or "confirmation".-
analyzer analyzer_kind string Analyzer::Logging::Info base/frameworks/analyzer/logging.zeek F - The kind of analyzer involved. Currently "packet", "file"\x0aor "protocol". -
...@load logschema/export/jsonThis exporter essentially runs the package's internal log analysis state through
to_json() to produce the schema, and is just a handful of lines of code. While
simple, this naturally features all log information the schema analysis is aware
of.
By default, this writes a single output file called zeek-logschema.json. The
result is a JSON array of objects, each representing a log. Each object has three members:
- "name", the name of the log (such as "conn" for conn.log),
- "id", the Log::ID enum identifying the log stream in Zeek,
- "fields", an array of fields that each contain the JSON rendering of a Log::Schema::Log record.
The sequence of logs is sorted alphabetically by name, and the sequence of fields is in the order they're defined in the corresponding Zeek records.
When writing individual schema files per log, each file contains the JSON object for the respective log.
$ cat zeek-logschema.json | jq
[
{
"name": "analyzer",
"id": "Analyzer::Logging::LOG",
"fields": [
{
"name": "ts",
"type": "time",
"record_type": "Analyzer::Logging::Info",
"is_optional": false,
"docstring": "Timestamp of the violation.",
"script": "base/frameworks/analyzer/logging.zeek"
},
...Redef Log::Schema::JSON::filename to control the file output, see
below for details.
By default, the package studies the default filter on each log stream. You can
adjust this by redef'ing Log::Schema::logfilter.
All exporters except the Zeek log one write their schemas to files. You can configure how they do this by adjusting a per-exporter filename pattern. This pattern supports keyword substitutions, as follows:
-
{log}: the name of the log, such as "conn". This keyword also controls whether the exporter writes one file per log, or all schemas in a single log: when the filename pattern features this keyword, it's one-file-per-log, otherwise a single file. -
{filter}: the log filter used for the export, such as "default". -
{pid}: the PID of the Zeek process, handy for disambiguating multiple runs. -
{version}: the Zeek version string, as produced byzeek_version(). -
strftime()conversion characters, such as%Y-%m-%d, based oncurrent_time().
Using "-" as filename will cause the schemas to be written to stdout.
The package provides a hook to make arbitrary changes to the log metadata before the exporters produce schemas from it. Let's say you want to patch up the docstring of the conn.log's service field. With this in test.zeek ...
hook Log::Schema::adapt(logs: Log::Schema::LogsTable) {
logs[Conn::LOG]$fields["service"]$docstring = "My much better docstring";
}... creating a JSON Schema yields:
$ zeek logschema/export/jsonschema ./test.zeek
$ cat zeek-conn-log.schema.json | jq '.properties["service"]'
{
"type": "string",
"description": "My much better docstring"
}Consult the logschema package's Field record
for details on the available log field metadata.
For each field in the logs, logschema attempts to identify Zeek packages (as
installed with zkg) that contribute it. Since zkg does not add package
metadata to installed scripts, the logschema package uses a heuristic: it
derives this information from the filename path of the Zeek script that
contributes a field. Using this path, the package scans the prefixes in
Log::Schema::package_prefixes for matches. Once it finds one, the next
directory becomes the name of the package. The defaults match Zeek's standard
installation of packages into the share/zeek/site/packages directory.
For example, say you have a Zeek package "foobar" whose main.zeek script adds a
field to conn.log. zkg will typically install the script in
share/zeek/site/packages/foobar/main.zeek. When starting up, Zeek learns that
site/packages/foobar/main.zeek is responsible for the log field. The
site/packages entry in Log::Schema::package_prefixes matches this path, and
logschema sets the field's Zeek package name to "foobar".
You can adjust this behavior in two ways. First, to recognize additional paths,
adjust the set of prefixes by redefining
Log::Schema::package_prefixes. Second, you can write arbitrary logic in a
Log::Schema::adapt hook handler to assign package names, perhaps as follows:
hook Log::Schema::adapt(logs: Log::Schema::LogsTable) {
for ( _, field in logs[MY::LOG]$fields ) {
field$package = "mypackage";
}
}The path-based behavior is only available in Zeek 6 and newer.
Writing an exporter involves three steps:
-
Create a record of type
Log::Schema::Exporterwith a name for your exporter and needed function callbacks. The record features callbacks for every log the reflection processes ($process_log()), a finalization over all state prior to output ($finalize_schema()), a callback to write all information to a single file ($write_all_schemas()), a callback to write a single log's schema to a file ($write_single_schema()), and a custom output routine when filenames don't apply ($custom_export()). -
Register this exporter with a call to
Log::Schema::add_exporter(). This usually happens in azeek_init()handler. -
Run the export. You can use the default logic, in which case you need to do nothing. To roll your own logic, redef
Log::Schema::run_at_startuptoFto disable built-in schema production, and callLog::Schema::run_export()where- and whenever you see fit.
Take a look at the exporters in this package to get you started.
Log streams nearly always get defined in zeek_init() event handlers. That's
why the package looks for registered log streams after those handlers have
run. However, script authors are free to create Zeek logs at any time and under
arbitrary conditions, so the package will not automatically see such logs. We
suggest the use of custom Log::Schema::run_export() invocations in that case.
A few Zeek logs use &default attributes for which this package produces
different output from run to run in schema formats that capture default values,
such as CSV. For example, the SMB logs have timestamps defaulting to current
network time, producing different timestamps whenever you generate the schemas.
The Log::Schema::show_defaults toggle, T by default, lets you suppress
defaults in generated schemas when you set it to F. This is the easiest way to
tame such fields, but affects all of them. You can also operate more surgically,
adjusting this and other troublesome output via the Log::Schema::adapt() hook
mentioned above:
hook Log::Schema::adapt(logs: Log::Schema::LogsTable) {
logs[SMB::FILES_LOG]$fields["ts"]$_default = 0.0;
logs[SMB::MAPPING_LOG]$fields["ts"]$_default = 0.0;
}You can also suppress this particular churn by redef'ing
allow_network_time_forward=F, which will keep these timestamps at 0.0 when
producing the schema at startup. You likely don't want to use this approach when
running Zeek in production, since it affects Zeek's internal handling of time.