10 mins to read
Recently, I was involved in McAfee Security Information Event
Management (SIEM) 11 deployment. SIEM 11 has new requirements
about TCP port 9092 open for Kafka for McAfee Event Receiver,
McAfee Advanced Correlation Engine (ACE), McAfee Application Data
Monitor, Enterprise Security Management (ESM) Data Streaming Bus.
Kafka data bus concept is new to some security engineers. I want to use
this week’s blog to help us to demystify Apache Kafka with a comparison
with the traditional messaging system, request/response style
applications and relational database, comparison with Hadoop – Big data,
and what is the potential value for Security Information Event
Management System to integrate with Kafka data bus.
Apache Kafka has been used by thousands of organizations. It is one of
the fast-growing open-source projects. It was born from LinkedIn internal
Infrastructure System. The original three core developers have started
their own consulting firm (Confluent) to promote Kafka adoption at
multiple industries: Confluent Platform - Event streaming platform built by
the original creators of Apache Kafka.
1. How does Kafka treat Data?
In our legacy cognition system about Data, we treat data as a static mode.
We built a database. We developed a data warehouse with Extract,
Transform, and Load (ETL) tools. For some degree, we envision data has a
static character. We can move "static" data with some request and
response protocol for data transfer such as File Transfer Protocol (FTP) or
Python script to automate the data movement from A to B.
Kafka treats data as dynamic mode like water in the river so Kafka is
focusing on continuous streaming of data. Kafka calls stream bus or
data bus.
There are two roles along or surrounding the Kafka data bus: one
is Producer, another one is Consumer. In other word, legacy data
management concept like hub-spoke, or brokerage mode or like
traditional "car dealer" business model. Kafka is mostly like Tesla Car
selling mode, from Car Producer to Car Consumer.
“Our observation was really simple: there were lots of databases and
other systems built to store data, but what was missing in our architecture
was something that would help us to handle the continuous flow of
data." - Jay Kreps Cofounder and CEO at Confluent [1]
Kafka is Data streaming platform to let
us publish and subscribe to streams of data, store data, and
process data.
Kafka has the combination of elements from the following three
systems: Enterprise Messaging Systems (such as ActiveMQ or IBM's
MQSeries), Big data Systems like a real-time version of Hadoop, Data
Integration or ETL tool. So Kafka is a kind of new animal. Kafka likes
Dragon, the combination of elements from different animals.
2. Top 5 Characters of Apache Kafka
1. Kafka is a Modern Distributed System that runs as a cluster and can
scale to handle all the applications in even the most massive of
companies.
2. Do not need to have individual messaging brokers (like car dealer)
for different applications.
3. Central platform – scale elastically to handle all the streams of data.
4. True Storage System built to store data for as long as you might like
5. Kafka is connecting layer – real delivery guarantees – Data is
replicated and persistent
Image Credit: Kafka: The Definitive Guide
Kafka: The Definitive Guide
Kafka is applicable for core applications that directly power a business.
React to real-time events to directly power the operation of the business
and feed backs into customer experiences. - Jay Kreps Cofounder and CEO
at Confluen [1]
3. Five Business Use Cases
1. Next-Generation Security Information Event Management (SIEM) with
built-in Kafka could be at the clustered environment with scale and
replicate mode to serve as central data backbone for a large cybersecurity
eco-system to share raw event, parsed event, or correlated event via
Kafka data bus architecture. SIEM's data sources on the Receivers can be
configured to publish different topics in Kafka steams data bus. This new
SIEM Integration Architecture will improve Security Operation Efficiency
and Effectiveness. For example, Security Operation Automation Response
(SOAR) product will be a consumer to leverage Correlated Events
published by the producer (SIEM) via Kafka Databus.
No alt text provided for this image
2. Powering real-time applications and data flow behind the scenes of a
social network like Linkedin. [1]
3. Big retailers are re-working fundamental business processes around
continuous data streams. [1]
4. Car companies are collecting and process real-time data streams
from internet-connected cars.[1]
5. Banks are re-working fundamental business processes around
continuous data streams.[1] Please find the following testimony from
RBC - my favorite Bank, where I used to work for more than 10 years.
No alt text provided for this image
In summary, like Cash flow is critical in Financial Industry and Traffic flow
is important in Transportation Industry, Data flow is paramount in
Information Technology Industry especially at the age of big data, IoT,
Artificial Intelligence and Machine Learning with real-time data feeding to
train the algorithm. Kafka is the New Generation platform to bring all the
streams of data together across all the use cases.
Disclaimer: The opinions expressed in this article are my own and do not
reflect the views of my employer.
Reference:
1. Book Published by Oreilly Media: Kafka: The Definitive Guide -
Real-Time Data and Stream Processing at Scale.