Versions - Hadoop 3.3.1, Kafka_2.13-3.0.0, Python 3.8.
10
Kafka Commands
Start services required for kafka
sh bin/zookeeper-server-start.sh config/zookeeper.properties
sh bin/kafka-server-start.sh config/server.properties
Create Topic
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1
--topic testTopic
List existing Topics
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Start Producer
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic testTopic
Start Receiver
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testTopic
--from-beginning
Versions - Hadoop 3.3.1, Kafka_2.13-3.0.0, Python 3.8.10
Using Kafka with Python (using kafka library)
Sending message as a producer using python
Receiving message as a consumer using python
Versions - Hadoop 3.3.1, Kafka_2.13-3.0.0, Python 3.8.10
Multi-Broker Setup in Kafka
Update the server.properties file
Make two copies of server.properties file and rename them to server1.properties and
server2.properties
Changes in server1.properties
broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs1
Changes in server2.properties
broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/tmp/kafka-logs2
Start zookeeper and kafka servers
sh bin/zookeeper-server-start.sh config/zookeeper.properties
sh bin/kafka-server-start.sh config/server.properties
sh bin/kafka-server-start.sh config/server1.properties
sh bin/kafka-server-start.sh config/server2.properties
Create Topic (use anyone port number)
bin/kafka-topics.sh --create --bootstrap-server localhost:9093 --replication-factor 1 --partitions 1
--topic testTopic
bin/kafka-topics.sh --create --bootstrap-server localhost:9094 --replication-factor 1 --partitions 1
--topic testTopic2
Alter Topic (use anyone port number)
bin/kafka-topics.sh --bootstrap-server localhost:9093 --alter --topic testTopic --partitions 3
List existing Topics (use anyone port number)
bin/kafka-topics.sh --list --bootstrap-server localhost:9093
Start Producer
bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9093,localhost:9094 --topic
testTopic
Start Receiver
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testTopic
--from-beginning
Versions - Hadoop 3.3.1, Kafka_2.13-3.0.0, Python 3.8.10
Kafka output to HDFS
Producer
import csv
import requests
from kafka import KafkaProducer
from json import dumps
# replace the "demo" apikey below with your own key from
https://www.alphavantage.co/support/#api-key
CSV_URL =
'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&
symbol=IBM&interval=15min&slice=year1month1&apikey=demo'
with requests.Session() as s:
download = s.get(CSV_URL)
decoded_content = download.content.decode('utf-8')
cr = csv.reader(decoded_content.splitlines(), delimiter=',')
my_list = list(cr)
producer =
KafkaProducer(bootstrap_servers=['localhost:9092'],value_serializer=lambda
K:dumps(K).encode('utf-8'))
for row in my_list:
producer.send('testTopic',row)
Consumer
from kafka import KafkaConsumer
import pydoop.hdfs as hdfs
consumer = KafkaConsumer('testTopic',bootstrap_servers=['localhost:9092'])
hdfs_path = 'hdfs://localhost:9000/StockDatapydoop/stock_file.txt'
for message in consumer:
values = message.value.decode('utf-8')
with hdfs.open(hdfs_path, 'at') as f:
print(message.value)
f.write(f"{values}\n")
Versions - Hadoop 3.3.1, Kafka_2.13-3.0.0, Python 3.8.10
Kafka TWITTER tweets to HDFS
Producer
import tweepy
from textblob import TextBlob
from kafka import KafkaProducer
from json import dumps
with open('config.json') as json_file:
data = json.load(json_file)
CONSUMER_KEY = data['CONSUMER_KEY']
CONSUMER_SECRET = data['CONSUMER_SECRET']
ACCESS_TOKEN = data['ACCESS_TOKEN']
ACCESS_TOKEN_SECRET = data['ACCESS_TOKEN_SECRET']
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
producer =
KafkaProducer(bootstrap_servers=['localhost:9092'],value_serializer=lambda
K:dumps(K).encode('utf-8'))
api = tweepy.API(auth)
cursor =
tweepy.Cursor(api.search_tweets,q="crypto",tweet_mode='extended').items(10
0)
for tweet in cursor:
producer.send('testTopic',tweet.full_text)
Consumer
from kafka import KafkaConsumer
import pydoop.hdfs as hdfs
consumer = KafkaConsumer('testTopic',bootstrap_servers=['localhost:9092'])
hdfs_path = 'hdfs://localhost:9000/StockDatapydoop/stock_file.txt'
for message in consumer:
values = message.value.decode('utf-8')
with hdfs.open(hdfs_path, 'at') as f:
print(message.value)
f.write(f"{values}\n")