-
Notifications
You must be signed in to change notification settings - Fork 6
Home
After months of inactivity because of several reasons, I’m glad to say that soon the code based will be refreshed and a fully functional data collection and processing pipeline will be available.
More coming soon!!
Honu or Chukwa-Streaming is an agent less solution compatible with the Apache Chukwa Project.
Like Chukwa the goal is to be able to collect a large volume of structured and unstructured logs, to process them and gain business insights.
Mailing-list: http://groups.google.com/group/honu-dev
Chukwa requires:
- An agent to be installed on every single machine
- Read access to a file in order to send it over to the collector
- Java only solution
For at least those reasons, I had to rewrite a large portion of Chukwa to meet my needs but because of time constraint I couldn’t work on fixing those issues and submitting patches to Chukwa at the same time.
Honu has been in production and stable for 3 months now, here at Netflix, collecting over 135 Millions log events every days.
So it’s time to open it!
The goal is to make the production version of Honu running at Netflix available here on GitHub so others can take advantage of it and contribute back to Honu.
The main advantages of Honu over the standard Chukwa project are:
- Proven scalability, Honu is currently processing over 70 billion events/day at Netflix
- Agent less solution (Chunks are sent directly to one or more remote collector)
- Multi language support for the Collector (internally Honu is using Thrift for Transport and Encoding)
- Java is fully implemented with batching/queuing and so on
- All others Thrift supported languages can easily be added
- New Demux (MapReduce to automatically process all the data)
- Dynamic & Multiple Output format for Mapper and/or Reducer
- Hive output format and Hive schema are natively supported
- Structured log API
- key/value log helper
- Dynamic Hive table creation
- Milestone 3 goals (Done)
- HBase integration for near real time data access (Done)
- Generic forwarder
- (06/01/2012) Honu is running in production collecting over 70 billion events/day
- Milestone 2 goals (Done)
- Multiple writers on the collector side. This is require in order to not have to process everything at the same time but to have an SLA driven processing.
- 600 Millions log events/day
- Open source the agent-less solution + collector
- (06/28/2010) Honu is running in production collecting over 1 billion events/day
- (02/03/2011) Honu is running in production collecting over 12 billion events/day
- Milestone 1 goals (Done)
- Agent-less streaming solution
- Compress output
- 50 Millions log events/day on 4 ec2 small instances
- Usage:
- (03/01/2010) In production, collecting over 95 Millions log events a day.
- Cpu % used is between 0.8 and 2%.
- Compression is done using LZO.
Jerome Boulon – (jboulon at apache.org)
Mailing-list: http://groups.google.com/group/honu-dev