-
Notifications
You must be signed in to change notification settings - Fork 195
Description
Add Framework-Agnostic Connector Library for Stream Processors
Summary
Create a reusable, framework-agnostic connector library for Apache Iggy that provides core abstractions for integrating with stream processing engines (Apache Flink, Apache Spark, Apache Beam, etc.). This library will serve as the foundation for the upcoming Flink connector implementation while enabling future integrations with other stream processors.
Motivation
Currently, Iggy has connectors implemented as "sources" and "sinks" in specific language ecosystems. As we expand support to stream processing frameworks like Apache Flink and Apache Spark, we need:
- Reusability: Common abstractions that can be shared across multiple stream processor integrations (estimated 85-95% code reuse)
- Consistency: Unified configuration and error handling across all stream processor connectors
- Maintainability: Single source of truth for core connector logic, reducing duplication
- Future-proofing: Clean separation between framework-agnostic and framework-specific code
This issue introduces "external processors" as a third category of connectors, distinct from sources and sinks, specifically for stream processing engine integrations.
Proposed Solution
Directory Structure
foreign/java/external-processors/iggy-connector-flink/
├── iggy-connector-library/ # Core reusable library
│ ├── src/main/java/org/apache/iggy/connector/
│ │ ├── config/ # Configuration classes
│ │ ├── error/ # Exception hierarchy
│ │ ├── partition/ # Partition metadata
│ │ └── serialization/ # SerDe interfaces
│ └── src/test/java/ # Comprehensive unit tests
└── iggy-flink-examples/ # Example Flink jobs (future work)
Package Design
Framework-Agnostic Packages (org.apache.iggy.connector.*):
- No dependencies on Flink, Spark, or any specific framework
- Pure Java abstractions with well-defined interfaces
- Serializable for distributed execution
- Documented for external reuse
Future Framework-Specific Packages (org.apache.iggy.connector.flink.*):
- Adapters wrapping common abstractions
- Framework-specific implementations (Source/Sink APIs, checkpointing, etc.)
- Minimal framework-specific logic
Scope of Work
Phase 1: Core Abstractions (This Issue)
Implement the following framework-agnostic classes:
Configuration (org.apache.iggy.connector.config)
-
IggyConnectionConfig - Connection settings
- Server address, credentials, timeouts
- Retry configuration (maxRetries, retryBackoff)
- TLS settings
- Builder pattern with sensible defaults
-
OffsetConfig - Offset management
- Reset strategies: EARLIEST, LATEST, NONE
- Auto-commit configuration
- Commit interval and batch size
Serialization (org.apache.iggy.connector.serialization)
-
DeserializationSchema - Framework-agnostic deserialization
- Independent of Flink's SerializationSchema
- Supports metadata access during deserialization
- Type descriptor for framework mapping
-
SerializationSchema - Framework-agnostic serialization
- Optional partition key extraction
- Nullable support
-
RecordMetadata - Message metadata
- Stream, topic, partition, offset information
- Immutable value object
-
TypeDescriptor - Type information abstraction
- Maps to Flink's TypeInformation or Spark's Encoder
- Supports generic types
Error Handling (org.apache.iggy.connector.error)
- ConnectorException - Base exception hierarchy
- Error codes: CONNECTION_FAILED, AUTHENTICATION_FAILED, RESOURCE_NOT_FOUND, SERIALIZATION_ERROR, OFFSET_ERROR, PARTITION_ERROR, CONFIGURATION_ERROR, TIMEOUT, RESOURCE_EXHAUSTED
- Retryability flag for transient vs permanent failures
Partition Management (org.apache.iggy.connector.partition)
- PartitionInfo - Partition metadata
- Stream, topic, partition identifiers
- Current and end offsets
- Available messages calculation
Phase 2: Flink Integration (Future Issue)
- Implement
org.apache.iggy.connector.flink.*packages - Flink Source API v2 implementation (SplitEnumerator, SourceReader)
- Flink Sink API v2 implementation
- Checkpointing and state management
- Integration tests with Testcontainers
Phase 3: Examples and Documentation (Future Issue)
- Example Flink jobs demonstrating connector usage
- Docker Compose setup for testing (Flink + Iggy)
- Comprehensive README and usage documentation