Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Processor]: Implement Processor Infrastructure/Libraries (for flink to start with) #2276

@chiradip

Description

@chiradip

Add Framework-Agnostic Connector Library for Stream Processors

Summary

Create a reusable, framework-agnostic connector library for Apache Iggy that provides core abstractions for integrating with stream processing engines (Apache Flink, Apache Spark, Apache Beam, etc.). This library will serve as the foundation for the upcoming Flink connector implementation while enabling future integrations with other stream processors.

Motivation

Currently, Iggy has connectors implemented as "sources" and "sinks" in specific language ecosystems. As we expand support to stream processing frameworks like Apache Flink and Apache Spark, we need:

  1. Reusability: Common abstractions that can be shared across multiple stream processor integrations (estimated 85-95% code reuse)
  2. Consistency: Unified configuration and error handling across all stream processor connectors
  3. Maintainability: Single source of truth for core connector logic, reducing duplication
  4. Future-proofing: Clean separation between framework-agnostic and framework-specific code

This issue introduces "external processors" as a third category of connectors, distinct from sources and sinks, specifically for stream processing engine integrations.

Proposed Solution

Directory Structure

foreign/java/external-processors/iggy-connector-flink/
├── iggy-connector-library/          # Core reusable library
│   ├── src/main/java/org/apache/iggy/connector/
│   │   ├── config/                  # Configuration classes
│   │   ├── error/                   # Exception hierarchy
│   │   ├── partition/               # Partition metadata
│   │   └── serialization/           # SerDe interfaces
│   └── src/test/java/               # Comprehensive unit tests
└── iggy-flink-examples/             # Example Flink jobs (future work)

Package Design

Framework-Agnostic Packages (org.apache.iggy.connector.*):

  • No dependencies on Flink, Spark, or any specific framework
  • Pure Java abstractions with well-defined interfaces
  • Serializable for distributed execution
  • Documented for external reuse

Future Framework-Specific Packages (org.apache.iggy.connector.flink.*):

  • Adapters wrapping common abstractions
  • Framework-specific implementations (Source/Sink APIs, checkpointing, etc.)
  • Minimal framework-specific logic

Scope of Work

Phase 1: Core Abstractions (This Issue)

Implement the following framework-agnostic classes:

Configuration (org.apache.iggy.connector.config)

  1. IggyConnectionConfig - Connection settings

    • Server address, credentials, timeouts
    • Retry configuration (maxRetries, retryBackoff)
    • TLS settings
    • Builder pattern with sensible defaults
  2. OffsetConfig - Offset management

    • Reset strategies: EARLIEST, LATEST, NONE
    • Auto-commit configuration
    • Commit interval and batch size

Serialization (org.apache.iggy.connector.serialization)

  1. DeserializationSchema - Framework-agnostic deserialization

    • Independent of Flink's SerializationSchema
    • Supports metadata access during deserialization
    • Type descriptor for framework mapping
  2. SerializationSchema - Framework-agnostic serialization

    • Optional partition key extraction
    • Nullable support
  3. RecordMetadata - Message metadata

    • Stream, topic, partition, offset information
    • Immutable value object
  4. TypeDescriptor - Type information abstraction

    • Maps to Flink's TypeInformation or Spark's Encoder
    • Supports generic types

Error Handling (org.apache.iggy.connector.error)

  1. ConnectorException - Base exception hierarchy
    • Error codes: CONNECTION_FAILED, AUTHENTICATION_FAILED, RESOURCE_NOT_FOUND, SERIALIZATION_ERROR, OFFSET_ERROR, PARTITION_ERROR, CONFIGURATION_ERROR, TIMEOUT, RESOURCE_EXHAUSTED
    • Retryability flag for transient vs permanent failures

Partition Management (org.apache.iggy.connector.partition)

  1. PartitionInfo - Partition metadata
    • Stream, topic, partition identifiers
    • Current and end offsets
    • Available messages calculation

Phase 2: Flink Integration (Future Issue)

  • Implement org.apache.iggy.connector.flink.* packages
  • Flink Source API v2 implementation (SplitEnumerator, SourceReader)
  • Flink Sink API v2 implementation
  • Checkpointing and state management
  • Integration tests with Testcontainers

Phase 3: Examples and Documentation (Future Issue)

  • Example Flink jobs demonstrating connector usage
  • Docker Compose setup for testing (Flink + Iggy)
  • Comprehensive README and usage documentation

Metadata

Metadata

Assignees

Labels

javaIssues related to Java SDK

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions