@@ -66,11 +66,6 @@ The Distribution-Aware Encoder is an advanced preprocessing layer that automatic
6666 - Handled via rate parameter estimation
6767 - Detection: Integer values and variance≈mean
6868
69- 13 . ** Weibull Distribution**
70- - For lifetime/failure data
71- - Handled via Weibull CDF
72- - Detection: Shape and scale analysis
73-
746914 . ** Cauchy Distribution**
7570 - For extremely heavy-tailed data
7671 - Handled via robust location-scale estimation
@@ -81,29 +76,61 @@ The Distribution-Aware Encoder is an advanced preprocessing layer that automatic
8176 - Handled via mixture model approach
8277 - Detection: Zero proportion analysis
8378
84- 16 . ** Bounded Distribution**
85- - For data with known bounds
86- - Handled via scaled beta transformation
87- - Detection: Value range analysis
88-
89- 17 . ** Ordinal Distribution**
90- - For ordered categorical data
91- - Handled via learned mapping
92- - Detection: Discrete ordered values
93-
9479## Usage
9580
9681### Basic Usage
82+
83+ The capability only works with numerical features!
84+
9785``` python
9886from kdp.processor import PreprocessingModel
99-
100- preprocessor = PreprocessingModel(
101- features_stats = stats,
102- features_specs = specs,
87+ from kdp.features import NumericalFeature
88+
89+ # Define features
90+ features = {
91+ # Numerical features
92+ " feature1" : NumericalFeature(),
93+ " feature2" : NumericalFeature(),
94+ # etc ..
95+ }
96+
97+ # Initialize the model
98+ model = PreprocessingModel( # here
99+ features = features,
103100 use_distribution_aware = True
104101)
105102```
106103
104+ ### Manual Usage
105+
106+ ``` python
107+ from kdp.processor import PreprocessingModel
108+ from kdp.features import NumericalFeature
109+
110+ # Define features
111+ features = {
112+ # Numerical features
113+ # Numerical features
114+ " feature1" : NumericalFeature(
115+ name = " feature1" ,
116+ feature_type = FeatureType.FLOAT_NORMALIZED
117+ ),
118+ " feature2" : NumericalFeature(
119+ name = " feature2" ,
120+ feature_type = FeatureType.FLOAT_RESCALED
121+ prefered_distribution = " log_normal" # here we could specify a prefered distribution (normal, periodic, etc)
122+ )
123+ # etc ..
124+ }
125+
126+ # Initialize the model
127+ model = PreprocessingModel( # here
128+ features = features,
129+ use_distribution_aware = True ,
130+ distribution_aware_bins = 1000 , # 1000 is the default value, but you can change it for finer data
131+ )
132+ ```
133+
107134### Advanced Configuration
108135``` python
109136encoder = DistributionAwareEncoder(
@@ -123,7 +150,7 @@ encoder = DistributionAwareEncoder(
123150| -----------| ------| ---------| -------------|
124151| num_bins | int | 1000 | Number of bins for quantile encoding |
125152| epsilon | float | 1e-6 | Small value for numerical stability |
126- | detect_periodicity | bool | True | Enable periodic pattern detection |
153+ | detect_periodicity | bool | True | Enable periodic pattern detection | Remove this parameter when having multimodal functions/distributions
127154| handle_sparsity | bool | True | Enable special handling for sparse data |
128155| adaptive_binning | bool | True | Enable adaptive bin boundaries |
129156| mixture_components | int | 3 | Number of components for mixture models |
@@ -272,40 +299,6 @@ The DistributionAwareEncoder is integrated into the numeric feature processing p
272299 - Enable caching for repeated processing
273300 - Adjust mixture components based on data
274301
275- ## Example Use Cases
276-
277- ### 1. Financial Data
278- ``` python
279- # Handle heavy-tailed return distributions
280- preprocessor = PreprocessingModel(
281- use_distribution_aware = True ,
282- handle_sparsity = False ,
283- mixture_components = 2
284- )
285- ```
286-
287- ### 2. Temporal Data
288- ``` python
289- # Handle periodic patterns
290- preprocessor = PreprocessingModel(
291- use_distribution_aware = True ,
292- detect_periodicity = True ,
293- adaptive_binning = True
294- )
295- ```
296-
297- ### 3. Sparse Features
298- ``` python
299- # Handle sparse categorical data
300- preprocessor = PreprocessingModel(
301- use_distribution_aware = True ,
302- handle_sparsity = True ,
303- mixture_components = 1
304- )
305- ```
306-
307- ## Monitoring and Debugging
308-
309302### Distribution Detection
310303``` python
311304# Access distribution information
0 commit comments