You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both layers offer additional parameters to fine-tune the embedΒding process. You can adjust dropout rates, batch normalization, and binning strategies to best suit your data. For more detailed information, please refer to the API documentation.
91
+
Both layers offer additional parameters to fine-tune the embedding process. You can adjust dropout rates, batch normalization, and binning strategies to best suit your data. For more detailed information, please refer to the API documentation.
92
92
93
93
---
94
94
95
95
This document highlights the key differences and usage examples for the new advanced numerical embeddings available in KDP.
96
+
97
+
# π Global Numerical Embedding
98
+
99
+
## π Overview
100
+
101
+
Global Numerical Embedding is a powerful technique for processing numerical features collectively rather than individually. It transforms batches of numerical features through a unified embedding approach, capturing relationships across the entire numerical feature space.
102
+
103
+
## π Key Benefits
104
+
105
+
-**Cross-Feature Learning**: Captures relationships between different numerical features
106
+
-**Unified Representation**: Creates a consistent embedding space for all numerical data
107
+
-**Dimensionality Control**: Transforms variable numbers of features into fixed-size embeddings
108
+
-**Performance Enhancement**: Typically improves performance on complex tabular datasets
109
+
110
+
## π» Usage
111
+
112
+
### Basic Configuration
113
+
114
+
Enable Global Numerical Embedding by setting the appropriate parameters in your `PreprocessingModel`:
115
+
116
+
```python
117
+
from kdp.processor import PreprocessingModel
118
+
from kdp.features import FeatureType
119
+
120
+
# Define features
121
+
features_specs = {
122
+
"feature1": FeatureType.FLOAT_NORMALIZED,
123
+
"feature2": FeatureType.FLOAT_NORMALIZED,
124
+
"feature3": FeatureType.FLOAT_RESCALED,
125
+
# more numerical features...
126
+
}
127
+
128
+
# Initialize with Global Numerical Embedding
129
+
preprocessor = PreprocessingModel(
130
+
features_specs=features_specs,
131
+
use_global_numerical_embedding=True, # Enable the feature
132
+
global_embedding_dim=16, # Output dimension per feature
133
+
global_pooling="average"# Pooling strategy
134
+
)
135
+
136
+
# Build the model
137
+
result = preprocessor.build_preprocessor()
138
+
```
139
+
140
+
### Advanced Configuration
141
+
142
+
Fine-tune Global Numerical Embedding with additional parameters:
Copy file name to clipboardExpand all lines: docs/index.md
+63Lines changed: 63 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,12 @@ Say goodbye to tedious data preparation tasks and hello to streamlined, efficien
20
20
21
21
- π§ **Enhanced with Transformer Blocks**: Incorporate transformer blocks into your preprocessing model to boost feature interaction analysis and uncover complex patterns, enhancing predictive model accuracy.
22
22
23
+
- π **Distribution-Aware Encoding**: Automatically detect underlying data distributions and apply specialized transformations to preserve statistical properties and improve model performance.
24
+
25
+
- π **Global Numerical Embedding**: Transform batches of numerical features with a unified embedding approach, capturing relationships across the entire feature space and enhancing model performance on tabular data.
26
+
27
+
- ποΈ **Tabular Attention Mechanisms**: Implement powerful attention-based learning on tabular data with standard and multi-resolution approaches to capture complex feature interactions.
28
+
23
29
- βοΈ **Easy Integration**: Designed to seamlessly integrate as the first layers in your TensorFlow Keras models, facilitating a smooth transition from raw data to trained model, accelerating your workflow significantly.
24
30
25
31
## π Getting started:
@@ -141,6 +147,63 @@ ppr = PreprocessingModel(
141
147
)
142
148
```
143
149
150
+
### π Global Numerical Embedding
151
+
152
+
The Global Numerical Embedding layer offers a powerful way to process numerical features collectively, capturing relationships across your entire numerical feature space. This is particularly useful for tabular data with many numerical columns.
153
+
154
+
-**Unified Embedding**: Process all numerical features together through a shared embedding space
155
+
-**Advanced Pooling**: Aggregate information across features with various pooling strategies
156
+
-**Adaptive Binning**: Intelligently discretize continuous values for more effective embedding
'global_init_max': 2.0, # Maximum initialization bound
168
+
'global_dropout_rate': 0.1, # Dropout rate for regularization
169
+
'global_use_batch_norm': True, # Whether to use batch normalization
170
+
'global_pooling': 'average'# Pooling strategy ('average', 'max', or 'concat')
171
+
}
172
+
173
+
ppr = PreprocessingModel(
174
+
path_data="data/my_data.csv",
175
+
features_specs=features_spec,
176
+
**numerical_embedding_config
177
+
)
178
+
```
179
+
180
+
### ποΈ Tabular Attention Configuration
181
+
182
+
Leverage attention mechanisms specifically designed for tabular data to capture complex feature interactions. See π [Tabular Attention](tabular_attention.md) for detailed information.
183
+
184
+
-**Standard Attention**: Apply uniform attention across all features
185
+
-**Multi-Resolution Attention**: Use different attention mechanisms for numerical and categorical data
186
+
-**Placement Options**: Control where attention is applied in your feature space
187
+
188
+
Example configuration:
189
+
190
+
```python
191
+
attention_config = {
192
+
'tabular_attention': True,
193
+
'tabular_attention_heads': 4, # Number of attention heads
'tabular_attention_placement': 'ALL_FEATURES', # Where to apply attention
197
+
'tabular_attention_embedding_dim': 32# For multi-resolution attention
198
+
}
199
+
200
+
ppr = PreprocessingModel(
201
+
path_data="data/my_data.csv",
202
+
features_specs=features_spec,
203
+
**attention_config
204
+
)
205
+
```
206
+
144
207
### ποΈ Custom Preprocessors
145
208
146
209
Tailor your preprocessing steps with custom preprocessors for each feature type. Define specific preprocessing logic that fits your data characteristics or domain-specific requirements, see π [Custom Preprocessors](features.md#π-custom-preprocessing-steps).
0 commit comments