Releases: datacoon/undatum
Releases · datacoon/undatum
Release v1.0.17
Changed
- Improved CLI documentation: Enhanced all command-line interface functions with detailed help text using Typer's
Annotatedtypes - Code refactoring: Refactored analyzer output writing into separate
_write_analysis_output()function for better maintainability - Better file handling: Improved file output handling in analyzer command with proper context managers
Fixed
- Fixed analyzer output not writing to files correctly when
--outputoption was used - Improved consistency between stdout and file output formatting
Release 1.0.16 - Multi-Provider AI Support
Release 1.0.16 - Multi-Provider AI Support
🎉 Major Features
Multi-Provider AI Support
undatum now supports multiple AI providers for automatic field and dataset documentation:
- OpenAI - GPT-4o-mini, GPT-4o, GPT-3.5-turbo, and more
- OpenRouter - Unified API for accessing models from OpenAI, Anthropic, Google, and others
- Ollama - Run local models without API keys
- LM Studio - Local models via OpenAI-compatible API
- Perplexity - Backward compatible with existing Perplexity integration
Structured AI Output
- Replaced fragile text parsing with JSON Schema-based structured output
- More reliable AI response parsing
- Better error handling and fallback mechanisms
Flexible Configuration
Configure AI providers through:
- Environment variables (lowest precedence)
- Config files (
undatum.yamlor~/.undatum/config.yaml) - CLI arguments (highest precedence)
✨ What's New
Added
- Multi-provider AI support: Added support for OpenAI, OpenRouter, Ollama, LM Studio, and Perplexity APIs
- Structured AI output: Replaced fragile text parsing with JSON Schema-based structured output for reliable AI responses
- Flexible AI configuration: Support for environment variables, config files (
undatum.yamlor~/.undatum/config.yaml), and CLI arguments with proper precedence - AI provider factory: New
get_ai_service()function for easy provider instantiation - Enhanced error handling: Proper exception classes (
AIServiceError,AIConfigurationError,AIAPIError) with clear error messages - CLI arguments for AI: Added
--ai-provider,--ai-model, and--ai-base-urloptions toanalyzecommand - Configuration management: New
undatum/ai/config.pymodule for unified configuration handling - Backward compatibility: Old
get_fields_info()andget_description()functions maintained for compatibility - Enhanced code quality improvements and Pylint score improvements
- Better error handling and resource management
Changed
- AI system refactoring: Completely refactored AI documentation system from Perplexity-only to multi-provider architecture
- Structured responses: All AI providers now use JSON Schema (
response_format: json_object) instead of parsing CSV from markdown code blocks - Provider architecture: Implemented abstract base class
AIServicewith concrete provider implementations - Improved code quality: fixed indentation, trailing whitespace, and formatting issues
- Refactored file operations to use
withstatements for better resource management - Updated string formatting to use f-strings and lazy logging
- Fixed dangerous default arguments in function signatures
- Improved type hints and code documentation
- Updated
analyzecommand to accept AI provider configuration - Updated
schemercommand to use new AI service interface
Fixed
- Fixed critical bug: added missing
_process_json_datafunction in analyzer module - Fixed bad indentation issues in
duckdb_decomposefunction - Fixed redefined builtin
idparameter (renamed totable_id) - Fixed unused imports and arguments
- Fixed dictionary iteration patterns (removed unnecessary
.keys()calls) - Fixed
isinstance()calls to use tuple syntax for better performance - Improved file handling with proper context managers
- Fixed fragile AI response parsing: Replaced error-prone text extraction with proper JSON parsing
- Fixed AI service initialization: Added proper error handling and fallback when AI service fails to initialize
📦 Installation
pip install --upgrade undatum🔗 Links
Release 1.0.14
Added JSON to JSON lines conversion
Fixed #19 missing xmltodict dependency
Release 1.0.12
Changes:
- Added command "analyze" it provides human-readable information about data files: CSV, BSON, JSON lines, JSON, XML. Detects encoding, delimiters, type of files, fields with objects for JSON and XML files. Doesn't support Gzipped, ZIPped and other compressed files yet.
- Updated setup.py and requirements.txt to require certain versions of libs and Python 3.8
Analyze command is very helpful working with JSON and XML files. Next step is to update convert command and re-use analyze code. Convert command should support small XML files to process them without SAX parser, using xmltodict instead and automatically detected list tags and convert command should support JSON files, with detection of JSON file type.
Release 1.0.10
- Added encoding and delimiter detection for commands: uniq, select, frequency and headers. Completely rewrote these functions. If options for encoding and delimiter set, they override detected. If not set, detected delimiter and encoding used.
- Added support of .parquet files to convert to. It's done in a simpliest way using pandas "to_parquet" function.
- Added support for CSV and BSON files for "stats" command