GH-148: [Go] Tabular Data Extraction in the CRUD API#155
Conversation
| // Initialize PostgreSQL config | ||
| postgresConfig := &postgres.Config{ | ||
| Host: os.Getenv("POSTGRES_HOST"), | ||
| Port: 5432, // Default PostgreSQL port |
There was a problem hiding this comment.
@Isuru-rangana shall we take this from env variable as well? Because we may need this configurable in future. WDYT?
There was a problem hiding this comment.
we'll modify the configuration to align with our existing env pattern
Host: os.Getenv("POSTGRES_HOST"),
Port: os.Getenv("POSTGRES_PORT"),
User: os.Getenv("POSTGRES_USER"),
Password: os.Getenv("POSTGRES_PASSWORD"),
DBName: os.Getenv("POSTGRES_DB"),
SSLMode: os.Getenv("POSTGRES_SSL_MODE"),
There was a problem hiding this comment.
Yes this is better. Let's do that.
| func (c *Client) InitializeTables(ctx context.Context) error { | ||
| // Create entity_attributes table | ||
| entityAttributesSQL := ` | ||
| CREATE TABLE IF NOT EXISTS entity_attributes ( |
There was a problem hiding this comment.
@Isuru-rangana shall we explain a bit why we are creating the entity_attributes table? or should this have the name of the key we get from the map of attributes?
There was a problem hiding this comment.
entity_attributes table serves as the core mapping between entities and their dynamic attributes.
It manages:
- Entity-Attribute Relationships: Maps each entity (entity_id) to its attributes (attribute_name)
- Dynamic Table References: Stores the actual table name (table_name) where the attribute data is stored
This allows efficient querying and maintains.
There was a problem hiding this comment.
Let's add a comment for this in the code. Let's be as descriptive as possible. Also explain the purpose of this table clearly and how it is used in the code.
|
|
||
| // Create attribute_schemas table | ||
| attributeSchemasSQL := ` | ||
| CREATE TABLE IF NOT EXISTS attribute_schemas ( |
There was a problem hiding this comment.
same here what is attributes_schemas? explain a bit in the code comments?
There was a problem hiding this comment.
attribute_schemas table is our schema registry that manages the structure of dynamic attribute tables
There was a problem hiding this comment.
Right, let's write a clear description here with good explanation on how it is defined and how it is used in the code.
| neo4j_data: | ||
| neo4j_logs: No newline at end of file | ||
| neo4j_logs: | ||
| postgres_data: No newline at end of file |
There was a problem hiding this comment.
let's add a new line at the end of the file. See the red circle with - mark, it says that we should have a new line at the end. It is kind of a norm we use. I also see I have missed that in the original code as well.
|
@Isuru-rangana awesome work. I did one pass. I have to checkout locally and play with it a bit. But overall it seems to do what we discuss. I am checking tomorrow. |
|
@Isuru-rangana we should also update the |
| ## Database Setup | ||
|
|
||
| ### PostgreSQL Setup | ||
|
|
||
| 1. **Using Docker**: | ||
| ```bash | ||
| # Run PostgreSQL container | ||
| docker run -d \ | ||
| --name postgres \ | ||
| -e POSTGRES_PASSWORD=postgres \ | ||
| -e POSTGRES_USER=postgres \ | ||
| -p 5432:5432 \ | ||
| postgres:16 | ||
| ``` | ||
|
|
||
| 2. **Create Test Database**: | ||
| ```bash | ||
| docker exec postgres psql -U postgres -c "CREATE DATABASE test_db;" | ||
| ``` | ||
|
|
||
| 3. **Environment Variables**: | ||
| ```bash | ||
| # Add to your .env file | ||
| POSTGRES_HOST=localhost | ||
| POSTGRES_PORT=5432 | ||
| POSTGRES_USER=postgres | ||
| POSTGRES_PASSWORD=postgres | ||
| POSTGRES_DB=test_db | ||
| POSTGRES_TEST_DB_URI="postgresql://postgres:postgres@localhost:5432/test_db?sslmode=disable" | ||
| ``` | ||
|
|
||
| ### Running PostgreSQL Tests | ||
|
|
||
| 1. **Run All Tests**: | ||
| ```bash | ||
| POSTGRES_TEST_DB_URI="postgresql://postgres:postgres@localhost:5432/test_db?sslmode=disable" go test -v ./db/repository/postgres/... | ||
| ``` | ||
|
|
||
| 2. **Run Specific Tests**: | ||
| ```bash | ||
| # Run client tests | ||
| POSTGRES_TEST_DB_URI="postgresql://postgres:postgres@localhost:5432/test_db?sslmode=disable" go test -v -run TestNewClient ./db/repository/postgres/... | ||
|
|
||
| # Run data insertion tests | ||
| POSTGRES_TEST_DB_URI="postgresql://postgres:postgres@localhost:5432/test_db?sslmode=disable" go test -v -run TestInsertSampleData ./db/repository/postgres/... | ||
| ``` | ||
|
|
||
| 3. **Run Tests with Coverage**: | ||
| ```bash | ||
| POSTGRES_TEST_DB_URI="postgresql://postgres:postgres@localhost:5432/test_db?sslmode=disable" go test -v -cover ./db/repository/postgres/... | ||
| ``` | ||
|
|
||
| 4. **Run Tests with Race Detection**: | ||
| ```bash | ||
| POSTGRES_TEST_DB_URI="postgresql://postgres:postgres@localhost:5432/test_db?sslmode=disable" go test -v -race ./db/repository/postgres/... | ||
| ``` |
There was a problem hiding this comment.
@Isuru-rangana I think we don't have to add these here, we should only add the required new configs to the env.template file. And if you feel like this needs to be mentioned. Let's add it to by creating a new DEVELOPMENT.md within the crud-api folder.
| ```bash | ||
| docker exec postgres psql -U postgres -c "CREATE DATABASE test_db;" | ||
| ``` |
There was a problem hiding this comment.
@Isuru-rangana do we have to run the create database docker exec? The reason for asking this is that, we should just be able to do the following and do the development and deployment quite easily.
|
@Isuru-rangana there is a problem, I don't think the CIs have been updated. Look at the code segment in here. Here we are initializing the databases within the container, and then we run the test cases and that is defined in the |
|
@Isuru-rangana I checked out the PR locally and I am seeing a few issues, maybe I am missing something here. Though let me share the issue I get the following when I try running them locally POSTGRES_TEST_DB_URI="postgresql://postgres:postgres@localhost:5432/test_db?sslmode=disable" go test -v ./db/repository/postgres/...=== RUN TestInsertSampleData
postgres_client_test.go:358: Failed to create client: error connecting to the database: pq: role "postgres" does not exist
--- FAIL: TestInsertSampleData (0.00s)
=== RUN TestQuerySampleData
postgres_client_test.go:470: Failed to create client: error connecting to the database: pq: role "postgres" does not exist
--- FAIL: TestQuerySampleData (0.00s)
FAIL
Also in the CIs we don't see these test cases running crud | 2025/06/05 05:01:44 [neo4j_client.NewNeo4jRepository] Connected to Neo4j successfully!
crud | === RUN TestCreateEntity
crud | --- PASS: TestCreateEntity (0.00s)
crud | PASS
crud | ok lk/datafoundation/crud-api/cmd/server 0.036s
crud | ? lk/datafoundation/crud-api/db/config [no test files]
crud | 2025/06/05 05:01:45 Successfully connected to MongoDB
crud | 2025/06/05 05:01:45 Running tests
crud | 2025/06/05 05:01:45 DB Name: testdb
crud | 2025/06/05 05:01:45 Collection Name: metadata_test
crud | === RUN TestCreateAndReadEntity
crud | 2025/06/05 05:01:45 Test using database: testdb, collection: metadata_test
crud | 2025/06/05 05:01:45 Inserted document with ID: test-entity-1
crud | --- PASS: TestCreateAndReadEntity (0.02s)
crud | === RUN TestUpdateEntityMetadata
crud | 2025/06/05 05:01:45 Test using database: testdb, collection: metadata_test
crud | --- PASS: TestUpdateEntityMetadata (0.00s)
crud | === RUN TestDeleteEntity
crud | 2025/06/05 05:01:45 Test using database: testdb, collection: metadata_test
crud | --- PASS: TestDeleteEntity (0.00s)
crud | === RUN TestMetadataHandling
crud | 2025/06/05 05:01:45 Test using database: testdb, collection: metadata_test
crud | --- PASS: TestMetadataHandling (0.00s)
crud | PASS
crud | 2025/06/05 05:01:45 Tests completed
crud | ok lk/datafoundation/crud-api/db/repository/mongo 0.042s
time="2025-06-05T05:01:46Z" level=warning msg="/home/runner/work/nexoan/nexoan/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"These are the only ones running. So we must check what is going on. We need to basically figure out two things,
|
|
|
I've analyzed the issues with the PostgreSQL tests and identified the root causes: The "role 'postgres' does not exist" error occurs because the PostgreSQL tests expect a 'postgres' user in your local setup, which might not exist or have different credentials. The PostgreSQL tests are being skipped in CI because the |
- Created DEVELOPMENT.md with PostgreSQL setup and testing instructions - Updated README.md with references to DEVELOPMENT.md - Updated env.template with PostgreSQL test connection string
| -e NEO4J_USER=${NEO4J_USER} \ | ||
| -e NEO4J_PASSWORD=${NEO4J_PASSWORD} \ | ||
| -e MONGO_URI=${MONGO_URI} \ | ||
| -e POSTGRES_TEST_DB_URI=${POSTGRES_TEST_DB_URI} \ |
There was a problem hiding this comment.
@Isuru-rangana why do we need POSTGRES_TEST_DB_URI can't we just use POSTGRES_DB_URI as the parameter name but use the value of the test DB URI?
|
|
||
| // Handle attributes | ||
| _, err = postgres.HandleAttributes(req.Attributes) | ||
| err = postgres.HandleAttributes(ctx, s.postgresClient, req.Id, req.Attributes) |
| // Create PostgreSQL client | ||
| postgresClient, err := postgres.NewClient(*postgresConfig) | ||
| if err != nil { | ||
| log.Fatalf("[service.main] Failed to create PostgreSQL client: %v", err) | ||
| } | ||
| defer postgresClient.Close() | ||
|
|
There was a problem hiding this comment.
@Isuru-rangana minor change, but important. We should stick to the convention we have used for the mongo and neo4j.
Looking at the code, see we use neo4jRepo or mongoRepo where you have used postgresClient. Let's use postgresRepo. Also please rename the classNewClient method to NewPostgresRepository.
| neo4jRepo: neo4jRepo, | ||
| mongoRepo: mongoRepo, | ||
| neo4jRepo: neo4jRepo, | ||
| postgresClient: postgresClient, |
There was a problem hiding this comment.
see here it is very clear how it is different.
use postgresRepo instead.
| var repository *Neo4jRepository | ||
|
|
||
| // cleanupDatabase deletes all nodes and relationships in the database | ||
| func cleanupDatabase(ctx context.Context, repo *Neo4jRepository) error { |
|
@Isuru-rangana I checked out the latest PR, and I see the following issue when running the test cases === RUN TestInsertSampleData
postgres_client_test.go:359: Failed to create client: error connecting to the database: pq: role "postgres" does not exist |
|
@Isuru-rangana there are a few things we need to improve in the PR.
|
| } | ||
|
|
||
| // validateTabularDataTypes validates that all values in each column have consistent types | ||
| func validateTabularDataTypes(data *structpb.Struct) (map[string]typeinference.TypeInfo, error) { |
There was a problem hiding this comment.
I think this function is not only doing validation it is also returning the extracted information as a map of TypeInfo. So it would be best to rename this function as validateAndReturnTabularDataTypes
| } | ||
|
|
||
| // compareSchemas compares two schemas and returns true if they are compatible | ||
| func compareSchemas(existing, new *schema.SchemaInfo) (bool, error) { |
There was a problem hiding this comment.
@Isuru-rangana let's not use parameters like new, we shouldn't use because that is a reserved keyword in many languages. So let's just do something like newSchema here.
- Add postgresql-client installation\n- Add PostgreSQL environment variables\n- Add PostgreSQL connection health check\n- Update service description
- Renamed Client to PostgresRepository for consistency - Updated all method receivers and constructor functions - Updated service and test files to use new naming - Updated DEVELOPMENT.md with correct env vars and table structure
| target/ | ||
|
|
||
| # artifacts and env configs not to be committed | ||
| artifacts and env configs not to be committed |
There was a problem hiding this comment.
@Isuru-rangana was this by mistake? This line is supposed to be a comment.
| #!/bin/bash | ||
|
|
||
| echo "=== Checking entity_attributes table ===" | ||
| docker exec postgres psql -U postgres -d nexoan -c "SELECT * FROM entity_attributes;" | ||
|
|
||
| echo -e "\n=== Checking attribute_schemas table ===" | ||
| docker exec postgres psql -U postgres -d nexoan -c "SELECT id, table_name, schema_version, created_at, schema_definition::text FROM attribute_schemas;" | ||
|
|
||
| echo -e "\n=== Table Descriptions ===" | ||
| docker exec postgres psql -U postgres -d nexoan -c "\d+ entity_attributes" | ||
| docker exec postgres psql -U postgres -d nexoan -c "\d+ attribute_schemas" No newline at end of file |
| } | ||
|
|
||
| // isInteger checks if a float64 is actually an integer | ||
| func isInteger(val float64) bool { |
Overview
This PR implements type handling for tabular data in PostgreSQL. The implementation provides robust type validation, conversion, and preservation for structured data.
Key Features
1. Type System Implementation
2. Schema Management
entity_attributes: Tracks entity-attribute relationshipsattribute_schemas: Stores schema versions and definitions3. Data Validation & Conversion
Closes #148