-
Notifications
You must be signed in to change notification settings - Fork 193
RFC-005: Functional Testing Framework for DocumentDB #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RFC-005: Functional Testing Framework for DocumentDB #368
Conversation
This RFC proposes a comprehensive functional testing framework to validate DocumentDB's correctness and MongoDB compatibility. Key Components: - Test Runner: Orchestrates test execution and scheduling - Test Executor: Executes tests with namespace isolation - Result Analyzer: Analyzes compatibility and generates reports - Custom test suite using Python/PyMongo with multi-dimensional tagging Architecture: - Single Docker container deployment for simplicity - Parallel test execution using ThreadPoolExecutor - Tests run against both DocumentDB and MongoDB for compatibility validation - Automatic namespace isolation and cleanup Output Formats: - JSON Report: Machine consumption (APIs, monitoring, historical tracking) - JUnit XML: Human consumption (GitHub Actions UI, PR reviews) - Dashboard: Visual consumption (charts, trend analysis) The framework enables contributors to easily write new functional tests, and utilize the tests to identify regressions from new changes.
cooltodinesh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together a details documentation on approach.
rfcs/005-functional-testing.md
Outdated
| - **Compatibility Gaps**: No systematic way to measure and track compatibility with MongoDB | ||
| - **Manual Testing Burden**: Contributors rely on manual testing, slowing development velocity | ||
| - **Regression Risk**: Lack of automated testing increases the risk of introducing regressions | ||
| - **Feature Validation**: There is no testing framework to allow contributors to write new test cases to validate their features and comnpatibility with MongoDB behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comnpatibility -> compatibility
rfcs/005-functional-testing.md
Outdated
| - `documentdb_uri`: Connection string for DocumentDB instance | ||
| - `mongodb_uri`: Connection string for MongoDB reference instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this configuration parameter have just "database_uri" and separate configuration parameters for each database that we want to test against? We might want to run same test suite against multiple versions of mongodb to observe functional differences across new major versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the term 'reference instance' implies here. The suite should have a self-contained set of expectations.
That being said, being able to run in a comparison mode against two (or more?) URIs does seem like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we should make this generic to be able to test against any MongoDB compatible engine. We could do something like this.
bash
pytest --engine documentdb=mongodb://localhost:27017 --engine mongodb-7.0=mongodb://mongo:27017
Or config file
pytest --config engines.yaml
rfcs/005-functional-testing.md
Outdated
| - Scan test directories recursively for Python files matching pattern `test_*.py` | ||
| - Parse test files to extract: | ||
| - Test function names (functions starting with `test_`) | ||
| - Test tags (from decorators like `@tags(['find', 'rbac'])`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have tags based, IMO it suffice.
rfcs/005-functional-testing.md
Outdated
| - Build test registry with metadata for each discovered test | ||
|
|
||
| **Test Scheduling Algorithm:** | ||
| - Build dependency graph from test dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please share an example why we want one test to depend on another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes technically we can handle the required dependency as a setup for the test so we shouldn't really need dependent tests.
rfcs/005-functional-testing.md
Outdated
| - Example: "`aggregate` + `decimal128` tests are failing" | ||
|
|
||
| **Metrics Calculation:** | ||
| - **DocumentDB Pass Rate**: `tests_passed_on_documentdb / total_tests * 100` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this always be presented at tag level? If we try to come up with general compatibility %, it always gets influenced by one or another tag which has large number of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, under every test suite, should we have set of smoke tests identified by "smoke" tag which determines if given feature is even implemented in target database before deciding to run whole lot of tests against it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide both tag-level and overall metrics to avoid skewing. Overall compatibility should be weighted or normalized to prevent high-volume tags from dominating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, having smoke tagged tests would be valuable for quick feature detection and failing fast on unsupported features before running comprehensive test suites.
|
can we have some cross-references with #344 since it looks like this RFC is proposing implementation for layer C and D of the previous RFC |
rfcs/005-functional-testing.md
Outdated
|
|
||
| The Test Executor is architecturally separated from the Test Runner to enable future extensibility: | ||
|
|
||
| 1. **Distributed Test Execution**: Test Executor could be deployed across multiple machines for large-scale parallel testing, while Test Runner remains centralized for coordination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that would need to coordinate closely with the infrastructure that manages driver and engine instances (i.e. github workflows, AWS, or Azure).
Not against the idea, but I think it'll be a while before the complexity is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is just a future prospect and not something we would focus on now.
rfcs/005-functional-testing.md
Outdated
|
|
||
| The functional testing framework consists of several key components that work together to provide comprehensive testing capabilities. | ||
|
|
||
| **1. Test Runner:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Runner' and 'Executor' are a little too similar. Can we call it the 'Orchestrator' or something similar?
rfcs/005-functional-testing.md
Outdated
| 1. **Custom Functional Test Suite**: A purpose-built test suite specifically designed for DocumentDB to validate functional correctness and compatibility with MongoDB. | ||
| 2. **MongoDB Service Tests Integration**: Leveraging existing MongoDB service tests without modification to measure compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure combining these into one test suite has a lot of value. The MongoDB service tests are not actively maintained and the signal we'll get from them is noisy since they include things that we'll likely never support (like replication set management). Their licensing also puts some restrictions on how they can be run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - additionally what are the licensing requirements for doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really proposing to combine them. I was proposing that we run MongoDB service tests separately to measure compatibility percentage. We would need to engage legal before we actually do this to make sure we have the green light for doing this.
rfcs/005-functional-testing.md
Outdated
| 3. **Pluggable Execution Strategies**: Test Executor can be swapped or extended without modifying Test Runner: | ||
| - Local execution (initial implementation) | ||
| - Remote execution (cloud-based test execution) | ||
| - Containerized execution (each test in isolated container) | ||
| - Custom execution strategies for specific test types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little skeptical about us trying to enable this from the framework layer. We might be better off focusing on a small, portable framework that makes it easy for the user to fit into whatever workflow they have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was meant to address future prospects. For now we will have the framework focus on the core correctness and compatibility testing.
rfcs/005-functional-testing.md
Outdated
|
|
||
| ### Deployment Architecture | ||
|
|
||
| The functional testing framework is deployed as a single Docker container that packages all components together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to prioritize making this easy to run directly. i.e. something like a python package with a small requirements.txt that's easy to get working in any setup. Using docker might otherwise be more difficult to work with than common python environment management tools like conda or venv.
Even for our use it might be a little unwieldy since github workflows run in a docker container already and nested docker containers can be fickle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think supporting both approaches to run the tests would be useful. Contributors can clone the git repo and run the tests directly from the package. Other users wanting to run these tests against a remote cluster can leverage the docker image.
rfcs/005-functional-testing.md
Outdated
| **3. Result Analyzer:** | ||
| - **Purpose**: Analyzes and compares test results | ||
| - **Responsibilities**: | ||
| - Compares results between DocumentDB and MongoDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think comparison between multiple engines should be an optional mode. The core responsibility should be to compare against defined expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was the goal but wording in my RFC was confusing. See my comment below which addresses a similar concern.
rfcs/005-functional-testing.md
Outdated
| Test Runner: | ||
| → Receive all test results | ||
| → Pass results to Result Analyzer | ||
| ↓ | ||
| Result Analyzer: | ||
| → Receive test results | ||
| → Compare DocumentDB vs MongoDB results for each test | ||
| → Calculate compatibility metrics and statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on how the communication will work here? Are the 'results' here just the pass/fail or are we capturing and transferring all of the output from each engine so that the analyzer can inspect it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final outcome for each test case is either passed or failed.
rfcs/005-functional-testing.md
Outdated
| **Parallelism Mechanism:** | ||
| - Use Python's `concurrent.futures.ThreadPoolExecutor` for I/O-bound test execution | ||
| - Thread pool size determined by parallelism configuration parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's relevant at this stage of the design, but python's global interpreter lock might severely limit the actual amount of parallelism you get out of the runner with a simple threadpool. Multirocessing might be required for scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the pytest approach this concern would be resolved since pytest-xdist uses multiprocessing for parallel test execution.
rfcs/005-functional-testing.md
Outdated
| functional-tests/ | ||
| ├── documentdb-functional-test/ # DocumentDB functional tests | ||
| │ ├── test_find.py | ||
| │ ├── test_insert.py | ||
| │ ├── test_aggregate.py | ||
| │ └── fixtures/ # Test data fixtures | ||
| │ ├── users.json | ||
| │ └── products.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we elaborate more on how we want to model the directory structure? Presumably we won't have a single test_aggregate.py for all stages and aggregation operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use the same structure of MongoDB documentation tree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory structure shown was just a high-level example. We'd definitely need more granular organization. For large features like find, we'd split by sub-functionality: basic queries, query operators, logical operators, projections, sorting, cursors, etc.
rfcs/005-functional-testing.md
Outdated
| def setUp(self): | ||
| """Setup RBAC users and test data""" | ||
| # Create a user with read role (base class method handles admin operations) | ||
| self.create_user( | ||
| username='reader_user', | ||
| password='reader_pass', | ||
| roles=[{'role': 'read', 'db': self.test_database.name}] | ||
| ) | ||
|
|
||
| # Load product fixture data (base class method) | ||
| # Reads products.json and inserts into both DocumentDB and MongoDB | ||
| self.load_fixture('products.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to have one setup per file that initiates data for all test cases? Some tests might need their own specialized fixtures or a data generator function, and centralizing it in one place might be hard to maintain. Also, tests that write data might each need their own version of the same fixture.
Would it make more sense to model this with pytest fixtures instead?
i.e. a fixture that'll create a collection based on the test name and drop it afterward
@pytest.fixture
def collection(request):
collection = my_db[request.node.name]
yield collection
collection.drop()
Used like this
@pytest.mark.collection
def test_something(collection):
collection.find(...)
assert(whatever())
A similar thing can be done for load_fixture or other resources, and we could have shared vs individual flavors for efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the most test cases, it should only use minimum data required for the test case purpose.
For example:
// Test $add returns null if input has null.
db.test.drop()
db.test.insert({})
db.test.find({}, {res: {$add: [1, null]}})
// Test $add can use field expression
db.test.drop()
db.test.insert({a: null})
db.test.find({}, {res: {$add: [1, "$a"]}})
- It's more readable what the specific case this test covers. If we have a complex document or many rows, we cannot rely on reviewer to track the coverage of the test file. In that case one command can cover multiple edge cases. IMO, if we have db.test.insert({a: null, b: 1}), b is a distraction for reviewer to understand the test case, let alone if we have a collection of real world data.
- If one command covers multiple test cases, we can have buried behavior difference.
For example:
> db.test.find()
{ "_id" : ObjectId("690eace782576799b86f4622"), "a" : 1 }
{ "_id" : ObjectId("690eacf382576799b86f4623"), "a" : "1" }
{ "_id" : ObjectId("690ead21fad8dbab68c7896c"), "a" : [ ] }
> db.test.find({}, {res: {$add: [1, "$a"]}})
Error: error: {
"ok" : 0,
"errmsg" : "Executor error during find command: test.test :: caused by :: $add only supports numeric or date types, not string",
"code" : 14,
"codeName" : "TypeMismatch"
}
We don't know the behavior of "a" being [].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for using a 'collection' fixture was just to remove the boiler-plate logic of having to create a collection with a unique name (for safe parallelism) and make sure to drop it afterwards. We might end up having hundreds or thousands of test cases, so we ought to try to streamline as much as possible.
I agree that loading pre-defined collections might make it more difficult to see what the test is doing, but I'd prefer something like this to explicit drop/insert/drop statements
@pytest.mark.collection
def test_something(collection):
collection.insert({...})
assert(whatever())
Alternatively, something like this
@pytest.mark.documents([
{ "_id" : ObjectId("690eace782576799b86f4622"), "a" : 1 }
{ "_id" : ObjectId("690eacf382576799b86f4623"), "a" : "1" }
])
def my_test():
//
Maybe even with some basic generator functions.
@pytest.mark.documents([
repeat( 10, { "a" : randomString() } )
])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree, the fixture will make it easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes using pytest fixtures make sense. We can also use the documents decorator for declarative test data for a test case that needs it.
rfcs/005-functional-testing.md
Outdated
|
|
||
| **Key benefits:** | ||
| - Automated compatibility measurement and reporting | ||
| - Easy test authoring for contributors using familiar Python/PyMongo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume python is the primary lanuage for most test coverage. One exception is Python can not imitate "undefined" in JS which can be used in mongo shell, do we want to include that test coverage at initial stage? It requires JS executor. Or we should consider it language driver specific test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For which test do we need this?
rfcs/005-functional-testing.md
Outdated
|
|
||
| **Success criteria:** | ||
| - Users are able to run all functional tests against locally hosted or remotely hosted DocumentDB with a single command and no setup involved | ||
| - Contributors can simply add new test files and it would get picked up by the test runner and include in the functional test suite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we rely on contributor and reviewer to verify if test for a feature is comprehensive? Since the framework is also used to evaluate compatibility, I think we should have some system to validate the testing space for any feature supported in MongoDB API. Or better, some degree of automation that systematically generate test categories or even test cases based on predefined or contributor added rules. e.g. test category can be BASIC BSON TYPE TEST CASES, for any new feature that takes a input, we automatically generate test case for each supported bson type.
I can think of several levels of approach from full manual to fully automation (might be over-engineer):
- A well structured README list all the aspect or categories each new feature should consider to add test.
- A CI/CD test that use maintained rules to verify if a feature has all possible test categories. Like a operator should have a test for each data type input.
- A maintained set rules that generate templates for each new feature, and contributor can add individual test cases in them. Contributor should first add rules for the new feature. Think of adding a Horizontal feature, assume we just added the support of Decimal128, almost all features should have a test for it. If we have this level of automation, contributor just need to add new data type to the list of supported bson types, then they know the minmum test cases to add. The template also ensures we maintain a good folder structure.
- One step more, generate the test cases automatically. Contributor can still add more edge cases for the feature that special enough to not add as a new rule, but the automation covers the minimum.
I sugget we have 3.
rfcs/005-functional-testing.md
Outdated
| - Executes individual tests from the specified test suite | ||
| - Manages test lifecycle (setup, run test, collect result, cleanup) | ||
| - Handles test isolation (namespace isolation, data seeding) | ||
| - Runs tests in parallel internally using threads or processes based on parallelism configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test Runner also mentinoed it handles parallelism. It's not clear to me the differentiation of these two layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disscussed this,
- the parallelism means run multiple tests in parallel, not parallel command running needed by each test.
- the runner only plan the execution, executor do the rest.
rfcs/005-functional-testing.md
Outdated
| result_list = list(result) | ||
|
|
||
| # Assertions | ||
| assert len(result_list) == 2, "Expected to find exactly 2 electronics products" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show what log we generate and pass to Analyzer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed on this, we can use shared assertion functions to separate the raw log and assertion result type. Analyzer will only parse the formated assertion results. Raw logs are used for debugging.
rfcs/005-functional-testing.md
Outdated
|
|
||
| Fixtures contain test data (documents) that get inserted into the test collection. | ||
|
|
||
| Example - Product documents for testing (products.json): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think real application data adds value to functional tests, like each price is double
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't necessarily need to be real application data. We should use whatever data makes sense to appropriately test the desired behavior.
rfcs/005-functional-testing.md
Outdated
| 2. **Failed on DocumentDB Only (Compatibility Issue)** | ||
| - Test passed on MongoDB but failed on DocumentDB | ||
| - Indicates a compatibility gap | ||
| - **This is the primary concern** - these tests identify areas where DocumentDB has gaps in functionality | ||
|
|
||
| 3. **Failed on MongoDB Only (Difference in Behavior)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can a test run have assertions fails on DocumentDB and another test fails on MongoDB, the expected outputs of a test should from either DocumentDB or MongoDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your question is how can the test fail on both MongoDB and DocumentDB when in reality we would have taken the test assertion from at least one of them. This can happen when there is a regression on the engine where the test was originally passing.
rfcs/005-functional-testing.md
Outdated
| 1. **Custom Functional Test Suite**: A purpose-built test suite specifically designed for DocumentDB to validate functional correctness and compatibility with MongoDB. | ||
| 2. **MongoDB Service Tests Integration**: Leveraging existing MongoDB service tests without modification to measure compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - additionally what are the licensing requirements for doing this?
rfcs/005-functional-testing.md
Outdated
|
|
||
| The functional testing framework consists of several key components that work together to provide comprehensive testing capabilities. | ||
|
|
||
| **1. Test Runner:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any existing frameworks and runners we can leverage to do this? Like pytest or some base framework that does the heavy lifting for us here? and we only need to provide libraries to extend the db specific stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was considering pytest and it can actually handle the vast majority of this framework's desired functionality. I think we can go with that. We can keep the test result analyzer component which will do the post processing after we obtain the test results.
rfcs/005-functional-testing.md
Outdated
| **3. Result Analyzer:** | ||
| - **Purpose**: Analyzes and compares test results | ||
| - **Responsibilities**: | ||
| - Compares results between DocumentDB and MongoDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does comparison mean here? Would appreciate more detail on this one - i presume a test generates a pass or a fail explicitly.
Also why is this comparison between DocumentDB and MongoDB? Isn't the test written to define a contract? the underlying presumption is that if the test passes, we meet the "spec" or the contract here. We separately could run this against Other database implementations such as MongoDB to ascertain the relative merits of the tests and the contract, but why is that coupled here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original wording in the RFC is confusing. What I meant was that the test cases would be written with specific asserts and that test could be run against any engine that supports the Mongo protocol. The comparison here just means comparing the result (pass or fail) of the test cases across different engines. The comparison doesn't really mean comparing the actual result of each query against the various engines under consideration and determining if they are the same or different.
So essentially the goal is to do exactly what you mentioned visridha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, we would do specification-based testing, not result comparison testing
rfcs/005-functional-testing.md
Outdated
| ``` | ||
| functional-tests/ | ||
| ├── documentdb-functional-test/ # DocumentDB functional tests | ||
| │ ├── test_find.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to have only 1 test for something like find? is that desirable? How would you split up stuff like aggregation pipelines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And wouldn't this cause test files to be several thousands of lines? Is there a flow that allows for splitting these up more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory structure shown was just a high-level example. We'd definitely need more granular organization. For large features like find, we'd split by sub-functionality: basic queries, query operators, logical operators, projections, sorting, cursors, etc.
rfcs/005-functional-testing.md
Outdated
| @parallel_safe | ||
| class TestRBACFind(DocumentDBTestCase): | ||
|
|
||
| def setUp(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: style consistency - are we doing methods as snake_cased or camelCased?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! We should follow Python conventions and use snake_case for all method names. I'll update the examples to be consistent throughout the RFC.
rfcs/005-functional-testing.md
Outdated
| **Test Outcome Classification:** | ||
| For each test, analyze the results from both databases: | ||
|
|
||
| 1. **Passed on Both (Compatible)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a separate entity - also "Both" seems disingenuous - the test should describe a spec and should be targeted to an implementation. I would think this shouldn't be comparing against native Mongo as part of this spec. Imo We should set up a separate flow that compares across database implementations that may include MongoDB, or any other implementation as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this question is similar to the previous one. I wasn't clear in the RFC that we will do specification based testing. To run the test against multiple engines we will simply provide an option when running tests (e.g., --engines=documentdb,mongodb,other-engine).
- Changed to pytest-based approach instead of custom framework - Focused on functional correctness rather than compatibility testing - Updated architecture to use pytest + Result Analyzer - Added multi-cloud examples (AWS DocumentDB, Azure Cosmos DB) - Simplified test organization and fixture approach - Removed test dependencies and distributed execution complexity - Updated tagging system to use pytest markers - Combined distribution and running sections for better flow
This RFC proposes a functional testing framework to validate DocumentDB's correctness using specification-based testing.
Key Components:
• pytest-based framework: Leverages proven testing infrastructure instead of custom components
• Result Analyzer: Post-processes pytest output to generate functionality metrics
• Self-contained tests: Each test defines explicit specifications for DocumentDB behavior
• Multi-dimensional tagging: Uses pytest markers for test organization and filtering
Architecture:
• pytest + custom fixtures: Handles test discovery, execution, and parallelization
• Multiprocessing: Uses pytest-xdist to avoid Python GIL limitations
• Flexible deployment: Both git repository (contributors) and Docker image (cluster testing)
• Engine-agnostic: Tests can run against any MongoDB wire protocol-compatible engine
Output Formats:
• JSON Report: Machine consumption (APIs, monitoring, historical tracking)
• JUnit XML: GitHub Actions integration (PR reviews, build gating)
• Dashboard: Visual consumption (charts, trend analysis)
Key Benefits:
• Specification-based: Tests define expected DocumentDB behavior, not comparisons
• Easy contribution: Familiar pytest fixtures and markers
• Future-proof: Supports DocumentDB-unique features and functionality
The framework enables contributors to easily write functional tests that validate DocumentDB correctness against explicit specifications.
Closes #367