Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

liferoad
Copy link
Contributor

@liferoad liferoad commented Aug 24, 2025

Addresses #35858

Update MongoDB Java driver from 3.12.11 to 5.5.0 and refactor code to use new API Add mongo-bson dependency required by new driver version Replace deprecated MongoClient with MongoClients and update GridFS implementation

MongoDB Java driver 5.5.0 supports MongoDB >=6.0

https://www.mongodb.com/docs/drivers/java/sync/current/reference/compatibility/


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Update MongoDB Java driver from 3.12.11 to 5.5.0 and refactor code to use new API
Add mongo-bson dependency required by new driver version
Replace deprecated MongoClient with MongoClients and update GridFS implementation
Replace deprecated MongoClient with MongoClients.create() and update database drop method
Add mongodb-driver-core to support MongoDB Java driver functionality.
Also mark mongo_java_driver as permitUnusedDeclared and add testImplementation.
Update embedded MongoDB test dependency to version 3.5.4 and simplify split key filtering logic by using BsonObjectId for range queries. This ensures proper type handling when filtering MongoDB documents by _id field.
Add mongodb-driver-core version 5.5.0 to support MongoDB Java driver functionality
@liferoad
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the MongoDB Java driver from version 3.12.11 to 5.5.0. The changes correctly refactor the code to use the new driver's API, especially for MongoClient creation and the GridFS implementation. The dependency updates in the build files are also appropriate. I have a couple of suggestions for improvement: one to use a library alias for a dependency for better maintainability, and another to simplify a method by removing a redundant check.

@liferoad
Copy link
Contributor Author

A lot of changes. If we think these are good, I can update CHANGES.md. We also need to consider Dataflow Templates.

Remove redundant null check and consolidate uri handling in MongoDbGridFSIO
@liferoad liferoad changed the title [Test-only] feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 Aug 24, 2025
@liferoad liferoad requested a review from Abacn August 24, 2025 21:03
@liferoad liferoad marked this pull request as ready for review August 24, 2025 21:03
Copy link
Contributor

Assigning reviewers:

R: @ahmedabu98 for label java.
R: @Abacn for label build.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this update is not trivial: #34100. Just had one comment. The change looks good if we are confident about current test coverage.

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Consider add an item to CHANGES.md since this is a major version bump?

@liferoad
Copy link
Contributor Author

Thanks!

Consider add an item to CHANGES.md since this is a major version bump?

Let me do this later. I want to merge this first and then use SNAPSHOT to test Dataflow Templates to get more testing coverage.

@liferoad liferoad merged commit cfd07be into apache:master Aug 27, 2025
29 checks passed
damccorm added a commit that referenced this pull request Aug 27, 2025
* sdks/python: properly make milvus as extra dependency

* sdks/python: update image requirements

* .github: trigger postcommit python

* sdks/python: fix linting issues

* sdks/python: fix formatting issues

* .github: trigger beam postcommit python

* sdks/python: revert milvus version in itests

* sdks/python: update image requirements

* trigger_files: trigger postcommit python

* Bump github.com/docker/go-connections from 0.5.0 to 0.6.0 in /sdks (#35906)

Bumps [github.com/docker/go-connections](https://github.com/docker/go-connections) from 0.5.0 to 0.6.0.
- [Commits](docker/go-connections@v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/docker/go-connections
  dependency-version: 0.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add the readme link to new YAML examples (#35941)

* Bump google.golang.org/api from 0.247.0 to 0.248.0 in /sdks (#35969)

* Remove mysql-connector-python dependency (#35932)

* Fix typos and update test implementation from #35656 (#35958)

* implement lambda name pickling in cloudpickle

* add enable_lambda_name to __init__

* fix formatting and lint

* fix typo

* fix code paths in test

* fix tests

* fix lint

* fix formatting and failing test

* fix formatting again

* remove cloudpickle implementation to leave only typo fixes and fixing test structure.

* fix _make_function typo

* revert regex

* fix failing tests

* fix formatting

* update prefix to not hardcode

* feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 (#35946)

* feat(mongodb): upgrade MongoDB Java driver to version 5.5.0

Update MongoDB Java driver from 3.12.11 to 5.5.0 and refactor code to use new API
Add mongo-bson dependency required by new driver version
Replace deprecated MongoClient with MongoClients and update GridFS implementation

* refactor(mongodb): update MongoDB client usage to modern API

Replace deprecated MongoClient with MongoClients.create() and update database drop method

* build(dependencies): add mongodb driver core dependency

Add mongodb-driver-core to support MongoDB Java driver functionality.
Also mark mongo_java_driver as permitUnusedDeclared and add testImplementation.

* fix(mongodb): update embedded mongo version and fix split key filtering

Update embedded MongoDB test dependency to version 3.5.4 and simplify split key filtering logic by using BsonObjectId for range queries. This ensures proper type handling when filtering MongoDB documents by _id field.

* build: add mongodb-driver-core dependency

Add mongodb-driver-core version 5.5.0 to support MongoDB Java driver functionality

* use version

* refactor: simplify mongo client creation logic

Remove redundant null check and consolidate uri handling in MongoDbGridFSIO

* Bump github.com/aws/aws-sdk-go-v2/credentials in /sdks (#35974)

Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.18.6 to 1.18.7.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/config/v1.18.7/CHANGELOG.md)
- [Commits](aws/aws-sdk-go-v2@config/v1.18.6...config/v1.18.7)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/credentials
  dependency-version: 1.18.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump google.golang.org/grpc from 1.74.2 to 1.75.0 in /sdks (#35971)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.74.2 to 1.75.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.74.2...v1.75.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.75.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Override localhost endpoint when a worker is running in docker on mac (#35964)

* fix(parquetio): handle missing nullable fields in row conversion (#35948)

* fix(parquetio): handle missing nullable fields in row conversion

Add null value handling when converting rows to Arrow tables for nullable fields that are missing from input data. This fixes KeyError when writing to Parquet with missing nullable fields, addressing issue #35791.

* fix lint

* Bump cloud.google.com/go/storage from 1.56.0 to 1.56.1 in /sdks (#35980)

Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.56.0 to 1.56.1.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@spanner/v1.56.0...storage/v1.56.1)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-version: 1.56.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Prism] Fix segv when docker container self-terminated. (#35977)

* Fix segv when docker container is self-terminated

* Add some debug logging for docker and process env.

* add a jinja % include/import pipeline example to docs (#35931)

* add a jinja include pipeline example

* update yaml doc with import example

* address gemini and other comments

* fix table of contents for readme

* add link to jinja pipeline examples

* Bump github.com/aws/aws-sdk-go-v2/config from 1.31.2 to 1.31.3 in /sdks (#35983)

* Add a security GCP log analyzer (#35922)

* Add the base log_analyzer

* Add github action for security logging

* Enhance LogAnalyzer to filter logs by time range and include file names in event summary

* Add dry-run option for weekly email report generation in LogAnalyzer

* Better error handling for timezones and missing details

* Refactor LogAnalyzer to use SinkCls for type consistency and enhance bucket permission management for log sinks

* update py containers (#35982)

* [YAML]: add import jinja pipeline example (#35945)

* add import jinja pipeline example

* revert name change

* update overall examples readme

* fix lint issue

* fix gemini small issue

* Update sdks/python/apache_beam/yaml/examples/transforms/jinja/import/README.md

---------

Co-authored-by: tvalentyn <[email protected]>

* workflows: capture DinD tests in PreCommit Py Coverage workflow

* workflows: temporarily removing `ubuntu-latest` till resolving deps

* workflows: add `matrix.os` label to `beam_PreCommit_Python_Coverage`

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Mohamed Awnallah <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chamikara Jayalath <[email protected]>
Co-authored-by: Yi Hu <[email protected]>
Co-authored-by: kristynsmith <[email protected]>
Co-authored-by: liferoad <[email protected]>
Co-authored-by: Shunping Huang <[email protected]>
Co-authored-by: Derrick Williams <[email protected]>
Co-authored-by: Enrique Calderon <[email protected]>
Co-authored-by: Ahmed Abualsaud <[email protected]>
Co-authored-by: tvalentyn <[email protected]>
@Abacn
Copy link
Contributor

Abacn commented Aug 28, 2025

Looks like this caused PostCommit SQL test failing (#35514): https://github.com/apache/beam/runs/49030465739

java.lang.AssertionError: 
    Expected: <Document{{$or=[Document{{c_varchar=varchar}}, Document{{c_varchar=Document{{$not=Document{{$eq=fakeString}}}}}}], c_boolean=true, c_integer=2147483647}}>
     but: was <Document{{$and=[Document{{$or=[Document{{c_varchar=varchar}}, Document{{c_varchar=Document{{$not=Document{{$eq=fakeString}}}}}}]}}, Document{{c_boolean=true}}, Document{{c_integer=2147483647}}]}}>
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)

need to fix assertion after checking whether it's a test issue

@liferoad
Copy link
Contributor Author

#36004 to check whether we can fix the SQL postcommit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants