Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[native] Dynamically Linked Library in Presto CPP #24330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

soumiiow
Copy link

@soumiiow soumiiow commented Jan 7, 2025

Description

Depends on facebookincubator/velox#11439 in the Velox space
and based off of the following PR: https://github.com/facebookincubator/velox/pull/1005/files

Motivation and Context

Having these changes will enable users to register custom functions dynamically without requiring a fork of Prestissimo.

Impact

This extends Prestissimo functionality to include dynamic loading of functions, types, connectors, etc.

Test Plan

Unit tested. and Manually end to end tested the changes.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

@soumiiow soumiiow self-assigned this Jan 7, 2025
Copy link

linux-foundation-easycla bot commented Jan 7, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: soumiiow / name: Soumya Duriseti (526ee07)

@soumiiow soumiiow marked this pull request as ready for review January 7, 2025 18:04
@soumiiow soumiiow requested a review from a team as a code owner January 7, 2025 18:04
@tdcmeehan tdcmeehan self-assigned this Jan 8, 2025
@tdcmeehan tdcmeehan added the from:IBM PR from IBM label Jan 30, 2025
@prestodb-ci prestodb-ci requested review from a team, pdabre12 and psnv03 and removed request for a team January 30, 2025 18:47
Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rebase.

@@ -76,7 +76,8 @@ target_link_libraries(
${FOLLY_WITH_DEPENDENCIES}
${GLOG}
${GFLAGS_LIBRARIES}
pthread)
pthread
velox_dynamic_function_loader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this before the velox_encode library.

@@ -89,6 +90,7 @@ set_property(TARGET presto_server_lib PROPERTY JOB_POOL_LINK
presto_link_job_pool)

add_executable(presto_server PrestoMain.cpp)
target_link_options(presto_server BEFORE PUBLIC "-Wl,-export-dynamic")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here why we need these flags?

const fs::path path(systemConfig->pluginDir());
PRESTO_STARTUP_LOG(INFO) << path;
std::error_code
ec; // For using the non-throwing overloads of functions below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don;t need this comment here and so can fix up the odd formatting.

void PrestoServer::registerDynamicFunctions() {
auto systemConfig = SystemConfig::instance();
if (!systemConfig->pluginDir().empty()) {
// if it is a valid directory, traverse and call dynamic function loader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the comments are full sentences beginning with capitalization etc.

auto dirEntryPath = dirEntry.path();
if (!fs::is_directory(dirEntry, ec) &&
extensions.find(dirEntryPath.extension()) != extensions.end()) {
facebook::velox::loadDynamicLibrary(dirEntryPath.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facebook is not needed here because we are already in the facebook namespace.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! A few minor formatting and phrasing suggestions.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I really liked your including the setup steps on plugin.rst. Some minor suggestions for formatting and phrasing but looks good overall.

@soumiiow
Copy link
Author

Thanks @steveburnett please take another look, I've made the changes. I'm not sure about the tone on the intro to UDFs i have on function_plugin.rst, would appreciate another set of eyes there.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision! Nice work, your unordered list of UDF benefits was great and the format fixes in the README look good.

I made a couple of small suggestions about the intro to UDFs, let me know what you think.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed this spelling nit earlier! After I found this one, I did a complete review of the doc in this PR and found no other errors so I think this is the last one.

@soumiiow soumiiow requested a review from PingLiuPing February 19, 2025 22:19
@soumiiow
Copy link
Author

Hey @pedroerp! I was looking through remote_function_server.json to design the config for the dylib changes to be simple to use and similar to the remote fn registrations so as to make for a seamless user experience.

I was curious as to the intended purpose of the schema object for the remote function registrations.

In prestissimo, when we use a prefix/namespace like presto.default, is the equivalent for remote function registrations that "default" would be the schema and that "presto" would be the prefix?

also, i notice that in PrestoServer.cpp, prefix is set to systemConfig->remoteFunctionServerCatalogName(). For remote functions, does this imply that the prefix name would be set for all remote fns and only the schema name would change?

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
add_subdirectory(examples)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is potential issues with regard to upgrading.
You need to fix this by changing some script to trigger the building of the dynamic library when upgrading (from customer perspective). This gurrantee the ABI compatibility in case of major compiler upgrade.
Well this requires the source code should be under some specific directory of presto installation path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline about ABI compatibility issues and came up with documentation changes to alert the users to rebuild shared libraries upon upgrades manually.
presto-docs/src/main/sphinx/presto_cpp/plugin.rst

@soumiiow
Copy link
Author

Updated the RFC with the discussions regarding adding a Json config, function validation, customizable entrypoint, and signal handling so as to not crash the worker here prestodb/rfcs#24. please take a look!

@soumiiow soumiiow force-pushed the dylib_new branch 2 times, most recently from a8d2204 to 173ecc9 Compare March 27, 2025 01:16
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! This will be great to have in the doc. I have some suggestions for conciseness and readability, and a couple of questions for you to consider. Let me know what you think, please!

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : Did a quick pass of your code. Its overall a good design for the first-cut implementation.

/// "my_function": [
/// {
/// "outputType": "integer",
/// "entrypoint": "nameOfRegistryFnCall",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the formating of this line.

int64_t compareConfigWithRegisteredFunctionSignatures(
facebook::velox::FunctionSignatureMap fnSignaturesBefore);

std::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a shorthand for
std::unordered_map<std::string, std::vectorvelox::exec::FunctionSignaturePtr> with "using" statement.

DynamicLibraryValidator dVal(configPath, systemConfig->pluginDir());
auto filenameAndEntrypointMap = dVal.getEntrypointMap();
auto registeredFnSignaturesBefore = velox::getFunctionSignatures();
for (const auto& entryPointItr : filenameAndEntrypointMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic can be moved into the DynamicLibraryValidator class (maybe call it DynamicLibraryLoader instead).

The code here would only read the config entries for plugin.dir and dynamiclibraryvalidator and pass them to DynamicLibraryLoader.

@@ -197,5 +197,52 @@ TEST_F(JsonSignatureParserTest, multiple) {
EXPECT_EQ(signature1->argumentTypes()[1].baseName(), "varchar");
}

TEST_F(JsonSignatureParserTest, dynamic) {
auto input = R"(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a function which uses a complex type for either paramType or outputType

}
})";

// Emulate user provided config file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a some repetition of code between this and the test below. Can you make common functions for reuse ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to separate the directories for this file from the previous directory ?

If yes, then it might be worth trying a recursive directory scenario as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the usage of "My" in the naming of these files.

@soumiiow soumiiow force-pushed the dylib_new branch 2 times, most recently from 94d57ba to 9ef0b63 Compare April 7, 2025 22:36
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated doc! Just a couple of small nits.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc update! A couple minor formatting issues, nothing important.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @soumiiow. Have a bunch of minor comments.

@@ -188,6 +188,8 @@ class PrestoServer {

VeloxPlanValidator* getVeloxPlanValidator();

virtual void registerDynamicFunctions();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual might not be needed here. Please can you remove it.

void PrestoServer::registerDynamicFunctions() {
auto systemConfig = SystemConfig::instance();
std::error_code ec;
const fs::path configPath(systemConfig->dynamicLibraryValidatorConfig());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should check for empty config and just log a message and return in that case.

LOG(ERROR) << "Config file not found in path: " << configPath
<< ". Dynamic libraries will not be loaded on this worker run.";
return;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else is not needed because of the return in the if part of the condition.

<< ". Dynamic libraries will not be loaded on this worker run.";
return;
} else {
if (!systemConfig->pluginDir().empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a config but the pluginDir is not there, then should there be a log message for it ?

# See the License for the specific language governing permissions and
# limitations under the License.

add_library(presto_function_my_dynamic SHARED DynamicFunction.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the usage of "my" in all the library names, etc in this file.

# See the License for the specific language governing permissions and
# limitations under the License.

add_library(presto_function_my_dynamic SHARED DynamicFunction.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its alright to remove the usage of the presto prefix as well.

outputFile.close();
}

void emulateConfigFile(std::string_view json, std::string pathStr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code.

outputFile.close();
}

void emulateConfigFile(std::string_view json, std::string pathStr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit rename "createConfigFile"


// Check functions do not exist first.
auto registeredFnSignaturesBefore = velox::getFunctionSignatures();
EXPECT_TRUE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write functions for existsBefore/After or notExistsBefore/After and use them in the tests instead of doing a find and checking the returned iterator.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Just one nit of formatting.

@soumiiow soumiiow force-pushed the dylib_new branch 2 times, most recently from d9ea7d6 to 55d1941 Compare April 29, 2025 16:43
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I think these suggestions should fix the doc.

Copy link
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before looking too deeply into this, there are failures in the CI. Please investigate.

@@ -100,6 +105,16 @@ set_property(TARGET presto_server_lib PROPERTY JOB_POOL_LINK
presto_link_job_pool)

add_executable(presto_server PrestoMain.cpp)
target_link_options(presto_server PRIVATE "-no-pie")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a problem because there are security concerns with this. Is this because the shared library loaded is not setting PIC?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, looks like something to do with PIC codes...

However, when I remove this flag on my linux build and try to make, I get the below error...
/usr/bin/ld: /usr/local/lib/libthriftprotocol.a(CompactProtocol.cpp.o): relocation R_X86_64_PC32 against undefined hidden symbol _ZTCN6apache6thrift8protocol18TProtocolExceptionE0_NS0_17TLibraryExceptionE' can not be used when making a PIE object
/usr/bin/ld: final link failed: bad value`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the thrift library is being built statically and therefore setting the PIC is not happening because its a static library

@@ -70,12 +83,19 @@ JsonSignatureParser::FunctionSignatureItem parseSignature(
std::move(paramTypeSignatures),
std::move(constantArguments),
/*variableArity=*/false),
(schema != nullptr) ? schema->asString() : ""};
(schema != nullptr) ? schema->asString() : "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird because these are additional arguments to FunctionSignature so should all be aligned with the previous ones. The make format-fix didn't do anything here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i just reran make format-fix and nothing happened...
I think (schema != nullptr) ? schema->asString() : "", is an addtl argument to FunctionSignatureItem not FunctionSignature and is therefore indented at the same level as FunctionSignature

@soumiiow
Copy link
Author

soumiiow commented May 5, 2025

@czentgr I am currently expecting CI build errors due to the relevant velox PR still in review and is yet to be merged. facebookincubator/velox#12878

Other than that, I see CI errors in product-tests-specific-environment2 that are separate from this that I will look into

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow :

Have you used the functions in the examples folder anywhere ... don't think they are exercised in any test ? Are you intending that they will be used in an e2e test ?

}
if (!systemConfig->pluginDir().empty()) {
const fs::path path(systemConfig->pluginDir());
PRESTO_STARTUP_LOG(INFO) << "Dynamic library loading path: " << path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this log message to "Loading dynamic libraries from in "

<< systemConfig->pluginDir() << "is invalid.";
}
} else {
LOG(ERROR) << "plugin.dir configuration not set up.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many installations like say Meta will not setup plugin.dir, so this shouldn't be an error right ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a great point. @aditi-pandit is it ok to make it a LOG(INFO) statement or would it be unnecessary given that not all installations will want to set a plugin dir

std::string name;
if (nameSpace.empty()) {
// auto systemConfig = SystemConfig::instance();
// std::string defaultPrefix = systemConfig->prestoDefaultNamespacePrefix();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

namespace facebook::velox::common::dynamicRegistry {
template <typename T>
struct DynamicFunction {
FOLLY_ALWAYS_INLINE bool call(int64_t& result, int64_t in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost the same as the previous function. Is this needed ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. getting rid of this

@soumiiow
Copy link
Author

soumiiow commented May 5, 2025

Have you used the functions in the examples folder anywhere ... don't think they are exercised in any test ? Are you intending that they will be used in an e2e test ?

hi @aditi-pandit yes currently they're not being used in any tests, theyll serve as examples alongside the presto docs documentation for now. the idea for the e2e tests is that yes they should be making use of examples to test for proper registration

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants