RapidUDF is a high-performance SIMD vectorized expression/script computation execution engine library designed for online systems. It can be used in scenarios requiring high performance and flexibility such as rule engines, storage systems, and feature computation.
- C++17
- Easy to Use:
- Provides support for conventional expression syntax
- For more complex logic, supports a C-like DSL including if-elif-else* conditional control, while loop control, auto temporary variables, etc.;
- For columnar memory data (vector), provides dynamic Table APIs similar to Spark's DataFrame and operations like filter/order_by/topk/take;
- High Performance:
- Based on LLVM JIT compilation, startup and execution performance comparable to native cpp implementation;
- For columnar memory data (vector), provides SIMD vectorization acceleration implementation
- Thread Safe:
- State-less JIT-generated C methods are naturally thread-safe
- FFI:
- Supports zero-cost access to C++ defined class objects (custom classes/stl/protobufs/flatbuffers/...) in expressions/UDFs
- Supports zero-cost calls to methods/class methods defined in C++ within expressions/UDFs
- Rich Built-in Data Types, Operators, and Functions:
Compilation requires a compiler that supports C++17
Add in WORKSPACE:
git_repository(
name = "rapidudf",
remote = "https://github.com/yinqiwen/rapidudf.git",
commit = "...",
)
load("@rapidudf//:rapidudf.bzl", "rapidudf_workspace")
rapidudf_workspace()Add in the BUILD file for relevant code compilation rules:
cc_library(
name = "mylib",
srcs = ["mylib.cc"],
hdrs = [
"mylib.h",
],
deps = [
"@rapidudf",
],
)First, compile and instal rapidudf
cd <rapidudf src dir>
mkdir build; cd build;
cmake ..
make installAdd the following to the CMake configuration of the related project:
find_package(rapidudf REQUIRED)
....
# link rapidudf
target_link_libraries(mylib PRIVATE rapidudf::rapidudf)#include "rapidudf/rapidudf.h"
int main() {
// 1. If needed, set up rapidudf logger
// std::shared_ptr<spdlog::logger> mylogger;
// rapidudf::set_default_logger(mylogger);
// 2. Expression string
std::string expression = "x >= 1 && y < 10";
// 3. Compile to generate Function, the generated Function object can be saved for subsequent repeated execution; compilation usually takes between 10ms-100ms;
rapidudf::JitCompiler compiler;
// CompileExpression's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types;
// Variable names used in the expression need to be passed in as a parameter name list, otherwise compilation fails
auto result = compiler.CompileExpression<bool, int, int>(expression, {"x", "y"});
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
// 4. Execute function
rapidudf::JitFunction<bool, int, int> f = std::move(result.value());
bool v = f(2, 3); // true
v = f(0, 1); // false
return 0;
};Fibonacci function
#include "rapidudf/rapidudf.h"
int main() {
// 1. If needed, can set up rapidudf logger
// std::shared_ptr<spdlog::logger> mylogger;
// rapidudf::set_default_logger(mylogger);
// 2. UDF string
std::string source = R"(
int fib(int n)
{
if (n <= 1){
return n;
}
// Supports cpp // comments
return fib(n - 1) + fib(n - 2); // Recursive call
}
)";
// 3. Compile to generate Function, the generated Function object can be saved for subsequent repeated execution; compilation usually takes between 10ms-100ms;
rapidudf::JitCompiler compiler;
// CompileFunction's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types
auto result = compiler.CompileFunction<int, int>(source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
// 4. Execute function
rapidudf::JitFunction<int, int> f = std::move(result.value());
int n = 9;
int x = f(n); // 34
RUDF_INFO("fib({}):{}", n, x);
return 0;
};#include "rapidudf/rapidudf.h"
using namespace rapidudf;
int main() {
// 2. UDF string
std::string source = R"(
simd_vector<f32> boost_scores(Context ctx, simd_vector<string_view> location, simd_vector<f32> score)
{
auto boost=(location=="home"?2.0_f32:0_f32);
return score*boost;
}
)";
// 3. Compile to generate Function, the generated Function object can be saved for subsequent use
rapidudf::JitCompiler compiler;
// CompileFunction's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types
// 'rapidudf::Context' is a mandatory parameter involved in arena memory allocation in the simd implementation
auto result =
compiler.CompileFunction<simd::Vector<float>, rapidudf::Context&, simd::Vector<StringView>, simd::Vector<float>>(
source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
// 4.1 Test data, need to convert raw data into columnar data
std::vector<float> scores;
std::vector<std::string> locations;
for (size_t i = 0; i < 4096; i++) {
scores.emplace_back(1.1 + i);
locations.emplace_back(i % 3 == 0 ? "home" : "other");
}
// 5. Execute function
rapidudf::Context ctx;
auto f = std::move(result.value());
auto new_scores = f(ctx, ctx.NewSimdVector(locations), ctx.NewSimdVector(scores));
for (size_t i = 0; i < new_scores.Size(); i++) {
// RUDF_INFO("{}", new_scores[i]);
}
return 0;
};RapidUDF supports dynamically creating vector tables, allowing arbitrary computational operations on table columns (accelerated through SIMD) in expressions/UDFs; The table class also provides operations similar to Spark DataFrame, such as:
.filter(simd::Vector<Bit>)returns a new table instance filtered by condition.order_by(simd::Vector<T> column, bool descending)returns a new table instance sorted by condition.topk(simd::Vector<T> column, uint32_t k, bool descending)returns a new table instance with top k entries
#include "rapidudf/rapidudf.h"
using namespace rapidudf;
struct Student {
std::string name;
uint16_t age = 0;
float score = 0;
bool gender = false;
};
RUDF_STRUCT_FIELDS(Student, name, age, score, gender)
int main() {
// 1. Create table schema
auto schema =
simd::TableSchema::GetOrCreate("Student", [](simd::TableSchema* s) { std::ignore = s->AddColumns<Student>(); });
// 2. UDF string, table<TABLE_NAME> generic format where TABLE_NAME must match the previously created table schema name
// table supports filter/order_by/topk/take, etc. operations
std::string source = R"(
table<Student> select_students(Context ctx, table<Student> x)
{
auto filtered = x.filter(x.score >90 && x.age<10);
// Sort by score in descending order and take top 10
return filtered.topk(filtered.score,10,true);
}
)";
// 3. Compile to generate Function, the generated Function object can be saved for subsequent use
rapidudf::JitCompiler compiler;
// CompileFunction's template parameters support multiple types, the first template parameter is the return type, the rest are function parameter types
auto result = compiler.CompileFunction<simd::Table*, Context&, simd::Table*>(source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
auto f = std::move(result.value());
// 4.1 Test data, need to convert raw data into columnar data
std::vector<Student> students;
for (size_t i = 0; i < 128; i++) {
float score = (i + 1) % 150;
uint16_t age = i % 5 + 8;
bool gender = i % 2 == 0;
students.emplace_back(Student{"test_" + std::to_string(i), age, score, gender});
}
// 4.2 Create table instance
rapidudf::Context ctx;
auto table = schema->NewTable(ctx);
std::ignore = table->AddRows(students);
// 5. Execute function
auto result_table = f(ctx, table.get());
auto result_scores = result_table->Get<float>("score").value();
auto result_names = result_table->Get<StringView>("name").value();
auto result_ages = result_table->Get<uint16_t>("age").value();
auto result_genders = result_table->Get<Bit>("gender").value();
for (size_t i = 0; i < result_scores.Size(); i++) {
RUDF_INFO("name:{},score:{},age:{},gender:{}", result_names[i], result_scores[i], result_ages[i],
result_genders[i] ? true : false);
}
return 0;
};RapidUDF can also create a table from Protobuf/Flatbuffers, avoiding the tedious process of creating a TableSchema. Building table instances can be done directly from arrays of Protobuf objects such as std::vector<T>, std::vector<const T*>, std::vector<T*>.
Here is an example of creating a vector table based on Protobuf;
Examples based on flatbuffers can be found in fbs_vector_table_udf;
Examples based on struct can be found in struct_vector_table_udf;
#include "rapidudf/examples/student.pb.h"
#include "rapidudf/rapidudf.h"
using namespace rapidudf;
int main() {
// 1. Create table schema
auto schema = simd::TableSchema::GetOrCreate(
"Student", [](simd::TableSchema* s) { std::ignore = s->AddColumns<examples::Student>(); });
// 2. UDF string
std::string source = R"(
table<Student> select_students(Context ctx, table<Student> x)
{
auto filtered = x.filter(x.score >90 && x.age<10);
// Sort in descending order
return filtered.topk(filtered.score,10, true);
}
)";
// 3. Compile to generate Function, the generated Function object can be saved for subsequent use
rapidudf::JitCompiler compiler;
auto result = compiler.CompileFunction<simd::Table*, Context&, simd::Table*>(source);
if (!result.ok()) {
RUDF_ERROR("{}", result.status().ToString());
return -1;
}
auto f = std::move(result.value());
// 4.1 Test data
std::vector<examples::Student> students;
for (size_t i = 0; i < 150; i++) {
examples::Student student;
student.set_score((i + 1) % 150);
student.set_name("test_" + std::to_string(i));
student.set_age(i % 5 + 8);
students.emplace_back(std::move(student));
}
// 4.2 Create table instance and populate data
rapidudf::Context ctx;
auto table = schema->NewTable(ctx);
std::ignore = table->AddRows(students);
// 5. Execute function
auto result_table = f(ctx, table.get());
// 5.1 Fetch columns
auto result_scores = result_table->Get<float>("score").value();
auto result_names = result_table->Get<StringView>("name").value();
auto result_ages = result_table->Get<int32_t>("age").value();
for (size_t i = 0; i < result_scores.Size(); i++) {
RUDF_INFO("name:{},score:{},age:{}", result_names[i], result_scores[i], result_ages[i]);
}
return 0;
};RapidUDF incorporates an LRU cache with keys as the string of expressions/UDFs. Users can retrieve compiled JitFunction objects from the cache to avoid parse/compile overhead each time they are used:
std::vector<int> vec{1, 2, 3};
JitCompiler compiler;
JsonObject json;
json["key"] = 123;
std::string content = R"(
bool test_func(json x){
return x["key"] == 123;
}
)";
auto rc = GlobalJitCompiler::GetFunction<bool, const JsonObject&>(content);
ASSERT_TRUE(rc.ok());
auto f = std::move(rc.value());
ASSERT_TRUE(f(json));
ASSERT_FALSE(f.IsFromCache()); // 第一次编译
rc = GlobalJitCompiler::GetFunction<bool, const JsonObject&>(content);
ASSERT_TRUE(rc.ok());
f = std::move(rc.value());
ASSERT_TRUE(f(json));
ASSERT_TRUE(f.IsFromCache()); //后续从cache中获取- Using Custom C++ Classes in Expressions/UDFs
- Using Member Functions of Custom C++ Classes in Expressions/UDFs
- Using Protobuf Objects in Expressions/UDFs
- Using FlatBuffers Objects in Expressions/UDFs
- Using STL Objects in Expressions/UDFs
There are more examples for different scenarios in the tests code directory.
Since RapidUDF is based on LLVM Jit, it theoretically can achieve performance very close to native C++ code. Comparison results for compiling the Fibonacci method with O0:
Benchmark Time CPU Iterations
---------------------------------------------------------------
BM_rapidudf_fib_func 22547 ns 22547 ns 31060
BM_native_fib_func 38933 ns 38933 ns 17964
Fibonacci method GCC O2 compilation comparison results:
Benchmark Time CPU Iterations
---------------------------------------------------------------
BM_rapidudf_fib_func 22557 ns 22555 ns 31065
BM_native_fib_func 19246 ns 19239 ns 36395
Note: The Jit implementation currently uses the same jit compilation logic under O0/O2 compilation switches; theoretically, the generated code should be identical.
The following tests were run on a CPU that supports AVX2, with the compilation optimization flag O2, and an array length of 4099.
The calculation is to execute the double array x + (cos(y - sin(2 / x * pi)) - sin(x - cos(2 * y / pi))) - y; theoretically, the acceleration ratio should be the multiple of the AVX2 register width to the double width, which is 4.
Actual results are as follows, showing that the acceleration ratio has exceeded 4 times, reaching 6.09:
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_rapidudf_expr_func 207713 ns 207648 ns 3362
BM_rapidudf_vector_expr_func 33962 ns 33962 ns 20594
BM_native_func 207145 ns 207136 ns 3387
Original function prototype:
float wilson_ctr(float exp_cnt, float clk_cnt) {
return std::log10(exp_cnt) *
(clk_cnt / exp_cnt + 1.96 * 1.96 / (2 * exp_cnt) -
1.96 / (2 * exp_cnt) * std::sqrt(4 * exp_cnt * (1 - clk_cnt / exp_cnt) * clk_cnt / exp_cnt + 1.96 * 1.96)) /
(1 + 1.96 * 1.96 / exp_cnt);
}Corresponding vector UDF script implementation:
simd_vector<f32> wilson_ctr(Context ctx, simd_vector<f32> exp_cnt, simd_vector<f32> clk_cnt)
{
return log10(exp_cnt) *
(clk_cnt / exp_cnt + 1.96 * 1.96 / (2 * exp_cnt) -
1.96 / (2 * exp_cnt) * sqrt(4 * exp_cnt * (1 - clk_cnt / exp_cnt) * clk_cnt / exp_cnt + 1.96 * 1.96)) /
(1 + 1.96 * 1.96 / exp_cnt);
}Theoretically, the acceleration ratio should be the multiple of the AVX2 register width to the float width, which is 8;
Actual results are as follows, showing that the acceleration ratio has exceeded 8 times, reaching 10.5:
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_native_wilson_ctr 69961 ns 69957 ns 9960
BM_rapidudf_vector_wilson_ctr 6661 ns 6659 ns 105270