Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Parallelization of reducing the size of embedded features#2818

Open
Maslyanko wants to merge 8 commits into
catboost:masterfrom
Maslyanko:parallelization-of-reducing-the-size-of-embeded-features
Open

Parallelization of reducing the size of embedded features#2818
Maslyanko wants to merge 8 commits into
catboost:masterfrom
Maslyanko:parallelization-of-reducing-the-size-of-embeded-features

Conversation

@Maslyanko
Copy link
Copy Markdown

Based on #2743.

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

@andrey-khropov andrey-khropov changed the title Parallelization of reducing the size of embeded features Parallelization of reducing the size of embedded features Mar 13, 2025
localExecutor->ExecRange([&](int id) {
int linesPerThread = samplesCount / numThreads;

for (int i = id; i < samplesCount; i += numThreads) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/espetrov/arcadia/catboost/private/libs/feature_estimator/base_embedding_feature_estimator.h:300:36: error: comparison of integers of different signs: 'int' and 'const ui64' (aka 'const unsigned long') [-Werror,-Wsign-compare]
  300 |                 for (int i = id; i < samplesCount; i += numThreads) {

NPar::ILocalExecutor* localExecutor
) const {
int numThreads = localExecutor->GetThreadCount();
TVector<TKNNCalcer> featureCalcers(numThreads);;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra ;

TConstArrayRef<TCalculatedFeatureVisitor> testVisitors,
NPar::ILocalExecutor* localExecutor
) const {
int numThreads = localExecutor->GetThreadCount();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be GetThreadCount() + 1 to include main thread

TConstArrayRef<TCalculatedFeatureVisitor> visitors,
NPar::ILocalExecutor* localExecutor) const {

const ui32 numThreads = localExecutor->GetThreadCount();
Copy link
Copy Markdown
Contributor

@Evgueni-Petrov-aka-espetrov Evgueni-Petrov-aka-espetrov Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ 1

Comment on lines +281 to +295
localExecutor->ExecRange([&](int id) {
int linesPerThread = samplesCount / numThreads;
int begin = id * linesPerThread;
int end = (id + 1) * linesPerThread;
if (id == numThreads - 1) {
end = samplesCount;
}

for (int i = begin; i < end; ++i) {
auto line = learnPermutation[i];
const TEmbeddingsArray& vector = learnDataset.GetVector(line);
vectorsNeighbors[line] = featureCalcers[id].GetNearestNeighborsAndDistances(vector);
calcerVisitors[id].Update(target[line], vector, &featureCalcers[id]);
}
}, params, NPar::TLocalExecutor::WAIT_COMPLETE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please reformat all similar code as follows

            localExecutor->ExecRange(
                [&](int id) {
                    // ...
                }, 
                params, 
                NPar::TLocalExecutor::WAIT_COMPLETE);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants