Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@yupyub
Copy link
Contributor

@yupyub yupyub commented Jan 9, 2024

While using the W2V model, a vulnerability arises, resulting in a memory error if the input stream data contains empty lines without characters.

Cause
During the reading of stream data, if a line contains only a newline character, the num_nnz variable is incremented by 1. code

data_size = len(data) # 0
_vali_size = min(vali_n, len(data) - 1) # -1
num_nnz += (data_size - _vali_size) # +1

Later on, num_nnz is utilized as total_lines in the _sort_and_compressed_binarization() function.
The values stored in the path file are pass to the records vector, and this vector is read based on the total_lines. code
If the calculation of num_nnz is inflated due to the newline, it exceeds the index of the records vector, leading to references outside the bounds.
Consequently, reading unexpected values triggers a segment fault or program malfunction.

Changes
In instances where an empty line is inputted, it has been modified to be disregarded using the continue statement. Additionally, a typo identified during debugging has been rectified.

@ita9naiwa ita9naiwa self-requested a review January 9, 2024 11:46
@ita9naiwa ita9naiwa merged commit efd7d0c into kakao:dev Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants