Processing text with tds.Hash is very slow

I have created a few scripts to preprocess text corpus ~6MB. In order to keep text formatting I need to iterate over each line and do some text manipulations with it. This in turn produces `PANIC: unprotected error in call to Lua API (not enough memory)`. I decided to try **tds.Hash** to keep my corpus table.

Here is the code I am using:

```
  text_arr = tokenize(text)
  text_arr = tds.Hash(text_arr)
  -- replace rare tokens with <unk>
  -- text_arr is a {idx: {tokens arr}}
  for l=1,#text_arr do -- iterating lines {}
    for t=1,#text_arr[l] do -- iterating tokens {}
      -- rare is arr of rare words
      for r=1,#rare do
        if text_arr[l][t] == rare[r] then text_arr[l][t] = "<unk>" end
      end
    end
  end
```

`text_arr` is a table of size 2900 and this 3 loop operation becomes really slow when using `tds.Hash`.
I am by no means a lua expert but am I doing something wrong?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Processing text with tds.Hash is very slow #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Processing text with tds.Hash is very slow #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions