Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Bug in Table (hash table) implementation (and potential GC bug) #144

@srdjanstipic

Description

@srdjanstipic

I was playing with libCello (git commit: dfcd86c)
and I found a strange behaviour in Table implementation.
When the table resizes, some values are modified/lost (check the lines with // BUG).

Also, it looks like GC is taking forever (O(n^2) or more) as the PROBLEM SIZE increases.

The code to reproduce the issue is here:

#include <assert.h>
#include "Cello.h"

// count word frequency
int main(int argc, char **argv) {
  (void)argc, (void)argv;
  // wget https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt
  FILE *fp =  fopen("./t8.shakespeare.txt", "r");
  char *line = NULL, *sep = " \f\n\r\t\v";
  var d = new (Table, String, Int), the = $S("the");
  int i = 0, the_count = 0;
  for (size_t len = 0; getline(&line, &len, fp) != EOF;) {
    for (char *tmp = line, *word; (word = strsep(&tmp, sep));) {
      if (i % 10000 == 0) {
        printf("%d\n", i);
      }
      if (i++ > 300000) { // ******* PROBLEM SIZE *********
        goto end;
      }
      the_count += strcmp(*(char**)the, word) == 0;
      if (!mem(d, $S(word))) {    // create if missing
        set(d, $S(word), $I(0));
      }
      *(long *)get(d, $S(word)) += 1; // OK, time 0.127s
      // set(d, $S(word), $I(c_int(get(d, $S(word))) + 1)); // BUG, time 0.261s
      // set(d, $S(word), new(Int, $I(c_int(get(d, $S(word))) + 1))); // BUG, SLOW, time 32.854s
    }
  }
end:
  if (line) {
    free(line);
  }
  if (fp) {
    fclose(fp);
  }
  println("%$ %$\n", $I(the_count), get(d, the));
  assert(the_count == c_int(get(d, the))); // FAIL for BUGs
  return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions