help! Jieba user-defined dictionary function doesn't work at all!  

**_Hello, I am now preparing for Chinese text mining using jiebaR, in Korea on Korean language Windows OS. 
Followings are my computing environment verified by library(jiebaR); sessionInfo()._** 

> library(jiebaR); sessionInfo();
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Korean_Korea.949  LC_CTYPE=Korean_Korea.949    LC_MONETARY=Korean_Korea.949
[4] LC_NUMERIC=C                 LC_TIME=Korean_Korea.949    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.7    tidytext_0.3.2 stringr_1.4.0  jiebaR_0.11    jiebaRD_0.1   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        rstudioapi_0.13   magrittr_2.0.1    tidyselect_1.1.1  lattice_0.20-44  
 [6] R6_2.5.1          rlang_0.4.11      fansi_0.5.0       tools_4.1.1       grid_4.1.1       
[11] utf8_1.2.2        cli_3.0.1         DBI_1.1.1         janeaustenr_0.1.5 ellipsis_0.3.2   
[16] assertthat_0.2.1  tibble_3.1.3      lifecycle_1.0.0   crayon_1.4.1      Matrix_1.3-4     
[21] purrr_0.3.4       SnowballC_0.7.0   tokenizers_0.2.1  vctrs_0.3.8       glue_1.4.2       
[26] stringi_1.7.4     compiler_4.1.1    pillar_1.6.2      generics_0.1.0    pkgconfig_2.0.3

**_I managed to set up the word segmentation process as the following. The result, however, is disappointing in that user-defined dictionary doesn't work._** 

bri_text<-readLines("BRIVA_revised3.txt", encoding="UTF-8")
> bri_stnc<-bri_text %>% as_tibble() %>% unnest_tokens(input=value, output=sentence, token="sentences")
> bri_stnc<-bri_stnc %>% mutate(sentence_id=row_number())
> bri_df<-bri_stnc %>%mutate(text=sapply(segment(bri_stnc$sentence, worker(bylines=TRUE, user= "C:/Users/user/Documents/R/win-library/4.1/jiebaRD/dict/user.dict.utf8")), function(x){paste(x, collapse=" ")})) %>% unnest_tokens(word, text)
> bri_df
# A tibble: 4,175 x 3
   sentence                                                                    sentence_id word 
   <chr>                                                                             <int> <chr>
 1 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 2000 
 2 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 多年 
 3 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 前   
 4 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 <U+4E9A>   
 5 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 <U+6B27>   
 6 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 <U+9646>上 
 7 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 勤<U+52B3> 
 8 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 勇敢 
 9 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 的   
10 2000多年前，<U+4E9A><U+6B27><U+2F24><U+9646>上勤<U+52B3>勇敢的<U+2F08><U+2EA0>，探索出多<U+6761><U+8FDE>接<U+4E9A><U+6B27><U+2FAE><U+2F0F><U+2F24><U+2F42>明的<U+8D38>易和~           1 探索 
# ... with 4,165 more rows. 

**The problem is that there is no difference between with user-defined dictionary and without. Tibble structure of "4175 by 3" does not change even with user.dict.  By the way, I checked out that stopwords.dict works well. I have no idea of what seems toi be the problem**
For reference, I attach screen capture of "use.dict.utf8" file below.
Thanks for advise!
![screen capture_briva_user dict](https://user-images.githubusercontent.com/92848809/138052017-50bced31-3dbc-4b4b-98dd-53694c5b6cbf.png)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

help! Jieba user-defined dictionary function doesn't work at all! #72

A tibble: 4,175 x 3

... with 4,165 more rows.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

help! Jieba user-defined dictionary function doesn't work at all! #72

Description

A tibble: 4,175 x 3

... with 4,165 more rows.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions