Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Stylometric analysis of poetic texts based on their versification

Notifications You must be signed in to change notification settings

versotym/stichometry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 

Repository files navigation

Parameters

lang :      'cs', 'de', 'es' ... 
  Language to process (subfolder in "pickle" folder)  

method :     'sticho', 'word', 'lemma', '3gram_t' ...
  Which data to use for attribution (file in pickle > lang folder)

Methods

Data filtering

reduce_features(filters)
Only for method == 'sticho'
Filter features (columns) which should be used for attribution. 
E.g. drop all statistics on rhyme, or leave on stress profile.
  filters:     conditions to filter features (format accepted by pandas .query method)
               default: None    

mfi(n)
Only for method != 'sticho'
Select how many most frequented items (words, lemmata, n-grams) will be analyzed.
  n:           int
               number of mfi
               default: 500    

reduce_sets(filters, n_min, remove_singles)
Filter datasets (rows) according to specified conditions.
  filters:         conditions to filter datasets (format accepted by pandas .query method)
                   default: None
  n_min:           int
                   minimum number of all features to keep dataset
                   default: 0
  remove_singles:  boolean
                   whether to drop datasets author of which is not author of any other dataset
                   default: True

Normalization

zscores()
Normalize data to z-scores across datasets.

Attribution

nearest_neighbour()
Classification by nearest neighbour (various distance metrics)

svm(multiclass, **kwargs)
Classification by support vector machine
  multiclass:      boolean
                   whether to perform multiclass or binary classification
                   when 'True' each dataset is assigned to one author
                   when 'False' on-vs.-rest. classifier is trained for every author resulting in:
                      (a) assigning author to the dataset if precisely one classifier 
                          gives other decision than 'rest'
                      (b) "I don't know" answer in other cases
                   default: True
  **kwargs:        Parameters for sklearn.svm.SVC (e.g. kernel, gamma...)
  
random_forest(multiclass, **kwargs)  
Classification by random forest
  multiclass:      boolean
                   whether to perform multiclass or binary classification
                   when 'True' each dataset is assigned to one author
                   when 'False' on-vs.-rest. classifier is trained for every author resulting in:
                      (a) assigning author to the dataset if precisely one classifier 
                          gives other decision than 'rest'
                      (b) "I don't know" answer in other cases
                   default: True
  **kwargs:        Parameters for sklearn.ensemble.randomForestClassifier 
                   (e.g. n_estimators, class_weight...)

Evaluation

evaluate()
Print evaluation of particulars methods that were applied

dendrograms()
Plot dendrograms (only if nearest_neighbour has been applied)

complete_results(pickle, filename)
Returns dictionary with complete results
  pickle:          boolean
                   whether to pickle dict into a file (stored in 'pickle' folder)
                   default: True
  filename:        specifies the name of a pickled file
                   default: method name (e.g. sticho, word...)

About

Stylometric analysis of poetic texts based on their versification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages