Codestin Search App

Parameters

lang :      'cs', 'de', 'es' ... 
  Language to process (subfolder in "pickle" folder)  

method :     'sticho', 'word', 'lemma', '3gram_t' ...
  Which data to use for attribution (file in pickle > lang folder)

Methods

Data filtering

reduce_features(filters)
Only for method == 'sticho'
Filter features (columns) which should be used for attribution. 
E.g. drop all statistics on rhyme, or leave on stress profile.
  filters:     conditions to filter features (format accepted by pandas .query method)
               default: None    

mfi(n)
Only for method != 'sticho'
Select how many most frequented items (words, lemmata, n-grams) will be analyzed.
  n:           int
               number of mfi
               default: 500    

reduce_sets(filters, n_min, remove_singles)
Filter datasets (rows) according to specified conditions.
  filters:         conditions to filter datasets (format accepted by pandas .query method)
                   default: None
  n_min:           int
                   minimum number of all features to keep dataset
                   default: 0
  remove_singles:  boolean
                   whether to drop datasets author of which is not author of any other dataset
                   default: True

Normalization

zscores()
Normalize data to z-scores across datasets.

Attribution

nearest_neighbour()
Classification by nearest neighbour (various distance metrics)

svm(multiclass, **kwargs)
Classification by support vector machine
  multiclass:      boolean
                   whether to perform multiclass or binary classification
                   when 'True' each dataset is assigned to one author
                   when 'False' on-vs.-rest. classifier is trained for every author resulting in:
                      (a) assigning author to the dataset if precisely one classifier 
                          gives other decision than 'rest'
                      (b) "I don't know" answer in other cases
                   default: True
  **kwargs:        Parameters for sklearn.svm.SVC (e.g. kernel, gamma...)
  
random_forest(multiclass, **kwargs)  
Classification by random forest
  multiclass:      boolean
                   whether to perform multiclass or binary classification
                   when 'True' each dataset is assigned to one author
                   when 'False' on-vs.-rest. classifier is trained for every author resulting in:
                      (a) assigning author to the dataset if precisely one classifier 
                          gives other decision than 'rest'
                      (b) "I don't know" answer in other cases
                   default: True
  **kwargs:        Parameters for sklearn.ensemble.randomForestClassifier 
                   (e.g. n_estimators, class_weight...)

Evaluation

evaluate()
Print evaluation of particulars methods that were applied

dendrograms()
Plot dendrograms (only if nearest_neighbour has been applied)

complete_results(pickle, filename)
Returns dictionary with complete results
  pickle:          boolean
                   whether to pickle dict into a file (stored in 'pickle' folder)
                   default: True
  filename:        specifies the name of a pickled file
                   default: method name (e.g. sticho, word...)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
lib		lib
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parameters

Methods

Data filtering

Normalization

Attribution

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

versotym/stichometry

Folders and files

Latest commit

History

Repository files navigation

Parameters

Methods

Data filtering

Normalization

Attribution

Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages