simDNAgen: Simulate DNA data with motifs (short 'relevant' sequences)

The repository contains python code to generate 'fake' DNA sequences of different lengths, based on the underlying known statistics of DNA data. For a more detailed explanation read section 4.1 of the report.pdf that can be found here: repo

How to use

motifs directory

store the target motif position probability matrix (PPM) files. This matrix represents the probability of each letter at a given position of a motif.

root directory

generate.py: generates a position probability matrix for each motif listed in the motifs directory
motif_probabilities.py : generates DNA sequences as txt files and stores them in output. In this file you can define the number of squences to be generated, as well as their length.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

simDNAgen: Simulate DNA data with motifs (short 'relevant' sequences)

How to use

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
motifs		motifs
output		output
README.md		README.md
generate.py		generate.py
motif_probabilities.py		motif_probabilities.py

IlzeAmandaA/simDNAgen

Folders and files

Latest commit

History

Repository files navigation

simDNAgen: Simulate DNA data with motifs (short 'relevant' sequences)

How to use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages