GMATS: A Minimal, Reproducible Framework for LLM‑Augmented Market Agents

Data

Reddit

We release a curated subset of Reddit posts and comments that mention
{AAPL, AMZN, NFLX, MSFT, GOOG, META, TSLA} between 2021-09-30 and 2022-09-30.

Source subreddits:

r/investing
r/StockMarket
r/stocks
r/wallstreetbets

The raw data were obtained from publicly available Pushshift Reddit dumps (Baumgartner et al., 2020). For each post/comment we keep text and basic metadata (id, subreddit, created_utc, score, num_comments, link_id/parent_id, etc.). Reddit usernames and profile information are not included.

Reddit data used in our experiments can be downloaded from:

https://drive.google.com/drive/folders/1zgphtMdGT9ZR2le_24EkprXdtRYnJcq8

Use of this data must comply with Reddit’s User Agreement, Developer Terms, and Data API Terms. This repository does not grant any additional rights to Reddit content beyond what Reddit itself permits.

To reproduce our experiments, download the Reddit data and convert it into the same tabular format as the example files in the data/ directory (e.g., JSONL/CSV with matching column names). You are free to implement your own preprocessing pipeline; we only require that the final format matches the examples.

Twitter

For the Twitter side, we do not redistribute tweet text in this repository. Instead, we rely on the public Kaggle dataset:

equinxx, “Stock Tweets for Sentiment Analysis and Prediction”
https://www.kaggle.com/datasets/equinxx/stock-tweets-for-sentiment-analysis-and-prediction

Please download the dataset directly from Kaggle (subject to Twitter/X and Kaggle terms of use), then manually process it into the same format as the example Twitter files in data/ (e.g., one row per tweet with the expected columns).

Commands

Run clean baseline

python scripts/run.py --config configs/gmats.yaml --debug --log_by_asset --log_agents

Train RL agent:

python scripts/train_attacker.py --config configs/gmats.yaml --timesteps 14

Run Attack:

python scripts/run_attack.py --config configs/gmats.yaml --budget 1 --attacker random

Evalute Metrics:

python scripts/metrics.py --attack_logs results/attack/logs --clean_logs results/baseline/logs --poison_ids results/attack/poison_ids.jsonl --attack_results_csv results/attack/results/attack_summary_by_ticker.csv --clean_results_csv results/baseline/results/gmats_local_GMATSLLMStrategy.csv

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
backtest		backtest
configs		configs
data		data
gmats		gmats
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GMATS: A Minimal, Reproducible Framework for LLM‑Augmented Market Agents

Data

Reddit

Twitter

Commands

Run clean baseline

Train RL agent:

Run Attack:

Evalute Metrics:

About

Uh oh!

Releases

Packages

Languages

Soqoro/GMATS

Folders and files

Latest commit

History

Repository files navigation

GMATS: A Minimal, Reproducible Framework for LLM‑Augmented Market Agents

Data

Reddit

Twitter

Commands

Run clean baseline

Train RL agent:

Run Attack:

Evalute Metrics:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages