API built to enable scripting of automated AI media generation at mass scale.
Note: All code written and tested with 64GB RAM and Nvidia 4090 24GB VRAM. Your mileage may vary.
- SDXL: text-to-image and image-to-image
- SDXL: upscale images
- Wan: text-to-video and image-to-video
- Wan: generate multiple segments for long videos
- Ollama: prompt variations
- More coming soon! contributions welcome
Need help making your own AI tools or websites?
Hire dotfinally and we can build any AI tool, extension, plugin, API, or website that you need.
- Nvidia GPU with CUDA v12.8
- (optional) Docker
- (optional) Ollama
- Start:
docker compose up -d - Stop:
docker compose down - Stop and clear:
docker compose down --volumes --remove-orphans - Stop, clear, rebuild, and restart:
docker compose down --volumes --remove-orphans && docker compose up -d --build --force-recreate
- Use an isolated Python environment (conda, venv, etc.)
conda create --name=aapi python=3.12.11conda activate aapisudo apt updatesudo apt install nvidia-cuda-toolkit- Verify with
nvcc --version
- Verify with
pip install -r requirements.txtpip install -U xformers --index-url https://download.pytorch.org/whl/cu128pip install git+https://github.com/xhinker/sd_embed.git@mainpython -m src.server- Runs at
http://localhost:5700by default - change host/port in server.py if needed
- Runs at
Generate images using an SDXL checkpoint. Supports text-to-image and image-to-image.
- POST
http://localhost:5700/api/sdxl - Download a checkpoint, recommended from: CivitAI SDXL Checkpoints
- Optionally download loras: CivitAI SDXL Loras
| Name | Required | Type | Default | Description |
|---|---|---|---|---|
| checkpoint_file_path | Yes | string | — | Path to the SDXL checkpoint to load |
| loras | No | list | — | list of objects, with path and strength. Strength is between 1 and 100 inclusive. |
| prompt | Yes | string | — | Text prompt to generate the image, can be a single string or a list of strings. If list, each prompt will trigger a request with other params. Optionally include keys in double brackets for variations. |
| prompt_replacements | No | object | — | Use with double bracket keys in prompts; each key in this array can have an array of variations. |
| negative_prompt | Yes | string | — | Negative prompt to discourage content |
| seed | No | integer | — | Starting randomness of image. Empty or -1 will use random seeds; else given seed will be used for all images. |
| width | No | integer | 1024 | Output image width in pixels. Must be divisible by 8. |
| height | No | integer | 1024 | Output image height in pixels. Must be divisible by 8. |
| num_images | No | integer | 1 | Number of images to generate for the prompt. Each image will be saved separately. |
| num_steps | No | integer | 60 | Number of inference steps |
| output_folder_path | No | string | "output" | Folder to place saved images and metadata |
| output_image_prefix | No | string | — | Optional filename prefix for saved images |
| output_image_suffix | No | string | — | Optional filename suffix for saved images |
| input_image_path | No | string | — | Path to image or folder of images for image-to-image generation. If folder, then each image in the folder will trigger a separate generation request. |
| input_image_strength | No | integer | 70 | Amount of change applied to input image, must be between 1 and 100 inclusive. Higher number means more change. |
| shuffle_prompts | No | boolean | false | Shuffle all expanded prompts to get outputs in random order |
curl -X POST http://localhost:5700/api/sdxl \
-H "Content-Type: application/json" \
-d '{
"checkpoint_file_path": "models/sdxl/checkpoint/wow.safetensors",
"loras": [
{
"path": "models/sdxl/lora/one.safetensors",
"strength": 80
},
{
"path": "models/sdxl/lora/two.safetensors",
"strength": 50
}
],
"prompt": [
"A {{landscape_type}} landscape, vibrant colors, {{lighting_type}} lighting",
"A {{species}} flying through the air",
],
"prompt_replacements": {
"landscape_type": ["fantasy", "cyberpunk"],
"lighting_type": ["cinematic", "neon"],
"species": ["cat", "dog", "hamster"]
},
"negative_prompt": "blurry, cartoon",
"seed": -1,
"width": 1024,
"height": 1024,
"num_images": 2,
"num_steps": 60,
"output_folder_path": "output/sdxl_images",
"output_image_prefix": "fantasy",
"output_image_suffix": "v1",
"input_image_path": "input/reference/test.png",
"input_image_strength": 70
}'
---
{
"saved_files": [
"output/sdxl_images/fantasy-1758590980-v1.png",
"output/sdxl_images/fantasy-1758590999-v1.png"
]
}
Upscale images using an SDXL checkpoint.
- POST
http://localhost:5700/api/sdxl/upscale - Download a checkpoint, recommended from: CivitAI SDXL Checkpoints
- Optionally download loras: CivitAI SDXL Loras
| Name | Required | Type | Default | Description |
|---|---|---|---|---|
| checkpoint_file_path | Yes | string | — | Path to the SDXL checkpoint to load |
| loras | No | list | — | list of objects, with path and strength. Strength is between 1 and 100 inclusive. |
| upscale_path | Yes | string | — | Path to image or folder of images for upscaling. If folder, then each image in the folder will trigger a separate upscale request. |
| prompt | No | string | — | Text prompt to generate the image. If not provided, will look for .json file with prompt. |
| prompt_prefix | No | string | — | Prefix to add to every upscale prompt |
| prompt_suffix | No | string | — | Suffix to add to every upscale prompt |
| negative_prompt | Yes | string | — | Negative prompt to discourage content. If not provided, will look for .json file with negative_prompt. |
| negative_prompt_prefix | No | string | — | Prefix to add to every upscale negative prompt |
| negative_prompt_suffix | No | string | — | Suffix to add to every upscale negative prompt |
| num_images | No | integer | 1 | Number of images to generate for the prompt. Each image will be saved separately. |
| num_steps | No | integer | 30 | Number of inference steps |
| input_image_strength | No | integer | 51 | Amount of change applied to input image, must be between 1 and 100 inclusive. Higher number means more change. |
| scale | No | number | 1.5 | Scale for size of new upscaled image |
curl -X POST http://localhost:5700/api/sdxl/upscale \
-H "Content-Type: application/json" \
-d '{
"checkpoint_file_path": "models/sdxl/checkpoint/wow.safetensors",
"loras": [
{
"path": "models/sdxl/lora/one.safetensors",
"strength": 80
},
{
"path": "models/sdxl/lora/two.safetensors",
"strength": 50
}
],
"upscale_path": "input/reference/images",
"prompt": "A fantasy landscape, vibrant colors, cinematic lighting",
"negative_prompt": "blurry, cartoon",
"num_images": 2,
"num_steps": 30,
"input_image_strength": 51,
"scale": 1.5
}'
---
{
"saved_files": [
"output/sdxl_images/fantasy-1758590980-v1_upscaled_1758590983.png",
"output/sdxl_images/fantasy-1758590982-v1_upscaled_1758590984.png"
]
}
Generate videos using a Wan GGUF checkpoint. Supports text-to-video and image-to-video.
- POST
http://localhost:5700/api/wan - Download a Wan GGUF t2v or i2v model, recommended from: Wan AIO GGUF
- Optionally download loras: CivitAI Wan Loras
| Name | Required | Type | Default | Description |
|---|---|---|---|---|
| gguf_path | Yes | string | — | URL or path to Wan GGUF model |
| loras | No | list | — | list of objects, with path and strength. Strength is between 1 and 100 inclusive. |
| prompt | No | string | — | Text prompt to generate the video. If not found, will look for .json file with prompt. |
| prompt_prefix | No | string | — | Prefix to add to every video prompt |
| prompt_suffix | No | string | — | Suffix to add to every video prompt |
| negative_prompt | Yes | string | — | Negative prompt to discourage content |
| negative_prompt_prefix | No | string | — | Prefix to add to every video negative prompt |
| negative_prompt_suffix | No | string | — | Suffix to add to every video negative prompt |
| seed | No | integer | — | Starting randomness of video. Empty or -1 will use random seeds; else given seed will be used for all videos. |
| width | No | integer | 480 | Output video width in pixels. Width - 1 must be divisible by 4. |
| height | No | integer | 720 | Output video height in pixels. Height - 1 must be divisible by 4. |
| num_videos | No | integer | 1 | Number of videos to generate for the prompt. Each video will be saved separately. |
| num_steps | No | integer | 4 | Number of inference steps |
| num_frames | No | integer | 81 | Number of total frames in the video (frames / fps = length of video) |
| fps | No | integer | 16 | Frames per second for generated video (frames / fps = length of video) |
| guidance_scale | No | integer | 1 | How closely to follow the prompt |
| output_folder_path | No | string | "output" | Folder to place saved videos and metadata |
| output_video_prefix | No | string | — | Optional filename prefix for saved videos |
| output_video_suffix | No | string | — | Optional filename suffix for saved videos |
| input_image_path | No | string | — | Path to image or folder of images for image-to-video generation. If folder, then each image in the folder will trigger a separate generation request. |
| shuffle_input_images | No | boolean | false | If input_image_path is provided and images found, randomly shuffle the order of generations |
| only_include_prompts_with_keywords | No | string[] | — | Only generate videos for prompts that include any of the provided keywords |
curl -X POST http://localhost:5700/api/wan \
-H "Content-Type: application/json" \
-d '{
"gguf_path": "models/wan/wan2.2-i2v-rapid-aio-v10-Q8_0.gguf",
"loras": [
{
"path": "models/wan/lora/one.safetensors",
"strength": 80
},
{
"path": "models/wan/lora/two.safetensors",
"strength": 50
}
],
"prompt": "a cat flying through the sky",
"negative_prompt": "blurry, cartoon, anime",
"seed": -1,
"width": 480,
"height": 720,
"num_videos": 2,
"num_steps": 4,
"num_frames": 81,
"fps": 16,
"guidance_scale": 1,
"output_folder_path": "output/wan_videos",
"output_video_prefix": "cat",
"output_video_suffix": "v1",
"input_image_path": "input/reference/test.png",
"shuffle_input_images": true,
"only_include_prompts_with_keywords": ["cat", "sky"]
}'
---
{
"saved_files": [
"output/wan_videos/cat-1758590980-v1.mp4",
"output/wan_videos/cat-1758590999-v1.mp4"
]
}
Generate multiple Wan video segments and combine them into a single long video.
- POST
http://localhost:5700/api/wan/segments - Each segment in the request will have the same parameters as the /wan endpoint above
- Each segment will use the parameters from the first segment as a base, so you can just define fields that need to change for a segment (like prompt or loras)
- If no input_image_path is given for a segment, then the last frame of the previous segment will be used as the starting image
curl -X POST http://localhost:5700/api/wan/segments \
-H "Content-Type: application/json" \
-d '{
"segments": [
{
"gguf_path": "models/wan/wan2.2-i2v-rapid-aio-v10-Q8_0.gguf",
"loras": [
{
"path": "models/wan/lora/one.safetensors",
"strength": 80
}
],
"prompt": "a cat flying through the sky",
"negative_prompt": "blurry, cartoon, anime",
"output_folder_path": "output/wan_videos",
"input_image_path": "input/reference/test.png"
},
{
"loras": [
{
"path": "models/wan/lora/two.safetensors",
"strength": 80
}
],
"prompt": "a cat flying into space",
}
]
}'
---
{
"all_files": [
"output/wan_videos/cat-1758590980-v1.mp4",
"output/wan_videos/cat-1758590999-v1.mp4",
"output/wan_videos/cat-1758591200-v1.mp4"
]
}
Returns variations of a given prompt, using ollama structured outputs.
- POST
http://localhost:5700/api/ollama/prompt_variation - Ollama must be running for this endpoint. We recommend running it locally in Docker: https://hub.docker.com/r/ollama/ollama
| Name | Required | Type | Default | Description |
|---|---|---|---|---|
| base_prompt | Yes | string | — | Base prompt you want to vary |
| variation_prompt | Yes | string | — | Prompt to guide variations of the base prompt |
| num_variations | No | int | 1 | Number of different variations you want |
| ollama_url | No | string | "http://localhost:11434/api/generate" | URL path to the generate endpoint on your ollama instance |
| ollama_model | No | string | "gemma3:27b" | ollama model you want to use |
curl -X POST http://localhost:5700/api/ollama/prompt_variation \
-H "Content-Type: application/json" \
-d '{
"base_prompt": "a cat wearing an astronaut suit and floating in a spaceship",
"variation_prompt": "change what the cat is wearing, what it's doing, and it's location. Choose very random and unique variations. Must be a cat.",
"num_variations": 3,
"ollama_url": "http://localhost:11434/api/generate",
"ollama_model": "gemma3:27b"
}'
---
{
"base_prompt": "A cat wearing an astronaut suit and floating in a spaceship",
"variation_prompt": "Change what the cat is wearing, what it's doing, and it's location. Choose very random and unique variations. Must be a cat.",
"variations": [
"A regal Tabby cat dressed as a Victorian-era royal, lifting a miniature planet above its head inside a giant teapot.",
"A fluffy Persian cat dressed as a medieval knight, jousting with a rubber chicken in a giant bowl of petunias",
"A fluffy calico cat in a superhero costume, flying through the clouds at night"
]
}
This repo is MIT Licensed, but please check the licenses of any models you use.
Contributions welcome.