MyAI

This project began as a way to document my progression through the IBM AI Developer certification course. As the course progressed, I realized that the manner in which the course conducted many of its labs was not an optimal format. The labs were performed inside in-browser virtual IDEs with no way to reference your past work. Additionally, I felt that the labs could all be leveraged as opportunities to build out an entire AI application with multiple features, not just a simple one-dimensional chatbot application.

With this in mind I built on my original app.py and implemented the following:

A custom logger that can:
- log informational messages to the console
- save an array of logs to a file.
  - This is with the intention of maintaining a conversation history between the user and the chatbot.
  - This directory is added to the .gitignore with the intention of keeping the user's conversation history private.
Implement a Flask server to handle all interactions with different AI models and APIs
Implement a React frontend
- This initiative began in response to a lab around building a speech-to-text/text-to-speech application. I took the starter code from the lab and refactored it from a pure Javascript/HTML application that was far more readable and maintainable.

How to start the project:

TODO: Section about starting the web client
- install npm
- npm install
- npm run start
TODO: Section about starting the server
- install python
- set up python venv
- pip install -r requirements.txt
TODO: Section about Docker deployment

Environment Variables

Key	ExampleValue	Description
DEBUG	True	Enable or disable debug mode.
DEFAULT_MODEL	openai/whisper-large-v3-turbo	Identifier of the default model to load for inference.
DEVICE_MAP	cuda	Device mapping used for model loading (e.g., `cpu`, `cuda`).
SELECTED_PRETRAINED_MODEL	local	User-defined name of the model being trained.
PRETRAINED_MODEL_DIR	C:/models/pretrained	Where on your local filesystem to save your trained models.
TRAINING_ARGS_NUM_EPOCHS	2	Number of training cycles to execute when training local model.
MAX_NEW_TOKENS	128	Maximum number of tokens to generate per inference step.
SERVER_HOST	0.0.0.0	Server address where local app is hosted.
SERVER_PORT	1587	Port number for your local application instance.
ROUTE_ASR	/api/v1/asr	Endpoint for automatic-speech-recognition API.
ROUTE_IS_ALIVE	/api/v1/is_alive	Endpoint for health check to verify service availability.
ROUTE_TTS	/api/v1/tts	Endpoint for text-to-speech API.
ROUTE_TRAINING_INIT	/api/v1/training	Endpoint to initialize training loop for user defined datasets.
STT_COMPUTATION_DEVICE	cpu	Device index for stt computation (e.g., GPU).
STT_SAMPLE_RATE	16000	Sample rate for speech-to-text processing.

Datasets.json

datasets.json is a configuration file designed to streamline the importation of multiple datasets at run time. At a high level, each entry takes the following structure:

{
	name: string, // Human-readable name of the dataset.
	hf_id: string, // The ID of the dataset in Hugging Face's datasets repo.
	pattern: string, // The pattern to be used to format the dataset prior to tokenization.
	columns: string[], // The dataset's column keys, used to build the prompt input along with pattern.
	config_type: "main" | "socratic", // Required second param when calling load_dataset() for GSM8K
	reference: obj // An object containing citation data for the dataset. Provided for credit and reference.
	split: "train" | "test" // Which portion of the dataset to use for building the model
}

This project uses the following datasets as its baseline training data. Its import configuration can be referenced in the file server/datasets.json.

System requirements

Cuda System Toolkit
CUDA-compatible PyTorch
- Including pytorch in requirements.txt by default installs the PyTorch build which is only compatible with the CPU.

References

Projects

Hugging Face Kernel Hub
- Intro wiki

Have fun!

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
client		client
server		server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MyAI

How to start the project:

Environment Variables

Datasets.json

System requirements

References

Projects

About

Uh oh!

Releases

Packages

Uh oh!

Languages

JacobPoe/MyAI

Folders and files

Latest commit

History

Repository files navigation

MyAI

How to start the project:

Environment Variables

Datasets.json

System requirements

References

Projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages