Thanks to visit codestin.com
Credit goes to github.com

Skip to content

JacobPoe/MyAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MyAI

This project began as a way to document my progression through the IBM AI Developer certification course. As the course progressed, I realized that the manner in which the course conducted many of its labs was not an optimal format. The labs were performed inside in-browser virtual IDEs with no way to reference your past work. Additionally, I felt that the labs could all be leveraged as opportunities to build out an entire AI application with multiple features, not just a simple one-dimensional chatbot application.

With this in mind I built on my original app.py and implemented the following:

  • A custom logger that can:
    • log informational messages to the console
    • save an array of logs to a file.
      • This is with the intention of maintaining a conversation history between the user and the chatbot.
      • This directory is added to the .gitignore with the intention of keeping the user's conversation history private.
  • Implement a Flask server to handle all interactions with different AI models and APIs
  • Implement a React frontend
    • This initiative began in response to a lab around building a speech-to-text/text-to-speech application. I took the starter code from the lab and refactored it from a pure Javascript/HTML application that was far more readable and maintainable.

How to start the project:

  • TODO: Section about starting the web client
    • install npm
    • npm install
    • npm run start
  • TODO: Section about starting the server
    • install python
    • set up python venv
    • pip install -r requirements.txt
  • TODO: Section about Docker deployment

Environment Variables

Key ExampleValue Description
DEBUG True Enable or disable debug mode.
DEFAULT_MODEL openai/whisper-large-v3-turbo Identifier of the default model to load for inference.
DEVICE_MAP cuda Device mapping used for model loading (e.g., cpu, cuda).
SELECTED_PRETRAINED_MODEL local User-defined name of the model being trained.
PRETRAINED_MODEL_DIR C:/models/pretrained Where on your local filesystem to save your trained models.
TRAINING_ARGS_NUM_EPOCHS 2 Number of training cycles to execute when training local model.
MAX_NEW_TOKENS 128 Maximum number of tokens to generate per inference step.
SERVER_HOST 0.0.0.0 Server address where local app is hosted.
SERVER_PORT 1587 Port number for your local application instance.
ROUTE_ASR /api/v1/asr Endpoint for automatic-speech-recognition API.
ROUTE_IS_ALIVE /api/v1/is_alive Endpoint for health check to verify service availability.
ROUTE_TTS /api/v1/tts Endpoint for text-to-speech API.
ROUTE_TRAINING_INIT /api/v1/training Endpoint to initialize training loop for user defined datasets.
STT_COMPUTATION_DEVICE cpu Device index for stt computation (e.g., GPU).
STT_SAMPLE_RATE 16000 Sample rate for speech-to-text processing.

Datasets.json

datasets.json is a configuration file designed to streamline the importation of multiple datasets at run time. At a high level, each entry takes the following structure:

{
	name: string, // Human-readable name of the dataset.
	hf_id: string, // The ID of the dataset in Hugging Face's datasets repo.
	pattern: string, // The pattern to be used to format the dataset prior to tokenization.
	columns: string[], // The dataset's column keys, used to build the prompt input along with pattern.
	config_type: "main" | "socratic", // Required second param when calling load_dataset() for GSM8K
	reference: obj // An object containing citation data for the dataset. Provided for credit and reference.
	split: "train" | "test" // Which portion of the dataset to use for building the model
}

This project uses the following datasets as its baseline training data. Its import configuration can be referenced in the file server/datasets.json.

System requirements

References

Projects

Have fun!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published