This is a tutorial project from Boot.dev focused on building an AI agent from scratch using Google's Gemini API.
This project implements a functional AI coding agent that can autonomously perform file operations, execute Python scripts, and interact with a sandboxed working directory. The agent uses natural language understanding to interpret user requests and execute appropriate functions.
The agent uses Google's Gemini 2.0 Flash model with function declarations to understand user intent and automatically call the appropriate functions.
-
List Files and Directories (
get_files_info)- Lists contents of a directory with file sizes and type information
- Constrained to working directory for security
-
Read File Contents (
get_file_content)- Reads and returns file contents
- Automatically truncates files larger than 10,000 characters
- Works with files in subdirectories
-
Write/Overwrite Files (
write_file)- Creates new files or overwrites existing ones
- Automatically creates parent directories if needed
- Returns confirmation with character count
-
Execute Python Files (
run_python_file)- Runs Python scripts with optional command-line arguments
- 30-second timeout for safety
- Captures both stdout and stderr
- Returns exit codes for error handling
- Sandboxed Working Directory: All operations are restricted to
./calculatordirectory - Path Validation: Security checks prevent access outside permitted directories
- File Size Limits: 10,000 character limit on file reads to prevent memory issues
- Execution Timeout: 30-second timeout on Python script execution
- Input Validation: Verifies file types and paths before operations
Every interaction with the AI agent is automatically logged in timestamped session folders:
data/
session_20251102_074629/
βββ user_prompt.txt # Original user request
βββ function_call_1.txt # First function call details
βββ function_call_2.txt # Second function call (if any)
βββ ai_response_summary.txt # Summary of all operations
For text responses:
data/
session_20251102_074733/
βββ user_prompt.txt # Original user request
βββ ai_response.txt # AI's text response
- Python 3.x
- Google Gemini API (gemini-2.0-flash-001)
- google-genai library
- python-dotenv for environment variable management
tutorial.build-ai-agent/
βββ main.py # Main AI agent script
βββ functions/
β βββ config.py # Configuration constants
β βββ get_files_info.py # Directory listing function + schema
β βββ get_file_content.py # File reading function + schema
β βββ write_file.py # File writing function + schema
β βββ run_python_file.py # Python execution function + schema
βββ calculator/ # Sandboxed working directory
β βββ main.py # Calculator application
β βββ tests.py # Unit tests
β βββ pkg/
β βββ calculator.py # Calculator logic
β βββ render.py # Output formatting
βββ data/ # Session logs
βββ session_[timestamp]/ # Individual session folders
uv run main.py "your prompt here"uv run main.py "your prompt here" --verboseVerbose mode shows:
- Detailed function call information (name and arguments)
- Function results
- Token usage statistics (prompt and response tokens)
List Files:
uv run main.py "list files in the root directory"
uv run main.py "what files are in the pkg directory?"Read Files:
uv run main.py "read the contents of main.py"
uv run main.py "show me calculator/pkg/calculator.py"Write Files:
uv run main.py "create a file called test.txt with content 'Hello World'"
uv run main.py "write 'def hello(): print(\"Hi\")' to utils.py"Execute Python:
uv run main.py "run the tests.py file"
uv run main.py "run main.py with arguments 10 + 5"Complex Operations:
uv run main.py "create a file hello.py that prints 'Hello World' and then run it"- Install dependencies:
uv sync- Create
.envfile with your Gemini API key:
GEMINI_API_KEY=your_api_key_here
- Run the agent:
uv run main.py "your request"Each function has a schema that describes it to the LLM:
schema_get_files_info = types.FunctionDeclaration(
name="get_files_info",
description="Lists files in the specified directory...",
parameters=types.Schema(
type=types.Type.OBJECT,
properties={
"directory": types.Schema(
type=types.Type.STRING,
description="The directory to list files from..."
),
},
),
)- User provides natural language prompt
- Gemini model analyzes the request
- AI decides which function(s) to call
- Agent executes functions with security checks
- Results are returned to user and logged
The agent is guided by a system prompt that defines its role and capabilities:
You are a helpful AI coding agent.
When a user asks a question or makes a request, make a function call plan.
You can perform the following operations:
- List files and directories
- Read file contents
- Execute Python files with optional arguments
- Write or overwrite files
All paths you provide should be relative to the working directory.
- Function Calling with LLMs: How to declare functions that AI can understand and call
- Security in AI Agents: Implementing guardrails to prevent unauthorized access
- Tool Integration: Connecting LLMs to real-world capabilities
- Error Handling: Graceful error management in AI systems
- Session Management: Tracking and logging AI interactions
- Prompt Engineering: Crafting system prompts for desired behavior
- Use it on production systems
- Give it access to sensitive directories
- Share it with untrusted users without additional security measures
For production use, implement additional security layers:
- Container isolation (Docker)
- Resource limits
- Code review before execution
- User authentication
- Audit logging
- Rate limiting
- The agent uses
gemini-2.0-flash-001model - Working directory is hardcoded to
./calculator - Function results are wrapped in
types.Contentwithfrom_function_response - All file operations use UTF-8 encoding
- Session logs include timestamps for easy tracking
Potential improvements:
- Multi-turn conversations with context
- Web search integration
- Git operations support
- Database query capabilities
- File upload/download
- Interactive mode
- Configuration file for working directory
- Multi-language support beyond Python