Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views12 pages

Intro Project Explaination

Uploaded by

kanahanazava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Intro Project Explaination

Uploaded by

kanahanazava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction

Good evening, and thank you for the opportunity to interview at Engati. I'm Jyoti Sahu, a
final-year B.Tech student at IIIT Naya Raipur, specializing in Data Science and Artificial
Intelligence. Throughout my academic journey, I’ve maintained a strong GPA of 8.84/10 while
engaging in impactful research and hands-on applications of AI and ML technologies.

What excites me most about Engati is your mission to build AI-native customer experiences that
make every interaction count. My background in conversational AI and generative models aligns
well with your platform’s focus on intelligent customer journeys across WhatsApp, Voice, and
Web channels.

I’ve built several projects, including an AI-powered nutrition analyzer using FastAPI, GPT-4, and
LlamaIndex, which demonstrates conversational AI in action—users ask questions in natural
language and receive intelligent responses from both custom knowledge bases and real-time
generation. In another project, I developed an ASL gesture recognition system using LSTM
networks and MediaPipe, and also built a full-stack prompt-sharing platform.

Recently, I worked as an ML Researcher at the University of Manitoba, focusing on generative


models. Additionally, being selected for the Amazon ML School program deepened my
understanding of supervised learning and deep learning, further strengthening my expertise in
advanced AI technologies.

What particularly draws me to the Forward Deployed Engineer role is the chance to translate
these technical capabilities into real business value for enterprise clients.



"I’m Jyoti, a final-year DSAI student at IIIT Naya Raipur. I’ve built projects like an AI-based
nutrition analyzer, an ASL detector using Python, and worked on generative models
during my ML research internship. Engati’s focus on conversational AI excites me
because I’m passionate about turning technical solutions into real business value”​



Lab to field image translation


I'm worked on a research project aimed at bridging the gap between lab-acquired plant
images and real-world field environments using generative AI, specifically diffusion
models.

In agricultural machine learning, models are often trained on clean lab images like those from
the EAGL-I dataset, which contain high-resolution, well-annotated plant photos taken against a
blue background. While ideal for training, these lab images lack the complexity of real outdoor
scenes—like variable lighting, soil textures, and surrounding vegetation—which causes a
performance drop when models are deployed in the field.

To solve this, I developed a pipeline that transforms lab plant images into synthetic
field-like images, preserving the plant’s structure but generating a realistic outdoor
background.

🔍 How I Did It
1.​ Segmentation Pipeline:
○​ I first isolate the plant from the blue lab background using a combination of color
space thresholding (RGB, HSV, and LAB) and refinement techniques.
○​ I tested and improved this pipeline across various versions, adding plant color
detection, and size-based filtering, for cleaner masks.​

○​ V1: Basic (Blue RGB + SAM Grid) → Fast but messy edges
○​ V2: Color+Size (HSV/LAB + Contour Filtering) → Handles partial leaves
○​ V3: Multi-Stage (RGB+LAB + Connected Components) → Precision
masking
○​ ​
2.​ Framework

I developed a progressive two-stage domain adaptation framework using


state-of-the-art diffusion models that bridges this lab-to-field gap while preserving
botanical accuracy. The key innovation is transforming laboratory weed imagery
into photorealistic field representations without losing crucial structural
information.​

My approach leverages four distinct model configurations that I systematically
evaluated:

●​ SDXL LoRA + ControlNet (best performing)


●​ SD1.5 LoRA + ControlNet
●​ SDXL Vanilla + ControlNet
●​ SD1.5 Vanilla + ControlNet​

The methodology uses progressive fine-tuning - a two-stage approach I designed
to prevent catastrophic forgetting:
●​ Stage 1 : Domain adaptation using LoRA on field images only, establishing
agricultural expertise within SDXL while maintaining parameter efficiency
●​ Stage 2: Spatial conditioning integration using ControlNet with laboratory
segmentation masks, employing a "weak conditioning" strategy (λ=0.3) that
allows domain knowledge to shine through while maintaining structural guidance.

●​ I worked with 14 Canadian weed species and 5 field crop species, using 30,000
field images per species spanning multiple growth stages and seasonal
variations from 2021-2024.
●​ The implementation required significant computational resources - I used NVIDIA
Tesla V100-SXM3-32GB with mixed precision training at 1024×1024 resolution.

●​ The results demonstrated improved


●​ FID Score 68%, SSIM: 46%
3.​ ControlNet for Contextual Conditioning:​

○​ I experimented with ControlNet, which allows me to condition the image


generation on structural inputs like edge maps, segmentation masks etc.
○​ I used plant segmentation masks as guides so that the generated field
background respects the plant’s original shape and positioning.​

4.​ Model Training and Optimization:​

○​ Trained models with various earning rates and applied tricks like gradient clipping
and NaN skipping for stable performance.
○​ Logged and visualized training progress using image previews and loss curves.
○​ Only the best-performing model weights were saved to reduce storage overhead.

Color Space Thresholding:

●​ RGB: Detect blue pixels using simple color value thresholds.


●​ HSV (Hue, Saturation, Value): Better for separating green/brown plant tones from blue.
●​ CIELAB: Good for separating blue background due to the 'b' channel (blue-yellow).​

SAM (Segment Anything Model):

●​ Uses point prompts to segment complex shapes.


●​ Very effective but computationally expensive.

Hayasaka AI:​

Project Overview
Hayasaka AI is a real-time American Sign Language (ASL) detection system I developed that
uses deep learning to recognize sign language gestures. The system was designed to bridge
communication gaps for the deaf and hard-of-hearing community by providing accurate,
real-time translation of ASL gestures.

Technical Implementation
●​ Core Technologies: I implemented the system using MediaPipe for keypoint extraction,
TensorFlow/Keras for model development, and OpenCV for video processing
●​ Model Architecture: The system utilizes a Long Short-Term Memory (LSTM) neural
network that processes sequences of hand keypoints to classify ASL gestures
●​ Data Processing: Our approach extracts 21 keypoints from each hand using MediaPipe
and organizes them into 30-frame sequences for temporal analysis

Key Features
●​ Real-time gesture recognition from webcam feeds
●​ High accuracy with 0.94 precision, 0.92 recall, and 0.93 F1 score
●​ Dynamic overlay predictions displayed on video feeds
●​ Robust performance under varying lighting conditions and hand orientations

Development Challenges & Solutions


●​ Limited Dataset: I addressed this through data augmentation techniques
●​ Overfitting: I implemented dropout regularization, batch normalization, and early
stopping
●​ Model Optimization: I incorporated learning rate scheduling and experimented with
various LSTM architectures

Results & Impact


●​ The final model achieved a training loss of 0.099
●​ Successfully detected common ASL gestures in real-time
●​ Created a practical tool that enhances accessibility for sign language users

Future Directions
●​ Expanding the gesture vocabulary
●​ Mobile deployment
●​ Integration with wearable devices
AI-based Food Nutrition Analyzer:​

This project aims to build a modular, production-ready AI-powered food nutrition analyzer that
leverages multiple advanced technologies including FastAPI, LlamaIndex, OpenAI’s GPT-4, and
Streamlit to deliver an intelligent, interactive nutrition assistant.​

Workflow Summary:

Upon startup, environment variables are loaded, logging is configured, and the LlamaIndex
service context is initialized with GPT-4.
The custom nutrition documents are loaded and indexed into a vector store, which enables
semantic search.
When a user queries nutrition info for a food item, the system either:
●​ Queries the local indexed knowledge base for matching info, or
●​ Calls GPT-4 directly to fetch nutrition details if not available locally.
●​ Responses are logged, formatted, and returned as JSON via the FastAPI endpoints. The
Streamlit UI then visualizes these responses interactively.

Core Functionality:

User Interaction:
The user inputs any food item via an interactive Streamlit dashboard or API endpoint. The
system then returns detailed nutrition information dynamically.

The backbone of the system is twofold:

A custom nutrition knowledge base created from user-provided nutrition documents (like PDFs
or text files) stored in a dedicated data folder. These documents are processed using
LlamaIndex’s vector store mechanism, enabling semantic search over this custom dataset. For
example, users can ask domain-specific questions such as “Which fruits are rich in potassium?”
and get precise, document-based answers.
A fallback to direct OpenAI GPT-4 queries that fetch nutrition facts in real-time when information
is not present in the local knowledge base.​

Architecture and Components:

FastAPI Backend: Serves API endpoints for nutrition analysis (/analyze/{food_item}) and
free-form questioning (/ask/{question}). It handles request processing, invokes AI model
interactions, and returns structured JSON responses.​

LlamaIndex Integration: Used to create a vectorized index from uploaded nutrition documents.
This index supports efficient semantic querying and is wrapped inside a service context
configured with GPT-4 as the underlying language model.​

OpenAI GPT-4 API: Direct calls to OpenAI’s chat completion API handle queries outside the
indexed data scope, ensuring comprehensive coverage.​

Streamlit Frontend: Provides a user-friendly, interactive dashboard that displays nutrition facts,
macro/micronutrient visualizations (like protein, carbs, fats, fiber), health scores, and smart food
pairing suggestions. The frontend communicates with FastAPI endpoints to fetch and display
results.​

Logging: Implements robust logging with rotation (10MB files) to monitor operations and errors.

Real-World Relevance:
This project exemplifies how to combine large language models with vector search to build
domain-specific AI assistants that are both knowledgeable (via custom data) and flexible (via
GPT fallback). The modular design and API-first approach facilitate easy deployment and
integration into existing systems, making it ideal for forward deployed engineers aiming to
deliver intelligent nutrition insights in health, wellness, or food tech applications.

AI Prompt Share:​

1:Objective

This project, PromptShare, is a full-stack web application designed to help users share and
discover AI prompts. It provides a platform for collaborative learning and creativity in the AI
space."
2:Tech Stack and Responsibilities

"I developed this application using a modern tech stack:

●​ Backend: Built RESTful APIs with Next.js to handle key functionalities like prompt
search, submission, and feedback collection. These APIs ensure seamless
communication between the frontend and backend.
●​ Database: Used MongoDB to store user data, prompts, and interactions efficiently.
●​ Authentication: Integrated NextAuth for secure and scalable user authentication.
●​ Frontend & Deployment: Implemented the UI and deployed the application on Vercel
for smooth, production-ready delivery."

3:Features and Functionalities

"The app includes:

●​ A searchable prompt library that allows users to find AI prompts easily.


●​ Features for submitting new prompts and providing feedback, fostering a collaborative
environment.
●​ Secure user login and profile management to ensure privacy and personalization."​


4:Extra questions

Sign in: create user


Authentication: next auth automatically google authenticates
After clicking on create
1)we can add prompts (type: string)
2) we can add tags(type: string)
In Prompt schema we add id of creator

mangoDB has two schema​
1)user
2) prompt

Operations
1)Create,
After clicking on profile
1)edit
2) delete​

Why not react why next js​
So in next js, caching and seo can be performed easily( inbuilt in next js
In react for seo, caching we need to define logic separately)​

File based routing in nextjs​
Routing in nextjs
folder->write logic in page.jsx​

Backend(methods in backend: get, post, put,fetch, delete): api call
When u click on create a api is called in backend​

In react we use javascript and react is used in nextjs​

Functional component used​

Get: connect to db
Find promt
Send to front end​

Whatever doc we create
Publications

●​ SyncNet: Harmonizing Nodes for Efficient Learning: This research introduces


SyncNet, a communication-efficient algorithm for training large neural networks on
massive datasets. By transmitting only the sign of gradient vectors to a central server
for aggregation via majority vote, SyncNet reduces communication overhead and
enhances fault tolerance. Applied to training ResNet50 on ImageNet, SyncNet
reduced training time by 25% compared to existing communication libraries while
achieving similar convergence to SGD. Tests on the MNIST dataset further validated
its effectiveness in distributed deep learning.

●​ A Novel Multimodal Framework for Early Detection of Congestive Heart Failure


using Ensemble Learning based Fusion Approach: Proposed a multimodal
ensemble learning approach for CHF classification using ECG data, following machine
learning models were evaluated: XGBoost, Support Vector Machine (SVM), Random
Forest (RF), Tiny UNet. The models were combined using a soft voting ensemble
learning technique. This enabled each classifier to contribute class probabilities
instead of making hard decisions, with the final prediction being a weighted sum of the
predicted probabilities. The weights were allocated based on each classifier's
performance using the inverse of the classification error. Achieving 98.52% accuracy.

●​ Comparative Analysis of Regression Techniques for Accurate House Price


Prediction Using Machine Learning: A Statistical Analysis Method: Conducted a
comparative analysis of regression techniques, leveraging models like Linear
Regression, SVR, Decision Tree, Gradient Boosting (GBR), Random Forest, and
XGBoost. The dataset, with 81 features on house characteristics, was preprocessed
for missing values, categorical encoding, and scaling. Models were evaluated using
3-fold cross-validation, with GBR achieving the highest R² score: 0.8545 before tuning
and 0.8699 after optimizing hyperparameters like learning rate and tree depth.
House Price prediction
The purpose of this project is to accurately predict house prices using advanced regression
techniques, enabling better decision-making in real estate. Its real-life applications include
assisting buyers, sellers, and real estate agents in determining fair market values, optimizing
property investments, and improving pricing strategies for developers.​

Comparative Analysis of Regression Techniques for Accurate House Price Prediction Using
Machine Learning: A Statistical Analysis Method: Conducted a comparative analysis of
regression techniques, leveraging models like Linear Regression, SVR, Decision Tree, Gradient
Boosting (GBR), Random Forest, and XGBoost. The dataset, with 81 features on house
characteristics, was preprocessed for missing values, categorical encoding, and scaling.Models
were evaluated using 3-fold cross-validation, with GBR achieving the highest R² score: 0.8545
before tuning and 0.8699 after optimizing hyperparameters like learning rate and tree depth.​


Project Introduction

"For this project, we conducted a comparative analysis of various regression techniques to


predict house prices. This research is particularly relevant in the real estate industry, where
accurate price predictions are crucial for informed decision-making. We evaluated multiple
regression models including Support Vector Regression, Linear Regression, Decision Tree
Regressor, Gradient Boosting Regressor, Random Forest Regressor, and XGBoost Regressor."

Methodology

"Our methodology involved four main phases: data collection, preprocessing, model training,
and performance analysis. We worked with a dataset containing 81 features related to house
attributes. During preprocessing, we handled missing values using different imputation
techniques based on data distribution - mode for categorical variables, mean for normally
distributed data, and median for skewed distributions. We encoded categorical variables using
label encoding for ordinal features and one-hot encoding for nominal features."

Model Selection

"After thorough evaluation using 3-fold cross-validation and the mean R² score as our metric,
the Gradient Boosting Regressor emerged as the best-performing model. It achieved an initial
R² score of 0.854, indicating it could explain about 85.4% of the variance in house prices. We
then performed hyperparameter tuning using Randomized Search, which improved the R² score
to 0.87."

Technical Skills Demonstrated

"This project showcased several key skills including data preprocessing, feature engineering,
model selection, hyperparameter tuning, and performance evaluation - all essential for effective
ML solution development."

Business Impact

"The practical application of this model would help homeowners, buyers, and real estate agents
make more informed decisions by providing accurate price predictions based on property
attributes."





You might also like