Bonjour le monde! This is Jagadeesh.
|
|
Hey, I'm Jagadeesh — I work across VLSI design, robotics, and AI hardware. My work spans digital/analog design, circuit-level implementation, and low-level arch, optimization {for ML workloads}. Currently learning open-source IC design. I also work with embedded systems, MCUs, and SBCs, applying them in robotics and prototyping. Beyond that, I take part in comps, open-source projects, and hardware/software co-design challenges.
MY Key Interests:
- VLSI Design: Analog Circuits & Digital Design
- Computer Arch and Protocols
- Neural Networks Hardware Acceleration
- Microcontrollers & Electronics
- Computer Vision, Sensor Fusion
- Robotics, AMR Path Planning & Navigation
Feel free to check out my projects here, and if you’re interested in collaborating or discussing hardware, AI, or robotics, let’s connect!
$ ls ~/projects --filter=feat | All Projects
(Click sections below to expand)
Hardware Accelerators for Image Processing and Neural Network Inference | Link
" I tried to ImProVe, but NeVer really did — so I MOVe-d on ¯\_(ツ)_/¯ "
Current Project Overview
Duration: Individual, Ongoing
Tools: Verilog (Icarus Verilog, Yosys) | Python (TensorFlow, PyTorch, OpenCV, NumPy, Tkinter) | Scripting (TCL, Perl)
Project Link Verilog | Basic Architecture | Digital Electronics
- Designed a shallow residual-style CNN for CIFAR-10, achieving ~84% accuracy (< 1% loss) with a 52 KB model size (only ~17× 3 KB input). Applied post-training quantization variants including Q1.7 (8-bit signed), optimizing accuracy, model size, and inference efficiency.
- Implemented synthesizable Verilog modules (Testbench Verified) with FSM-based control, 2-cycle handshake, and auto-generated ROMs (14: weights/biases & 3 (RGB): input). Intermediate values stored in registers and computed using systolic array-based MAC units.
- Explored key image-processing techniques including edge detection, noise reduction, filtering, and enhancement. Implemented (E)MNIST classification using MLP, achieving >75% accuracy. Automated inference flow with TCL/Python scripts and manual GUI inputs.
Digital Logic Design | Synthesis
- Compared 6 × 8-bit adders/multipliers for systolic-array MACs using PPA metrics (latency / throughput / area, sky130 nm PDK) and analyzed trade-offs.
- Final design uses Carry-Save Adder (CSA) and Modified Booth Encoder (MBE) multiplier for 3×3 convolution and GEMM operations with 3-stage pipelined systolic arrays, verified for 0 / same padding modes.
- Pipeline Stages: sampling image → truncating & flipping → MAC accumulation.
ViSiON – Verilog for Image Processing and Simulation-based Inference Of Neural Networks
This repo includes all related projects as submodules in one place
|
|
|
ImProVe – IMage PROcessing using VErilog: A collection of image processing algorithms implemented in Verilog, including geometric transformations, color space conversions, and other foundational operations.
NeVer – NEural NEtwork on VERilog: A hardware-implemented MLP in Verilog for character recognition on (E)MNIST, alongside a lightweight CNN for CIFAR-10 image classification
MOVe – Math Ops in VErilog
|
|
|
|
|
|
-
CORDIC Algorithm – Implements Coordinate Rotation Digital Computer (CORDIC) algorithms in Verilog for efficient hardware-based calculation of sine, cosine, tangent, square root, magnitude, and more.
-
Systolic Array Matrix Multiplication – Verilog implementation of matrix multiplication using systolic arrays to enable parallel computation and hardware-level performance optimization. Each processing element leverages a Multiply-Accumulate (MAC) unit for core operations.
-
Hardware Multiply-Accumulate Unit – Implements and compares 8-bit multipliers and 8-bit adders in synthesizable Verilog, analyzing their area, timing, and power characteristics in MAC datapath architectures.
-
Posit Arithmetic (Python) – Currently using fixed-point arithmetic; considering Posit as an alternative to IEEE 754 for better precision and dynamic range. Still working through the trade-off.
Storage and Buffer Modules
-
RAM1KB – A 1KB (1024 x 8-bit) memory module in Verilog with write-once locking for even addresses. Includes a randomized testbench. Also forms the base for a ROM3KB variant to store 32×32 RGB CIFAR-10 image data.
-
FIFO Buffer – Not started. Planned as a synchronous FIFO with fixed depth, single clock domain, and standard full/empty flag logic.
ANAV for Martian Surface Exploration (ISRO IRoC‑U 2025) | Link
An autonomous aerial system designed for reliable navigation and landing in environments without GPS, using onboard visual-inertial mapping, real-time obstacle awareness, and wireless telemetry
Duration: Team-based (ISRO RIG), Ongoing Tools: Jetson Nano | Pixhawk | RealSense D435i | ESP32 (ESP‑Now) | VINS‑Fusion | ROS2
-
Built a
<2kg autonomous quadrotor>forGNSS-denied environments, capable ofreal-time mapping,navigation, andsafe-zone detectionwith zero manual intervention; Jetson Nano was used for onboard compute and Pixhawk handled flight control. -
Calibrated ESCs and implemented
embedded power distributionvia BEC module to ensure stable regulation for compute/sensing; integrated barometer and external optical flow sensor with Pixhawk for redundancy in low-texture or drifting conditions. -
Fused stereo-IMU data from
Intel RealSense D435iusingVINS-FusiononROS2, achieving<5cm driftover ~5m; transmitted real-time telemetry using ESP32 modules (ESP‑Now); autonomously landed onobstacle-free 1.5×1.5mzones with<15° slopes.
RV32I RTL CPU DESIGN | Link
Duration: Individual, Ongoing
Tools: Verilog (Icarus Verilog) | TL-Verilog (Makerchip)
-
Implemented a fully synthesizable RV32I RISC-V core in TL-Verilog with a single-stage pipeline, supporting all base integer instructions and immediate formats (I, S, B, U, J).
-
Developed a test program summing integers 1 to 9, verified correct ALU operations, branching, and control flow within 50 simulation cycles, with pass/fail status stored in registers
x30andx31. -
Designed a 32-register file with dual-read and single-write ports, enforcing write-disable on register
x0, and integrated instruction decode logic handling opcode, funct3, and funct7 fields. -
Implemented comprehensive ALU supporting arithmetic, logic, shifts, and comparisons, with immediate extraction and flexible program counter update logic including branch and jump target calculation.
-
Enabled simulation and debugging via Makerchip integration using
m4+cpu_viz(), with waveform visualization and automated test validation through register monitoring.
Peripheral Serial Communication Protocols [I2C/SPI/UART-TX] Link
|
|
|
|
Designed and implemented serial communication protocols in Verilog:
- I2C: Single-master, multi-slave with clock stretching & configurable delays.
- SPI: Supports modes 0–3 via CPOL/CPHA, performs single 8-bit full-duplex transfers, and allows clock frequency scaling.
- UART TX Soft Core IP: Customizable baud rate, lightweight transmitter module for FPGA/ASIC integration.
Device Modeling using Sentaurus TCAD | Link
Designed and simulated semiconductor structures (N-resistor, PN diode, NMOS) using Sentaurus TCAD; explored effects of doping, geometry, and physical models through process setup, simulation scripting, and visual analysis of internal device behavior
RU83C – Rubik’S Cube Solving Robot | Link
|
|
|
A vision-guided, algorithm-driven robot that solves the Rubik’s Cube with precision using Kociemba’s two-phase algorithm for optimal move sequences, developed in Unity3D with C# scripting