Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Norman-Ou's full-sized avatar

Block or report Norman-Ou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Norman-Ou/README.md

Hi there 👋

I‘m Ruizhe Ou

I explore how multi-modal large language models (MLLMs) can advance remote sensing tasks.

I work on text-to-image, image-to-image, and text-to-video generation, blending creativity with machine learning.

Things I‘ve Been Discovering

GeoPix is a remote sensing MLLM that extends image understanding capabilities to the pixel level. It integrates a mask predictor into the MLLM, transforming visual features from the vision encoder into masks conditioned on the segmentation token embeddings generated by the LLM.

multitask

This work provides a novel method for generating disaster-affected remote sensing images by integrating state-of-the-art models, including Stable Diffusion, BLIP, GPT-4, and human-in-the-loop feedback. The pipeline starts with only 97 unlabelled 512×512 remote sensing images. BLIP is first used to generate initial captions, which are then refined through expert feedback and GPT-based semantic rewriting to enhance the prompts. These enhanced prompts, paired with the original images, form a synthetic training set.

fig1

Things I've Been Creating

Line Art to Anime

I developed a ControlNet model designed to transform line art into fully colored anime-style images. This model enables precise and high-quality generation by conditioning the diffusion process on clean line drawings, making it easier to create vibrant and consistent anime artwork from simple sketches.

Image Generation Pipeline for Linky Logo

Main developer of the image generation pipeline for Linky, supporting anime-style, real-style, and film-style image stylization, pose editing, and face consistency modeling.

This pipeline covers:

  • Prompt cleaning and expansion (similar to a prompt helper)
  • Image style selection
  • Pose extraction and editing
  • Face consistency enhancement
  • Risk control evaluation
  • Compute resource scheduling strategies

Popular repositories Loading

  1. GeoPix GeoPix Public

    [GRSM] Project Page for "GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing"

    Python 55 5

  2. InstantID-with-FouriScale InstantID-with-FouriScale Public

    Combined InstantID🔥 and FouriScale to generate high resolution image!

    Python 11 1

  3. Deformable-DETR-Torch2.x-cuda12 Deformable-DETR-Torch2.x-cuda12 Public

    🔧 A minimal, PyTorch 2.x and CUDA 12 compatible reimplementation of the Multi-Scale Deformable Attention CUDA ops from Deformable DETR. Drop-in replacement for the original ms_deform_attn modules.

    Cuda 8

  4. BUPT-QM-InnovationTutor_Group61 BUPT-QM-InnovationTutor_Group61 Public

    CSS 2 1

  5. EBU6304-2022-Software-Engineering-Group-8 EBU6304-2022-Software-Engineering-Group-8 Public

    Java 2

  6. Norman-Ou Norman-Ou Public

    Config files for my GitHub profile.