I'm a data engineer and environmental data enthusiast passionate about empowering scientists with tools to aquire and analyze data. My largest side projects focus on
- Remote sensing data analysis and publishing related research
- Using infrastructure-as-code tools for the orchestration and containerization of workflows (MlOps
- Webscraping and archiving
π« Questions? Connect with me at:
LinkedIn β’ [email protected]
| Section | Technologies | Project |
|---|---|---|
| Infrastructure-as-Code | Ansible, Terraform, Bash | ml_ops_tree_learn: Object detection MLOps |
| Prometheus, Grafana, NodeJS, Docker | ArchiveTeam IaC: Distributed compute observation stack | |
| Geospatial & Remote Sensing | Open3D, PyTorch, OpenCV, Rasterio | pyqsm: Image processing and spatial algorithms |
| NumPy, MatPlotLib, GeoPandas, GDAL | canopyHydrodynamics: Simulating water movement within tree canopies | |
| Data Engineering / DevOps | DLT, DuckDB, Web Scraping, Streamlit | LinkedInScraper: Automated data acquisition |
| GitOps, Pandocs, PyPI | canopyHydrodynamics: Robust GitOps CI/CD workflows |
|
Note
Detailed project descriptions are available via dropdowns.
π³ canopyHydrodynamics
GitOps, NumPy, MatPlotLib, GeoPandas, GDAL, Pandocs, PyPI
Simulating water movement within tree canopiea under varied meteorological conditions.
Identifies key structural traits:- Stemflow and throughfall generating areas of the canopy
- The 'drip points' to which throughfall is directed - complete with their relative volumes
- 'Divides' and 'confluences' within the canopy that dictate the flow of water through the canopy
Leverages GitOps for robust CI/CD capabilities.
- automated linting and testing for all changes
- dynamically created version upgrade branches
- auto-generated method documentation
- Versioned deployment automated for release branches
Prometheus, Grafana, NodeJS, Docker, Bash
Infrastructure-as-code to provision and configure a multi-server, multi-container cluster with a modern observability stack. Utilized for the community archive project ArchiveTeam.
Consists of:- Docker containerization monitored by CloudWatch
- Prometheus for node management/aggregation
- Graphana dashboards for visualization
- a custom a node.js metrics server for exporting telemetry.
π² pyQSM
SciPy, Open3D OpenCV, Rasterio
Image processing and spatial algorithms to clean and segment trees and their components within terrestrial LiDAR point clouds.
Key functionality includes:π ml_ops_tree_learn
Laspy, Terraform, PyTorch, Open3D
An MLOps pipeline for configuration and deployment of a convolutional neural-net on GPU-enabled, cloud-hosted clusters.
Automates the provisioning of Digital Ocean GPU droplets to allow users to leverage CUDA friendly compute. Designed as a 'one-click' solution enabling researchers without specialized hardware to process LiDAR data at minimal cost.πΈοΈ linkedInScraper
DLT, DuckDB, Web Scraping, Streamlit
A DLT pipeline leveraging a LinkedIn's 'hidden' Voyager API to retrieve job and company data.
- Built on DLT which provides a UI for viewing pipeline status, exploring data
- Custom DLT source automatically handles REST requests, pagination, data extraction and relational DB storage
- Predefined endpoints/available datasets
- `get_companies`: scrape followed companies via GraphQL profile components
- `get_job_urls`: fetch job cards per company
- `get_descriptions`: fetch job descriptions and details crawler
- Extensible, with additional resources configured via json