Codestin Search App

6/18/2025 • EN

This AI Agent Should Have Been a SQL Query

Explores building AI Agents as streaming SQL queries using platforms like Apache Flink for improved consistency, scalability, and developer experience.

AI Agents Apache Flink data processing Microservices Streaming SQL

Gunnar Morling

3/26/2025 • EN

Building a Personal Content Recommendation System, Part Two: Data Processing and Cleaning

Part two of building a personal recommendation system, covering data collection from Pocket and content extraction using the Jina Reader API.

Data Cleaning data processing github recommendation systems web scraping

Saeed Esmaili

3/16/2025 • EN

Building a Personal Content Recommendation System, Part One: Introduction

A developer documents the first steps in building a personalized content recommendation system using saved articles, text embeddings, and algorithms.

API Integration Content Embeddings data processing Personalization recommendation systems

Saeed Esmaili

12/19/2024 • EN

Quickly Filter and Aggregate Python Lists

Introduces the 'leopards' Python library for filtering and aggregating lists, offering a lightweight alternative to pandas for basic data operations.

Data Filtering data processing Leopards Library List Aggregation Python

Saeed Esmaili

10/1/2022 • EN

Chat log exhibits from Twitter v. Musk case

A cleaned-up, de-interleaved transcript of text message exhibits from the Twitter v. Elon Musk lawsuit, presented for clarity.

chat logs cybersecurity data processing legal discovery ocr

Dan Luu

11/14/2019 • EN

Python Tears Through Mass Spectrometry Data

A talk on using Python to efficiently process and analyze large datasets from mass spectrometry, presented at a Python Frederick event.

data processing Mass Spectrometry Python

Matt Layman

10/13/2018 • EN

Approximate Distinct Count

Explains the APPROX_COUNT_DISTINCT function for faster, memory-efficient distinct counts in SQL, comparing it to exact COUNT(DISTINCT).

algorithm Approximate Distinct Count Big Data data processing Hyperloglog

Niko Neugebauer

12/7/2013 • EN

Lean, Mean Data Science Machine

A guide to using the Unix command-line for efficient data science workflows, including data processing, exploration, and modeling.

data processing Data Science Exploratory Data Analysis Repl Unix Command Line

Jeroen Janssens

11/3/2013 • EN

SQLite

A guide to using SQLite and Python's sqlite3 module to efficiently manage and query large datasets from text files.

data processing database performance Python sqlite

Sebastian Raschka

9/19/2013 • EN

7 Command-Line Tools for Data Science

A guide to seven essential command-line tools (jq, csvkit, Rio, etc.) for data scientists to obtain, scrub, explore, and model data.

api Command Line Tools data processing Data Science json

Jeroen Janssens

Data processing Articles

This AI Agent Should Have Been a SQL Query

Building a Personal Content Recommendation System, Part Two: Data Processing and Cleaning

Building a Personal Content Recommendation System, Part One: Introduction

Quickly Filter and Aggregate Python Lists

Chat log exhibits from Twitter v. Musk case

Python Tears Through Mass Spectrometry Data

Approximate Distinct Count

Lean, Mean Data Science Machine

SQLite

7 Command-Line Tools for Data Science

Select Language