I'll try to update this on Tuesday so I can join in the fun, but more than likely, I'll be updating it on Wednesday or Thursday because life, ya know? All data come from Posit's Tidy Tuesday repo.
-
April 15, 2025: Palmer Penguins. A classic from R that I explored in Python. The Bambi documentation is the real MVP of my script for this week. I used the data as a chance to show why it's generally better to fit a multi-level model instead of a basic linear model when you know there is variation in your data by some grouping. Code, viz
-
April 22, 2025: 420 and auto fatalities. Some basic exploratory work made me realize that the 420 piece of this was not all that interesting, but that thinking about how to treat count data, especially when the data set at hand is not particularly wide, is kind of neat. I haven't run many Poisson regressions, so that was the task I gave myself this week. Code, Viz
-
April 25, 2025: UseR conference sessions. I like streamgraphs. I never use streamgraphs. Let's try streamgraphs! This week, I took a look at the density of conference sessions by timeslot. I don't totally think this works, but it looks kind of neat and it was helpful for looking under the good at the geom_stream function to see how to tinker with output a bit. Code, Viz.
-
May 6, 2025: Cancelled National Science Foundation grants. This was a tough one! I made a Shiny dashboard that uses the very neat querychat package, which connects the OpenAI API to Shiny to filter data. The really amazing thing to me is that it basically calls DuckDB via natural language so that you can essential run SQL without knowing SQL, but it gives you the code you would need to filter the so that you can actually learn how to run queries. Code, Dashboard.
-
May 13, 2025: Mt. Vesuvius induced earthquakes. I really struggled to not turn this into a big project even though I don't know anything about earthquakes. Just having a nice set of geocoords and some interesting patterns set me down a modeling path that it took a bit to pull myself out of. It seems obvious that the closer you are to the volcano, the larger magnitdue you'd feel, but a little descriptive work shows that's not necessarily the case and also that quakes are being felt farther away. Code, Viz.
-
May 27, 2025: Dungeons and Dragons. This started as a way to explore Polars, so I spent most of my time in the Polars docs rather than thinking about what to analyze or viz. That said, I decided using a kitchen sink approach to compare models types. Overall, Polars is fine. It does some stuff better, or at least simpler, than Pandas, but overall it seems less like a close approximation of dplyr in R and just a different approach compared to Pandas, not better, just different. Code, Viz
-
June 3, 2025: This one was perhaps too interesting and I had too little time to truly dive into the data, but I'm making a note to myself here to revisit the Project Gutenberg data, which is ripe for LLM work. I wanted to check out bezier curves, which can make cool visuals--ggforce does the heavy lifting here--and life-death cycles seemed like an easy way to do that. The key thing here is that you have to do a bit of processing to get the midpoint for each curve or you'll end up with straight lines. I also had to tinker with figuring out how to collapse and merge the data functionally with itself to get the right structure. There's probably a better way to do this, so input is welcome!. Code, Viz