A new package index for Python
The Python Package Index (PyPI) is the principal repository of libraries for the Python programming language, serving more than 170 million downloads each week. Fifteen years after PyPI launched, a new edition is in beta at pypi.org, with features like better search, a refreshed layout, and Markdown README files (and with some old features removed, like viewing GPG package signatures). Starting April 16, users visiting the site or running pip install will be seamlessly redirected to the new site. Two weeks after that, the legacy site is expected to be shut down and the team will turn toward new features; in the meantime, it is worth a look at what the new PyPI brings to the table.
Growing needs and a restart
In the early 2000s, several Python developers wrote and ran their own tools cataloging and linking to available Python packages. In 2002, Richard Jones successfully proposed PEP 301 to create an official index meant to run on a single server and linking to Python packages hosted elsewhere. Jones, and Martin von Löwis who joined him as a core maintainer soon after, started, administered, and improved the site — before the advent of Django, Flask, Pyramid, and other Python web frameworks.
Jones, von Löwis, and (starting in the 2010s) Donald Stufft were volunteers — as with Wikipedia, "the Cheese Shop" (as it was named by Barry Warsaw) became popular before it got consistent upkeep from paid staffers, and demands on PyPI's infrastructure grew steadily. For packagers' convenience and to improve the experience of end users, the maintainers started allowing packagers to upload files onto a central server via ssh; the PyPI application assumed those files lived on its local filesystem.
For security,
performance, and user experience reasons, PyPI stopped indexing
projects with files that were hosted externally in 2015. As PEP 470 ("Removing External Hosting Support on PyPI") stated, often
"end users want to use PyPI as a repository but the author wants to use
PyPI solely as an index
". Meanwhile, PyPI's file-hosting needs grew
to over a
terabyte. Volunteer developers and system administrators battled
outages,
malicious packages, and spam attacks, while the age of the code base and its
structure made it hard to maintain — and Sisyphean to develop new
features. Generous infrastructure donations helped; for instance, Fastly's
donation of Content Delivery Network (CDN) service in 2013 improved
performance substantially.
A slow takeover of functionality
Stufft worked on packaging and distribution projects for several years. He did this mostly as a volunteer, though he now works for Amazon Web Services and spends two paid days per week on PyPI, pip, and related tools. He started a replacement PyPI effort, Crate, in 2011. A few years later, he changed tack and began work on what turned into Warehouse, which proved to be a solid foundation for PyPI 2.0. Warehouse is a web application using the Pyramid web framework, with 100% test coverage of its Python code, and a Docker-based development environment to make it easier to hack on locally.
Volunteer contributors, such as developer Dustin Ingram, joined the project. Designer and front-end developer Nicole Harris volunteered, assessing legacy PyPI, articulating design goals, and starting an overhaul of the user interface. Ernest W. Durbin III worked steadily in development and operations as a volunteer, improving the infrastructure behind Warehouse's pre-production installations, first at preview-pypi.python.org, then warehouse.python.org, then pypi.io, and, since late 2013, pypi.org.
Given its years of live testing, calling pypi.org a "beta" belies its longevity and durability. (Stufft's original migration plan predicted Warehouse would gradually come to "own" various database tables "as time goes on" but didn't predict it would take quite this long.) Warehouse always had read access to the canonical PyPI database; this was easier than creating a mirror database, and enforced discipline for Warehouse developers. Legacy PyPI allowed packagers to upload releases via command-line tools like setuptools or through an in-browser interface. However, its uploading routines increasingly failed to fully record new releases (causing HTTP 500 internal server errors), which led to a ~10% error rate by June 2016. At that point, Stufft advised Python packagers that it was a better experience to upload releases to the canonical PyPI database via Warehouse, using the command line tool Twine, than via pypi.python.org. Starting in July 2017, PyPI went so far as to disable uploading via the old site.
Throughout this period, Warehouse was labeled "pre-production" to acknowledge its missing features, layout changes, and occasional outages. Uploading (an API interaction) worked well, but the browser user interface still lacked significant features. Most notably, important features, such as email management, and significant project owner/maintainer administrative functionality, such as release deletion, were only available using the legacy site.
Fresh code and momentum
In early 2016, maintainers of Python packaging and distribution tools were eager to see Warehouse development speed up so that it could replace legacy PyPI. I started speaking with the Python Software Foundation (PSF) Packaging Working Group to discuss applying for Mozilla Open Source Support (MOSS) funding; an award proposal was submitted in 2017 that Mozilla accepted. MOSS-funded work started in December 2017; I serve as Warehouse's project manager. Harris, Durbin, Ingram, Laura Hampton, and I have improved PyPI's code base and infrastructure toward to the goal of redirecting traffic to the new site and shutting down the old one.
The group has also nurtured new contributors. Jones and Stufft found that legacy PyPI could not attract a group of volunteer contributors to reduce the workload on the core maintainers, mainly because newcomers found it nearly impossible to understand, or even locally deploy, the code base. Warehouse's frameworks, docs, developer environment setup, and configuration are superior, making onboarding new developers and deploying their work far easier. Just between February 20 and March 20 this year, Warehouse merged 127 pull requests by 20 distinct authors; it continues to attract new contributors, some of whom are entirely new to open source.
Changes, new features, and deprecations
The most obvious improvement in Warehouse is the browser interface. The new site looks, as longtime Python users finding the site often notice, like a site from the current decade. The colors have changed, it's mobile-responsive, and the layout reflects what Harris has learned from user testing. The new front-end is more accessible to people with visual and motor-control disabilities (with more work to come). In the legacy code base, it was difficult to change the interface because content and presentation were mixed together. In contrast, Warehouse uses model/view/controller conventions, and uses front-end frameworks and tools: Jinja2 for templates, Sass with SCSS to handle CSS, Stimulus for JavaScript, and gulp to process and prepare front-end files for serving.
Beyond just the new interface is new functionality. Warehouse provides a chronological release history for each project (example), an easy-to-read project activity journal for project maintainers (see screen shot below), user-visible Gravatars and email addresses for maintainers on project pages, and support for multiple project URLs (e.g., for a homepage and a repo) on a project's PyPI page. Previously, to put a project description on PyPI, maintainers had to submit documents formatted in reStructuredText. Warehouse supports Markdown README files, thanks to improved metadata handling that required improvements to many parts of the Python packaging toolchain.
The original PyPI drew upon SourceForge and Freshmeat.net software categories to create a list of standard "Trove classifiers". Packagers label their releases with these classifiers to describe their target platforms, Python versions, intended audience, and frameworks, and to suggest the project's maturity status. Warehouse uses ElasticSearch for faceted search. This lets users perform intersection searches and filter the project list by multiple classifiers, making packagers' classifier choices more useful (see screen shot below). Project maintainers also no longer need to register a project with a separate command before initially uploading it to PyPI.
Overall, Warehouse has newer back-end infrastructure than legacy PyPI did, supporting new features and a more scalable site. Instead of assuming that it lives on a single server, Warehouse assumes that its PostgreSQL database, file storage, search, queueing (Redis), and other parts may live in different containers or on different machines. Durbin added configuration management and instrumented Warehouse to gather statistics to view using Datadog. In the course of his infrastructure work, he built cabotage, a new deployment infrastructure tool that securely manages secrets with end-to-end TLS and lets PyPI maintainers automate managing software and configuration changes.
In the interests of more sustainable long-term policies and to fight spam, PyPI has removed or deprecated several features already. For instance, one of the steps taken to handle a spam attack earlier this year is to require that users verify an email address in order to upload releases. And uploading new releases via the web interface instead of the API is no longer allowed, which simplifies PyPI's job.
In general, very little about a PyPI project can now be altered via the browser. Project maintainers used to be able to update release descriptions in the browser, but to update release metadata, maintainers now need to upload a new release to respect release metadata immutability. PyPI no longer allows HTTP access to APIs; it's now HTTPS-only. Also, in advance of PyPI's CDN (Fastly) turning off support for TLS 1.0 and 1.1 on June 30, Warehouse supports only TLS versions 1.2 and above.
Download counts are no longer visible in PyPI's API; instead, PyPI advises curious statisticians to use the data set it uploads to Google BigQuery. As the open-source service Libraries.io improves its PyPI dependency analysis and metrics coverage, PyPI is increasingly directing users there, instead of providing such services itself. Similarly, getting out of the documentation-hosting game and deferring to Read the Docs, PyPI no longer allows package maintainers to upload docs to pythonhosted.com. In addition, as legacy PyPI shuts down, users will also lose the ability to log in with OpenID and Google auth.
Warehouse's signature handling demonstrates a
shift in Python's thinking regarding key management and package
signatures.
Ideally, package users, software distributors, and package distribution
tools would regularly use signatures to verify Python package integrity.
For the most part, however, they don't, and there are major
infrastructural barriers to them effectively doing so.
Therefore, GPG/PGP signatures for packages are no longer visible in
PyPI's web interface. Project maintainers can still attach signatures to their
release uploads, and those signatures still appear in the Simple
Project API as described in PEP
503. Stufft has made no secret of his
opinion that "package signing is not the Holy Grail
"; current
discussion among packaging-tools developers leans toward
removing signing features from another part of the Python packaging ecology
(the wheel library) and working toward implementing The Update Framework
instead. Relatedly, Warehouse, unlike legacy PyPI, does not provide an interface
for users to manage GPG or SSH public keys.
Thanks to redirects, most sites, services, and tools will probably be able to seamlessly switch to the new site when it launches on April 16. Migration guides for Python users, project maintainers, and API users are available. Currently the main snags seem to be the TLS 1.0/1.1 deprecation affecting users with old versions of OpenSSL (including users on some versions of Mac OS X) and the redirects affecting companies whose private internal package indexes include packaging clients that cannot follow an HTTP 302 redirect from pypi.python.org to pypi.org.
Future features
Shutting down legacy PyPI frees Warehouse to make major database schema changes that would have broken features in legacy PyPI and frees maintainers to concentrate on new improvements. As the MOSS award runs out, PSF's Packaging Working Group is seeking further funding to continue Warehouse work, particularly to audit and improve accessibility and application security. Volunteer Luke Sneeringer and others are discussing better authentication for release uploaders, including a bearer token authentication scheme involving Macaroons, and two-factor authentication. While Stufft is deferring to Ingram, Harris, and Durbin for day-to-day Warehouse leadership, he aims to eventually deprecate its XML-RPC API and architect new APIs, probably along RESTful lines. Warehouse developers will discuss and work on some of these tasks during sprints at PyCon in Cleveland, Ohio this May and at EuroPython in Edinburgh, Scotland in July.
Beyond security, accessibility, and APIs, Warehouse contributors are interested in performing further systematic user testing and adding user-friendly features like group/organization support for related projects and, potentially, language localization. Warehouse will also need to make it easier to change project ownership: with the acceptance of PEP 541, a long-awaited policy on "Package Index Name Retention," PyPI administrators have a policy framework to address requests to take over maintainership and ownership of abandoned project names. PyPI administrators are finalizing the implementation details now, which will enable administrators to start resolving hundreds of backlogged requests. Rather than treat user support requests like bug reports, Warehouse developers plan to create or integrate with a proper user support ticket system.
The pace of further improvements will depend on whether Python packaging and distribution tools receive further financial support and on how volunteers' enthusiasm and investment grows or shifts once the deadline urgency has passed. There is plenty to do even after the switch. The ongoing story of Python packaging will continue to evolve, and Warehouse — or something that eventually replaces it — will have to adapt.
[I would like to thank Ernest W. Durbin III, Nicole Harris, Dustin
Ingram, and Donald Stufft for reviewing this article.]
| Index entries for this article | |
|---|---|
| GuestArticles | Harihareswara, Sumana |
| Python | Packaging |
