Thanks to visit codestin.com
Credit goes to lwn.net

|
|
Log in / Subscribe / Register

On pages and folios

By Jonathan Corbet
April 24, 2026
The kernel coverage here at LWN often touches on memory-management topics and, as a result, tends to talk a lot about both pages and folios. As the folio transition in the kernel has moved forward, it has often become difficult to decide which term to use in writing that is meant to be both approachable and technically correct. As this work continues, it will be increasingly common to use "folio" rather than page. This article is intended to be a convenient reference for readers wanting to differentiate the two terms or understand the state of this transition.

Pages

Memory in all but the smallest of computing systems is divided into regularly-sized units called "pages"; the most common page size is 4KB, but there are systems that run with larger page sizes. A page is the smallest unit that the system's hardware units, including the memory-management unit and translation lookaside buffer (TLB) work with. When memory is swapped in or out, or when it is moved between NUMA nodes in a larger system, it is moved in chunks that are an integral number of pages. Pages are thus fundamental to the management of memory in Linux systems.

This structure is reflected in the way virtual addresses work. On a 64-bit x86 system with a 4KB page size, the upper 52 bits (the "page-frame number" or "PFN") identify the page referred to by the address, while the bottom 12 bits give the offset within the page:

[Simple address
structure]

Naturally, the full story is a bit more complex than that, starting with the fact that the PFN is really a physical concept, not a virtual one, so the usage here is a bit sloppy — the real page-frame number is only found as part of the address-translation process. Also, the PFN does not usually occupy the full upper 52 bits. Logically, the PFN can be looked at as an index into a large table that stores information about each page, most importantly whether it is resident in RAM and, if so, what its physical address is. In practice, the PFN is split into a maximum of five (as of this writing) nine-bit (on x86-64 systems) indices, each of which is an index into a different table; those tables are then organized into a hierarchy:

[More complete address
structure]

This structure enables a far more efficient representation of the page tables, and it also, as we will see, comes into play in how huge pages are implemented. On the other hand, it makes for expensive memory access. Every time the CPU encounters a virtual address, it must translate it into a physical address; that means iterating through that series of five levels of page tables, which is going to be slow. To minimize that expense, CPUs maintain a translation lookaside buffer to cache the result of address translations. If a given PFN is in the TLB, the translation will happen quickly; otherwise it will be slow. The TLB is not huge, so a lot of attention goes into ensuring that code uses the TLB efficiently.

The system memory map and struct page

The kernel needs to keep track of how every page of memory is being used — that is what "memory management" is all about, after all. To that end, it maintains a large array of page structures, one for each page of physical memory in the system. This structure has been made as small as kernel developers can get it, but it still (typically) requires 64 bytes. As a result, on a system with 4KB pages, the associated page structures occupy 1.6% of physical memory. That is a cost that the kernel community has long wanted to reduce.

The page structure has been used throughout the kernel for many years to refer to specific pages of physical memory. At times this ubiquity has proved to be problematic; struct page is a core memory-management data structure, but code in other parts of the system often make surprising use of its fields. Current work in the memory-management subsystem is reducing the importance of struct page, which may, someday, wither away altogether.

Huge pages

One way to get better use out of the TLB is to have each TLB entry cover a larger area of memory — to make the page size larger, in other words. Most contemporary CPUs implement a huge-page mechanism that does exactly that. In essence, the CPU implements the smallest huge-page size by taking the PTE portion of the page-frame number and turning it into the offset within a larger page instead:

[Huge-page address
structure]

The entry at the PMD level of the page tables is specially marked to indicate that the PFN stops there and points to a 2MB huge page (again, on x86; other architectures can vary somewhat but the idea remains the same). For this reason, this type of huge page is often referred to as a "PMD-level" (or just "PMD") huge page. By extending the range of a TLB entry from 4KB to 2MB, huge pages can significantly increase the amount of memory that can be addressed without having to go through the whole translation routine.

Traditionally, applications had to explicitly request huge pages to be able to make use of them. The transparent huge page (THP) feature makes it possible for the kernel to provide PMD-level huge pages to user space automatically in situations where it appears that they will help performance. THPs are not always a performance win; they can cause a lot of memory waste if the pages are only sparsely used and they stress the memory-management system more, so they can slow some workloads down. For this reason, the feature ends up being disabled on some systems.

Larger huge pages exist as well; a PUD-level huge page removes the PMD layer of the page-table hierarchy, yielding a 1GB page size. Such pages can be somewhat unwieldy to work with and can be difficult for the memory-management subsystem to reliably supply, but one common use case is to allocate them for use by virtual machines, which manage them internally, in smaller chunks, as the virtual machine's "physical" memory.

More recent processors have gained a separate, not-so-huge-page concept. Some x86 processors can mark a TLB entry as covering eight pages, and some Arm processors can perform a similar trick with 16-page chunks. That, again, allows the TLB to cover more of working memory, but without requiring the use of 2MB (or larger) huge pages. The result of these changes is that the sizing of huge pages is becoming more flexible; the term "mTHP" (multi-size transparent huge page) is often used for these smaller page clusters.

Folios

Even in the absence of huge pages, the kernel has long needed to work with larger chunks of physically contiguous memory. The concept of compound pages was added to the 2.6.6 kernel release in 2004 as one way of organizing such a chunk; a compound page is a power-of-two-sized group of pages managed, for a period of time, as a single unit. Since a compound page consists of at least two physically contiguous pages, it is represented by an equal number of adjacent page structures. The kernel takes advantage of this fact by treating the page structure for the first ("head") page as representing the whole set, and storing related information in the page structures for the following ("tail") pages.

Back in 2021, Matthew Wilcox noticed that there was a lot of kernel code that could be handed either a compound page or a single ("base") page, with the base page being perhaps located within a compound page. A surprising amount of overhead went into ensuring, in many places in the kernel, that any passed-in struct page pointer referred to the head page of a compound page, or to a solitary base page. He decided to improve the kernel's internal APIs to reduce that overhead. The result was the "folio", which was defined as a struct page that is known not to represent a tail of a compound page. After some significant discussion, the initial folio patches were merged for the 5.16 release at the beginning of 2022.

It became evident fairly quickly, though, that the folio concept has uses far beyond reducing the overhead of supporting compound pages. For decades, kernel developers have contemplated managing memory in larger chunks; the 4KB page size is unchanged from the 1990s, even though the amount of installed memory has grown by several orders of magnitude since then. Contemporary systems have to manage vast numbers of pages, and the associated overhead, in terms of both CPU and memory use, hurts. But attempts to move to larger pages have generally been thwarted by other costs, primarily the lost memory due to internal fragmentation.

What was needed was a way to deal with memory in variably sized chunks, rather than working with one fixed size (and, perhaps, the vastly larger huge-page sizes). Folios have, since their introduction, been evolving into that way. Over time, areas of the kernel that dealt with pages have been modified to work with variably sized folios instead.

For example, consider the page cache, which caches portions of files in memory to speed access. The page cache once, true to its name, cached data one page at a time. Now, though, it would be more properly called the "folio cache", with the ability to cache file contents in appropriately sized folios. A small file might well fit within a single-page folio in the page cache, while a much larger file could be cached in a relatively small number of large folios. Making this work required a lot of changes to the memory-management subsystem, the readahead code, and the individual filesystems as well.

To see how far this transformation has progressed, compare the definitions of struct address_space_operations, which (to simplify) describes the functions that move data between the page cache and the underlying persistent storage, from the 5.16 kernel (when folios were introduced but not yet widely used) and 7.0-rc5. The readpage() method is now read_folio(), many other methods have been changed similarly, and none of them take struct page arguments in the current version. These changes were not easy, but they allow the management of the page cache at varying levels of granularity, enable the support of filesystems with block sizes larger than the system page size, and ease the creation of larger (more efficient) I/O operations.

Anonymous memory for user-space processes has also traditionally been allocated and managed one page at a time. The addition of transparent huge pages helped in some situations, but THPs are too large to be a net performance improvement for many workloads. Instead, mTHPs are easier to work with, waste less memory through internal fragmentation, and can boost performance significantly; folios can represent them nicely within the kernel. The work to make full use of mTHPs is still ongoing, and may take a while yet to settle, but mTHPs may prove to be a more generally applicable performance enhancement than PMD-level huge pages for many workloads.

One significant advantage of moving to folios for both the page cache and anonymous memory is the effect on the kernel's least-recently-used (LRU) lists, which are used to identify which pages (now folios) have not been accessed for a while and should be considered for reclamation. Large numbers of pages lead to extremely long LRU lists, which are more expensive for the kernel to manipulate. Managing folios in those lists makes them shorter, again improving performance.

Within the kernel, folios are represented by struct folio. Since the introduction of folios, this structure has been carefully designed to overlay struct page (more correctly, it overlays the first four page structures representing a large folio). That work has been done to ensure that the folio structures are made up of valid page structures, allowing the transition to folios to be implemented incrementally. There will come a time, though, when struct folio will become entirely separate from struct page, but that will require some fundamental changes to the system's memory map.

Shrinking the memory map

As mentioned above, the kernel's memory map itself takes up a significant amount of memory, which developers would like to see put to better uses. The static nature of the map means that there must be a page structure for each physical page, and that said structure must be large enough to handle all of the possible uses to which a page might be put. Making the map more dynamic offers the hope of reducing its memory footprint considerably.

The eventual plan is to replace struct page with an eight-byte memory descriptor; it can be thought of as a pointer to a type-specific structure describing the memory in question, though the real story is a bit more complex. For memory that is organized into folios, the folio structure will be the descriptor. Unlike page structures, though, there would only need to be a single folio structure regardless of how many pages the folio holds. There would still need to be a descriptor entry for each PFN, but the entries in the memory map for the base pages that make up a single folio would all point to a single folio structure. There will be other descriptor types for other memory uses, including slab pages, page tables, and so on. See this page for a description of the descriptor types and how they are expected to work.

The memory-descriptor work is underway, and may take years yet to complete. This sort of transition in a production kernel can be compared to replacing the foundation of building that is in heavy use; it is not a small task. But the fundamental rethinking of the memory-management subsystem that was kicked off by the introduction of folios is moving quickly and has already shown some significant results.

Index entries for this article
KernelMemory management/Folios
KernelMemory management/struct page


to post comments

Good article

Posted Apr 24, 2026 18:08 UTC (Fri) by q3cpma (subscriber, #120859) [Link] (4 responses)

Thanks for this clearly written recap, I think it's even simple enough to try to get my colleagues to read it!

A relevant question just popped into my mind: do you (or anyone) know how other OSes with a modicum of focus on performance (e.g. Windows, macOS, FreeBSD) approach the problem?

Good article

Posted Apr 24, 2026 18:39 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Windows NT has always used a flexible directory of pages, with page frame numbers providing a level of indirection in this directory. So it could manage pages of different size from the very beginning.

Filesystems and apps could use large pages to save on mapping costs, but they were not swappable. Which is a bigger deal in Windows, because it doesn't really support overcommit. If you allocate something, it must be backed by a (theoretically accessible) pagefile. I stopped doing Windows NT kernel work in 2000-s, but I think they have since fixed that.

https://github.com/ayoubfaouzi/windows-internals/blob/mai...

Good article

Posted Apr 24, 2026 20:52 UTC (Fri) by q3cpma (subscriber, #120859) [Link]

Thanks for the info. Knowing NT was designed and partly made by VMS people certainly makes sense sometimes.

Good article

Posted Apr 26, 2026 12:27 UTC (Sun) by micka (subscriber, #38720) [Link] (1 responses)

Having to use it daily at work, I object to the notion that macos has any kind of focus on performance.

Good article

Posted Apr 26, 2026 13:29 UTC (Sun) by q3cpma (subscriber, #120859) [Link]

Oh, you and me both, man. I have a M2 Pro at work and that's because it was that or Windows 11. Coming from Gentoo, I learned to tolerate MacPorts and XQuartz enough, haha.

But well, I meant compared to stuff like OpenBSD or Plan 9 that don't have the resources to _try_ to focus on performance. And macOS did manage to switch to APFS where MS fumbled with ReFS...

History is a little backwards ...

Posted Apr 24, 2026 20:57 UTC (Fri) by willy (subscriber, #9762) [Link]

The original motivation for all of this work was the effort to support THPs in the page cache (other than shmem). Kiryl did that work, and I referred to it extensively at the beginning.

https://lore.kernel.org/linux-fsdevel/20170126115819.5887...

I had argued with Kiryl (as early as 2015, I think) that to be successful, we needed to support arbitrary order pages, not just THP sizes. When he stopped working on his patches, I took the opportunity to do things the way I thought they should be done.

As I worked on it, I realised that I didn't understand what it meant to pass a struct page pointer to, eg, readpage(). Was it legitimate to pass any page, even a tail page to readpage()? Or did it have to be a head page? And what, exactly, did that mean if we did pass a tail page? Should we fill in the entire compound page, or just the precise page that was requested? We have only one PageUptodate bit, and it's stored on the head page, so that pointed towards an answer of sorts.

Eventually I couldn't handle the ambiguity any more and decided we needed a new type. It's every bit as painful as everybody said it would be, but it's offered the opportunity to clean up a lot of code.

For the hyper-interested, you can see my earliest presentation on this (before the name folio had been coined) at Linux Conf AU 2020 here: https://youtu.be/p5u-vbwu3Fs

PMD? PUD?

Posted Apr 27, 2026 9:31 UTC (Mon) by taladar (subscriber, #68407) [Link] (5 responses)

Pretty clearly written article, it just could use a paragraph introducing the terms PMD and PUD before you use them. I know they are just names for different levels of that 5 level hierarchy you do introduce but it might be good to mention which ones.

PMD? PUD?

Posted Apr 27, 2026 9:38 UTC (Mon) by corbet (editor, #1) [Link] (3 responses)

I deliberately left that out because the article was already getting long and, as you say, they are just names for the various levels of the page-table hierarchy. Adding that PMD = "page middle directory" and PUD = "page upper directory" didn't seem worth the clutter. And how does one define P4D?

Apologies, though, if that decision reduced the clarity of the article, that certainly wasn't the effect I was after.

PMD? PUD?

Posted Apr 27, 2026 11:08 UTC (Mon) by taladar (subscriber, #68407) [Link]

I was thinking more along the lines of a sentence that says "The levels of this hierarchy are called..." I agree that the actual abbreviations don't really matter to the context here.

PMD? PUD?

Posted Apr 27, 2026 11:20 UTC (Mon) by joib (subscriber, #8541) [Link]

It would perhaps, at this point, be clearer to just rename them to P1D, P2D, etc.

That's of course not the fault of the article, whose job is to report on things as they are, not as I might wish them to be.

PMD? PUD?

Posted Apr 28, 2026 18:58 UTC (Tue) by songmaster (subscriber, #1748) [Link]

All my browsers are configured for dark-mode (white text on black) and while reading this I wondered if the address-bit diagrams might have included some black text on a transparent background explaining what the different colored parts were.

PMD? PUD?

Posted Apr 29, 2026 2:18 UTC (Wed) by linusw (subscriber, #40300) [Link]

There are detailed explanations in the kernel page table documentation:
https://docs.kernel.org/mm/page_tables.html


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds