Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Stop saving Runs in OldDataMonitor#26711

Merged
timja merged 6 commits into
jenkinsci:masterfrom
jglick:OldDataMonitor
Apr 29, 2026
Merged

Stop saving Runs in OldDataMonitor#26711
timja merged 6 commits into
jenkinsci:masterfrom
jglick:OldDataMonitor

Conversation

@jglick
Copy link
Copy Markdown
Member

@jglick jglick commented Apr 27, 2026

I reviewed a heap dump from a Jenkins controller (2.541.x) which was having severe memory pressure problems; it contained hundreds of thousands of WorkflowRun objects, most of which were completed. FINEST logging on RunMap revealed that almost all of them were being loaded by one of the following code paths (or minor variants):

at hudson.model.RunMap.retrieve(RunMap.java:292)
at hudson.model.RunMap.retrieve(RunMap.java:64)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:451)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:445)
at jenkins.model.lazy.AbstractLazyLoadRunMap.resolveBuildRef(AbstractLazyLoadRunMap.java:371)
at jenkins.model.lazy.AbstractLazyLoadRunMap$BuildReferenceMapAdapterResolver.resolveBuildRef(AbstractLazyLoadRunMap.java:528)
at jenkins.model.lazy.BuildReferenceMapAdapter$EntryAdapter.getValue(BuildReferenceMapAdapter.java:345)
at …
at hudson.tasks.LogRotator.perform(LogRotator.java:194)
at jenkins.model.SimpleGlobalBuildDiscarderStrategy.apply(SimpleGlobalBuildDiscarderStrategy.java:61)
at jenkins.model.BackgroundGlobalBuildDiscarder.lambda$processJob$0(BackgroundGlobalBuildDiscarder.java:75)
at …
at jenkins.model.BackgroundGlobalBuildDiscarder.processJob(BackgroundGlobalBuildDiscarder.java:71)
at jenkins.model.BackgroundGlobalBuildDiscarder.processJob(BackgroundGlobalBuildDiscarder.java:64)
at jenkins.model.BackgroundGlobalBuildDiscarder.execute(BackgroundGlobalBuildDiscarder.java:56)
at …
at hudson.model.RunMap.retrieve(RunMap.java:292)
at hudson.model.RunMap.retrieve(RunMap.java:64)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:451)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:445)
at jenkins.model.lazy.AbstractLazyLoadRunMap.resolveBuildRef(AbstractLazyLoadRunMap.java:371)
at jenkins.model.lazy.AbstractLazyLoadRunMap$BuildReferenceMapAdapterResolver.resolveBuildRef(AbstractLazyLoadRunMap.java:528)
at jenkins.model.lazy.BuildReferenceMapAdapter.get(BuildReferenceMapAdapter.java:133)
at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:387)
at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:240)
at PluginClassLoader for workflow-job//org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
at PluginClassLoader for workflow-job//org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
at hudson.model.Run.fromExternalizableId(Run.java:2474)
at hudson.diagnosis.OldDataMonitor$RunSaveableReference.get(OldDataMonitor.java:431)
at hudson.diagnosis.OldDataMonitor.getData(OldDataMonitor.java:113)
at PluginClassLoader for support-core//com.cloudbees.jenkins.support.impl.AdministrativeMonitors$1.lambda$printTo$2(AdministrativeMonitors.java:85)
at …
at PluginClassLoader for support-core//com.cloudbees.jenkins.support.impl.AdministrativeMonitors$1.printTo(AdministrativeMonitors.java:78)
at PluginClassLoader for support-core//com.cloudbees.jenkins.support.api.PrefilteredPrintedContent.writeTo(PrefilteredPrintedContent.java:63)
at PluginClassLoader for support-core//com.cloudbees.jenkins.support.SupportPlugin.writeBundle(SupportPlugin.java:498)
at PluginClassLoader for support-core//com.cloudbees.jenkins.support.SupportPlugin.writeBundle(SupportPlugin.java:415)
at PluginClassLoader for support-core//com.cloudbees.jenkins.support.SupportAction$SupportBundleAsyncGenerator.compute(SupportAction.java:546)
at jenkins.util.ProgressiveRendering$1.run(ProgressiveRendering.java:121)
at …

Reviewing the admin-monitors.md showed hundreds of thousands of records like

OldData Manage Old Data

  • Problematic object: folder/job #12345
    • CannotResolveClassException: libraryPath
  • Problematic object: otherfolder/otherjob #67890
    • CannotResolveClassException: libraryPath

The reported old data appears to trace to jenkinsci/pipeline-groovy-lib-plugin#202 (comment) which indeed deleted this (apparently) unused field.

The trigger for the problem seems to have been misuse of artifactNumToKeep. If you configure this build discarder option but do not set it to just slightly less than numToKeep, and your jobs have thousands of builds, then every hour when the background discarder runs it will be forced to load thousands of historical builds which have not been deleted yet but which might still have artifacts (even though only an hour’s worth of builds could still match). Thus if XStream reported a problem loading those old builds, they would be listed in the monitor. Now 681a8ff (#14915) ensured that the memory held in OldDataMonitor itself for each build is minimal (just folder/job#12345 in the example above), but jenkinsci/support-core-plugin@33475d1 means that every support bundle generated will also have to load all of those builds again—merely to call Run.toString on them! Taken together, these behaviors effectively defeat lazy loading—or worse, mean that builds do get evicted from heap during full GC cycles, but then soon reloaded, forcing more disk I/O.

At first I was considering exposing SaveableReference as an API for use by AdministrativeMonitors to avoid forcing build loading when just printing data to a support bundle. However after some reflection I realized that there is no compelling reason to save Runs to begin with. These are not configuration like global settings or jobs (or even users), so there is no need to “clean up” after a plugin upgrade; in fact it is unusual for a build.xml to be resaved after the build is complete (the only common use cases are adding a description manually or marking it keepLog). Most builds are deleted eventually by a build discarder anyway. Dropping the requirement to track Runs here simplifies the implementation considerably and bypasses any possible unbounded memory consumption.

CloudBees-internal issue

Testing done

OldDataMonitorTest passes locally 57×.

Screenshots (UI changes only)

Before

After

Proposed changelog entries

  • Stop retaining build references in the old data monitor.

Proposed changelog category

/label bug

Proposed upgrade guidelines

N/A

Desired reviewers

Before the changes are marked as ready-for-merge:

Maintainer checklist

  • There are at least two (2) approvals for the pull request and no outstanding requests for change.
  • Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
  • Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
  • Proper changelog labels are set so that the changelog can be generated automatically.
  • If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
  • If it would make sense to backport the change to LTS, be a Bug or Improvement, and either the issue or pull request must be labeled as lts-candidate to be considered.

Copilot AI review requested due to automatic review settings April 27, 2026 13:33
@comment-ops-bot comment-ops-bot Bot added bug For changelog: Minor bug. Will be listed after features labels Apr 27, 2026
r = rule;
}

@Disabled("constantly failing on CI builders, makes problems for memory()")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -144 to -146
@Issue("JENKINS-26718")
@Test
void unlocatableRun() throws Exception {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
@NonNull
static OldDataMonitor get(Jenkins j) throws IllegalStateException {
static OldDataMonitor get() throws IllegalStateException {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused since #22515 / #3240

private static void remove(Saveable obj, boolean isDelete) {
Jenkins j = Jenkins.get();
OldDataMonitor odm = get(j);
try (ACLContext ctx = ACL.as2(ACL.SYSTEM2)) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#21339 / #1796 specific to Runs

private static final Logger LOGGER = Logger.getLogger(OldDataMonitor.class.getName());

private ConcurrentMap<SaveableReference, VersionRange> data = new ConcurrentHashMap<>();
private ConcurrentMap<Saveable, VersionRange> data = new ConcurrentHashMap<>();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrency from #21369 / #1825

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts OldDataMonitor behavior to avoid retaining/loading Run (WorkflowRun) records, reducing memory pressure and disk I/O caused by repeatedly materializing historical builds when old-data entries are enumerated (e.g., for support bundles).

Changes:

  • Stop tracking/reporting Run instances in OldDataMonitor and simplify storage to a ConcurrentMap<Saveable, VersionRange>.
  • Simplify the discard handler signature and update call sites/tests accordingly.
  • Remove tests which asserted behavior around Run retention/unlocatable runs, and update remaining tests to the new API.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/src/main/java/hudson/diagnosis/OldDataMonitor.java Removes Run tracking and SaveableReference indirection; changes discard handler signature; simplifies getData() implementation.
test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java Updates usages of OldDataMonitor.get() / doDiscard() and removes Run-focused tests.
test/src/test/java/hudson/model/ViewTest.java Updates tests to call odm.doDiscard() with new signature.
test/src/test/java/hudson/model/ItemGroupMixInTest.java Updates tests to call odm.doDiscard() with new signature.
test/src/test/java/hudson/model/FreeStyleProjectTest.java Updates tests to call odm.doDiscard() with new signature.
test/src/test/java/hudson/model/ComputerTest.java Updates tests to call odm.doDiscard() with new signature.
Comments suppressed due to low confidence (1)

test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java:83

  • The Javadoc link still references the old doDiscard(StaplerRequest2, StaplerResponse2) signature, but doDiscard no longer takes request/response parameters. This should be updated to the new method signature to avoid broken Javadoc links / doclint warnings.
    /**
     * Note that this doesn't actually run slowly, it just ensures that
     * the {@link OldDataMonitor#changeListener}'s {@code onChange()} can complete
     * while {@link OldDataMonitor#doDiscard(org.kohsuke.stapler.StaplerRequest2, org.kohsuke.stapler.StaplerResponse2)}
     * is still running.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 323 to 324
*/
@RequirePOST
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing doDiscard from doDiscard(StaplerRequest2, StaplerResponse2) to a no-arg method removes a public method and is a binary-incompatible change for any plugins calling it directly. Consider keeping the old signature as a deprecated overload delegating to the new implementation (or vice versa) to preserve compatibility.

Suggested change
*/
@RequirePOST
*/
@Deprecated
@RequirePOST
public HttpResponse doDiscard(StaplerRequest2 req, StaplerResponse2 rsp) {
return doDiscard();
}
/**
* Save all files containing only unreadable data (no data upgrades), which discards this data.
* Remove those items from the data map.
*/
@RequirePOST

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

@jglick jglick Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an API. Only apparent usage from plugin tests is in https://github.com/jenkinsci/freestyle-project-plugin/blob/8e20b062fc37a0213c154475016cf2292f7bcfad/src/test/java/hudson/model/FreeStyleProjectTest.java#L208 which is just moved from core and outdated. I think this repo should be archived unless and until someone actually follows through with removal from core.

Comment on lines 144 to +148
public static void report(Saveable obj, String version) {
OldDataMonitor odm = get(Jenkins.get());
if (obj instanceof Run<?, ?>) {
return; // somewhat ephemeral
}
OldDataMonitor odm = get();
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj instanceof Run<?, ?> is now explicitly ignored, but there is no regression test asserting that OldDataMonitor.report(run, …) does not add an entry. Adding a small test would help prevent accidental reintroduction of run tracking (and the associated eager loading/memory issues).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but the code is sufficiently trivial that it seems improbable this clause would be deleted accidentally.

Comment thread core/src/main/java/hudson/diagnosis/OldDataMonitor.java Outdated
Copilot AI review requested due to automatic review settings April 27, 2026 21:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java
Comment thread test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java
@timja
Copy link
Copy Markdown
Member

timja commented Apr 28, 2026

/label ready-for-merge


This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

@comment-ops-bot comment-ops-bot Bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Apr 28, 2026
@jglick jglick closed this Apr 28, 2026
@jglick jglick reopened this Apr 29, 2026
@jglick
Copy link
Copy Markdown
Member Author

jglick commented Apr 29, 2026

(sorry, closed by accident)

@timja timja merged commit 9f083a1 into jenkinsci:master Apr 29, 2026
17 checks passed
@jglick jglick deleted the OldDataMonitor branch April 29, 2026 15:36
@MarkEWaite
Copy link
Copy Markdown
Contributor

MarkEWaite commented May 6, 2026

The plugin BOM testing of Jenkins 2.563 shows that parameterized trigger plugin test CapturedEnvironmentActionTest#onLoad has an assertion that seems to depend on OldDataMonitor saving Runs.

I'm one of the maintainers of the parameterized trigger plugin, so I can adjust the test to pass with new and old versions of Jenkins, but I'm wondering what the desired result should be. That test is checking SECURITY-2185, "Sensitive parameter values captured in build metadata files by Parameterized Trigger Plugin". Since the OldDataMonitor no longer records Run's, it can't be used to warn the user that one of their build records might include sensitive information.

Since we are intentionally not capturing Runs in OldDataMonitor any longer, should that test just be deleted? If not deleted, should it be adapted to only run the test when the Jenkins version is 2.562 or older?

@jglick, I'd appreciate your guidance on this one, since I feel "out of my league" on the topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug For changelog: Minor bug. Will be listed after features ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants