Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix NullPointerException in LazyBuildMixIn on jenkins reload#26399

Merged
timja merged 1 commit into
jenkinsci:masterfrom
dukhlov:npe-fix
Mar 4, 2026
Merged

Fix NullPointerException in LazyBuildMixIn on jenkins reload#26399
timja merged 1 commit into
jenkinsci:masterfrom
dukhlov:npe-fix

Conversation

@dukhlov
Copy link
Copy Markdown
Contributor

@dukhlov dukhlov commented Mar 3, 2026

Fixes #26397

This PR removes the lazy loading behavior of RunMap’s entry.getValue(). The value will now be resolved inside iterator.next() instead of in entry.getValue() itself.

The weak semantics will be preserved for keySet() iteration, allowing iteration without resolving values on each step. Values can still be resolved explicitly by calling RunMap.get(someKey).

Testing done

Only jenkins integration tests. It is hard to reproduce the issue

Screenshots (UI changes only)

Before

After

Proposed changelog entries

  • Partially revert optimisation in RunMap that causes issues when reloading

Proposed changelog category

/label regression-fix

Proposed upgrade guidelines

N/A

Submitter checklist

  • The issue, if it exists, is well-described.
  • The changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developers, depending on the change) and are in the imperative mood (see examples). Fill in the Proposed upgrade guidelines section only if there are breaking changes or changes that may require extra steps from users during upgrade.
  • There is automated testing or an explanation as to why this change has no tests.
  • New public classes, fields, and methods are annotated with @Restricted or have @since TODO Javadocs, as appropriate.
  • New deprecations are annotated with @Deprecated(since = "TODO") or @Deprecated(forRemoval = true, since = "TODO"), if applicable.
  • UI changes do not introduce regressions when enforcing the current default rules of Content Security Policy Plugin. In particular, new or substantially changed JavaScript is not defined inline and does not call eval to ease future introduction of Content Security Policy (CSP) directives (see documentation).
  • For dependency updates, there are links to external changelogs and, if possible, full differentials.
  • For new APIs and extension points, there is a link to at least one consumer.

Desired reviewers

@jglick

Before the changes are marked as ready-for-merge:

Maintainer checklist

  • There are at least two (2) approvals for the pull request and no outstanding requests for change.
  • Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
  • Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
  • Proper changelog labels are set so that the changelog can be generated automatically.
  • If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
  • If it would make sense to backport the change to LTS, be a Bug or Improvement, and either the issue or pull request must be labeled as lts-candidate to be considered.

@comment-ops-bot comment-ops-bot Bot added the regression-fix Pull request that fixes a regression in one of the previous Jenkins releases label Mar 3, 2026
Copy link
Copy Markdown
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sorting so quickly. I've tweaked the changelog entry a bit, not entirely happy with but I think its better than before.

@timja timja requested a review from jglick March 3, 2026 22:20
Copy link
Copy Markdown
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, is this reverting all or part of some specific prior PR?

@dukhlov
Copy link
Copy Markdown
Contributor Author

dukhlov commented Mar 3, 2026

To be clear, is this reverting all or part of some specific prior PR?

I would say that it's re-work of #11038

@timja
Copy link
Copy Markdown
Member

timja commented Mar 4, 2026

/label ready-for-merge


This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

@comment-ops-bot comment-ops-bot Bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Mar 4, 2026
@sxa
Copy link
Copy Markdown

sxa commented Mar 4, 2026

Thanks for getting this fix done so quickly. I'm not too familiar with your LTS build cut-off dates but assuming it gets merged this week is it likely to make it into 2.541.3 in a couple of weeks from now?

@timja
Copy link
Copy Markdown
Member

timja commented Mar 4, 2026

see #26397 (comment)

@timja timja merged commit ef7e6e6 into jenkinsci:master Mar 4, 2026
18 checks passed
@krisstern krisstern added the bug For changelog: Minor bug. Will be listed after features label Mar 8, 2026
meetgoti07 pushed a commit to meetgoti07/jenkins that referenced this pull request Mar 14, 2026
shalinisudarsan pushed a commit to shalinisudarsan/jenkins that referenced this pull request Mar 30, 2026
Comment on lines +168 to +171
// Iterate through keySet() instead of entrySet() or values() to avoid triggering lazy loading
// for the first `numToKeep` builds
runMap.keySet().stream().skip(numToKeep).map(runMap::get)
.filter(r -> r != null && !shouldKeepRun(r, lsb, lstb)).forEach(r -> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this skip does not actually work as advertised. In 2.555.x using RunLoadCounter it seems that even the first numToKeep builds are loaded:

hudson.model.Run.onLoad(Run.java:376)
hudson.model.RunMap.retrieve(RunMap.java:290)
hudson.model.RunMap.retrieve(RunMap.java:64)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:451)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:445)
jenkins.model.lazy.AbstractLazyLoadRunMap.resolveBuildRef(AbstractLazyLoadRunMap.java:371)
jenkins.model.lazy.AbstractLazyLoadRunMap$BuildReferenceMapAdapterResolver.resolveBuildRef(AbstractLazyLoadRunMap.java:528)
jenkins.model.lazy.BuildReferenceMapAdapter$KeySetAdapter.lambda$iterator$0(BuildReferenceMapAdapter.java:176)
hudson.util.Iterators$6.adapt(Iterators.java:337)
hudson.util.AdaptedIterator.next(AdaptedIterator.java:57)
com.google.common.collect.Iterators$5.computeNext(Iterators.java:674)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
jenkins.model.lazy.BuildReferenceMapAdapter$KeySetAdapter$1.tryAdvance(BuildReferenceMapAdapter.java:197)
java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:673)
java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:718)
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
hudson.tasks.LogRotator.perform(LogRotator.java:171)
hudson.model.Job.logRotate(Job.java:519)

Copy link
Copy Markdown
Contributor Author

@dukhlov dukhlov May 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current keySet() implementation returns core BuildReferenceMap keys filtered by resolver.isBuildRefResolvable(ref) == true.

@Override
public Iterator<Integer> iterator() {
return Iterators.removeNull(Iterators.map(
BuildReferenceMapAdapter.this.core.entrySet().iterator(), coreEntry -> {
BuildReference<R> ref = coreEntry.getValue();
return resolver.isBuildRefResolvable(ref) ? ref.number : null;
}));
}

isBuildRefResolvable resolves a build if the corresponding BuildReference has not been resolved before (isSet() == false).

/**
* Checks if the reference can be resolved. If the reference was already resolved,
* returns the cached status. Otherwise, attempts to resolve it and returns the result.
*
* @param buildRef the reference to check
* @return {@code true} if the reference points to a valid build
*/
@Override
public boolean isBuildRefResolvable(BuildReference<R> buildRef) {
return buildRef != null &&
(buildRef.isSet() ? !buildRef.isUnloadable() : this.resolveBuildRef(buildRef) != null);
}
}

So the first iteration over RunMap.keySet() (or the first one after reload) may trigger BuildReference resolution.

I consider this a reasonable compromise between optimization and consistency.

I'm not sure about all possible use cases. We could make it more consistent by resolving build references on every iteration, or relax consistency further by simply returning the core BuildReferenceMap keys as-is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong opinion at this point, but at least the existing comment

avoid triggering lazy loading for the first numToKeep builds

does not seem to be true as written.

Context: analyzing a severe performance problem reported by a CloudBees CI customer (running in high availability mode, though I do not believe it matters here) and examining FINEST stack traces from RunMap I found that hundreds of thousands of build records were being loaded by LogRotator when iterating artifactNumToKeep, because (I inferred) this was set to a value far lower than numToKeep (or numToKeep was not configured at all). Imagine you have a job whose last build is number 1000. numToKeep is set to 100 while artifactNumToKeep is set to 10. So the existing builds will be 901–1000, of which only 991–1000 have artifacts while 901–990 still exist but have no artifacts. Now say you run builds 1001, 1002, and 1003, and then LogRotator runs due to the hourly background build discarder. So it should process numToKeep by skipping over the last 100, examining 901, 902, and 903, and deleting each. Fine so far. Then it processes artifactNumToKeep by skipping over the last 10, examining 904–993. It will delete artifacts from 991, 992, and 993 (forcing them to be loaded, fine); but it will also load 904–990 into memory only to find that they did not have any artifacts (they were deleted long ago). This is a serious performance problem. Without some optimization in core, my conclusion is that you just should not configure artifactNumToKeep on a job with a lot of builds (unless it is nearly as big as numToKeep) because you will be constantly checking builds for nonexistent artifacts and increasing heap (and I/O and CPU).

To confirm my suspicion, I wrote a test using RunLoadCounter.countLoads. But to my surprise, the performance was even worse than predicted: the numToKeep loop loads not just 901–903 but 901–1003.

Copy link
Copy Markdown
Contributor Author

@dukhlov dukhlov May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong opinion at this point, but at least the existing comment

Before my change it made a performance issue for our setup as well. I made a minimal improvement of the performace to fix the performance problem of my setup.

But to my surprise, the performance was even worse than predicted: the numToKeep loop loads not just 901–903 but 901–1003.

yes, but just first time after loading a RunMap, during next-hour log rotator job it won't do this. It was enough for my usecase

avoid triggering lazy loading for the first numToKeep builds

agree, better to correct the comment (or change a behaviour)

Without some optimization in core, my conclusion is that you just should not configure artifactNumToKeep

yes, we also decided not to use it, we can try to keep mandatory fields in memory (like start/end datetime, artifacts number, shouldKeep etc as BuildReference fields), should be easy for readonly fields, but for field like shouldKeep, which can be changed after build completion it is harder (possibly can use a ReferenceQueue and store the mandatory build's data in BuildReference object during GCing the build or find out some other way)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong opinion at this point, but at least the existing comment

Please decide what to do (fix the comment or align the code with the comment), I can create a PR for this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just first time after loading a RunMap, during next-hour log rotator job it won't do this

Even if the SoftReferences are cleared? In the case I found, even loading the old builds once per session was bad enough at that scale, in conjunction with OldDataMonitor. Now I remember that I already wrote up the problem in #26711.

we can try to keep mandatory fields in memory

The trouble (for this case) is that currently there is no API to determine given a Job × number whether artifacts exist except by loading the Run, in the general case that a plugin like artifact-manager-s3 is in use:

* Delete all artifacts associated with an earlier build (if any).
* @return true if there was actually anything to delete
via
public abstract @CheckForNull ArtifactManager managerFor(Run<?, ?> build);
I suppose LogRotator could write some placeholder file .artifacts-deleted to avoid reprocessing builds (at the expense of a little I/O).

decide what to do (fix the comment or align the code with the comment), I can create a PR for this

I guess fix the comment for now, since I do not have a clear idea of what to improve in the code. Thanks!

@dukhlov
Copy link
Copy Markdown
Contributor Author

dukhlov commented May 14, 2026

Even if the SoftReferences are cleared?
yes, but when you iterate over keySet()

@Override
public boolean isBuildRefResolvable(BuildReference<R> buildRef) {
return buildRef != null &&
(buildRef.isSet() ? !buildRef.isUnloadable() : this.resolveBuildRef(buildRef) != null);
}
}

isSet returns true even if SoftReferences is cleared

I suppose LogRotator could write some placeholder file .artifacts-deleted to avoid reprocessing builds (at the expense of a little I/O).

yes, it is possible way to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug For changelog: Minor bug. Will be listed after features ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback regression-fix Pull request that fixes a regression in one of the previous Jenkins releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NullPointerException in LazyBuildMixIn on jenkins reload

6 participants