Oct 2, 2021

Fixing a Simple JRuby Bug

A little over a week ago, JRuby 9.3 was released which is kind of a major thing for the project. Unbeknownst to us, our GitHub Actions spec tests for Facter were running with jruby-head, meaning that the workflows silently switched to testing with the newly released JRuby. Shortly after, some of our tests started to fail. But first, some background.

What is JRuby?

JRuby is an implementation of the Ruby programming language atop the Java Virtual Machine, written largely in Java.

The Puppet Agent and Facter run under MRI, the standard C Ruby implementation, but there are some puppetlabs projects which actively use JRuby.

Why JRuby?

I just said that Facter runs under CRuby, so why do we run our tests on JRuby? The answer is: puppetserver.

Like its name suggests, Puppet Server is an application that compiles configurations for nodes, and serves them to the agents. Historically, the server has been a WEBrick/Passenger Ruby application, but due to poor performance under heavy loads it was since rewritten and now runs on the Java Virtual Machine. The server is mostly a Clojure application now, but Ruby is still an important part of it, especially since critical parts of the puppet code are executed both by the server and the agent. So, what better way to run Ruby in Java than JRuby?

With this in mind, and because Facter is a tight dependency of Puppet, we have to validate that the project behaves properly when ran under JRuby.

The problem

Some time this week, our Facter PR spec tests started failing on JRuby. At first we thought the failures were transient—this often happens for JRuby, but we get them predominantly on Puppet, not Facter. We rekicked the jobs, they failed again. The error messages were not suggestive at all, so we thought we’d rekick the jobs some more before actually starting to investigate what happened.

Unluckily enough, the same tests were failing over and over again. Here’s what a failing test looked like:

  4) Facter#fact when there is a resolved fact with type nil returns nil
     Failure/Error: value = fact_collection.value(user_query)

     ArgumentError:
       wrong number of arguments calling `initialize` (given 0, expected 1..3)
     # ./lib/facter.rb:569:in `resolve_fact'
     # ./lib/facter.rb:430:in `fact'
     # ./spec/facter/facter_spec.rb:279:in `block in <main>'

Now, whether you know about Facter or not, this functionality is pretty important; it handles retrieving information from the fact collection, which is like the central point of Facter.

There were 6 failures in total, all of them complaining about the same thing:

ArgumentError: wrong number of arguments calling `initialize` (given 0, expected 1..3)

Who was calling initialize? And on what?

The cause

Jumping to the location of the code in question, we can narrow things down to the following:

begin
  value = fact_collection.value(user_query)
  add_fact_to_searched_facts(user_query, value)
rescue KeyError, TypeError
  nil
end

The block rescues KeyError and TypeError, but our actual exception is an ArgumentError. Let’s dig deeper by checking the FactCollection#value method.

def value(user_query)
  fetch(user_query) do
    split_user_query = Facter::Utils.split_user_query(user_query)
    split_user_query.reduce(self) do |memo, key|
      raise KeyError unless memo.respond_to?(:fetch)

      memo.fetch(key) { memo.fetch(key.to_s) }
    end
  end
end

What the heck? There’s nothing calling initialize on anything here. It’s pry time. I set a breakpoint in the #value method and ran a specific failing test, then attempted to execute everything in that method:

From: /home/gabi/repo/facter/lib/facter/models/fact_collection.rb @ line 50 Facter::FactCollection#value:

    45: def value(user_query)
    46:   require 'pry'; binding.pry
    47:   fetch(user_query) do
    48:     split_user_query = Facter::Utils.split_user_query(user_query)
    49:     split_user_query.reduce(self) do |memo, key|
 => 50:       raise KeyError unless memo.respond_to?(:fetch)
    51:
    52:       memo.fetch(key) { memo.fetch(key.to_s) }
    53:     end
    54:   end
    55: end

[1] pry(#<Facter::FactCollection>)> raise KeyError unless memo.respond_to?(:fetch)
=> nil
[2] pry(#<Facter::FactCollection>)> raise KeyError
ArgumentError: wrong number of arguments calling `initialize` (given 0, expected 1..3)
[3] pry(#<Facter::FactCollection>)> KeyError.new
ArgumentError: wrong number of arguments calling `initialize` (given 0, expected 1..3)
from org/jruby/RubyClass.java:878:in `new'

And there you have it, something went wrong with the KeyError exception which could no longer be initialized without arguments. As a sanity check I attempted the same on CRuby:

irb(main):001:0> raise KeyError
Traceback (most recent call last):
        4: from /home/gabi/.rbenv/versions/2.7.1/bin/irb:23:in `<main>'
        3: from /home/gabi/.rbenv/versions/2.7.1/bin/irb:23:in `load'
        2: from /home/gabi/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/irb-1.2.3/exe/irb:11:in `<top (required)>'
        1: from (irb):1
KeyError (KeyError)
irb(main):002:0> KeyError.new
=> #<KeyError: KeyError>
irb(main):003:0> RUBY_VERSION
=> "2.7.1"

The reason

At this point I didn’t know for sure, but I suspected something changed within JRuby since this definitely worked with the latest JRuby 9.2 release. I found this commit which changed some stuff within the KeyError class and went in the 9.3.0.0 release. I don’t know Java so I have no idea whether or not this is the actual culprit.

I searched through the pull requests and issues on GitHub, and wasn’t able to find anything related to my issue. After all, 9.3 had been out for only a week or so. Then I said, jokingly, what if I fixed this?

The fix

I have no experience whatsoever in Java, I always associated the language with guys in suits working at large enterprises, and I always preferred the conciseness of Ruby. Java was… long.

Either way, I decided to shoot my shot. Building JRuby is quite easy and the compile/test feedback loop is fast so I assured myself that I’d be able to quickly iterate on the dumb code I was about to write.

I discovered that there was a test suite pulled from CRuby which validated the behavior of the KeyError exception. I ran it. There were 3 failures, and this is before I started writing any code. Exciting!

I went ahead and implemented an initialize method with no arguments in the RubyKeyError class:

@JRubyMethod
public IRubyObject initialize(ThreadContext context) {
    return context.nil;
}

I ran the test suite again:

$ jruby test/mri/runner.rb test/mri/ruby/test_key_error.rb
Run options:

# Running tests:

[3/6] TestKeyError#test_key = 0.01 s
  1) Failure:
TestKeyError#test_key [/home/gabi/repo/jruby/test/mri/ruby/test_key_error.rb:26]:
ArgumentError expected but nothing was raised.

[5/6] TestKeyError#test_receiver = 0.00 s
  2) Failure:
TestKeyError#test_receiver [/home/gabi/repo/jruby/test/mri/ruby/test_key_error.rb:19]:
ArgumentError expected but nothing was raised.

Finished tests in 0.140380s, 42.7411 tests/s, 142.4704 assertions/s.
6 tests, 20 assertions, 2 failures, 0 errors, 0 skips

ruby -v: jruby 9.3.1.0-SNAPSHOT (2.6.8) 2021-10-02 e793316754 OpenJDK 64-Bit Server VM 11.0.12+7 on 11.0.12+7 +jit [linux-x86_64]

So, initializing a KeyError with no arguments worked now, but there were still some inconsistencies between JRuby and CRuby. Both rubies allow you to initialize a KeyError with a key and receiver. If you attempt to query one of those and they are not set, you get an ArgumentError in CRuby:

irb(main):001:0> KeyError.new.key
...
ArgumentError (no key is available)

irb(main):002:0> KeyError.new.receiver
...
ArgumentError (no receiver is available)

Compare this with JRuby:

irb(main):001:0> KeyError.new.key
=> nil
irb(main):002:0> KeyError.new.receiver
=> nil

The JRuby implementation mistakenly sets the variables to nil if uninitialized, so I set them to the Java null instead, and for the #key and #receiver methods themselves, I added a null check where I threw the ArgumentError. This is how the impacted code looks now (my changes are highlighted):

private IRubyObject initializeCommon(ThreadContext context, IRubyObject message, IRubyObject[] receiverKey) {
    IRubyObject receiver;
    IRubyObject key;
    if (receiverKey == null) {
        receiver = null;
        key = null;
    } else {
        receiver = receiverKey[0];
        key = receiverKey[1];
    }

    setMessage(message);
    this.receiver = receiver;
    this.key = key;

    return context.nil;
}

@JRubyMethod
public IRubyObject receiver() {
    if (receiver == null) {
        throw getRuntime().newArgumentError("no receiver is available");
    }
    return receiver;
}

@JRubyMethod
public IRubyObject key() {
    if (key == null) {
        throw getRuntime().newArgumentError("no key is available");
    }
    return key;
}

I reran the test suite and everything passed now! It was peculiar to me that the tests were failing from the get-go, so I correctly assumed that they were not running in JRuby’s CI. The rake task which executes MRI tests collects files to run from a file, and this specific suite wasn’t there. I added it, and then I was able to properly run the tests with jruby -S rake test:mri.

I opened a pull request which was promptly merged, so I guess this was my first open-source contribution to a Java project, which to my luck happened to be JRuby.

The moral of this story is: don’t let your dreams be dreams, if you see a bug that seems simple enough to fix, try to do it (even if it’s Java and you have 0 experience with the language).