Ruby, Facter and Windows Sockets

Part of my talk at the first Timișoara Open Source Meetup was about Ruby and embedding the Ruby interpreter in a C++ library. This is what one of our most popular open-source Puppet projects does. The tool is called facter and, as its name suggests, it queries facts about your system.

What’s a fact?

Anything about your system, basically. Disks, mountpoints, operating system, network interfaces, SSH keys, EC2/GCE instances, etc. Facter can render these facts in various serialization formats. This is how the system_uptime fact looks like, in JSON format:

$ facter --json system_uptime
{
  "system_uptime": {
    "days": 0,
    "hours": 1,
    "seconds": 4322,
    "uptime": "1:12 hours"
  }
}

You can access the elements inside a structured fact by using the dot notation:

$ facter system_uptime.seconds
4322

Apart from the standalone executable, Facter also ships a shared library to be used from Ruby. Something like a native extension, but with a cherry on top. Instead of delegating methods to C code, Facter actually embeds the Ruby interpreter inside C++ and evaluates code with it.

So, after installing Facter you can do something like this in Ruby:

# requires facter.rb which is actually
# a wrapper for the underlying .so/.dll library
require 'facter'

puts Facter::Core::Execution.execute("ls /").split.join(" ")

This should output something like the following (on Linux at least):

bin boot dev etc home lib lib64 lost+found mnt opt proc root run sbin srv sys tmp usr var

So far, this looks like regular native extension behavior (apart from everything being actually written in C++). The part where everything goes to Funkytown is when Facter uses the executable to load and evaluate Ruby code.

Custom facts!

Not to be confused with external facts.

Custom facts are written in Ruby and evaluated by Facter during its run. Here’s a fact that should return the UNIX time when queried:

Facter.add('epoch') do
  time = Time.now.to_i
  setcode { time }
end

Assuming you saved the fact in the current directory, query it with:

$ FACTERLIB=$PWD facter epoch
1575294364

You’re probably getting the gist of what’s going on. We’re not manually invoking the Ruby interpreter now. Facter takes care of that for us. It sees that we’re querying a non-core1 fact and it tries to resolve it in other ways. First it tries to resolve it as a custom fact (Ruby), then it goes through external facts (if applicable).

In broad strokes, Facter needs to know a couple of things to sort this out. First it needs to evaluate the Facter.add call (source), then it should set the fact value to what results from calling the setcode block (source).

Since the Ruby C API is pretty saucy, Facter relies on Leatherman (another Puppet component) which rewraps the Ruby API in a more easy-to-use way.

If you got to this point, you’re probably asking yourself: why?. In broader terms you can call this is a re-implementation of the ruby interpreter. Why not write everything in ruby then? Actually… Facter was previously written in Ruby, then due to performance reasons it was rewritten in C++11. The gnarly part is that once you shipped a stable version of your application to a ton of people, when you decide to rewrite it in a different programming language you have to re-implement all the features. In conclusion, this C++ application must also be a Ruby application, for the sake of backwards compatibility. However, five years later, our team is working to reimplement Facter in Ruby again. This time faster! With concurrency! And threads!

Getting over the history lesson, let’s get to the issue in the title.

Facter on Windows

An issue was present on Facter, where a custom fact that seemingly opened URIs would fail with a segmentation fault on Windows. The same fact would cleanly resolve on Linux.2

The custom fact looks something like this:

require 'open-uri'
require 'json'
require 'timeout'

Facter.add('test') do
  response = nil
  begin
    url = 'https://api.ipify.org?format=json'
    Timeout::timeout(4) do
      response = open(url).read
    end
  rescue
    nil
  end
  if !response.to_s.empty?
    result = JSON.parse(response)
    setcode do
      result['ip']
    end
  end
end

It’s fairly simple to parse, and it should evaluate to something like this (provided it works):

$ FACTERLIB=$PWD facter test
85.204.138.112

On Windows systems the evaluation would fail in a non-graceful way:

C:/puppet/lib/ruby/2.5.0/net/protocol.rb:45: [BUG] Segmentation fault
ruby 2.5.3p105 (2018-10-18 revision 65156) [x64-mingw32]

-- Control frame information -----------------------------------------------
c:0024 p:---- s:0165 e:000164 CFUNC  :wait_readable
c:0023 p:0093 s:0160 e:000159 METHOD C:/puppet/lib/ruby/2.5.0/net/protocol.rb:45
c:0022 p:0557 s:0153 E:001568 METHOD C:/puppet/lib/ruby/2.5.0/net/http.rb:981
c:0021 p:0004 s:0140 e:000139 METHOD C:/puppet/lib/ruby/2.5.0/net/http.rb:920
c:0020 p:0029 s:0136 e:000135 METHOD C:/puppet/lib/ruby/2.5.0/net/http.rb:909
c:0019 p:0521 s:0132 e:000131 METHOD C:/puppet/lib/ruby/2.5.0/open-uri.rb:337
c:0018 p:0017 s:0111 e:000110 METHOD C:/puppet/lib/ruby/2.5.0/open-uri.rb:755
c:0017 p:0029 s:0104 e:000103 BLOCK  C:/puppet/lib/ruby/2.5.0/open-uri.rb:226 [FINISH]
c:0016 p:---- s:0101 e:000100 CFUNC  :catch
c:0015 p:0365 s:0096 E:0010e8 METHOD C:/puppet/lib/ruby/2.5.0/open-uri.rb:224
c:0014 p:0328 s:0081 e:000080 METHOD C:/puppet/lib/ruby/2.5.0/open-uri.rb:165
c:0013 p:0018 s:0069 e:000068 METHOD C:/puppet/lib/ruby/2.5.0/open-uri.rb:735
c:0012 p:0071 s:0063 e:000062 METHOD C:/puppet/lib/ruby/2.5.0/open-uri.rb:35
c:0011 p:0007 s:0055 e:000054 BLOCK  C:/cygwin64/home/Administrator/facts/ip.rb:10
c:0010 p:0030 s:0052 E:000948 BLOCK  C:/puppet/lib/ruby/2.5.0/timeout.rb:93
c:0009 p:0005 s:0046 e:000045 BLOCK  C:/puppet/lib/ruby/2.5.0/timeout.rb:33 [FINISH]
c:0008 p:---- s:0043 e:000042 CFUNC  :catch
c:0007 p:0044 s:0038 e:000037 METHOD C:/puppet/lib/ruby/2.5.0/timeout.rb:33
c:0006 p:0113 s:0032 E:000650 METHOD C:/puppet/lib/ruby/2.5.0/timeout.rb:108
c:0005 p:0021 s:0020 E:000748 BLOCK  C:/cygwin64/home/Administrator/facts/ip.rb:9 [FINISH]
c:0004 p:---- s:0014 e:000013 CFUNC  :instance_eval
c:0003 p:---- s:0011 e:000010 CFUNC  :add
c:0002 p:0034 s:0006 E:000790 TOP    C:/cygwin64/home/Administrator/facts/ip.rb:5 [FINISH]
c:0001 p:0000 s:0003 E:000640 (none) [FINISH]

Finding the problem

We currently provide AIO puppet packages with different versions of ruby. The older versions (5.5.x stream) use ruby 2.4, while newer versions use ruby 2.5. One of the first things we tested for was whether this issue occurs on all of our release streams, and we concluded that this only happens for ruby versions equal to or greater than 2.5.

Checking what changed between our passing and broken versions would be tedious, but the number of commits on the Windows tree seemed reasonable enough for us to manually check them.

$ git diff --stat v2_4_5..v2_5_3
...
6101 files changed, 340476 insertions(+), 79434 deletions(-)

$ git rev-list --count v2_4_5..v2_5_3 -- win32/
60

We found something interesting shortly after. Commit win32.c: vm_exit_handler · ruby/ruby@e33b169

diff --git a/win32/win32.c b/win32/win32.c
index 38871931160c..c6183091fb75 100644
--- a/win32/win32.c
+++ b/win32/win32.c
@@ -689,7 +690,7 @@ rtc_error_handler(int e, const char *src, int line, const char *exe, const char
 #endif

 static CRITICAL_SECTION select_mutex;
-static int NtSocketsInitialized = 0;
+#define NtSocketsInitialized 1
 static st_table *socklist = NULL;
 static st_table *conlist = NULL;
 #define conlist_disabled ((st_table *)-1)

Going from a static variable to a hardcoded 1 looks like a significant change to me. We immediately compiled ruby with the above hunk reverted and the crash was gone!

To delve into details a bit, the NtSocketsInitialized variable was referenced more than 30 times in that file, and most of them looked like this:

if (!NtSocketsInitialized) {
    StartSockets();
}

Obviously, given that NtSocketsInitialized is now 1, and !1 in C-land means 0, that code is not executed anymore.

Looking at the StartSockets function, we notice that it’s responsible to start Windows Sockets (Winsock API).

/* License: Artistic or GPL */
static void
StartSockets(void)
{
    WORD version;
    WSADATA retdata;

    //
    // initialize the winsock interface and insure that it's
    // cleaned up at exit.
    //
    version = MAKEWORD(2, 0);
    if (WSAStartup(version, &retdata))
	rb_fatal("Unable to locate winsock library!");
    if (LOBYTE(retdata.wVersion) != 2)
	rb_fatal("could not find version 2 of winsock dll");

    InitializeCriticalSection(&select_mutex);

    NtSocketsInitialized = 1;
}

On Windows, the Winsock API is required for anything network-based. In other words, you can’t open and read a webpage without loading the Winsock DLL. That explained why only Net::HTTP and open-uri calls failed in our case.

After NtSocketsInitialized changed, the only place where StartSockets was still called in the code was the rb_w32_sysinit function.

//
// Initialization stuff
//
/* License: Ruby's */
void
rb_w32_sysinit(int *argc, char ***argv)
{

    ...

    get_version();

    //
    // subvert cmd.exe's feeble attempt at command line parsing
    //
    *argc = w32_cmdvector(GetCommandLineW(), argv, CP_UTF8, &OnigEncodingUTF_8);

    //
    // Now set up the correct time stuff
    //
    tzset();
    init_env();
    init_stdhandle();
    atexit(exit_handler);

    // Initialize Winsock
    StartSockets();
}

As you can probably see, this function does a lot of other things besides initializing Windows Sockets. Looking at the parent function definition, we shouldn’t necessarily call this function if we’re embedding the interpreter, we should rather do our own initialization.

/*! Initializes the process for ruby(1).
 *
 * This function assumes this process is ruby(1) and it has just started.
 * Usually programs that embeds CRuby interpreter should not call this function,
 * and should do their own initialization.
 */
void
ruby_sysinit(int *argc, char ***argv)

The problem is, we were doing our own initialization. To summarize this behavior in a few lines of C, we got the development headers and the compiled ruby library and ran some test programs.

Reproducing the problem

The working version

We first compiled and ran a basic program, linking with Ruby 2.4. The actual code lines are highlighted and error checking is skipped for brevity.

#include <ruby.h>

int main()
{
    ruby_init(); // sets up some basic things
    char* options[] = { "-v", "-e", "" };
    ruby_options(3, options); // sets up more stuff

    rb_require("open-uri");
    rb_eval_string("puts open('http://api.ipify.org?format=json').read");
    return 0;
}

Sure enough, this errors due to us not setting up Winsock:

Traceback (most recent call last):
        15: from -e:1:in `<main>'
        14: from C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:35:in `open'
... blah ...
         2: from C:/tools/ruby26/lib/ruby/2.6.0/timeout.rb:93:in `block in timeout'
         1: from C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:946:in `block in connect'
C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:949:in `rescue in block in connect':
Failed to open TCP connection to api.ipify.org:80 (getaddrinfo: Either the application has not called WSAStartup, or WSAStartup failed.) (SocketError)

We call WSAStartup which should start Windows Sockets, provided we included the required header file:

#include <ruby.h>
#include <Winsock2.h>

int main()
{
    // Winsock initialization (we do this in Facter)
    WSADATA wsaData;
    int wVersionRequested = MAKEWORD(2, 2);
    int err = WSAStartup(wVersionRequested, &wsaData);

    ruby_init(); // sets up some basic things
    char* options[] = { "-v", "-e", "" };
    ruby_options(3, options); // sets up more stuff

    rb_require("open-uri");
    rb_eval_string("puts open('http://api.ipify.org?format=json').read");
    return 0;
}
{"ip":"192.69.65.12"}

Lo and behold, it works! We managed to reproduce our working case in its most basic form. However things look different with a newer version of ruby.

Note

You can try this yourself by getting a Puppet 5 nightly build from here, which should contain the Ruby 2.4 library and headers.

Assuming you saved the above C code in a test.c, these are the CFLAGS/LDFLAGS to correctly compile it under Command Prompt:

C:\>gcc -I"C:\Program Files\Puppet Labs\Puppet\sys\ruby\include\ruby-2.4.0" ^
-I"C:\Program Files\Puppet Labs\Puppet\sys\ruby\include\ruby-2.4.0\x64-mingw32" ^
-L"C:\Program Files\Puppet Labs\Puppet\sys\ruby\bin" ^
-lx64-msvcrt-ruby240 -lwsock32 test.c

Then just run it with a.exe.

Getting it to fail

Progress! Now let’s see how this behaves with a newer version of Ruby, which makes Facter with Puppet 6 segfault.

As with the previous try, we get some Ruby libraries and headers. I chose Ruby 2.6 but you can also get 2.5 from RubyInstaller (make sure you choose a version with devkit).

Compiling the same program with Ruby 2.6 instead renders us this:

C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:217: [BUG] Segmentation fault
ruby 2.6.5p114 (2019-10-01 revision 67812) [x64-mingw32]

-- Control frame information -----------------------------------------------
c:0022 p:---- s:0157 e:000156 CFUNC  :wait_readable
c:0021 p:0122 s:0152 e:000151 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:217
c:0020 p:0014 s:0145 e:000144 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:191
c:0019 p:0006 s:0138 e:000137 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:201
c:0018 p:0005 s:0134 e:000133 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/http/response.rb:40
c:0017 p:0006 s:0127 e:000126 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/http/response.rb:29
c:0016 p:0041 s:0118 e:000117 BLOCK  C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:1509 [FINISH]
c:0015 p:---- s:0115 e:000114 CFUNC  :catch
c:0014 p:0017 s:0110 e:000109 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:1506
c:0013 p:0060 s:0102 e:000101 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:1479
c:0012 p:0062 s:0094 e:000093 BLOCK  C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:343
c:0011 p:0033 s:0088 e:000087 METHOD C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:920
c:0010 p:0526 s:0084 e:000083 METHOD C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:337
c:0009 p:0017 s:0063 e:000062 METHOD C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:756
c:0008 p:0029 s:0056 e:000055 BLOCK  C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:226 [FINISH]
c:0007 p:---- s:0053 e:000052 CFUNC  :catch
c:0006 p:0371 s:0048 E:000d28 METHOD C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:224
c:0005 p:0330 s:0033 e:000032 METHOD C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:165
c:0004 p:0019 s:0021 e:000020 METHOD C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:736
c:0003 p:0073 s:0015 e:000014 METHOD C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:35
c:0002 p:0007 s:0007 e:000005 EVAL   eval:1 [FINISH]
c:0001 p:0000 s:0003 E:0008e0 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
eval:1:in `<main>'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:35:in `open'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:736:in `open'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:165:in `open_uri'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:224:in `open_loop'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:224:in `catch'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:226:in `block in open_loop'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:756:in `buffer_open'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:337:in `open_http'
C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:920:in `start'
C:/tools/ruby26/lib/ruby/2.6.0/open-uri.rb:343:in `block in open_http'
C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:1479:in `request'
C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:1506:in `transport_request'
C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:1506:in `catch'
C:/tools/ruby26/lib/ruby/2.6.0/net/http.rb:1509:in `block in transport_request'
C:/tools/ruby26/lib/ruby/2.6.0/net/http/response.rb:29:in `read_new'
C:/tools/ruby26/lib/ruby/2.6.0/net/http/response.rb:40:in `read_status_line'
C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:201:in `readline'
C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:191:in `readuntil'
C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:217:in `rbuf_fill'
C:/tools/ruby26/lib/ruby/2.6.0/net/protocol.rb:217:in `wait_readable'

-- C level backtrace information -------------------------------------------
C:\WINDOWS\SYSTEM32\ntdll.dll(NtWaitForSingleObject+0x14) [0x00007ffe5f08ae54]
C:\WINDOWS\System32\KERNELBASE.dll(WaitForSingleObjectEx+0x8e) [0x00007ffe5caa26de]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_bugreport+0x2eb) [0x000000006a680c4b]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_bug_context+0x70) [0x000000006a4ce9f0]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_check_safe_obj+0x4d0) [0x000000006a5e4460]
 [0x00000000004024a6]
C:\WINDOWS\System32\msvcrt.dll(_C_specific_handler+0x98) [0x00007ffe5e2d7ff8]
C:\WINDOWS\SYSTEM32\ntdll.dll(_chkstk+0x11f) [0x00007ffe5f09010f]
C:\WINDOWS\SYSTEM32\ntdll.dll(RtlRaiseException+0x434) [0x00007ffe5f03b4e4]
C:\WINDOWS\SYSTEM32\ntdll.dll(KiUserExceptionDispatcher+0x2e) [0x00007ffe5f08ec3e]
C:\WINDOWS\SYSTEM32\ntdll.dll(RtlWaitOnAddress+0xc6) [0x00007ffe5f035e86]
C:\WINDOWS\SYSTEM32\ntdll.dll(RtlEnterCriticalSection+0x214) [0x00007ffe5f0115b4]
C:\WINDOWS\SYSTEM32\ntdll.dll(RtlEnterCriticalSection+0x42) [0x00007ffe5f0113e2]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_w32_select_with_thread+0xc5d) [0x000000006a693f0d]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_hrtime_now+0x383) [0x000000006a62cc73]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_ensure+0x12b) [0x000000006a4d832b]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_thread_fd_select+0x141) [0x000000006a62ebb1]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_thread_fd_select+0x338) [0x000000006a62eda8]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_ensure+0x12b) [0x000000006a4d832b]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_wait_for_single_fd+0xb8) [0x000000006a62ef28]
 [0x00000000706c19f2]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_error_arity+0x131) [0x000000006a65df51]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_invoke_bmethod+0xd01) [0x000000006a6685f1]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_check_funcall+0xa02) [0x000000006a673102]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_exec+0x9f5) [0x000000006a667465]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_yield_1+0x64f) [0x000000006a66a64f]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_check_block_call+0x19d) [0x000000006a66216d]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_catch+0xa1) [0x000000006a6623d1]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_error_arity+0x131) [0x000000006a65df51]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_invoke_bmethod+0xd01) [0x000000006a6685f1]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_check_funcall+0xaa5) [0x000000006a6731a5]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_exec+0x9f5) [0x000000006a667465]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_yield_1+0x64f) [0x000000006a66a64f]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_check_block_call+0x19d) [0x000000006a66216d]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_catch+0xa1) [0x000000006a6623d1]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_error_arity+0x131) [0x000000006a65df51]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_invoke_bmethod+0xd01) [0x000000006a6685f1]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_check_funcall+0xaa5) [0x000000006a6731a5]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_exec+0x21f) [0x000000006a666c8f]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_vm_invoke_bmethod+0x1a82) [0x000000006a669372]
C:\tools\ruby26\bin\x64-msvcrt-ruby260.dll(rb_eval_string+0x45) [0x000000006a6696a5]
 [0x00000000004015ec]
 [0x00000000004013b4]
 [0x000000000040150b]
C:\WINDOWS\System32\KERNEL32.DLL(BaseThreadInitThunk+0x14) [0x00007ffe5e0f6fd4]
...

Look familiar? It’s extremely similar to the segfault we were getting with Facter. Even now I’m not sure whether this is a bug in Ruby or not. Obviously, initializing Windows Sockets ourselves won’t do the job, so we decided to use Ruby’s function which mostly does the same things… remember StartSockets? Unfortunately Ruby does not export that function by itself, so ultimately we call the function that calls the function that calls StartSockets.

After some really weird C type casting, our code looked like this:

#include <ruby.h>

int main()
{
    ruby_init(); // sets up some basic things
    char* options[] = { "-v", "-e", "" };
    ruby_options(3, options); // sets up more stuff

    int sysinit_opts_size = 1;
    char const* sysinit_opts[] = {
        "ruby"
    };

	// This calls StartSockets (after doing some unnecessary setup)
    ruby_sysinit(&sysinit_opts_size, (char***)(&sysinit_opts));

    rb_require("open-uri");
    rb_eval_string("puts open('http://api.ipify.org?format=json').read");
    return 0;
}

Again, compile and run:

{"ip":"213.233.104.44"}

Surprisingly, this works. Even though the ruby_sysinit method does other setup, like timezone, environment, extra command line parsing, it seems to Work Fine™ and not break things (that we know of, at least).

Looking some more at commit win32.c: vm_exit_handler · ruby/ruby@e33b169 and at the segmentation fault calltrace, this might be some kind of use after free issue. However, the CRuby source still looks very cryptic to me, and I wish I had more C debugging knowledge to properly pursue this, since it definitely looks fun.

Ultimately, we decided to call ruby_sysinit in our code too, and everything is well again. We also left the code which initializes Windows Sockets for backwards compatibility with different combinations of Facter/Ruby versions. I also submitted a pull request to Ruby, helping them by removing dead code, which I was surprised to see get merged. Even though I didn’t do anything of note, it was satisfying to see all that debugging and Windows spelunking pay out this way.

During this journey we learned a lot of things:

  • there’s not a lot of people swarming around the Ruby implementation
  • the Ruby C API is not that well documented and changes a lot, which is why if you’re looking to understand something it’s not a bad idea to check out the code yourself
  • due to the extremely high amount of commits and lines changed in Ruby, GitHub user nobu may or may not be an actual human being (relevant talk)

TL;DR Ruby no longer likes us initializing Windows Sockets, so we have to let it do the initialization itself


  1. not in the C++ implementation ↩︎

  2. FACT-1936: facter -p segfaults with facts that call open-uri.open ↩︎