Adventures in Ruby Source Code

From BenningtonWiki
Jump to: navigation, search

This page is a documentation of one part of Ruby's source code that I have explored. It was supposed to be updated weekly, but alas, it has not been.

Methods with a Bang!

The "bang" refers to methods that end in a "!". Strings, for example, have string.upcase and string.upcase! What's the difference?

<source lang=ruby> > (string = "hello").upcase > string =>hello

> (string = "hello").upcase! > string =>HELLO </source>

The code for string methods is in "string.c". The function is

    VALUE str;
    char *s, *send;
    int modify = 0;

    s = RSTRING(str)->ptr; send = s + RSTRING(str)->len;
    while (s < send) {
	if (ismbchar(*s)) {
	    s+=mbclen(*s) - 1;
	else if (ISLOWER(*s)) {
	    *s = toupper(*s);
	    modify = 1;

    if (modify) return str;
    return Qnil;

The while loop says to step through each character in the string. To accomplish this, it is using pointer arithmetic, which refers to the address of the information in the string. "send" is actually the string's end (as opposed to say, "send" and "receive"). "send" is the pointer to the end of the string.

What is mbc? From regex.h:

const unsigned char *re_mbctab;
#define ismbchar(c) re_mbctab[(unsigned char)(c)]
#define mbclen(c)   (re_mbctab[(unsigned char)(c)]+1)

I can find nothing else on either of those macros or variables in the regex.h file. The string.c file requires "re.h" which requires in turn, "regex.h".

The first "If" statement is something like this: If the current character is not an alphabet character, then s moves to the next letter.

Clearly, the upcase! method returns nil if nothing has changed, and the string with the changed values if something did change.

For reference, from "ruby.h":

struct RString {
    struct RBasic basic;
    long len;
    char *ptr;
    union {
	long capa;
	VALUE shared;
    } aux;

(Note: there are a lot of macros defined to make working with these structs easier on the eyes)

Interestingly, for string.upcase, the code for .upcase! is used on a copy of the string.