动手实现 objc_msgSend

文章發布時間 2012年11月16日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2012-11-16: Let's Build objc_msgSend · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2012-11-16-lets-build-objc_msgsend.html 发布：2012-11-16　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

objc_msgSend 函数支撑着我们在 Objective-C 中所做的一切。读者兼周五 Q & A 客座作者 Gwynne Raskind 建议我谈谈 objc_msgSend 的内部工作原理。要理解某个事物如何运作，还有什么比从头构建它更好的方法呢？让我们来构建 objc_msgSend。

“蹦床！小蹦床！“每当你编写一条 Objective-C 消息发送（message sending）时：

1
    [obj message]

编译器会生成一个对 objc_msgSend 的调用：

1
    objc_msgSend(obj, @selector(message));

接着由 objc_msgSend 负责完成消息派发。

它如何实现这一点？它查找对应的函数指针，即 IMP（方法实现），然后跳转到该函数。传递给 objc_msgSend 的所有参数在跳转后最终会成为该 IMP 的参数。IMP 的返回值则会成为调用方所看到的返回值。

由于 objc_msgSend 仅在获取正确函数指针并直接跳转的这段时间内掌控控制流，因此它有时被称为一个跳板（trampoline）。通常，任何用于将代码重定向至他处的一小段代码都可以称为跳板。

正是这种跳板行为使 objc_msgSend 与众不同。因为它只是查找正确的代码然后直接跳转过去，所以它相对通用。它能与传递给它的任意参数组合协同工作，因为它只是原样保留这些参数供方法 IMP 读取。返回值的处理则稍微复杂些，但事实证明，只需 objc_msgSend 的几个变体就足以涵盖所有可能的返回类型。

不幸的是，这种蹦床（trampoline）行为无法用纯 C 语言编写。没有方法能编写一个可以将通用参数传递给其他函数的 C 函数。虽然可以通过使用可变参数（variable arguments）来接近实现，但可变参数的传递方式与普通参数不同，且速度更慢，因此不兼容常规 C 参数。

如果能够用 C 编写objc_msgSend，其基本思路大致如下：

1
    id objc_msgSend(id self, SEL _cmd, ...)
2
    {
3
        Class c = object_getClass(self);
4
        IMP imp = class_getMethodImplementation(c, _cmd);
5
        return imp(self, _cmd, ...);
6
    }

实际上这有点过于简化。存在一个方法缓存（method cache）来使整个查找过程更快，所以更接近于这样：

1
    id objc_msgSend(id self, SEL _cmd, ...)
2
    {
3
        Class c = object_getClass(self);
4
        IMP imp = cache_lookup(c, _cmd);
5
        if(!imp)
6
            imp = class_getMethodImplementation(c, _cmd);
7
        return imp(self, _cmd, ...);
8
    }

唯一的区别在于，为了提升速度，cache_lookup 被实现为内联函数。

Assembly（汇编语言）

在 Apple 的运行时中，整个函数都采用汇编语言编写，以实现最大速度。objc_msgSend 会在每一次 Objective-C 消息发送时运行，而应用程序中最简单的操作也可能导致成千上万甚至数百万次消息。

为简化起见，我自己的实现仅用汇编语言完成最基本的操作，所有复杂的逻辑都放在一个单独的 C 函数中。汇编部分实际执行的功能等价于：

1
    id objc_msgSend(id self, SEL _cmd, ...)
2
    {
3
        IMP imp = GetImplementation(self, _cmd);
4
        imp(self, _cmd, ...);
5
    }

这样的话，GetImplementation 就能以更易理解的方式完成所有工作。

汇编代码需要：

将所有潜在参数保存到安全位置，以免 GetImplementation 覆盖它们。
调用 GetImplementation。
保存返回值到某处。
恢复所有参数值。
跳转到从 GetImplementation 返回的 IMP（方法实现）。

那么我们就开始吧！

这里我将使用 x86-64 汇编，因为它在 Mac 上使用最为方便。同样的原理也适用于 i386 或 ARM 架构。

这个函数放在它自己的文件中，我将其命名为 msgsend-asm.s。这个文件可以像普通源文件一样传递给编译器，它会被汇编并链接到程序的其余部分。

首先要做的是实际声明全局符号。由于一些枯燥的历史原因，C 函数在其全局符号名前会多一个下划线：

1
    .globl _objc_msgSend
2
    _objc_msgSend:

编译器会愉快地链接到最近可用的 objc_msgSend。只需将其链接到测试应用中，就足以让 [obj message] 表达式指向我们自己的代码而非苹果的运行时（runtime），这在测试代码以确保其实际正常工作时极为便利。

整数和指针参数通过寄存器 %rsi、%rdi、%rdx、%rcx、%r8 和 %r9 传递。任何超出寄存器容量的额外参数将通过栈（stack）传递。此函数首先要做的便是将这六个寄存器也保存到栈上，以便稍后恢复：

1
    pushq %rsi
2
    pushq %rdi
3
    pushq %rdx
4
    pushq %rcx
5
    pushq %r8
6
    pushq %r9

除了这些寄存器外，%rax 寄存器充当了某种隐式参数。它用于变量参数调用（variable-argument calls），在这种情况下，它存储传入的向量寄存器（vector registers）数量，被调用函数会使用这个数量来正确准备变量参数列表。鉴于目标方法可能是一个变量参数方法（variable-argument method），我也会保存这个寄存器：

1
    pushq %rax

为了完整性，用于传递浮点参数的 % xmm 寄存器实际上也应当被保存。然而，如果我能安全地假设 GetImplementation 不使用任何浮点数，那么我可以忽略它们，而我这样做仅仅是为了让代码更简洁。

接下来，我进行栈对齐。Mac OS X 要求在进行函数调用时，栈必须对齐到 16 字节（16-byte）边界。上述代码已经使栈处于对齐状态，但显式处理这个逻辑是好事，这样就无需担心是否所有内容都已对齐，或者疑惑你的应用程序为何在 dyld 函数中崩溃。为了对齐栈，我先将 % r12 的原始值保存到栈上，然后将当前的栈指针存入 % r12。选择 % r12 是比较随意的，任何调用者保存寄存器（caller-saved register）都可以。重要的是，该值能确保在调用 GetImplementation 的过程中得以保留。然后，我将栈指针与 -0x10 进行按位与（AND）操作，这仅仅是清除其低四位：

1
    pushq %r12
2
    mov %rsp, %r12
3
    andq $-0x10, %rsp

现在栈指针已经对齐了。它也安全地越过了上方所有已保存的寄存器区域，因为栈是向下增长的，而这个对齐过程只会让它继续向下移动。

终于可以调用 GetImplementation 了。它需要两个参数：self 和 _cmd。按照调用约定（calling conventions），这两个参数分别存放在 %rsi 和 %rdi 寄存器中。然而，它们当初就是这样传入 objc_msgSend 的，并且没有被移动过，所以无需额外操作就能让它们处于正确位置。唯一要做的就是实际调用 GetImplementation，这个函数名也需要加上前导下划线（leading underscore）。

1
    callq _GetImplementation

整数和指针类型的返回值通过 %rax 寄存器传递，因此被返回的 IMP（译注：方法实现指针）就存放在这个寄存器中。由于 %rax 必须恢复到其原始状态，所以需要将返回的 IMP 移动到其他位置。我临时选择将其存入 %r11 寄存器：

1
    mov %rax, %r11

现在是时候开始恢复原状了。第一项是恢复栈指针（stack pointer），它被保存在 %r12 中，并恢复 %r12 的旧值：

1
    mov %r12, %rsp
2
    popq %r12

然后，以与压入时相反的顺序，将所有参数寄存器从栈中弹出：

1
    popq %rax
2
    popq %r9
3
    popq %r8
4
    popq %rcx
5
    popq %rdx
6
    popq %rdi
7
    popq %rsi

现在一切都已就绪。参数寄存器（argument registers）已恢复到调用前的状态。所有为目标方法准备的参数都已放置在目标方法期望找到的位置。方法实现（IMP）本身位于 %r11 寄存器中，因此剩下的唯一操作就是跳转到该地址执行：

1
    jmp *%r11

就这样！汇编代码中没有更多需要处理的内容了。跳转指令将控制权传递给方法实现（method implementation）。从该代码的视角来看，一切就如同消息发送者直接调用了方法一样。上述所有的间接层都消失了。当方法返回时，它将直接返回到 objc_msgSend 的调用者处，无需任何额外干预。方法返回的任何返回值都会出现在正确的位置。

当涉及到非标准返回值时，会存在一些微妙之处。大型结构体（任何大到无法通过寄存器返回的数据）是最常见的例子。在 x86-64 架构上，大型结构体通过使用一个隐藏的第一参数（hidden first parameter）来返回。当你进行如下调用时：

1
    NSRect r = SomeFunc(a, b, c);

这次调用会被转换为更接近这样的形式：

1
    NSRect r;
2
    SomeFunc(&r, a, b, c);

用于存储返回值的内存地址通过 % rdi 寄存器传递。由于 objc_msgSend 期望 % rdi 和 % rsi 分别存放 self 和 _cmd，因此在处理返回大型结构体（large struct）的消息时该机制将失效。这种根本性问题存在于多种不同平台。运行时通过提供独立的 objc_msgSend_stret 函数（用于结构体返回）来解决此问题，该函数与 objc_msgSend 工作原理类似，但明确知道从 % rsi 中获取 self，从 % rdx 中获取 _cmd。

在部分平台上，当消息返回浮点值（floating point value）时会出现类似问题。在这些平台上，运行时提供了 objc_msgSend_fpret 函数（在 x86-64 架构下，针对极端特殊情况还有 objc_msgSend_fpret2）。

方法查找

现在让我们转向 GetImplementation 的实现。上述汇编跳板（assembly trampoline）意味着这段代码可以用 C 语言编写。请记住在实际运行时中，这些代码全部采用纯汇编实现以获取最佳性能。这不仅实现了对代码的精细控制（fine control），还免去了像上述示例代码那样需要保存和恢复所有寄存器的必要。

GetImplementation 本可以简单地调用 class_getMethodImplementation 就完成任务，将所有工作都推给 Objective-C 运行时（Objective-C runtime）处理。不过这样有些乏味。真正的 objc_msgSend 会优先在类的方法缓存（method cache）中查找，以获得最快速度。由于 GetImplementation 旨在模拟 objc_msgSend，它也将采取相同做法。只有当缓存中没有给定 selector（选择子）对应的条目时，它才会回退到查询运行时。

我们首先需要一些结构体定义。方法缓存是一组通过类结构访问的私有结构体，因此要访问它，我们需要自己定义这些结构体。请注意，虽然这些定义是私有的，但它们都可以在 Apple 开源发布的 Objective-C 运行时源码中找到。

首先是一个缓存条目的定义：

1
    typedef struct {
2
        SEL name;
3
        void *unused;
4
        IMP imp;
5
    } cache_entry;

很简单。不要问我关于那个 unused field（未使用字段）的事，我也不知道为什么它在那里。以下是缓存的整体定义：

1
    struct objc_cache {
2
        uintptr_t mask;
3
        uintptr_t occupied;
4
        cache_entry *buckets[1];
5
    };

缓存实现为哈希表。这个表以速度和简洁为首要考虑，因此有些不同寻常。表的大小总是二的幂。表通过 selector（选择子）索引，桶索引的计算方式是简单地取 selector 的值，可能进行位移以移除无关的低位，然后与适当的掩码进行逻辑与操作。顺便提一下，这里是用于计算特定 selector 和掩码的桶索引的宏：

1
    #ifndef __LP64__
2
    # define CACHE_HASH(sel, mask) (((uintptr_t)(sel)>>2) & (mask))
3
    #else
4
    # define CACHE_HASH(sel, mask) (((unsigned int)((uintptr_t)(sel)>>0)) & (mask))
5
    #endif

最后是类本身的结构体。这就是一个 Class 实际指向的内容：

1
    struct class_t {
2
        struct class_t *isa;
3
        struct class_t *superclass;
4
        struct objc_cache *cache;
5
        IMP *vtable;
6
    };

既然必要的结构体已经就位，现在让我们开始讲解 GetImplementation：

1
    IMP GetImplementation(id self, SEL _cmd)
2
    {

它首先做的事情是获取对象的类。真实的 objc_msgSend 通过 self->isa 的等价方式来实现这一步，但为了表述友好，我将在这部分使用官方 API：

1
        Class c = object_getClass(self);

既然我想访问其内部结构，我会立即将其转换为指向 class_t 结构体的指针：

1
        struct class_t *classInternals = (struct class_t *)c;

现在该查找 IMP 了。我们先将其初始设置为 NULL。如果在缓存中找到了对应条目，就会将它赋值给 IMP。如果检查缓存后它仍为 NULL，就会回退到慢速路径：

1
        IMP imp = NULL;

接下来，获取缓存（cache）的指针：

1
        struct objc_cache *cache = classInternals->cache;

计算 bucket index（桶索引），并获取指向 array of buckets（桶数组）的指针：

1
        uintptr_t index = CACHE_HASH(_cmd, cache->mask);
2
        cache_entry **buckets = cache->buckets;

接下来，我们搜索具有相应选择子（selector）的缓存条目。运行时（runtime）使用线性探测法（linear chaining），因此只需搜索后续的桶（bucket），直到找到匹配项或发现一个空条目（NULL entry）：

1
        for(; buckets[index] != NULL; index = (index + 1) & cache->mask)
2
        {
3
            if(buckets[index]->name == _cmd)
4
            {
5
                imp = buckets[index]->imp;
6
                break;
7
            }
8
        }

如果未找到缓存条目，我们就会回退到慢速路径（slow path）并调用运行时（runtime）。在真正的 objc_msgSend 实现中，上述所有代码都是用汇编（assembly）编写的，而此处正是它会脱离汇编、转而调用运行时自身代码的节点。一旦缓存查找尝试过但未找到条目，快速消息发送的任何希望就破灭了。此时追求速度的意义已大幅降低，部分原因在于它注定无法快速完成，部分原因在于这条路径极少被触及。正因如此，从汇编代码中脱离出来、调用更易维护的 C 代码是完全可以接受的：

1
        if(imp == NULL)
2
            imp = class_getMethodImplementation(c, _cmd);

IMP（方法实现）现已通过某种方式获取到。如果它已在缓存（cache）中，则直接从缓存中检索；否则由运行时（runtime）填充缓存。class_getMethodImplementation 调用同样会填充缓存，因此后续调用速度会更快。剩下的就是返回这个 IMP：

1
        return imp;
2
    }

测试
为了验证这些机制确实有效，我编写了一个快速测试程序：

1
    @interface Test : NSObject
2
    - (void)none;
3
    - (void)param: (int)x;
4
    - (void)params: (int)a : (int)b : (int)c : (int)d : (int)e : (int)f : (int)g;
5
    - (int)retval;
6
    @end
7

8
    @implementation Test
9

10
    - (id)init
11
    {
12
        fprintf(stderr, "in init method, self is %p\n", self);
13
        return self;
14
    }
15

16
    - (void)none
17
    {
18
        fprintf(stderr, "in none method\n");
19
    }
20

21
    - (void)param: (int)x
22
    {
23
        fprintf(stderr, "got parameter %d\n", x);
24
    }
25

26
    - (void)params: (int)a : (int)b : (int)c : (int)d : (int)e : (int)f : (int)g
27
    {
28
        fprintf(stderr, "got params %d %d %d %d %d %d %d\n", a, b, c, d, e, f, g);
29
    }
30

31
    - (int)retval
32
    {
33
        fprintf(stderr, "in retval method\n");
34
        return 42;
35
    }
36

37
    @end
38

39

40
    int main(int argc, char **argv)
41
    {
42
        for(int i = 0; i < 20; i++)
43
        {
44
            Test *t = [[Test alloc] init];
45
            [t none];
46
            [t param: 9999];
47
            [t params: 1 : 2 : 3 : 4 : 5 : 6 : 7];
48
            fprintf(stderr, "retval gave us %d\n", [t retval]);
49

50
            NSMutableArray *a = [[NSMutableArray alloc] init];
51
            [a addObject: @1];
52
            [a addObject: @{ @"foo" : @"bar" }];
53
            [a addObject: @("blah")];
54
            a[0] = @2;
55
            NSLog(@"%@", a);
56
        }
57
    }

我还为 GetImplementation 添加了一些调试日志，以确保它确实被调用了，免得我搞错了构建，结果意外调用了运行时的原生实现。一切运行正常，即使是字面量和下标操作也调用了替换的实现。

结论 objc_msgSend 的核心逻辑其实相对简单。然而，它的使用方式要求采用汇编代码，这使得它比实际需要更难理解。此外，极端的性能需求以及随之而来的优化，意味着它是一段相当密集且棘手的汇编代码。不过，通过构建一个简单的汇编跳板（assembly trampoline），然后用 C 语言重新实现其逻辑，我们就能明白它的工作原理，其中确实没有太多复杂的内容。

这一点应该是显而易见的：永远不要在你自己的应用程序中发布自定义的 objc_msgSend。你会搞砸某些东西，并且会为此感到后悔。请仅出于教育目的进行此操作。

今天这篇充满幻象、浸透汇编的文章就到这里。下次再来继续享受乐趣、游戏和黑客精神。正如我大约已经说了一千遍，但仍忍不住要提醒各位：周五问答（Friday Q & A）是由读者建议驱动的。如果你有希望我撰写的主题，请尽管发送过来！

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2012-11-16-lets-build-objc_msgsend.html

The objc_msgSend function underlies everything we do in Objective-C. Gwynne Raskind, reader and occasional Friday Q&A guest contributor, suggested that I talk about how objc_msgSend works on the inside. What better way to understand how something works than to build it from scratch? Let’s build objc_msgSend.

Tramapoline! Trampopoline!Whenever you write an Objective-C message send:

1
    [obj message]

The compiler generates a call to objc_msgSend:

1
    objc_msgSend(obj, @selector(message));

objc_msgSend then takes care of dispatching the message.

How does it do that? It looks up the appropriate function pointer, or IMP, to invoke, then jumps to it. Any arguments passed to objc_msgSend end up being arguments to the IMP after the jump. The return value from the IMP ends up as the return value seen by the caller.

Because objc_msgSend only takes control long enough to obtain the right function pointer and directly jump to it, it’s sometimes referred to as a trampoline. In general, any small piece of code that serves to redirect code somewhere else can be called a trampoline.

It is this trampolining behavior that makes objc_msgSend special. Because it simply looks up the right code and then jumps directly to it, it’s relatively generic. It works with any combination of parameters passed to it, because it just leaves them alone for the method IMP to read. Return values are a bit trickier, but it turns out that every possible return type can be accounted for with just a couple of variants of objc_msgSend.

Unfortunately, this trampoline behavior cannot be written in pure C. There is no way to write a C function that passes through generic parameters to another function. You can come close by using variable arguments, but variable arguments are passed differently from normal arguments and in a way that’s slower, so it’s not compatible with regular C parameters.

If you could write objc_msgSend in C, the basic idea would look something like this:

1
    id objc_msgSend(id self, SEL _cmd, ...)
2
    {
3
        Class c = object_getClass(self);
4
        IMP imp = class_getMethodImplementation(c, _cmd);
5
        return imp(self, _cmd, ...);
6
    }

This is actually a bit over-simplified. There’s a method cache to make the whole lookup faster, so it’s more like this:

1
    id objc_msgSend(id self, SEL _cmd, ...)
2
    {
3
        Class c = object_getClass(self);
4
        IMP imp = cache_lookup(c, _cmd);
5
        if(!imp)
6
            imp = class_getMethodImplementation(c, _cmd);
7
        return imp(self, _cmd, ...);
8
    }

Except that, for speed, cache_lookup is implemented inline.

AssemblyIn Apple’s runtime, the whole function is implemented in assembly for maximum speed. objc_msgSend runs for every single Objective-C message send, and the simplest action in app can result in thousands or millions of messages.

To simplify things a bit, my own implementation will do the bare minimum in assembly, with all of the smarts in a separate C function. The assembly itself will do the equivalent of:

1
    id objc_msgSend(id self, SEL _cmd, ...)
2
    {
3
        IMP imp = GetImplementation(self, _cmd);
4
        imp(self, _cmd, ...);
5
    }

Then GetImplementation can do all of the work in a more understandable fashion.

The assembly code needs to:

Save all potential parameters somewhere safe, so that GetImplementation won’t overwrite them.
Call GetImplementation.
Save the return value somewhere.
Restore all of the parameter values.
Jump to the IMP returned from GetImplementation.

So let’s get started!

I’m going to use x86-64 assembly here, as it’s the most convenient to work with on a Mac. The same principles would apply for i386 or ARM.

This function goes into its own file, which I called msgsend-asm.s. This file can be passed to the compiler as just another source file, and it will assemble it and link it into the rest of the program.

The first thing to do is to actually declare the global symbol. For boring historical reasons, C functions get an extra leading underscore in their global symbol name:

1
    .globl _objc_msgSend
2
    _objc_msgSend:

The compiler will happily link against the nearest available objc_msgSend. Simply linking this into a test app is enough to get [obj message] expressions going to our own code rather than Apple’s runtime, which is terribly convenient when it comes to testing this code to make sure it actually works.

Integer and pointer parameters are passed in registers %rsi, %rdi, %rdx, %rcx, %r8, and %r9. Any additional parameters beyond what would fit in there get passed on the stack. The first thing this function does is save those six registers onto the stack as well, so they can be restored later:

1
    pushq %rsi
2
    pushq %rdi
3
    pushq %rdx
4
    pushq %rcx
5
    pushq %r8
6
    pushq %r9

In addition to these registers, the %rax register acts as something of a hidden parameter. It’s used for variable-argument calls, and in that case it stores the number of vector registers passed in, which is used by the called function to properly prepare the variable argument list. In case the target method is a variable-argument method, I save this register as well:

1
    pushq %rax

For completeness, the %xmm registers used to pass floating-point arguments really ought to be saved as well. However, if I can safely assume that GetImplementation doesn’t use any floating point, then I can ignore them, which I do simply to keep the code shorter.

Next, I align the stack. Mac OS X requires that the stack be aligned to a 16-byte boundary when making function calls. The above code leaves us with an aligned stack anyway, but it’s nice to have code to explicitly handle it so that you don’t have to worry about making sure everything is lined up, or wondering why your app is crashing in dyld functions. To align the stack, I save the existing stack pointer into %r12 after saving the original value of %r12 onto the stack. The choice of %r12 is somewhat arbitrary, and any caller-saved register would do. The important thing is that the value is guaranteed to survive across the call to GetImplementation. Then I and the stack pointer with -0x10, which just clears the bottom four bits:

1
    pushq %r12
2
    mov %rsp, %r12
3
    andq $-0x10, %rsp

Now the stack pointer is aligned. It’s also safely past any of the saved registers from above, since the stack grows down, and this alignment procedure will only move it further down.

It’s finally time to call into GetImplementation. It takes two parameters, self and _cmd. Calling conventions are for those two parameters to go into %rsi and %rdi, respectively. However, they were passed into objc_msgSend like that, and haven’t been moved, so nothing has to be done to get them into place. All that has to be done is actually make the call to GetImplementation, which also gets a leading underscore:

1
    callq _GetImplementation

Integer and pointer return values are returned in %rax, so that’s where the returned IMP is found. Since %rax has to be restored to its original state, the returned IMP needs to be moved elsewhere. I arbitrarily chose to store it into %r11:

1
    mov %rax, %r11

Now it’s time to start putting things back the way they were. The first item is to restore the stack pointer, which was stashed in %r12, and restore the old value of %r12:

1
    mov %r12, %rsp
2
    popq %r12

Then pop all of the argument registers off the stack in the opposite order from when they were pushed:

1
    popq %rax
2
    popq %r9
3
    popq %r8
4
    popq %rcx
5
    popq %rdx
6
    popq %rdi
7
    popq %rsi

Everything is now ready. The argument registers are restored to how they were before. All parameters intended for the target method are in the place where the target method will expect to find them. The IMP itself is in %r11, so all that has to be done is to jump there:

1
    jmp *%r11

And that’s it! There’s nothing more to be done in the assembly code. The jump passes control to the method implementation. From the perspective of that code, it looks exactly as if the message sender directly invoked the method. All of the indirection above just disappears. When the method returns, it will return directly to the caller of objc_msgSend without any further intervention. Any return value from the method will be found in the correct place.

There’s a bit of subtlety when it comes to unusual return values. Large structs (anything too large to be returned in a register) are the most common example of this. On x86-64, large structs are returned by using a hidden first parameter. When you make a call like this:

1
    NSRect r = SomeFunc(a, b, c);

The call gets translated to something more like this:

1
    NSRect r;
2
    SomeFunc(&r, a, b, c);

The address of memory to use for the return value gets passed in %rdi. Since objc_msgSend expects %rdi and %rsi to contain self and _cmd, it won’t work for messages that return large structs. This same basic problem exists on many different platforms. The runtime solves this problem by providing a separate objc_msgSend_stret function used for struct returns, which works like objc_msgSend, but knows to find self in %rsi and _cmd in %rdx.

A similar problem arises on some platforms with messages that return floating point values. On those platforms, the runtime provides objc_msgSend_fpret (and on x86-64, objc_msgSend_fpret2 for extremely special cases).

Method LookupLet’s move on to the implementation of GetImplementation. The above assembly trampoline means that this code can be written in C. Remember that in the real runtime, this code is all straight assembly, in order to get the best speed possible. Not only does this allow for fine control over the code, but it also eliminates the need to save and restore all of those registers like the code above does.

GetImplementation could simply call class_getMethodImplementation and be done with it, foisting all of the work onto the Objective-C runtime. This is a bit boring, though. The real objc_msgSend looks in the class’s method cache first, for maximum speed. Since GetImplementation is intended to mimic objc_msgSend, it will do the same. Only if the cache doesn’t contain an entry for the given selector will it fall back to querying the runtime.

The first thing we need is some struct definitions. The method cache is a private set of structures accessed through the class structure, so to get to it we need our own definitions. Note that, while private, these definitions are all available as part of Apple’s open source release of the Objective-C runtime.

First comes the definition for a single cache entry:

1
    typedef struct {
2
        SEL name;
3
        void *unused;
4
        IMP imp;
5
    } cache_entry;

Pretty easy. Don’t ask me about the unused field, I don’t know why that’s there. Here’s the definition for the cache as a whole:

1
    struct objc_cache {
2
        uintptr_t mask;
3
        uintptr_t occupied;
4
        cache_entry *buckets[1];
5
    };

The cache is implemented as a hash table. This table is built for speed and simplicity over all else, so it’s a bit unusual. The table size is always a power of two. The table is indexed by selector, and the bucket index is computed by simply taking the selector’s value, possibly shifting it to get rid of irrelevant low bits, and performing a logical and with the appropriate mask. While we’re at it, here are macros used to compute the bucket index for a particular selector and mask:

1
    #ifndef __LP64__
2
    # define CACHE_HASH(sel, mask) (((uintptr_t)(sel)>>2) & (mask))
3
    #else
4
    # define CACHE_HASH(sel, mask) (((unsigned int)((uintptr_t)(sel)>>0)) & (mask))
5
    #endif

Finally, there’s the structure for the class itself. This is what a Class actually points to:

1
    struct class_t {
2
        struct class_t *isa;
3
        struct class_t *superclass;
4
        struct objc_cache *cache;
5
        IMP *vtable;
6
    };

Let’s get started with GetImplementation now that the necessary structs are there:

1
    IMP GetImplementation(id self, SEL _cmd)
2
    {

The first thing it does is get the object’s class. The real objc_msgSend does this with the equivalent of self->isa, but I’ll be gentle and use the official API for that part:

1
        Class c = object_getClass(self);

Since I want access to the guts, I’ll immediately cast to a pointer to the class_t struct:

1
        struct class_t *classInternals = (struct class_t *)c;

Now it’s time to look up the IMP. We’ll start off with it set to NULL. If we find an entry in the cache, we’ll set it. If it’s still NULL after checking the cache, we’ll fall back to the slow path:

1
        IMP imp = NULL;

Next, grab a pointer to the cache:

1
        struct objc_cache *cache = classInternals->cache;

Compute the bucket index, and grab a pointer to the array of buckets:

1
        uintptr_t index = CACHE_HASH(_cmd, cache->mask);
2
        cache_entry **buckets = cache->buckets;

Next, we search for a cache entry with the appropriate selector. The runtime uses linear chaining, so it’s just a matter of searching subsequent buckets until either we find a match or find a NULL entry:

1
        for(; buckets[index] != NULL; index = (index + 1) & cache->mask)
2
        {
3
            if(buckets[index]->name == _cmd)
4
            {
5
                imp = buckets[index]->imp;
6
                break;
7
            }
8
        }

If no entry was found, we fall back to the slow path and call into the runtime. In the real objc_msgSend, all of the above code is written in assembly, and this is the point where it would drop out of assembly and call into the runtime itself. Once the cache has been tried and no entry was found, any hope for a fast message send is gone. The need to go fast becomes much less important at this point, partly because it’s already doomed to be slow, and partly because this path should be taken extremely rarely. Because of that, it’s acceptable to drop out of the assembly code and call into more maintainable C:

1
        if(imp == NULL)
2
            imp = class_getMethodImplementation(c, _cmd);

The IMP has now been obtained, one way or another. If it was in the cache, it was retrieved from there, and otherwise it was populated by the runtime. The class_getMethodImplementation call will also populate the cache, so subsequent calls will go faster. All that’s left is to return it the IMP:

1
        return imp;
2
    }

TestingTo make sure this stuff actually works, I whipped up a quick test program:

1
    @interface Test : NSObject
2
    - (void)none;
3
    - (void)param: (int)x;
4
    - (void)params: (int)a : (int)b : (int)c : (int)d : (int)e : (int)f : (int)g;
5
    - (int)retval;
6
    @end
7

8
    @implementation Test
9

10
    - (id)init
11
    {
12
        fprintf(stderr, "in init method, self is %p\n", self);
13
        return self;
14
    }
15

16
    - (void)none
17
    {
18
        fprintf(stderr, "in none method\n");
19
    }
20

21
    - (void)param: (int)x
22
    {
23
        fprintf(stderr, "got parameter %d\n", x);
24
    }
25

26
    - (void)params: (int)a : (int)b : (int)c : (int)d : (int)e : (int)f : (int)g
27
    {
28
        fprintf(stderr, "got params %d %d %d %d %d %d %d\n", a, b, c, d, e, f, g);
29
    }
30

31
    - (int)retval
32
    {
33
        fprintf(stderr, "in retval method\n");
34
        return 42;
35
    }
36

37
    @end
38

39

40
    int main(int argc, char **argv)
41
    {
42
        for(int i = 0; i < 20; i++)
43
        {
44
            Test *t = [[Test alloc] init];
45
            [t none];
46
            [t param: 9999];
47
            [t params: 1 : 2 : 3 : 4 : 5 : 6 : 7];
48
            fprintf(stderr, "retval gave us %d\n", [t retval]);
49

50
            NSMutableArray *a = [[NSMutableArray alloc] init];
51
            [a addObject: @1];
52
            [a addObject: @{ @"foo" : @"bar" }];
53
            [a addObject: @("blah")];
54
            a[0] = @2;
55
            NSLog(@"%@", a);
56
        }
57
    }

I also added some debug logs to GetImplementation to make sure it actually got called, in case I screwed up the build and ended up calling the runtime’s implementation by mistake. Everything worked, and even the literals and subscripting called the replacement implementation.

ConclusionAt its core, objc_msgSend is relatively simple. The way that it’s used requires the use of assembly code, however, which makes it more difficult to understand than it really needs to be. Additionally, the extreme performance demands and requisite optimizations mean that it’s pretty dense and tricky assembly. However, by building a simple assembly trampoline and then reimplementing the logic in C, we can see just how it works, and there really isn’t all that much to it.

This should be obvious, but never ship your own objc_msgSend in your own app. You’ll break stuff and you’ll be sorry. Do this for educational purposes only.

That’s it for today’s hallucinatory, assembly-soaked article. Come back next time for more fun, games, and hacking. As I’ve said roughly one thousand times by now, but can’t help but reminding you, Friday Q&A is driven by reader suggestions. If you have a topic that you’d like to see me write about, please send it in!