动手实现 NSInvocation（上）

文章發布時間 2013年3月8日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2013-03-08: Let's Build NSInvocation, Part I · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2013-03-08-lets-build-nsinvocation-part-i.html 发布：2013-03-08　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

又到了灵魂深处探索的时刻。读者 Robby Walker 建议写一篇关于 NSInvocation 的文章，我已经应约，从头开始实现供你消遣。今天我将开启一场惊悚走廊之旅 ——MAInvocation，即我对 NSInvocation API 的重新实现。这是个大工程，今天我将聚焦于基本原理和汇编语言粘合代码，后续再完成其余实现。

代码
MAInvocation 的代码已上传至 GitHub：
https://github.com/mikeash/MAInvocation

概述
NSInvocation 对象代表一次方法调用（method invocation）。一次方法调用包含目标（target）、选择子（selector）、一组参数和返回值。

仅仅存储这些数值会相当无趣。你完全能轻松构建一个模型类来实现：声明一个返回值变量、一个参数数组就完事了。（目标和选择子本质上就是第一和第二个参数。）NSInvocation 的真正价值在于其捕获并发送所代表调用的能力。

NSInvocation 可以在特定对象上被调用，其效果等同于 [target message: argument] 这样的代码，但目标对象、消息和参数都完全在运行时确定。NSInvocation 可以在代码中通过运行时内省（runtime introspection）来构造，而无需预先了解任何关于该方法的信息。

此外，NSInvocation 可以从一次尝试的消息发送中构造出来。如果你编写了 [target message: argument]，而 target 实际上并未实现 message:，那么它会收到一个 forwardInvocation: 调用，并传入一个表示该调用的 NSInvocation *。随后，它可以对这个 invocation 为所欲为，例如在另一个对象上调用它、修改参数，或者设置一个任意的返回值传递给调用者。

因此，NSInvocation 包含两个互补的复杂部分：

能够获取一组参数、使用它们来发起一个方法调用并收集返回值的代码。
能够接收一个方法调用、收集参数，然后向调用者返回一个任意返回值的代码。

这两项工作都需要深入理解 CPU 架构的调用约定（calling conventions）并在实现中编码体现，同时还需要汇编语言粘合代码（assembly language glue code）。

调用约定 由于需要大量架构特定的代码，我决定专注于单一架构。x86-64 对我们 Mac 用户来说是最方便使用的。为了进一步简化，我决定不支持浮点参数或返回值，并且也放弃了结构体参数的支持，不过我确实实现了结构体返回值的支持。以下讨论将忽略那些我没有实现的部分。

为了实现即便是这个受限的MAInvocation，也必须理解 x86-64 函数调用约定的相关部分，而要理解这一点，你首先必须至少对 x86-64 架构（x86-64 architecture）本身有一些了解。

x86-64 架构是英特尔 32 位 x86 架构的 64 位扩展，后者随 386 CPU 引入。而 x86 架构本身又是对英特尔 8086 的 16 位架构的扩展，8086 架构则深度基于英特尔 8080 的 8 位架构 ——8080 通常被认为是第一款值得用以构建计算机的微处理器。它能够寻址高达 64kB 的 RAM，放在今天，这仅够容纳一个中等大小的应用图标。

该架构拥有十六个通用寄存器：rax、rbx、rcx、rdx、rbp、rsp、rsi、rdi、r8、r9、r10、r11、r12、r13、r14 和 r15。前一半均继承自英特尔的 32 位架构，而后一半是 x86-64 的新增部分。每个寄存器可存储 64 位数据。

在涉及这些调用约定（calling conventions）时，指针和整数的处理方式完全相同。两者都只是 64 位的量。较小的整数会被扩展至 64 位大小。

调用函数时，前六个参数按照顺序填充以下寄存器来传递：rdi、rsi、rdx、rcx、r8 和 r9。额外的参数（如果有的话）则作为 64 位的量传递到栈上，因此后续的参数可以在内存中于 rsp、rsp + 8、rsp + 16 等位置找到。

如果函数有返回值，该值会通过存储在 rax 寄存器中来传递。如果函数返回两个值（例如返回包含两个值的结构体 NSRange），第二个值会使用 rdx 寄存器。若函数返回更大的结构体，则通过调用方预先分配足够的内存来容纳它，然后将指向该内存的指针作为隐式第一参数通过 rdi 传递，所有显式参数的寄存器分配顺序相应后移一位。

请注意，对于 Objective-C 方法，前两个参数分别是 self 和 _cmd，因此它们会通过 rdi 和 rsi 传递（如果方法返回较大结构体，则通过 rsi 和 rdx 传递）。显式参数（如果有的话）则位于这两个参数之后。

据我所知，用于传递参数的寄存器数量或具体选用哪些寄存器并没有特定的根本原因。调用约定（calling conventions）是在调用方负担、被调用方负担、参数传递效率以及周边代码效率之间做出的权衡。这些约定大概位于所有相互竞争需求之间某个合理的折中点上。

为了发起一个函数调用，MAInvocation 需要获取函数的参数，将前六个参数放入相应的寄存器，将其他参数压入栈中，然后需要实际跳转到函数的地址。在返回时，它需要记录两个返回值寄存器中的值。

为了接收一个函数调用，MAInvocation 需要记录六个参数传递寄存器的值，以及栈指针的位置，并用这些来提取参数值。在返回时，它需要将期望的返回值放入两个返回值寄存器中。确定哪些值放入寄存器、哪些放入栈的逻辑可以用 Objective-C 编写，但实际操作寄存器和栈的代码需要用汇编编写。

数据结构为了在 Objective-C 和汇编代码之间清晰地通信，我定义了一个包含所有相关代码的结构体（struct）。当发起调用时，MAInvocation 会根据情况填充该结构体，然后调用汇编语言胶水代码。当接收调用时，汇编语言胶水代码会从当前状态构造该结构体，然后将其传递给 Objective-C 代码。并非所有字段在两种情况下都有效，但为所有情况使用同一个结构体比试图进行专门化更简单。

该结构体包含的第一项内容是：要调用的函数的地址。

1
    struct RawArguments
2
    {
3
        void *fptr;

接下来，它会存储这六个 64 位参数传递寄存器的值：

1
        uint64_t rdi;
2
        uint64_t rsi;
3
        uint64_t rdx;
4
        uint64_t rcx;
5
        uint64_t r8;
6
        uint64_t r9;

它然后存储在栈上传递的参数的地址，以及栈参数的多少个：

1
        uint64_t stackArgsCount;
2
        uint64_t *stackArgs;

接下来，它会存储那两个返回值寄存器：

1
        uint64_t rax_ret;
2
        uint64_t rdx_ret;

rdx 在参数传递部分已经存在，但为返回值创建一个单独的条目比重用该字段更容易。最后，它保存一个标志，记录调用是否使用 struct return conventions（结构体返回约定），即 rdi 是否用于存储为返回值分配的空间的指针。在 Objective-C runtime 术语中，这种调用被称为 stret（结构体返回），是 “struct return” 的缩写：

1
        uint64_t isStretCall;
2
    };

“结构体返回” 这个说法有些名不副实，因为小型结构体会通过寄存器返回，但业界习惯如此称呼。当你看到 “struct return” 或 “stret” 时，应理解为 “足够大型的结构体返回”。

函数调用胶水

函数调用胶水（function call glue）是一个具有以下 C 语言签名的函数：

1
    void MAInvocationCall(struct RawArguments *);

它由汇编实现，但通过上述函数原型，Objective-C 代码可以像调用 C 函数一样调用它。它会传递一个已填充的结构体 RawArguments，而汇编胶水代码将执行实际的调用。

汇编代码首先声明符号。它被标记为全局（global），以便程序的其他部分可以访问。前导下划线源于涉及 Fortran 的古老历史，每个 C 符号都会隐式地添加一个。一个非 C 符号若希望从 C 代码中被访问，也需要带有这个下划线：

1
    .globl _MAInvocationCall
2
    _MAInvocationCall:

任何规范的 x86-64 函数首先要保存旧的栈帧指针（存储在 rbp 寄存器中），并通过将栈指针复制过来建立一个新的栈帧指针：

1
    pushq %rbp
2
    movq %rsp, %rbp

我将在接下来的代码中使用 r12 到 r15 这些寄存器。按照平台调用约定，这些寄存器被指定为「被调用者保存的」（callee-saved）寄存器，这意味着我们不能随意清除它们的内容。因此，我们需要先将它们的值保存到栈上，以便稍后恢复：

1
    pushq %r12
2
    pushq %r13
3
    pushq %r14
4
    pushq %r15

结构体 RawArguments 的指针参数存储在 rdi 寄存器中。它是函数的第一个参数，而调用约定（calling conventions）规定第一个参数通过 rdi 传递。由于我们需要将 rdi 用于调用目标函数的第一个参数，因此将其当前值保存到 r12 寄存器中。结构体 RawArguments 参数的各个成员可以通过从 r12 加载不同偏移量来访问：

1
    mov %rdi, %r12

现在它已准备好开始将参数复制到需要的位置。由于这需要操作栈指针，因此它将栈指针复制到 r15 中以便稍后轻松恢复：

1
    mov %rsp, %r15

栈参数会被优先复制，这并没有什么特别的原因。不过这样做确实能让复制栈参数的代码编写起来稍微容易一些，因为此时可以将参数传递寄存器（argument-passing registers）用作临时空间（scratch space），反正它们里面还没有重要数据。程序首先会加载栈参数的数量，该数值位于 Rawarguments 结构体的偏移量 56 处：

1
    movq 56(%r12), %r10

如果你好奇 56 这个数字从何而来，是因为该结构体中的每个成员占 8 字节，而栈参数的数量是结构体的第 8 个元素，这意味着它前面有 7 个其他元素的空间。7 × 8 = 56。这段代码中所有偏移量的计算方式都相同。

现在 r10 寄存器包含了需要复制的栈参数（stack arguments）数量。接下来，它计算这些参数所需的栈空间大小。这等于参数数量乘以 8（因为每个参数为 64 位，即 8 字节）。具体做法是：将参数数量复制到 r11，然后将其左移三位，这等价于乘以 8：

1
    movq %r10, %r11
2
    shlq $3, %r11

接下来，它将栈参数指针（stack argument pointer）从 struct RawArguments 的偏移量 64 处加载到寄存器 r13 中：

1
    movq 64(%r12), %r13

让我们花点时间回顾一下当前临时寄存器（temporary registers）中包含的内容：

r10: 需要复制的栈参数数量。
r11: 栈参数（stack arguments）所需的字节数。
r13: 栈参数指针。

在汇编中，我们无法给事物起方便的名称，因此在任何时刻仔细追踪什么包含什么是至关重要的。

下一步是将栈指针（stack pointer）向下移动以为参数腾出空间，这通过从栈指针中减去 r11 来完成：

1
    subq %r11, %rsp

栈（stack）在发起函数调用前也必须保持 16 字节对齐（16-byte aligned），这只需通过与一个清除最低四位的数值进行逻辑与（logical AND）操作即可实现：

1
    andq $-0x10, %rsp

舞台已搭建完毕。此时，我们只需执行一个简单的内存复制循环。等价的 C 代码如下：

1
    for(int i = 0; i != r10; i++)
2
        rsp[i] = r13[i];

r14 将用作循环计数器。第一步是将其初始化为零：

1
    movq $0, %r14

循环的顶部需要一个标签，以便后续代码能够轻松地跳转回该位置：

1
    stackargs_loop:

接下来是对 r14 是否不等于 r10 的检查：

1
    cmpq %r14, %r10
2
    je done

cmp 指令比较两个寄存器并相应地设置标志寄存器（FLAGS register）的内容。随后，如果标志寄存器表明两者相等，je 指令将跳转到 done 标签处。这种两阶段构造略显奇特，但这正是 x86-64 架构的工作方式。

如果两者不相等，循环将继续执行。下一步是复制当前参数。这分为两个阶段：首先，参数从 r13 指向的内存中复制到一个临时寄存器，在此例中为 rdi。接着，参数从 rdi 复制到 rsp 指向的内存中：

1
    movq 0(%r13, %r14, 8), %rdi
2
    movq %rdi, 0(%rsp, %r14, 8)

这些括号表达式有点吓人。x86-64 架构允许多种不同组件的内存引用（memory references），这使得执行计算型数组解引用（computed array dereferences）如上例所示变得轻松。该表达式的通用形式如下所示：

1
    offset(%r1, %r2, elementSize)

这指的是这个地址：

1
    r1 + r2 * elementSize + offset

这可以看作是一个数组解引用。r13 是数组指针，r14 是索引，elementSize 是数组中每个元素的大小，而 offset 则是对整个结果进行的最终修正。简而言之，0(%r13, %r14, 8) 等价于 ((uint64_t *)r13)[r14]。

接下来是 i++，其对应的汇编指令很简单：

1
    inc %r14

最终，通过跳转回 stackargs_loop 来完成循环，done 标签紧随其后，以便循环退出后继续执行后续代码：

1
    jmp stackargs_loop
2

3
    done:

栈参数现在已准备就绪。剩下的就是将寄存器参数复制到它们实际的寄存器中。这通过写入一系列移动指令来完成：

1
    movq 8(%r12), %rdi
2
    movq 16(%r12), %rsi
3
    movq 24(%r12), %rdx
4
    movq 32(%r12), %rcx
5
    movq 40(%r12), %r8
6
    movq 48(%r12), %r9

万事俱备，接下来就可以调用目标函数了。函数指针恰好位于r12寄存器所指向的内存位置，因为它是RawArguments结构体中的第一个元素。这条指令将执行调用：

1
    callq *(%r12)

调用返回后，返回值（如果有的话）会保存在寄存器rax和rdx中。代码立即将这些寄存器的内容复制到结构体RawArguments（原始参数结构）中：

1
    movq %rax, 72(%r12)
2
    movq %rdx, 80(%r12)

基本上就完成了。除了返回操作之外，唯一需要做的就是将存储在 r12-r15 寄存器中的值恢复为调用者原先在这些寄存器中的状态。首先，需要将栈指针（stack pointer）恢复到这些寄存器被压栈后的状态：

1
    mov %r15, %rsp

那么它们的弹出顺序将与压入顺序相反：

1
    popq %r15
2
    popq %r14
3
    popq %r13
4
    popq %r12

最终，控制权被返回给调用者，这一过程通过一组神奇的指令（instruction）组合来实现：在跳转到调用者的地址之前，这些指令会重新调整栈（stack）和栈帧指针（frame pointer）。

1
    leave
2
    ret

这便完成了函数调用所需的胶水代码。现在，Objective-C 代码可以填写 struct RawArguments 结构体以适应具体的调用，然后调用 MAInvocationCall 并将结构体指针传递过去以执行调用。

转发胶水代码 在 Objective-C 中，捕获方法调用被称为” 转发（forwarding）“。运行时（runtime）有一个特殊的转发处理程序（forwarding handler），当某个特定选择子（selector）找不到对应的实现（method implementation）时就会被调用。实际上，这里有两个不同的转发处理程序：一个用于普通调用，另一个用于 stret 调用。转发处理程序需要知道从哪里找到 self 和 _cmd 参数，而这两个参数的位置对于 stret 调用会有所不同，因此需要一些专门化处理。

这里的策略是设置两个入口点，在记录下是否为 stret 调用后，再调用到一个通用的实现。该通用实现会相应地填写一个新的 struct RawArguments 结构体，并调用一个 Objective-C 函数。一旦该函数返回，它便将返回值复制回返回值寄存器，然后返回。

当函数被调用时，r10 寄存器并不包含特定内容，且其值也无需保存。这使得它成为临时存储 stret（结构体返回）标志的理想位置。普通转发处理器（normal forwarding handler）会在跳转到通用实现前将其设为 0，而 stret 处理器则会将其设为 1。以下是普通处理器的完整代码：

1
    .globl _MAInvocationForward
2
    _MAInvocationForward:
3
    movq $0, %r10
4
    jmp _MAInvocationForwardCommon

stret 处理器（处理结构体返回值的机制）几乎相同：

1
    .globl _MAInvocationForwardStret
2
    _MAInvocationForwardStret:
3
    movq $1, %r10
4
    jmp _MAInvocationForwardCommon

所有有趣的部分都发生在通用处理函数中：

1
    .globl _MAInvocationForwardCommon
2
    _MAInvocationForwardCommon:

首先，它计算传入函数的栈参数位置。从被调用函数的角度看，栈参数起始于 rsp + 8。调用方发出的调用指令（call instruction）会将返回地址（return address）压入栈中，这就是为什么从调用方角度看栈参数起始于 rsp，但这里并非如此。r11 是另一个方便的寄存器，既不包含任何有用数据也无需保存，因此代码在该寄存器中计算地址：

1
    movq %rsp, %r11
2
    addq $8, %r11

随后，该函数会执行标准的序幕部分 —— 设置栈帧指针（frame pointer）：

1
    pushq %rbp
2
    movq %rsp, %rbp

现在终于到了构造 RawArguments 结构体的时刻。这是通过将值压入栈来完成的。首先，快速回顾一下当前各寄存器中的内容：

r10：isStretCall 标志。
r11：指向栈参数的指针。
rdi 到 r9：寄存器参数。

处理函数使用 pushq 指令在栈上构造该结构体。由于是向栈中压入数据，所有内容都需要按逆序压入。因为 isStretCall 是结构体末尾的字段，所以它需要最先被压入：

1
    pushq %r10

返回值寄存器无需包含特定内容，因此通过两次压入零值来为其腾出空间。

1
    pushq $0
2
    pushq $0

接下来是栈参数指针，其值目前位于 r11 中：

1
    pushq %r11

接下来是栈参数的数量。目前这个数值还无法确定，因此处理函数（handle）先压入一个零来预留空间。这个字段稍后将由 Objective-C 代码填充：

1
    pushq $0

接下来是参数寄存器（argument registers），它们按逆序压入栈中：

1
    pushq %r9
2
    pushq %r8
3
    pushq %rcx
4
    pushq %rdx
5
    pushq %rsi
6
    pushq %rdi

该结构体（struct）的第一个字段是函数指针（function pointer）。由于此处并不需要使用它，因此推入了一个零值来为其预留空间：

1
    pushq $0

此时，rsp 现在包含一个指向新构建的 RawArguments 结构体的指针。目标是调用一个具有该原型的 C 函数：

1
    void MAInvocationForwardC(struct RawArguments *r);

指向该结构体的指针是其唯一的参数，因此该地址需要移动到 rdi 寄存器（register），也就是第一个参数被传递的位置：

1
    movq %rsp, %rdi

处理器处理程序随后需要查阅结构体以提取返回值寄存器。由于 rdi 寄存器不会在函数调用过程中被保存，且 rsp 在为调用对齐栈时可能被修改，因此处理程序会将地址复制到 r12 寄存器中，以便后续使用：

1
    movq %rdi, %r12

现在需要对齐栈并调用 Objective-C：

1
    andq $-0x10, %rsp
2
    callq _MAInvocationForwardC

Objective-C 代码现在将构造一个 MAInvocation 实例并调用对象的 forwardInvocation: 方法。

控制返回后，若有返回值，则可从该结构体中找到。为了使该值对调用者可见，需要将其从结构体中拷贝出来，并存入相应的寄存器：

1
    movq 72(%r12), %rax
2
    movq 80(%r12), %rdx

就这样！返回给调用者：

1
    leave
2
    ret

Objective-C 运行时（runtime）的转发处理器（forward handlers）是令人惊讶地可配置的。要将其设置为这段代码，只需在合适的地方调用以下内容即可：

1
    objc_setForwardHandler(MAInvocationForward, MAInvocationForwardStret);

运行时随后将对所有未实现的选择子使用这些转发处理器。

结论至此，汇编语言胶水代码和调用约定（calling conventions）的基础知识已讲解完毕。虽然仍有大量工作需要完成，但此处提供的两个胶水函数为 MAInvocation 中 Objective-C 部分的实现奠定了必要基础。MAInvocation 需要管理一个 struct RawArguments 结构体，并在该结构体的内容与 API 客户端提供的参数及请求的返回值之间进行转换。要发起方法调用，它需要正确配置该结构体，然后调用上述胶水代码。而要接收方法调用，它需要根据结构体内容构造一个新的 MAInvocation 实例。

所有这些内容将在下次揭晓。在此之前，欢迎向周五问答栏目提交您希望探讨的主题。下一篇文章或许已有安排，但我们始终期待您对未来的建议。

评论 RSS 订阅

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2013-03-08-lets-build-nsinvocation-part-i.html

It’s time for another trip into the nether regions of the soul. Reader Robby Walker suggested an article about NSInvocation, and I have obliged, implementing it from scratch for your amusement. Today I’ll start on a guided tour down the hall of horrors that is MAInvocation, my reimplementation of the NSInvocation API. It’s a big project, so today I’m going to focus on the basic principles and the assembly language glue code, with the rest of the implementation to follow.

CodeThe code for MAInvocation is available on GitHub here:

https://github.com/mikeash/MAInvocation

OverviewAn NSInvocation object represents a single method invocation. A method invocation has a target, a selector, a set of arguments, and a return value.

Just holding these values would be pretty boring. You can whip up a model class pretty easily to do that. Have a variable for the return value, an array for the arguments, and you’re done. (The target and selector are just the first and second arguments.) Where NSInvocation gets interesting is in its ability to actually capture and send the invocations that it represents.

An NSInvocation can be invoked on a particular object. This does the equivalent of code like [target message: argument], except that the target, the message, and the arguments are all determined entirely at runtime. The NSInvocation can be constructed in code using runtime introspection without knowing anything about the method ahead of time.

Furthermore, an NSInvocation can be constructed from an attempted message send. If you write [target message: argument], and target doesn’t actually implement message:, then it gets a forwardInvocation: call, which is given an NSInvocation * representing the invocation. It can then do whatever it wishes with that invocation, such as invoking it on another object, fiddling with the parameters, or setting an arbitrary return value which is passed back to the caller.

NSInvocation therefore has two complementary pieces of tricky business:

Code that’s able to take a set of arguments, use them to make a method call, and collect the return value.
Code that’s able to receive a method call, collect the arguments, then return an arbitrary return value to the caller.

Both pieces require extensive knowlede of the CPU architecture’s calling conventions encoded in the implementation, as well as assembly language glue code.

Calling ConventionsBecause so much architecture-specific code is needed, I decided to focus on a single architecture. x86-64 is the most convenient one to use for us Mac types. To further simplify things, I decided not to support floating-point arguments or return values, and also gave up on struct arguments, although I did implement support for struct return values. The following discussion ignores those parts that I didn’t implement.

In order to implement even this limited MAInvocation, it’s necessary to understand the relevant parts of the x86-64 function calling conventions, and in order to understand that, you must first understand at least a bit of the x86-64 architecture in general.

The x86-64 architecture is a 64-bit extension of Intel’s 32-bit x86 architecture introduced with the 386 CPU. That is in turn an extension of the Intel 8086’s 16-bit architecture which is in turn heavily based on the 8-bit architecture of the Intel 8080, generally considered to be the first microprocessor worth building a computer around. It could address a whopping 64kB of RAM, just enough to hold one medium-sized app icon these days.

There are sixteen general-purpose registers: rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, and r15. The first half are all inherited from Intel’s 32-bit architecture, while the second half are new additions for x86-64. Each register holds 64 bits.

Pointers and integers are treated identically when it comes to these calling conventions. Both are simply 64-bit quantities. Smaller integers are extended to 64 bits in size.

When calling a function, the first six parameters are passed by filling these registers in order: rdi, rsi, rdx, rcx, r8, and r9. Additional arguments, if any, are passed on the stack as 64-bit quantities, so subsequent parameters can be found in memory at rsp, rsp + 8, rsp + 16, etc.

If the function returns a value, that value is returned by storing it in rax. If the function returns two values, such as when returning a struct like NSRange that contains two values, rdx is used for the second one. If the function returns a larger struct, this is handled by having the caller allocate enough memory to hold it, and then a pointer to that memory is passed as an implicit first argument to the function in rdi, with all of the explicit parameters moved down by one.

Note that, for Objective-C methods, the first two parameters are self and _cmd, which are therefore passed in rdi and rsi (or, if the method returns a larger struct, in rsi and rdx). The explicit parameters, if any, come after those two.

As far as I know, there’s no particular fundamental reason for the number of registers used to pass parameters, or which ones are used. Calling conventions are a tradeoff between placing a burden on the caller, placing a burden on the callee, making parameter passing more efficient, and making surrounding code more efficient. These conventions presumably sit near some reasonable compromise between all of the competing desires.

In order to make a function call, MAInvocation needs to take the parameters to the function, place the first six in the appropriate registers, place any additional ones on the stack, then needs to actually jump to the function’s address. Upon return, it needs to record the values in the two return-value registers.

In order to receive a function call, MAInvocation needs to record the values of the six parameter-passing registers, as well as the location of the stack pointer, and use these to extract the argument values. Upon returning, it needs to place the desired return values into the two return-value registers. The logic of which values go into registers and the stack can be written in Objective-C, but the code that actually manipulates the registers and the stack needs to be written in assembly.

Data StructureIn order to cleanly communicate between the Objective-C and assembly code, I defined a struct that contains all of the relevant code. When making a call, MAInvocation will fill out the struct as appropriate, then invoke the assembly language glue code. When receiving a call, the assembly language glue code will construct the struct from the current state, then pass it over to the Objective-C code. Not all fields will be useful in both situations, but it’s easier to use the same struct for everything rather than try to specialize.

The first thing this struct contains is the address of the function to call:

1
    struct RawArguments
2
    {
3
        void *fptr;

Next, it stores the values of the six 64-bit parameter-passing registers:

1
        uint64_t rdi;
2
        uint64_t rsi;
3
        uint64_t rdx;
4
        uint64_t rcx;
5
        uint64_t r8;
6
        uint64_t r9;

It then stores the address of the arguments passed on the stack, as well as how many stack arguments there are:

1
        uint64_t stackArgsCount;
2
        uint64_t *stackArgs;

After that, it stores the two return-value registers:

1
        uint64_t rax_ret;
2
        uint64_t rdx_ret;

rdx already exists in the parameter-passing section, but it’s easier to make a separate entry for return values than to reuse that field.

Finally, it keeps a flag that records whether or not the call uses struct return conventions, i.e. whether the rdi is used to store a pointer to space allocated for the return value. In Objective-C runtime terminology, such calls are called stret, short for “struct return”:

1
        uint64_t isStretCall;
2
    };

“Struct return” is something of a misnomer, since small structs are returned in registers, but that’s how it is. When you see “struct return” or “stret”, think “sufficiently large struct return”.

Function Call GlueThe function call glue is a function with this C signature:

1
    void MAInvocationCall(struct RawArguments *);

It is implemented in assembly, but with the above prototype, the Objective-C code can call it as if it were a C function. It will pass a filled-out struct RawArguments and the assembly glue will make the call.

The assembly code first declares the symbol. It’s marked as global so it’s accessible from other parts of the program. The leading underscore is due to ancient history involving Fortran, and every C symbol implicitly gets one. A non-C symbol that expects to be accessible from C code needs to have it as well:

1
    .globl _MAInvocationCall
2
    _MAInvocationCall:

The first thing any well-behaved x86-64 function is save the old frame pointer (stored in rbp) and set up a new one by copying the stack pointer over:

1
    pushq %rbp
2
    movq %rsp, %rbp

I’ll use r12 through r15 in the following code. These registers are designated as callee-saved by the platform calling conventions, meaning that we’re not allowed to just obliterate their contents. Instead, we save their values onto the stack so they can be restored later:

1
    pushq %r12
2
    pushq %r13
3
    pushq %r14
4
    pushq %r15

The struct RawArguments * parameter is stored in rdi. It’s the first parameter to the function, and the calling conventions state that the first parameter is passed it rdi. We need to use rdi for the first parameter to the function being called, so we save the current value into r12. The various elements of the struct RawArguments parameter can be accessed by loading various offsets from r12:

1
    mov %rdi, %r12

Now it’s ready to start copying arguments where they need to go. Because this requires manipulating the stack pointer, it copies the stack pointer into r15 so it’s easy to restore later:

1
    mov %rsp, %r15

Stack arguments get copied first, for no particular reason. It does make the code to copy them slightly easier to write, as it can use the argument-passing registers as scratch space, since they don’t contain anything important. The first thing it does is load the number of stack arguments, which is located at offset 56 in the struct Rawarguments:

1
    movq 56(%r12), %r10

If you’re wondering where 56 comes from, each member in this struct is 8 bytes, and the number of stack arguments is the 8th element in the struct, meaning that it comes after space for 7 other elements. 7 * 8 = 56. All the offsets in this code are computed in the same way.

r10 now contains the number of stack arguments that need to be copied. Next, it computes the amount of stack space needed for these arguments. This is equal to the number of arguments multiplied by 8 (each argument is 64 bits, or 8 bytes). It does this by copying the number of arguments into r11, then shifting it left by three bits, which is equivalent to multiplying by 8:

1
    movq %r10, %r11
2
    shlq $3, %r11

Next, it loads the stack argument pointer from offset 64 in the struct RawArguments into r13:

1
    movq 64(%r12), %r13

Let’s take a moment to recap what the temporary registers contain at the moment:

r10: the number of stack arguments to copy.
r11: the number of bytes needed for stack arguments.
r13: the stack argument pointer.

We don’t get to give things convenient names in assembly, so it’s essential to keep careful track of what contains what at any given moment.

The next step is to move the stack pointer down to make room for the arguments, which is done by subtracting r11 from the stack pointer:

1
    subq %r11, %rsp

The stack is also required to be 16-byte aligned before making a function call, and this is done by just doing a logical AND with a value that has the bottom four bits cleared:

1
    andq $-0x10, %rsp

The stage is now set. At this point, we just execute a simple memory copy loop. The equivalent C code would be:

1
    for(int i = 0; i != r10; i++)
2
        rsp[i] = r13[i];

r14 will serve as the loop counter. The first step is to initialize it to zero:

1
    movq $0, %r14

The top of the loop needs a label so that later code can easily jump back to it:

1
    stackargs_loop:

Next comes the check for r14 != r10:

1
    cmpq %r14, %r10
2
    je done

The cmp instruction compares the two registers and sets the contents of the FLAGS register accordingly. The je instruction then jumps to the done label if the FLAGS register indicats that the two are equal. This two-stage construct is a bit odd, but it’s how x86-64 works.

If the two aren’t equal, the loop continues. The next step is to copy the current argument. This is done in two stages. First, the argument is copied from the memory pointed to by r13 into a temporary register, in this case rdi. Next, the argument is copied from rdi into the memory pointed to by rsp:

1
    movq 0(%r13, %r14, 8), %rdi
2
    movq %rdi, 0(%rsp, %r14, 8)

The parenthetical expressions are a little scary. x86-64 allows memory references with a bunch of different components, which makes it easier to do computed array dereferences like this. The general form of the expression looks like:

1
    offset(%r1, %r2, elementSize)

This refers to this address:

1
    r1 + r2 * elementSize + offset

This can be thought of as an array dereference. r1 is the array pointer, r2 is the index, elementSize is the size of each element in the array, and offset is just a final fixup to apply to the whole result. In short, 0(%r13, %r14, 8) is equivalent to ((uint64_t *)r13)[r14].

After that comes the i++, which has a simple assembly equivalent:

1
    inc %r14

Finally, a jump back to stackargs_loop completes the loop, with the done label following it so that execution resumes below once the loop exits:

1
    jmp stackargs_loop
2

3
    done:

The stack arguments are now ready to go. All that remains is to copy the register arguments into their actual registers. This is done by writing a sequence of move instructions:

1
    movq 8(%r12), %rdi
2
    movq 16(%r12), %rsi
3
    movq 24(%r12), %rdx
4
    movq 32(%r12), %rcx
5
    movq 40(%r12), %r8
6
    movq 48(%r12), %r9

With everything ready, it’s time to call the target function. The function pointer is conveniently located right at the location pointed to by r12, since it’s the first element in the struct RawArguments. This instruction makes the call:

1
    callq *(%r12)

Once the call returns, the return value (if any) is found in rax and rdx. The code immediately copies the contents of these registers into the struct RawArguments:

1
    movq %rax, 72(%r12)
2
    movq %rdx, 80(%r12)

It’s just about done. The only thing that needs to be done, aside from returning, is to restore the values stored in r12-r15 to whatever the caller had in them. First, the stack pointer needs to be restored to what it was after those registers were pushed onto the stack:

1
    mov %r15, %rsp

Then they’re popped off in the opposite order from which they were pushed:

1
    popq %r15
2
    popq %r14
3
    popq %r13
4
    popq %r12

Finally, control is returned to the caller, using a magic combination of instructions which readjust the stack and frame pointer before jumping to the caller’s address:

1
    leave
2
    ret

That takes care of the glue code for function calls. The Objective-C code can now fill out a struct RawArguments to suit the call being made, then call MAInvocationCall and pass the pointer to the struct to make the call.

Forwarding GlueCapturing a method invocation is called “forwarding” in Objective-C. The runtime has a special forwarding handler, which is called any time an implementation can’t be found for a particular selector. In fact, there are two different forwarding handlers: one for normal calls, and one for stret calls. The forwarding handler needs to know where to find the self and _cmd parameters, and the locations of those parameters change for a stret call, so a bit of specialization is required.

The strategy here is to have two entry points that call through to a common implementation after making a note of whether or not it’s a stret call. The common implementation then fills out a new struct RawArguments accordingly and calls into an Objective-C function. Once that function returns, it copies the return value back out into the return value registers, then returns.

The r10 register doesn’t contain anything in particular when a function is called, but neither is it required to save the value. This makes it a good spot to store the stret flag temporarily. The normal forwarding handler will set it to 0 before jumping to the common implementation, and the stret handler will set it to 1. Here’s the normal handler in its entirety:

1
    .globl _MAInvocationForward
2
    _MAInvocationForward:
3
    movq $0, %r10
4
    jmp _MAInvocationForwardCommon

The stret handler is nearly identical:

1
    .globl _MAInvocationForwardStret
2
    _MAInvocationForwardStret:
3
    movq $1, %r10
4
    jmp _MAInvocationForwardCommon

All the interesting stuff happens in the common handler:

1
    .globl _MAInvocationForwardCommon
2
    _MAInvocationForwardCommon:

The first thing it does is calculate the location of the stack arguments passed in to the function. The stack arguments start at rsp + 8 from the callee’s point of view. The call instruction issued by the caller pushes the return address onto the stack, which is why stack arguments start right at rsp from that side of things, but not here. r11 is another convenient register that neither contains anything useful nor needs to be saved, so the code computes the address in that register:

1
    movq %rsp, %r11
2
    addq $8, %r11

Then the function performs the standard prologue of setting up the frame pointer:

1
    pushq %rbp
2
    movq %rsp, %rbp

Now it’s finally time to construct the struct RawArguments. This is done by pushing values onto the stack. First, a quick recap of what the various register contain right now:

r10: the isStretCall flag.
r11: the pointer to the stack arguments.
rdi-r9: register arguments.

The handler uses the pushq instruction to construct the struct on the stack. Because it’s pushing onto the stack, it needs to push everything in reverse order. Because isStretCall is the last thing in the struct, it’s the first thing to be pushed:

1
    pushq %r10

The return value registers don’t need to contain anything in particular, so it makes space for them by pushing zero twice:

1
    pushq $0
2
    pushq $0

Next comes the stackArgs pointer, whose value is currently in r11:

1
    pushq %r11

After that comes the number of stack arguments. This is not currently known, so the handle just pushes a zero to make room for it. That field will be filled out by the Objective-C code:

1
    pushq $0

Next come the argument registers, which are pushed in reverse order:

1
    pushq %r9
2
    pushq %r8
3
    pushq %rcx
4
    pushq %rdx
5
    pushq %rsi
6
    pushq %rdi

The very first field of the struct is the function pointer. That’s not used here, so another zero is pushed to make room for it:

1
    pushq $0

At this point, rsp now contains a pointer to the newly-built struct RawArguments. The goal is to call a C function with this prototype:

1
    void MAInvocationForwardC(struct RawArguments *r);

The pointer to the struct is its only parameter, so that address needs to be moved to rdi, where the first parameter is passed:

1
    movq %rsp, %rdi

The handler needs to consult the struct afterwards to extract the return value registers. Since rdi isn’t saved across the function call, and rsp may be changed when aligning the stack for the call, the handler also copies the address into r12 so it can be used afterwards:

1
    movq %rdi, %r12

It’s now time to align the stack and call into Objective-C:

1
    andq $-0x10, %rsp
2
    callq _MAInvocationForwardC

The Objective-C code will now construct an MAInvocation instance and invoke the object’s forwardInvocation: method.

Once control returns, the return value, if any, is found in the struct. To make them visible to the caller, that value is copied out of the struct and into the appropriate registers:

1
    movq 72(%r12), %rax
2
    movq 80(%r12), %rdx

That’s it! Return to the caller:

1
    leave
2
    ret

The Objective-C runtime’s forward handlers are, amazingly, configurable. To set them to this code, all you have to do is call this somewhere convenient:

1
    objc_setForwardHandler(MAInvocationForward, MAInvocationForwardStret);

The runtime will then use these forward handlers for all unimplemented selectors.

ConclusionThat wraps up the assembly language glue code and the basic knowledge of calling conventions. Much work remains, but the two glue functions here provide the necessary foundation that the Objective-C parts of MAInvocation can be built on. MAInvocation needs to manage a struct RawArguments and translate between the contents of that struct and the arguments and return values provided and requested by the clients of the API. To make a method call, it needs to arrange the struct properly, then call into the above glue code. To receive a method call, it needs to construct a new MAInvocation from the struct contents.

All this shall be covered next time. Until then, please send in your ideas for topics to cover on Friday Q&A. The next article may be spoken for, but your suggestions for the future are always welcome.

Comments RSS feed for this page