受 Heartbleed 启发的偏执内存分配器

文章發布時間 2014年5月23日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2014-05-23: A Heartbleed-Inspired Paranoid Memory Allocator · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2014-05-23-a-heartbleed-inspired-paranoid-memory-allocator.html 发布：2014-05-23　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

Heartbleed 漏洞在几个月前引起了轩然大波，这确实是实至名归的。它可以被描述为一种 “内存泄漏”，但并非程序未能释放已分配内存的标准类型。相反，该漏洞几乎允许攻击者随意转储远程程序的内存内容，可能导致私钥、密码、源代码及其他本应保密的数据泄露。这让我开始思考如何保护敏感数据免受类似攻击。MAParanoidAllocator 便是这一思考的成果，本文将探讨其实现方式。

背景
Heartbleed 漏洞涉及 TLS 协议（用于安全 HTTP 连接及多种其他加密互联网协议）中新增的心跳消息。该消息是一个简单的 ping 机制，一方可以询问 “你是否仍在连接？”，另一方将回应 “是的”。与许多 ping 机制类似，请求方可包含一段负载，而对端会将其原样返回。该负载由一串字节构成，其前缀为一个 16 位长度字段，用于表明负载中包含的字节数。

Heartbleed 漏洞的根本原因在于对长度前缀的错误处理。请求方可以在长度字段中填入一个较大的值，但实际载荷却更短。对此的正确响应应该是将其视为致命错误并终止连接。

OpenSSL 未能进行校验，因此在这种情况下继续处理了响应。它将请求中的载荷复制到响应中，并使用长度字段来确定要复制的字节数。如果实际载荷小于长度字段声明的大小，最终它就会复制位于传入数据包之后的任意数据。

OpenSSL 在此任务及其他任务中使用了一个内部分配器，该分配器在分配或释放内存时不会清零内存。因此，被复制到响应中的额外数据最终会是该位置上一次分配时偶然存储在那里的任意数据。由于同一个分配器被用于私钥存储等场景，这些数据最终可能被发送回请求方。

为了尽可能多地获取数据，攻击者可以指定一个过长的长度字段而不附带任何有效载荷（payload）。存在漏洞的 TLS（传输层安全协议）实现就会从其内部存储中回复数千字节的数据。凭借运气，攻击者很有可能在那些转储数据中获得有价值的信息。如果没有获取到有趣的数据，攻击者只需再次尝试。这个过程很快速，并且可以无限重复。

偏执的内存分配器 Heartbleed（心脏滴血）漏洞的正确修复方案是在发送回复前，校验长度字段与所提供的有效载荷是否匹配。此外，还有各种方法可以尝试避免这一类安全漏洞。

一种方法是为开发安全关键软件制定更严格的流程。当初将心跳扩展功能添加到 TLS 中并没有特别充分的理由，或者将其在 OpenSSL 中实现也无必要，因此整个项目甚至在开始前就应该被叫停。（译注：此观点基于 Heartbleed 漏洞的历史背景，现代 TLS 实现及开发流程已有显著改进。）

另一种方法是使用静态分析（static analysis）工具，它们能够检测出对输入数据过度信任的情况。遗憾的是，Heartbleed 漏洞对大多数静态分析器而言过于隐蔽而难以检测；虽然该漏洞推动了该领域的大量研究工作，但很难判断这些工作有多少能帮助检测未来的类似漏洞，又有多少实际上仅仅专门针对 Heartbleed 漏洞的检测有所助益。

最后，避免此类漏洞的另一种方法是使用比 C 语言更安全的编程语言。C 语言可能极其缺乏容错性，其未定义行为（undefined behavior）的概念意味着错误很容易演变成安全漏洞。使用更安全语言的建议已提出多年，但尽管存在这些问题，安全关键型代码却仍然持续用 C 语言编写。

有趣的是，我并不确定更好的编程语言是否能阻止这个特定的漏洞。即使在提供更强保障的语言中，回收缓冲区也是相对自然的做法，而这也可能导致回收缓冲区的内容本身。如果缓冲区没有按照每个传入数据包的实际长度进行截断，边界检查（bounds checking）将无法捕捉到过大的拷贝操作。话虽如此，更好的语言将能杜绝整类安全漏洞，因此现在开始认真考虑 C 语言的替代方案或许是明智之举。

理论上，以上所述应该足够了。但在实践中，采用分层安全方法是个好主意 —— 不仅要努力避免漏洞，还要在漏洞发生时努力减轻其影响。例如，OS X 和大多数其他操作系统默认会将内存标记为非可执行（non-executable）。这意味着，即使攻击者能够将机器代码写入你的进程内存并跳转到该代码，除非攻击者能首先将该代码标记为可执行，否则其控制权的尝试将会失败。

有鉴于此，我思考了在为敏感数据（如私钥）分配内存时可以采取哪些措施，以帮助其在漏洞被利用后仍能保持安全。我提出了以下特性：

内存应在分配时清零，释放时也应再次清零。
应在内存分配的前后设置不可读写的保护页（guard page），这样任何相邻分配的溢出都会导致崩溃。
用于存储的内存应保持最低权限。默认情况下应不可读或不可写。当有读取请求时，应将其更改为只读；当有写入请求时，应将其更改为可读写。
API 的设计应使得难以长时间保持读写权限启用，尤其是意外地永久启用它们。

代码
代码一如既往地在 GitHub 上提供：

https://github.com/mikeash/MAParanoidAllocator

API
这是该类的公共 API：

1
    @interface MAParanoidAllocator : NSObject
2

3
    - (id)init;
4
    - (id)initWithSize: (size_t)size;
5

6
    - (size_t)size;
7
    - (void)setSize: (size_t)size;
8

9
    - (void)read: (void (^)(const void *ptr))block;
10
    - (void)write: (void (^)(void *ptr))block;
11

12
    @end

大部分内容一目了然，但 read: 和 write: 方法需要一些解释。

从概念上讲，这个类类似于 NSMutableData。它是一个对任意字节块进行对象包装（object wrapper）的类。NSMutableData 的 API 提供了用于读取的 bytes 方法，以及用于读写的 mutableBytes 方法。然而，这些方法使得数据对象无法得知调用者何时完成读取或写入。虽然可以添加一个在操作结束时被调用的方法来显式地表示操作已完成，这样调用代码看起来会像这样：

1
    const void *ptr = [dataObject bytes];
2
    // ...use ptr here...
3
    [dataObject recycleBytes: ptr];

然而，人们很容易忘记调用 recycleBytes:，从而使内存永远保持可读状态。

read: 和 write: 方法各自接收一个 block（代码块）作为参数。该 block 会被同步调用，并传入一个指向对象所持有内存的指针。此指针仅在 block 内部有效，并且在 block 返回后，权限会自动重置，使内存变为不可读也不可写的状态。

实现策略 像 malloc 和 free 这样的常规内存分配 API 无法满足此类的需求。

取而代之的是，它将使用 mmap 来分配内存。这允许它使用 mprotect 来更改内存的权限。mmap 和 mprotect 都以页面粒度（page granularity）工作，这意味着该类必须以 4KB 的块为单位分配内存。分配内存时，它会将请求的大小向上舍入到最近的 4KB 的倍数。它还会将大小额外增加两个 4KB 的页面，分别用作前后保护页（guard page）。这些保护页将被永久标记为不可读和不可写。中间分配的内存通常也标记为不可读和不可写，但在必要时会使用 mprotect 临时更改其权限。（译注：现代系统中页面大小可能已为 16KB 等）

要调整已有分配的大小，方法并无特殊之处：分配新内存、复制内容、释放旧内存。由于内存是通过 mmap 分配的，因此释放时应使用 munmap。出于安全考虑，代码会在将分配的内存返还给操作系统之前将其清零。

实例变量（Instance Variables）

该类需要三个实例变量（instance variables）：当前分配大小（current allocation size）、指向已分配内存的指针（pointer to the allocated memory）以及系统的页大小（page size）。页大小本可作为全局变量，但将其作为实例变量存储会稍显便利：

1
    @implementation MAParanoidAllocator {
2
        size_t _size;
3
        char *_memory;
4
        size_t _pageSize;
5
    }

错误检查

这段代码并未刻意追求完善的错误报告或恢复机制。这些调用通常不应该失败，若真发生了，意味着情况已严重到不可收拾。一旦失败，代码仅会记录错误日志然后调用 abort()。错误检查在所有代码中都至关重要，但对于安全关键代码（security-critical code）尤为重要，因为未检查的错误极易转化为可利用的漏洞。（例如参见《可利用的用户态 NULL 指针解引用》一文。）

我编写了一个简单的错误检查宏，它本质上就是一个定制化的断言（assert）：

1
    #define CHECK(condition) do { \
2
            if(!(condition)) { \
3
                NSLog(@"%s: %s (%d)", #condition, strerror(errno), errno); \
4
                abort(); \
5
            } \
6
        } while(0)

除了记录失败条件并调用 abort () 之外，它还打印出 errno 的值以帮助指示问题所在。初始化和释放 init 方法调用 super，然后设置页面大小变量：

1
    - (id)init {
2
        if((self = [super init])) {
3
            CHECK((_pageSize = sysconf(_SC_PAGESIZE)) > 0);
4
        }
5
        return self;
6
    }

其他变量保持为零，表示一个新初始化的实例大小为零。该类将被构建为：大小为零意味着不分配内存，也无需释放任何资源。这意味着首次分配只需调用 setSize: 并传入所需大小即可完成，而在 dealloc 中进行清理时只需将大小重置为零。

因此，initWithSize: 方法只需先调用 init，再调用 setSize:。

1
    - (id)initWithSize: (size_t)size {
2
        self = [self init];
3
        [self setSize: size];
4
        return self;
5
    }

dealloc 方法再次只是调用了 setSize:。

1
    - (void)dealloc {
2
        [self setSize: 0];
3
    }

页面大小舍入
该 API 承诺字节粒度，但其底层的所有调用都必须以整个页面为单位进行操作。这本身不是问题，但确实需要将请求的大小向上舍入到页大小（page size）的最接近倍数。这个简单方法就负责处理这个任务：

1
    - (size_t)roundToPageSize: (size_t)size {
2
        size_t pageCount = (size + _pageSize - 1) / _pageSize;
3
        return pageCount * _pageSize;
4
    }

修改内存权限代码中多处需要对分配的整个内存块调用 mprotect。这需要将分配大小向上舍入到最近的页面大小（page size），调用 mprotect，并检查错误。该过程封装在一个辅助方法中：

1
    - (void)mprotect: (int)prot {
2
        size_t size = [self roundToPageSize: _size];
3
        if(size > 0) {
4
            CHECK(mprotect(_memory, size, prot) == 0);
5
        }
6
    }

设置尺寸
这个类的大部分复杂性都集中在 setSize: 方法中。内存的分配、释放以及从旧分配区复制到新分配区的操作都在这里完成。

它首先将新旧尺寸都四舍五入到页面大小（page size）的整数倍：

1
    - (void)setSize: (size_t)newSize {
2
        size_t beforeSize = [self roundToPageSize: _size];
3
        size_t afterSize = [self roundToPageSize: newSize];

这些值经常被使用，所以最容易的方式是一开始就计算它们一次。接下来，检查它们是否真的不同。如果四舍五入后的大小相等，则不需要重新分配内存：

1
        if(beforeSize != afterSize) {

如果它们不同，那么下一个任务是分配一个新的内存块，大小为新的尺寸。由于大小为零的情况是通过完全不分配内存来处理的，所以只有在新的大小不为零时才执行此操作：

1
            char *afterPointer = NULL;
2
            if(afterSize > 0) {

需分配的总内存大小为新大小加上两个额外的保护页：

1
                size_t guardPagesSize = _pageSize * 2;
2
                size_t toAllocate = afterSize + guardPagesSize;

然后就可以调用 mmap 了。它请求一个匿名的、私有的映射（anonymous, private mapping）（换句话说，它并非试图对文件进行内存映射），并设置了读写权限。由于需要将现有数据复制到新分配的内存中，因此将新内存设为不可读和不可写的操作要稍后进行。

1
                char *allocatedPointer;
2
                CHECK((allocatedPointer = mmap(NULL, toAllocate, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, 0, 0)) != MAP_FAILED);

mmap 返回的指针指向的是首防护页（guard page）。实际用于存储数据的内存指针比这个位置再往后一个页面：

1
                afterPointer = allocatedPointer + _pageSize;

从 mmap 返回的内存已被操作系统清零，因此在代码中无需显式清除。

随着新分配的进行，前后页面被设置为不可读取和不可写入，从而充当守卫页（guard pages）：

1
                CHECK(mprotect(allocatedPointer, _pageSize, PROT_NONE) == 0);
2
                CHECK(mprotect(afterPointer + afterSize, _pageSize, PROT_NONE) == 0);
3
            }

如果存在一个现有分配和一个新分配，那么数据就需要从现有分配复制到新分配。新分配当前是可写的，但现有分配则完全无法访问。因此，复制操作是在调用 read: 方法的过程中完成的：

1
            if(beforeSize > 0 && afterSize > 0) {
2
                [self read: ^(const void *ptr) {
3
                    memcpy(afterPointer, ptr, MIN(beforeSize, afterSize));
4
                }];
5
            }

此时，新内存已分配完毕（如有必要），现有数据（如有）也已被复制进去。下一步是释放旧内存（如果有的话）：

1
            if(beforeSize > 0) {

在将其返回给操作系统之前，将其清零：

1
                [self write: ^(void *ptr) {
2
                    memset(ptr, 0, beforeSize);
3
                }];

这或许并非必要。munmap 的文档并未保证内存会被清零，但这样做应该是安全的，因为收回内存的唯一方式是通过调用 mmap，而 mmap 在将内存提供给调用者前会将其全部清零。然而，这个类的名称中包含 “paranoid”（偏执狂），而那种间接保证对于我的安心程度来说还是有点不够。

（补充说明：这类代码中常见一个问题是 memset 调用会被编译器优化掉。编译器足够智能，能判断出对随后立即通过 free() 释放的缓冲区进行写入是无意义的，从而删除该写入操作。尽管这符合语言标准，但对于注重安全、偏执型的代码来说并非理想行为。为解决此问题，C11 标准引入了 memset_s 函数。它执行与 memset 相同的操作，但保证不会被优化掉。该函数自 Mac OS X 10.9 和 iOS 7 起在 Apple 平台上可用。幸运的是，此处无需使用 memset_s，因为通过 block 间接调用 memset，且内存最终是通过 munmap（而非 free）释放的，这意味着它无法被优化掉。在此使用普通 memset 可以使代码兼容早期的操作系统版本。）

现在内存已被清零，需要计算原始分配的总大小，以及分配区域起始处的指针 —— 这需要考虑前端和尾端的防护页（guard pages）：

1
                size_t guardPagesSize = _pageSize * 2;
2
                size_t toDeallocate = beforeSize + guardPagesSize;
3
                char *pointerWithGuards = _memory - _pageSize;

然后调用 munmap 释放内存。

1
                CHECK(munmap(pointerWithGuards, toDeallocate) == 0);
2
            }

在新内存已分配、旧内存已释放之后，接下来需要更新实例变量。

1
            _memory = afterPointer;
2
            _size = newSize;

最后，可以将新分配的内存设置为不可读和不可写：

1
            [self mprotect: PROT_NONE];

在两个分配（allocations）是相同（四舍五入）大小的情况下，不需要做太多事情。除了更新 _size 实例变量（instance variable）之外，当大小缩减时，它还会将超出新大小末尾的任何额外内存（memory）清零。这确保了在调用者（caller）期望数据消失后，残留的潜在敏感数据（sensitive data）不会留存：

1
        } else {
2
            if (newSize < _size) {
3
                [self write:^(void *ptr) {
4
                    memset((char *)ptr + newSize, 0, _size - newSize);
5
                }];
6
            }
7
            _size = newSize;
8
        }
9
    }

相比 setSize: 方法，size 方法（getter）的实现则显得不太复杂：

1
    - (size_t)size {
2
        return _size;
3
    }

读取与写入

让我们看看 read: 和 write: 的实现。两个方法遵循相同的基本模式：先通过 mprotect 设置分配的内存为具有某些权限，调用一个代码块（block），然后再次通过 mprotect 使分配的内存不可访问。这个模式可以封装在一个通用方法中：

1
    - (void)withProtections: (int)prot call: (void (^)(void))block {
2
        [self mprotect: prot];
3
        block();
4
        [self mprotect: PROT_NONE];
5
    }

有了它之后，read: 方法只需使用 PROT_READ 来调用 withProtections:call:：

1
    - (void)read: (void (^)(const void *))block {
2
        [self withProtections: PROT_READ call: ^{
3
            block(_memory);
4
        }];
5
    }

write: 方法与此几乎完全相同，只是权限标志换成了 PROT_READ | PROT_WRITE，旨在用于写入操作的内存映射区域。

1
    - (void)write: (void (^)(void *))block {
2
        [self withProtections: PROT_READ | PROT_WRITE call: ^{
3
            block(_memory);
4
        }];
5
    }

测试我希望能全面测试这段代码的所有特性。其中许多特性涉及确保当代码尝试访问所提供 API 之外的内容时会崩溃。我最初的方法是编写会导致崩溃的代码，然后使用类似 PLCrashReporter 的工具捕获崩溃并恢复执行。不幸的是，这种做法与调试器（debugger）的配合并不好，因为 lldb 坚决要求在程序崩溃时停止执行，即使该崩溃本应被捕获。由于调试测试用例非常有用，我不想妥协采用这种方法。

在经历了设置自定义 mach 异常处理程序（mach exception handler）的诸多痛苦后，我意识到可以使用诸如 mach_vm_read 和 mach_vm_write 之类的 mach 调用来执行非法的内存读写操作而不会导致崩溃。这些调用允许读写内存，但当给定的地址不可访问时，它们会返回错误，而不会引发信号（signal）。这大大简化了测试代码。我不会在此详述细节，但如果你感兴趣，可以在 GitHub 上阅读测试代码。

结论这段代码防御的是一种本不应发生的场景，而且到了那种地步，你实际上已经输了。让攻击者触发此处所用各种防护机制的 bug 本就不该出现。然而，由于 bug 在所难免，采用分层的安全防护方法有助于减轻其影响。我不确定在这种情况下它是否真正有用，但这不失为一个有趣的实践练习，而且在不明确使用敏感数据时使其变得不可读，这种做法也并非不合理。实现这一目标所使用的技术都是相当直接的 POSIX 调用（POSIX calls，一种操作系统接口标准），尽管这些调用在常规代码中并不常见。

今天就到这里。下次再见，继续探索更多惊心动魄的冒险。Friday Q & A 栏目由读者的想法驱动，因此和往常一样，如果您有任何希望在此探讨的主题想法，请随时发送过来！

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2014-05-23-a-heartbleed-inspired-paranoid-memory-allocator.html

The Heartbleed vulnerability made a big splash a couple of months ago, and rightly so. It could be described as a “memory leak”, but it’s not the standard kind where a program fails to free allocated memory. Instead, it allowed an attacker to dump memory contents from a remote program nearly at will, potentially leaking private keys, passwords, source code, and other data intended to stay secret. This got me thinking about ways to protect sensitive data against similar attacks. The result is MAParanoidAllocator, and in this article I’ll discuss the implementation.

BackgroundThe Heartbleed bug involves a heartbeat message added to the TLS protocol used for secure HTTP connections and many other encrypted internet protocols. The message is a simple ping, where one side can ask, “Are you still there?” and the other side will respond with, “Yes.” As with many pings, the requester can include a payload which the other side will repeat back to it. The payload consists of a string of bytes prefixed with a 16-bit length to indicate how many bytes are in the payload.

The Heartbleed bug is fundamentally due to mishandling the length prefix. It’s possible for the requester to send a request with a large value in the length field but with a shorter payload. The proper response to this would be to treat it as a fatal error and terminate the connection.

OpenSSL failed to check, and so proceeded with the response in this case. It copied the payload from the request to the response, using the length field to determine how many bytes to copy. If the actual payload is shorter than what the length field says, it ended up copying whatever data was sitting beyond the incoming packet data.

OpenSSL uses an internal allocator for this and other tasks, and that allocator doesn’t zero memory when it’s allocated or freed. Thus, the extra data that’s copied into the response ends up being whatever arbitrary data happened to be stored there from the last allocation at that spot. Since that same allocator is used for things like private key storage, that data can end up being sent back to the requester.

To scoop up as much data as possible, the attacker can specify the a long length field with no payload. A vulnerable TLS implementation will then reply with kilobytes of data from its internal memory. Depending on luck, the attacker stands a good chance of getting something interesting in that dump. If there’s no interesting data, the attacker can just try again. The process is quick and can be repeated without limit.

A Paranoid AllocatorThe proper fix for Heartbleed is to validate the length against the provided payload before sending a reply. There are various ways to try to avoid this whole class of security vulnerabilities.

One way is to have better processes for developing security-critical software. There was no particularly good reason to add the heartbeat extension to TLS in the first place, or to implement it in OpenSSL, so the whole thing should have been stopped before it even got started.

Another way is to use static analysis tools that are able to detect when too much trust is placed in input data. Unfortunately, the Heartbleed bug was too subtle for most static analyzers to detect, and while it’s prompted a lot of work in that area, it’s hard to know how much of that work will help to detect similar bugs in the future, and how much of it really just helps them detect Heartbleed specifically.

Finally, another way to avoid these vulnerabilities is to program in a safer language than C. C can be enormously unforgiving, and its concept of undefined behavior means that mistakes can easily turn into security vulnerabilities. The use of a safer language has been suggested for ages, but security-critical code keeps being written in C despite the problems.

Interestingly, I’m not too sure if a better language would have prevented this particular bug. It would be relatively natural to recycle buffers even in a language with more guarantees, and that could end up recycling the contents as well. If the buffer isn’t truncated to the actual length of each incoming packet, bounds checking won’t catch the excessively large copy. That said, better languages will shut out whole classes of security vulnerabilities and it would probably be wise to start seriously looking at alternatives to C.

In theory, the above should be enough. In practice, it’s a good idea to take a layered approach to security and not only try to avoid vulnerabilities, but also try to mitigate them if and when they do occur. For example, OS X and most other operating systems will mark memory as being non-executable by default. This means that even if an attacker is able to write out machine code into your process’s memory and jump to it, the attempt to take control will fail unless the attacker can first arrange for that code to be marked as executable.

Given that, I thought about what measures could be taken when allocating memory for sensitive data, such as private keys, that could help keep it safe even after a vulnerability is exploited. I came up with these features:

Memory should be zeroed when allocated, and again when freed.
Guard pages with no read or write permission should be placed before and after the memory allocation so that any overflow from an adjacent allocation turns into a crash.
The memory used for storage should be kept with minimal permissions. It should be unreadable or unwriteable by default. When a read is requested, it should be changed to read-only, and when a write is requested, it should be changed to read-write.
The API should be designed such that it’s difficult to leave read or write permissions enabled for longer than necessary, and especially to accidentally leave them enabled permanently.

CodeThe code is available on GitHub as usual:

https://github.com/mikeash/MAParanoidAllocator

APIThis is the public-facing API for the class:

1
    @interface MAParanoidAllocator : NSObject
2

3
    - (id)init;
4
    - (id)initWithSize: (size_t)size;
5

6
    - (size_t)size;
7
    - (void)setSize: (size_t)size;
8

9
    - (void)read: (void (^)(const void *ptr))block;
10
    - (void)write: (void (^)(void *ptr))block;
11

12
    @end

Most of it is obvious, but read: and write: need some explaining.

Conceptually, this class is similar to NSMutableData. It’s an object wrapper around an arbitrary chunk of bytes. The NSMutableData API provides the bytes method for reading, and the mutableBytes method for reading and writing. However, these methods make it impossible for the data object to know when the caller is done reading or writing. It would be possible to add a method that’s called at the end to explicitly signal that the operation is done, so that calling code would look like:

1
    const void *ptr = [dataObject bytes];
2
    // ...use ptr here...
3
    [dataObject recycleBytes: ptr];

However, it would be too easy to forget the recycleBytes: call and thus leave the memory readable forever.

The read: and write: methods each take a block as a parameter. The block is called synchronously and is passed in a pointer to the memory held by the object. This pointer is only valid inside the block, and the permissions are automatically reset to make the memory unreadable and unwriteable as soon as the block returns.

Implementation StrategyThe usual memory allocation APIs like malloc and free won’t suffice for this class.

Instead, it will allocate memory using mmap. This allows it to use mprotect to change permissions on the memory. Both mmap and mprotect work with page granularity, meaning that the class has to allocate memory in 4kB chunks. When allocating memory, it will round the requested size up to the nearest multiple of 4kB. It will add two more 4kB pages to the size, one for a guard page before and after. The guard pages will be permantly marked as unreadable and unwriteable. The allocated memory in the middle will normally be marked as unreadable and unwriteable, but the permissions will be temporarily changed using mprotect when necessary.

To resize an existing allocation, the approach is nothing special: allocate new memory, copy the contents across, deallocate the old memory. Since the memory is allocated with mmap, it’s deallocated with munmap, and for the sake of paranoia, the code will zero out the allocated memory before returning it to the operating system.

Instance VariablesThe class needs three instance variables: the current allocation size, a pointer to the allocated memory, and the system’s page size. The page size could be a global variable, but it’s mildly more convenient to keep it as an instance variable:

1
    @implementation MAParanoidAllocator {
2
        size_t _size;
3
        char *_memory;
4
        size_t _pageSize;
5
    }

Error CheckingThis code doesn’t try too hard to report or recover from errors. The calls generally shouldn’t fail, and if they do, something has gone terribly wrong. If they fail, the code just logs the error and then calls abort(). Error checking is essential in all code, but it’s especially true for security-critical code, as unchecked errors can easily turn into exploitable vulnerabilities. (See for example Exploitable Userland NULL Pointer Dereference.)

I wrote a simple error checking macro that’s little more than a custom assert:

1
    #define CHECK(condition) do { \
2
            if(!(condition)) { \
3
                NSLog(@"%s: %s (%d)", #condition, strerror(errno), errno); \
4
                abort(); \
5
            } \
6
        } while(0)

In addition to logging the failed condition and calling abort(), it also prints out the value of errno to help indicate what went wrong.

Initialization and DeallocationThe init method calls super, then sets the page size variable:

1
    - (id)init {
2
        if((self = [super init])) {
3
            CHECK((_pageSize = sysconf(_SC_PAGESIZE)) > 0);
4
        }
5
        return self;
6
    }

The other variables are left as zero, indicating that a freshly initialized instance has a size of zero. The class will be built so that a size of zero means no memory is allocated and no resources need to be freed. This means that making the first allocation can be done by just calling setSize: with the desired size, and that cleaning up in dealloc can be done by setting the size back to zero.

The initWithSize: method therefore just calls init, then setSize:

1
    - (id)initWithSize: (size_t)size {
2
        self = [self init];
3
        [self setSize: size];
4
        return self;
5
    }

The dealloc method again just calls setSize:

1
    - (void)dealloc {
2
        [self setSize: 0];
3
    }

Page Size RoundingThe API promises byte granularity, but all of the calls behind the scenes have to work with entire pages. This isn’t a problem, but it does require rounding the requested sizes up to the nearest multiple of the page size. This simple method takes care of that:

1
    - (size_t)roundToPageSize: (size_t)size {
2
        size_t pageCount = (size + _pageSize - 1) / _pageSize;
3
        return pageCount * _pageSize;
4
    }

Changing Memory PermissionsSeveral places in the code need to call mprotect on the entire chunk of allocated memory. This entails rounding the allocation size up to the nearest page size, calling mprotect, and checking for errors. That process is wrapped in a small helper method:

1
    - (void)mprotect: (int)prot {
2
        size_t size = [self roundToPageSize: _size];
3
        if(size > 0) {
4
            CHECK(mprotect(_memory, size, prot) == 0);
5
        }
6
    }

Setting the SizeMost of the complexity of this class is in setSize:. That’s where memory is allocated, deallocated, and copied from an old allocation to a new one.

The first thing it does is round both new and old sizes to multiples of the page size:

1
    - (void)setSize: (size_t)newSize {
2
        size_t beforeSize = [self roundToPageSize: _size];
3
        size_t afterSize = [self roundToPageSize: newSize];

These values get used a lot, so it’s easiest to compute them once up front. Next, check if they’re actually different. If the rounded sizes are equal, no memory needs to be reallocated:

1
        if(beforeSize != afterSize) {

If they are different, then the next task is to allocate a new chunk of memory with the new size. Since size zero is handled by having no memory allocated at all, this is only done if the new size is not zero:

1
            char *afterPointer = NULL;
2
            if(afterSize > 0) {

The total amount of memory to allocate is the new size plus two additional guard pages:

1
                size_t guardPagesSize = _pageSize * 2;
2
                size_t toAllocate = afterSize + guardPagesSize;

Then it’s time to call mmap. It requests an anonymous, private mapping (in other words, it’s not trying to memory map a file or anything like that) with read and write permissions. It needs to copy the existing data into the new allocation, so making the new memory unreadable and unwriteable comes later.

1
                char *allocatedPointer;
2
                CHECK((allocatedPointer = mmap(NULL, toAllocate, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, 0, 0)) != MAP_FAILED);

The pointer returned by mmap points to the leading guard page. The pointer to the memory that’s actually used to store data is one page beyond that:

1
                afterPointer = allocatedPointer + _pageSize;

The memory returned from mmap is already zeroed by the operating system, so it’s not necessary to explicitly clear it in code.

With the new allocation in place, the leading and trailing pages are set to be unreadable and unwriteable so that they act as guard pages:

1
                CHECK(mprotect(allocatedPointer, _pageSize, PROT_NONE) == 0);
2
                CHECK(mprotect(afterPointer + afterSize, _pageSize, PROT_NONE) == 0);
3
            }

If there’s an existing allocation and a new allocation, then the data needs to be copied across. The new allocation is currently writeable, but the existing allocation is completely inaccessible. Thus, the copy is done inside of a call to the read: method:

1
            if(beforeSize > 0 && afterSize > 0) {
2
                [self read: ^(const void *ptr) {
3
                    memcpy(afterPointer, ptr, MIN(beforeSize, afterSize));
4
                }];
5
            }

At this point, the new memory is allocated (if necessary) and the existing data (if any) has been copied into it. The next step is to deallocate the old memory, if there is any:

1
            if(beforeSize > 0) {

Before returning it to the operating system, zero it out:

1
                [self write: ^(void *ptr) {
2
                    memset(ptr, 0, beforeSize);
3
                }];

This is probably unnecessary. The munmap documentation doesn’t guarantee that memory is zeroed out, but it should be safe since the only way to get the memory back again is with a call to mmap, and mmap zeroes all memory before providing it to the caller. However, this is a class with “paranoid” in its name, and that guarantee is a little too indirect for my comfort.

(Aside: there’s a common problem in code like this where the memset call gets optimized away by the compiler. Compilers are intelligent enough to know that writing to a buffer that is then immediately passed to free() is pointless, and the write can be eliminated. While this is correct according to the language standard, it’s undesirable behavior for paranoid, security-conscious code like this. To work around that problem, the memset_s function was introduced in the C11 standard. It does the same thing as memset, but is guaranteed not to be optimized away. It’s available on Apple platforms starting with Mac OS X 10.9 and iOS 7. Fortunately, memset_s is unnecessary here, as the indirection of calling memset in a block, and the fact that the memory is freed using munmap rather than free, mean that it can’t be optimized away. Using plain memset here allows the code to be compatible with earlier OS releases.)

Now that the memory is zeroed, it has to calculate the total size of the original allocation, and the pointer to the beginning of the allocation, taking into account the leading and trailing guard pages:

1
                size_t guardPagesSize = _pageSize * 2;
2
                size_t toDeallocate = beforeSize + guardPagesSize;
3
                char *pointerWithGuards = _memory - _pageSize;

Then a call to munmap deallocates the memory.

1
                CHECK(munmap(pointerWithGuards, toDeallocate) == 0);
2
            }

With the new memory allocated and the old memory deallocated, it’s time to update the instance variables:

1
            _memory = afterPointer;
2
            _size = newSize;

Finally, the newly allocated memory can be made unreadable and unwriteable.:

1
            [self mprotect: PROT_NONE];

In the case where the two allocations are the same (rounded) size, not much needs to be done. In addition to updating the _size instance variable, it also zeroes out any extra memory beyond the end of the new size in the case where the size shrinks. This ensures that leftover, potentially sensitive data doesn’t remain after the caller expected it to be gone:

1
        } else {
2
            if (newSize < _size) {
3
                [self write:^(void *ptr) {
4
                    memset((char *)ptr + newSize, 0, _size - newSize);
5
                }];
6
            }
7
            _size = newSize;
8
        }
9
    }

Size GetterCompared to setSize:, the implementation of the size method is somewhat less exciting:

1
    - (size_t)size {
2
        return _size;
3
    }

Reading and WritingLet’s look at the implementation of read: and write:. Both methods follow the same basic pattern: mprotect the allocation to have some permissions, call a block, then mprotect the allocation again to make it inaccessible. This pattern can be wrapped up in a common method:

1
    - (void)withProtections: (int)prot call: (void (^)(void))block {
2
        [self mprotect: prot];
3
        block();
4
        [self mprotect: PROT_NONE];
5
    }

With that available, the read: method just calls withProtections:call: with PROT_READ:

1
    - (void)read: (void (^)(const void *))block {
2
        [self withProtections: PROT_READ call: ^{
3
            block(_memory);
4
        }];
5
    }

The write: method is nearly identical, just with PROT_READ | PROT_WRITE:

1
    - (void)write: (void (^)(void *))block {
2
        [self withProtections: PROT_READ | PROT_WRITE call: ^{
3
            block(_memory);
4
        }];
5
    }

TestsI wanted to be sure to test all of the features of this code. A lot of those features involve ensuring that code crashes when it tries to access stuff outside of the provided API. My first approach was to write code that would crash, then use something like PLCrashReporter to catch the crash and resume execution. Unfortunately, this doesn’t play well with the debugger, as lldb strongly insists on stopping execution when the program crashes even if the crash was going to be caught. Since debugging tests is really useful, I didn’t want to settle for that approach.

After a lot of pain and suffering trying to get a custom mach exception handler set up, I realized that I could use mach calls like mach_vm_read and mach_vm_write to perform illegal memory reads and writes without crashing. These calls allow reading and writing memory, but when the given address is inaccessible, they return an error instead of raising a signal. This simplified the test code a lot. I won’t get into details here, but if you’re interested, you can read through the test code on GitHub.

ConclusionThis code defends against a scenario that should never happen, and where you’ve effectively already lost the game. Bugs which allow an attacker to bump into the various protections used here shouldn’t happen in the first place. However, since bugs are inevitable, a layered approach to security can help mitigate their effects. I’m not sure if it’s truly useful in this case, but it’s an interesting exercise to work through, and it doesn’t seem unreasonable to make sensitive data unreadable when it’s not explicitly being used. The techniques used to do that are all fairly straightforward POSIX calls, albeit ones that don’t see a lot of use in normal code.

That’s it for today. Come back next time for more frightening adventures. Friday Q&A is driven by reader ideas, so as always, if you have an idea for a topic that you’d like to see covered here, please send it in!