译文 · 原文: Friday Q&A 2014-11-07: Let's Build NSZombie · 作者 Mike Ash
原文:https://www.mikeash.com/pyblog/friday-qa-2014-11-07-lets-build-nszombie.html 发布:2014-11-07 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样
僵尸对象 僵尸对象是调试内存管理问题的重要工具。我之前讨论过僵尸对象的实现,今天我将更进一步,从头开始构建它们。这个话题由 Шпирко Алексей 提出。
僵尸对象用于检测内存管理错误。具体来说,它们检测这样一种场景:一个 Objective-C 对象已被释放,随后又通过指向该对象原先所在位置的指针发送了消息。这是通用的” 释放后使用” 错误的一种特例。
在正常运行情况下,这会导致消息被发送到可能已被覆盖或已归还给内核的内存地址。如果内存已被归还给内核,这将导致崩溃;如果内存已被覆盖,也可能导致崩溃。在内存被一个新的 Objective-C 对象覆盖的情况下,消息会被发送到那个很可能与原对象完全无关的新对象,这可能会因无法识别的选择子而抛出异常,甚至如果该消息恰好是这个新对象能够响应的,还可能引发诡异的错误行为。
还有一种可能是,内存尚未被触及,仍保留着原对象处于释放后状态的内容。这会导致其他有趣且离奇的错误。例如,如果对象包含一个 UNIX 文件描述符(file descriptor),它可能对同一个文件描述符调用两次 close,最终可能关闭了程序其他部分所拥有的文件描述符,从而在远离缺陷源头的地方引发故障。
ARC(自动引用计数)已大幅降低了这类错误的出现频率,但并未完全杜绝。由于多线程问题、与非 ARC 代码的交互、方法声明不匹配,或滥用类型系统导致 ARC 存储修饰符被剥离或更改,这些问题仍可能发生。
僵尸对象在对象销毁(deallocate)时被启用。与普通对象销毁时最后释放底层内存不同,僵尸对象会将对象转换为一个新的僵尸类(zombie class),这个类会拦截所有发往该对象的消息。任何发送到僵尸对象的消息都会触发一条诊断错误信息,而非正常情况下出现的怪异行为。另有一种模式会在重写类之后仍然释放内存,但这通常用处不大,因为内存往往很快会被重新使用,因此我在此处将忽略这个选项。
要实现我们自己的僵尸对象机制,我们需要在对象销毁时建立钩子(hook),并构建相应的僵尸类。让我们开始吧!
捕获所有消息
如果我们创建一个没有任何方法的根类,那么发送到该类实例的任何消息都会进入 runtime 的转发机制。这似乎使得 forwardInvocation: 成为捕获消息的天然切入点。然而,这个时机实际上晚了一些。在 forwardInvocation: 能够运行之前,runtime 需要一个方法签名来构造 NSInvocation 对象,这意味着 methodSignatureForSelector: 会先执行。因此,这里才是捕获发送到僵尸对象消息的重写点。
动态分配的类 除了所发送的选择器,僵尸对象还会记住对象的原始类。然而,对象的内存中可能没有空间来存储对该原始类的引用。如果原始类没有额外的实例变量,那么就没有可供重新利用的存储空间。因此,原始类必须存储在僵尸类中,而不是僵尸对象里,这意味着僵尸类需要被动态分配。每个有实例变成僵尸的类,都会得到其专属的僵尸类。
下一个问题是:将原始类的引用存储在哪里。虽然可以通过为类分配额外存储空间来存放这类信息,但使用起来颇为不便。更简单的方式是直接使用类名。由于所有 Objective-C 类都共享一个大型命名空间,类名足以在进程内唯一标识一个类。通过给原始类名添加前缀生成僵尸类名,我们既能得到具有自描述性的名称,又能用它反推出原始类名。我们将使用MAZombie_作为前缀。
方法实现
请注意,此处所有代码均在非 ARC 环境下编译,因为 ARC 的内存管理调用在此会构成阻碍。
我们从一个简单的方法实现开始,这是一个空方法:
void EmptyIMP(id obj, SEL _cmd) {}Objective-C runtime(运行时)假定每个类都实现了 +initialize 方法。该方法会在首次向该类发送消息前被发送,以便类执行所需的初始化设置。如果该方法未实现,runtime 仍会发送该消息,从而触发转发机制(forwarding machinery),这在此处并无益处。添加一个空的 +initialize 实现即可避免此问题。EmptyIMP 将被用作僵尸类(zombie class)的 +initialize 方法实现。
NSMethodSignature *ZombieMethodSignatureForSelector(id obj, SEL _cmd, SEL selector) {它获取对象的类以及该类的名称。这就是僵尸类(zombie class)的名称:
Class class = object_getClass(obj); NSString *className = NSStringFromClass(class);可以通过去除前缀来获取原始类名:
className = [className substringFromIndex: [@"MAZombie_" length]];接着它会记录错误并调用 abort() 以确保你注意到:
NSLog(@"Selector %@ sent to deallocated instance %p of class %@", NSStringFromSelector(selector), obj, className); abort(); }创建类
ZombifyClass 函数接收一个普通类,返回其对应的僵尸类(zombie class),如该僵尸类尚未存在则会自动创建:
Class ZombifyClass(Class class) {僵尸类的类名非常实用,既可用于检查是否已存在僵尸类,也可在不存在时创建它。
NSString *className = NSStringFromClass(class); NSString *zombieClassName = [@"MAZombie_" stringByAppendingString: className];可以通过 NSClassFromString 来检查僵尸类(zombie class)是否存在。这样也能获取到僵尸类的引用,如果它存在的话就可以立即返回:
Class zombieClass = NSClassFromString(zombieClassName); if(zombieClass) return zombieClass;注意这里存在一个竞争条件(race condition):如果同一个类的两个实例同时被两个线程僵尸化(zombified),它们都会尝试创建僵尸类(zombie class)。在实际代码中,你需要用锁包裹这段代码块来防止这种情况发生。
调用 objc_allocateClassPair 函数会分配僵尸类:
zombieClass = objc_allocateClassPair(nil, [zombieClassName UTF8String], 0);我们使用 class_addMethod 函数来添加 -methodSignatureForSelector: 的实现。签名 ”@@::” 表示该方法返回一个对象,并接收三个参数:一个对象(self)、一个 selector(_cmd),以及另一个 selector(显式的 selector 参数)。
class_addMethod(zombieClass, @selector(methodSignatureForSelector:), (IMP)ZombieMethodSignatureForSelector, "@@::");空方法同时也被添加为 +initialize 的实现。添加类方法并没有单独的函数。实际上,我们是给类的类(即元类(metaclass))添加方法:
class_addMethod(object_getClass(zombieClass), @selector(initialize), (IMP)EmptyIMP, "v@:");现在类已设置完成,可以在运行时(runtime)中注册并返回了:
objc_registerClassPair(zombieClass);
return zombieClass; }僵尸化对象
为了将对象转变为僵尸对象,我们需要替换 NSObject 的 dealloc 方法实现。子类的 dealloc 方法仍会执行,但一旦调用链传递到 NSObject,僵尸代码就会运行。这将阻止对象被销毁,并为将对象的类设置为僵尸类提供了入口。此操作会被封装成一个函数,用于启用僵尸机制:
void EnableZombies(void) { Method m = class_getInstanceMethod([NSObject class], @selector(dealloc)); method_setImplementation(m, (IMP)ZombieDealloc); }然后,我们可以在 main() 函数顶部或类似位置调用 EnableZombies,后续工作将自动完成。ZombieDealloc 的实现很直接:它调用 ZombifyClass 来获取待释放对象的僵尸类(zombie class),随后使用 object_setClass 将该对象的类更改为这个僵尸类:
void ZombieDealloc(id obj, SEL _cmd) { Class c = ZombifyClass(object_getClass(obj)); object_setClass(obj, c); }测试让我们确认它能正常工作:
obj = [[NSIndexSet alloc] init]; [obj release]; [obj count];我半随意地选择了 NSIndexSet,因为它是一个方便的类,且不会遇到 CoreFoundation 桥接(bridging)的奇怪之处。启用僵尸对象(zombies)后运行这段代码,会产生以下结果:
a.out[5796:527741] Selector count sent to deallocated instance 0x100111240 of class NSIndexSet成功!
总结
僵尸对象的实现最终相当简单。通过动态分配类(dynamically allocating classes),我们可以轻松追踪原始类,而无需依赖僵尸对象内部的存储。methodSignatureForSelector: 为拦截发送到僵尸对象的消息提供了便利的枢纽点。在 -[NSObject dealloc] 上快速添加钩子,就能在对象的引用计数(retain count)归零时,将其转化为僵尸对象而非销毁。
以上就是今天的全部内容。下次将带来更惊悚的故事。在此之前,欢迎继续提交你们感兴趣的话题建议。
Original (English)
Source: https://www.mikeash.com/pyblog/friday-qa-2014-11-07-lets-build-nszombie.html
Zombies are a valuable tool for debugging memory management problems. I previously discussed the implementation of zombies, and today I’m going to go one step further and build them from scratch, a topic suggested by Шпирко Алексей.
ReviewZombies detect memory management errors. Specifically, they detect the scenario where an Objective-C object is deallocated and then a message is sent using a pointer to where that object used to be. This is a specific case of the general “use after free” error.
In normal operation, this results in a message being sent to memory that may have been overwritten or returned to the kernel. This results in a crash if the memory has been returned to the kernel, and can result in a crash if the memory has been overwritten. In the case where the memory was overwritten with a new Objective-C object, then the message is sent to that new object which is probably completely unrelated to the original one, which can cause exceptions thrown due to unrecognized selectors or can even cause bizarre misbehaviors if the message is one the object actually responds to.
It’s also possible that the memory hasn’t been touched and still contains the original object, in a post-dealloc state. This can lead to other interesting and bizarre failures. For example, if the object contains a UNIX file handle, it may call close on a file descriptor twice, which can end up closing a file descriptor owned by some other part of the program, causing a failure far away from the bug.
ARC has greatly reduced the frequency of these errors, but it hasn’t eliminated them altogether. These problems can still occur due to problems with multithreading, interactions with non-ARC code, mismatched method declarations, or type system abuse that strips or changes ARC storage modifiers.
Zombies hook into object deallocation. Instead of freeing the underlying memory as the last step in object deallocation, zombies change the object to a new zombie class which intercepts all messages sent to it. Any message sent to a zombie object results in a diagnostic error message instead of the bizarre behavior you get in normal operation. There is also a mode where it rewrites the class and then frees the memory anyway, but this is typically much less useful since the memory will typically get reused quickly, and I’ll ignore that option here.
To write our own zombies implementation, we need to hook object deallocation and build the appropriate zombie classes. Let’s get started!
Catching All MessagesIf we make a root class without any methods, then any message sent to an instance of that class will go into the runtime’s forwarding machinery. This would seem to make forwardInvocation: a natural point to catch messages. However, that one happens a bit too late. Before forwardInvocation: can run, the runtime needs a method signature to construct an NSInvocation object, and that means that methodSignatureForSelector: runs first. This, then, is the override point for catching messages sent to a zombie object.
Dynamically Allocated ClassesIn addition to the selector that was sent, zombies also remember the original class of the object. However, there may not be any room in the object’s memory to store a reference to that original class. If the original class had no additional instance variables, then there’s no space that can be repurposed for storage. The original class must therefore be stored in the zombie class rather than in the zombie object, and that means the zombie class needs to be dynamically allocated. Each class which has an instance that becomes a zombie will get its own zombie class.
The next question is where to store the reference to the original class. It’s possible to allocate a class with some extra storage for things like this, but it’s somewhat inconvenient to use. An easier way is to simply use the class name. Since Objective-C classes all live in one big namespace, the class name is sufficient to uniquely identify it within a process. By sticking a prefix on the original class name to generate the zombie class name, we end up with something that’s both descriptive on its own and can be used to recover the original class name. We’ll use MAZombie_ as the prefix.
Method ImplementationsNote that all of the code here is built without ARC, since ARC memory management calls really get in the way here.
Let’s start off with a simple method implementation, which is an empty one:
void EmptyIMP(id obj, SEL _cmd) {}It turns out that the Objective-C runtime assumes that every class implements +initialize. This is sent to a class before the first message sent to the class to allow it to do any setup it needs. If it’s not implemented, the runtime sends it anyway and hits the forwarding machinery instead, which isn’t helpful here. Adding an empty implementation of +initialize avoids that problem. EmptyIMP will be used as the implementation of +initialize on zombie classes.
The implementation of -methodSignatureForSelector: is a bit more interesting:
NSMethodSignature *ZombieMethodSignatureForSelector(id obj, SEL _cmd, SEL selector) {It retrieves the class of the object and that class’s name. This is the name of the zombie class:
Class class = object_getClass(obj); NSString *className = NSStringFromClass(class);The original class name can be retrieved by stripping off the prefix:
className = [className substringFromIndex: [@"MAZombie_" length]];Then it logs the error and calls abort() to make sure you’re paying attention:
NSLog(@"Selector %@ sent to deallocated instance %p of class %@", NSStringFromSelector(selector), obj, className); abort(); }Creating the ClassesThe ZombifyClass function takes a normal class and returns a zombie class, creating it if necessary:
Class ZombifyClass(Class class) {The zombie class name is useful both for checking to see if a zombie class exists and for creating it if it doesn’t:
NSString *className = NSStringFromClass(class); NSString *zombieClassName = [@"MAZombie_" stringByAppendingString: className];The existence of the zombie class can be checked using NSClassFromString. This also provides the zombie class so it can be returned immediately if it exists:
Class zombieClass = NSClassFromString(zombieClassName); if(zombieClass) return zombieClass;Note that there’s a race condition here: if two instances of the same class are zombified from two threads simultaneously, they’ll both try to create the zombie class. In real code, you’d need to wrap this whole chunk of code in a lock to ensure that doesn’t happen.
A call to the objc_allocateClassPair function allocates the zombie class:
zombieClass = objc_allocateClassPair(nil, [zombieClassName UTF8String], 0);We add the implementation of -methodSignatureForSelector: using the class_addMethod function. The signature of ”@@::” means that it returns an object, and takes three parameters: an object (self), a selector (_cmd), and another selector (the explicit selector parameter):
class_addMethod(zombieClass, @selector(methodSignatureForSelector:), (IMP)ZombieMethodSignatureForSelector, "@@::");The empty method is also added as the implementation of +initialize. There’s no separate function for adding class methods. Instead, we add a method to the class’s class, which is the metaclass:
class_addMethod(object_getClass(zombieClass), @selector(initialize), (IMP)EmptyIMP, "v@:");Now that the class is set up, it can be registered with the runtime and returned:
objc_registerClassPair(zombieClass);
return zombieClass; }Zombifying ObjectsIn order to turn objects into zombies, we’ll replace the implementation of NSObject’s dealloc method. Subclasses’ dealloc methods will still run, but once they go up the chain to NSObject, the zombie code will run. This will prevent the object from being destroyed, and provides a place to set the object’s class to the zombie class. This operation gets wrapped up into a function to enable zombies:
void EnableZombies(void) { Method m = class_getInstanceMethod([NSObject class], @selector(dealloc)); method_setImplementation(m, (IMP)ZombieDealloc); }We can then put a call to EnableZombies at the top of main() or similar, and the rest takes care of itself. The implementation of ZombieDealloc is straightforward. It calls ZombifyClass to obtain the zombie class for the object being deallocated, then uses object_setClass to change the class of the object to the zombie class:
void ZombieDealloc(id obj, SEL _cmd) { Class c = ZombifyClass(object_getClass(obj)); object_setClass(obj, c); }TestingLet’s make sure it works:
obj = [[NSIndexSet alloc] init]; [obj release]; [obj count];I chose NSIndexSet semi-arbitrarily, as a convenient class that doesn’t hit CoreFoundation bridging weirdness. Running this code with zombies enabled produces:
a.out[5796:527741] Selector count sent to deallocated instance 0x100111240 of class NSIndexSetSuccess!
ConclusionZombies are fairly simple to implement in the end. By dynamically allocating classes, we can easily keep track of the original class without needing to rely on storage within the zombie object. methodSignatureForSelector: provides a convenient choke point for intercepting messages sent to the zombie object. A quick hook on -[NSObject dealloc] lets us turn objects into zombies instead of destroying them when their retain count goes to zero.
That’s it for today. Come back next time for more frightening tales. Until then, keep sending in your suggestions for topics.