译文 · 原文: Friday Q&A 2011-06-03: Objective-C Blocks vs. C++0x Lambdas: Fight! · 作者 Mike Ash
原文:https://www.mikeash.com/pyblog/friday-qa-2011-06-03-objective-c-blocks-vs-c0x-lambdas-fight.html 发布:2011-06-03 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样
块(Blocks)大概是苹果近年来引入的最重要的语言新特性,我之前已经写过很多相关文章。新的 C++ 标准 C++0x 引入了类似的特性 ——lambda 表达式。今天我想讨论这两个特性的异同之处,这个话题是由 David Dunham 建议的。
术语说明
我将把苹果的块扩展称为 “Objective-C 块”(虽然这不完全准确)。实际上它们是 C 语言的扩展(甚至可以在 C++ 中使用),并添加了一些额外行为以使其在 Objective-C 中更有用。然而在实现层面,它们与 Objective-C 深度交织,而 “C 块” 的说法又过于宽泛,因此我认为用 “Objective-C 块” 来指代它们最为合适。
C++0x 的 lambda 表达式属于 C++ 独有特性,无法在 C 语言中使用。假设编译器支持 C++0x,它们应该可以在 Objective-C++ 中使用。
作为一个同时指代 Objective-C 块和 C++0x lambda 表达式的统称,我将使用 “匿名函数”(anonymous function)这一术语。
语法 Objective-C 的块(block)和 C++0x 的 lambda 表达式有着相同的基本目标:允许编写匿名内联函数。无论是称为闭包(closure)、块、lambda,还是匿名函数,它们都是高级语言中的常见特性。它们对于构建诸如数组遍历、多线程、延迟计算等诸多场景下便捷简洁的库非常有用。
作为较低级的语言,C 和 C++ 本身没有匿名函数的概念。为了添加这一功能,必须创造新的语法。正因如此,Objective-C 的块和 C++0x 的 lambda 最终采用了略有不同的语法。一个空的 Objective-C 块看起来像这样:
^{} []{}匿名函数可以通过在开头部分之后用括号写入参数来接收参数,这种风格类似于函数参数:
^(int x, NSString *y){} // ObjC, take int and NSString* [](int x, std::string y){} // C++, take int and std::string ^{ return 42; } // ObjC, returns int []{ return 42; } // C++, returns int []{ if(something) return 42; else return 43; }相比之下,无论代码块内部多么复杂,Objective-C 块(block)始终支持返回类型推断。如果没有返回语句,则类型被推断为 void。否则,编译器会检查块中的所有返回语句。如果它们都返回相同类型,那么块的返回类型就会被推断为该类型。如果存在冲突,则会产生错误。因此,对应那个无效的 C++0x lambda 示例的 Objective-C 写法可以正常工作:
^{ if(something) return 42; else return 43; }使用 Objective-C 的 block 时,返回类型声明位于 ^ 之后;而在 C++0x 中,返回类型则通过将 ->type 置于 lambda 的参数列表之后来声明。以下是加上了显式返回类型声明的两个示例,这使其能够在两种语言中兼容:
id (^block)(void) = ^ id (void) { ... };auto lambda = [] () -> id { ... }; []()->int { if(something) return 42; else return 43; } ^int { if(something) return 42; else return 43; } ^int (void) { if(something) return 42; else return 43; }类型 Objective-C 的 block(块)引入了一类新的语言级别的类型来表示 block(块)类型。它们的语法与标准的(但有些棘手)C 语言函数指针类型语法匹配,但使用 ^ 代替了 *:
void (*)(int) // function pointer taking int and returning void void (^)(int) // block taking int and returning voidC++0x 采用了完全不同的实现方式。Lambda 拥有独特的匿名类型,该类型实现了函数调用运算符 operator()。换言之,你可以调用它,但不像 Objective-C 的 block(块)那样拥有可访问的类型。为了将其存储在变量中、传递给函数或从函数返回,必须使用 C++ 模板或 C++0x 的 auto 关键字(该关键字能从初始化器的类型推断变量的类型)。
捕获变量
这些匿名函数最重要的特性之一是能够捕获封闭作用域中的变量。例如:
int x = 42; void (^block)(void) = ^{ printf("%d\n", x); }; block(); // prints 42如上所述,Objective-C 代码块(blocks)可以通过在块内直接引用变量来捕获(capture)其封闭作用域中的变量。默认情况下,所有被捕获的变量会在代码块创建时被复制,且在代码块内部无法修改这些变量。
如果一个代码块捕获了另一个代码块变量,该代码块的内存将被自动管理 —— 它会根据需要被复制(copy)和释放(release)。Objective-C 对象指针(object pointers)也会通过自动执行 retain 和 release 操作来管理。这种与 Objective-C 内存管理深度且隐式的集成,使得许多任务变得明显更简单,因为在大多数情况下,代码块会对其捕获的变量自动进行正确的处理。
当代码块需要修改被捕获的变量时,必须使用特殊的 __block 限定符(qualifier)来声明该变量,以将其标记为可变的(mutable):
__block int x = 42; void (^block)(void) = ^{ x = 43; }; block(); // x is now 43不出所料,C++0x 的 lambda 表达式在此方面提供了灵活得多但也复杂得多的特性。C++ 的总体哲学似乎是尽可能为程序员提供更多工具和选择,这利弊兼有。
C++0x lambda 开头的方括号 [] 控制着局部变量的捕获方式。如果像这样留空,则完全无法捕获任何变量。若要捕获变量,必须明确告知编译器你的意图。
最直接的方式是在 [] 内列出要捕获的变量。任何直接列出名称的变量将被按值捕获;任何名称前带有 & 符号的变量将被按引用捕获。例如:
int x = 42; int y = 99; auto lambda = [x, &y]{ y = 100; }; lambda(); // y is now 100 int x = 42; int y = 99; auto lambda = [x, &y]() mutable { x++, y++; printf("%d, %d\n", x, y); }; lambda(); // prints 43, 100 printf("%d, %d\n", x, y); // prints 42, 100 lambda(); // prints 44, 101!逐一列出所有要捕获的变量可能会显得繁琐,因此可以通过在 [] 内指定 = 或 & 来设置默认的捕获行为。例如:
int x = 42; int y = 99; auto lambda = [&] { x++, y++; }; lambda(); // x, y are now 43, 100 int x = 42; int y = 99; int z = 1001; auto lambda = [=, &z] { // can't modify x or y here, but we can read them z++; printf("%d, %d, %d\n", x, y, z); }; lambda(); // prints 42, 99, 1002 // z is now 1002内存管理
Objective-C 的 block 和 C++0x 的 lambda 在诞生之初都是栈对象。然而在此之后,它们的发展路径截然不同。
Objective-C 的 block 同样是 Objective-C 对象。与所有 Objective-C 对象一样,它们以引用方式存储,而非按值存储。当编写一个 block 字面量时,block 对象会在栈上创建,而该字面量表达式会求值为该 block 的地址。
为了让 block 能够超越其栈上位置的生命周期,它必须被复制(copied)。由于该值只是一个引用,仅通过 = 赋值是不够的:
void (^block)(void); { block = ^{ printf("hello world"); }; } block(); // bad! void (^block)(void); { block = ^{ printf("hello world"); }; block = [block copy]; } block(); // good!C++0x lambda 表达式以值而非引用方式存储。必要时可以手动将它们复制到堆上,但整个过程完全需要手动操作。所有被捕获的变量都作为匿名 lambda 对象的成员变量存储,因此当 lambda 被复制时,这些变量也会随之复制,从而触发相应的构造函数和析构函数。
这种行为的一个极其重要的方面是:以引用方式捕获的变量在 lambda 对象内会以引用形式存储。在此方面它们不会获得任何特殊处理。这意味着,如果 lambda 在原始封闭作用域已被销毁后仍访问这些变量,将导致未定义行为并很可能崩溃。与之对比的是 __block 变量(__block 变量),其存储会被透明地移至堆上,并保证生命周期至少与 block 本身相当。
另一方面,只要没有通过引用捕获任何内容,C++0x lambda 就可以在无需额外操作的情况下返回。返回操作会复制它,而该副本将继续正常工作。而使用 Objective-C block(Objective-C block)时,必须在返回前显式复制它们,否则它们会立即失效。
性能方面,Objective-C blocks 是包含内嵌函数指针的对象。一次 block 调用会转化为对该函数指针的调用,并将 block 本身作为隐式参数传递:
block(); // equivalent to: block->impl(block);在大多数使用场景中,优化机会并不常见。例如,以下代码调用了一个方法来使用 block 遍历数组:
[array do: ^(id obj) { NSLog(@"Obj is %@", obj); }];能够对 block 进行优化的情形,大多是其实本不需要 block 的情况 —— 例如在同一作用域内定义并立即调用 block。真正能产生有效优化的场景,或许是那些接受 block 参数的 inline(内联)函数,因为优化器能够根据调用方的代码来改进内联后的代码。不过据我所知,目前尚无任何支持 block 的编译器实施此类优化,尽管我并未深入调研。
C++0x 的 lambda(匿名函数)本质是带有 operator() 的对象。调用过程不涉及动态派发,因此调用 lambda 最终只是执行一次简单的函数调用,无需提前解引用指针。
由于将 lambda 传递给其他函数会涉及模板(template),这带来了进一步的优化机会。考虑以下遍历 vector 的代码:
for_each(v.begin(), v.end(), [](int x) { printf("x is %d\n", x); });这是 C++ 与 Objective-C 之间一种典型的权衡取舍。C++ 往往在单个函数层面追求尽可能快的生成代码,为此不惜牺牲编译的简易性与速度,有时甚至会降低编程的便捷性。Objective-C 则更倾向于选择实现方式更简单、更易于编译和使用的方案,代价是增加额外的运行时开销。
总结
Objective-C 的 Blocks 与 C++0x 的 lambdas 是相似的语言特性,拥有相似的目标,但采用的方法却大相径庭。Objective-C 的 Blocks 在编写和使用上相对更简单,尤其是在需要将 block 复制并使其生存期超越其创建作用域的异步或后台任务场景中。C++0x 的 lambdas 最终提供了更高的灵活性和潜在的速度优势,但代价是增加了相当大的复杂性。在比较两者时,我认为苹果最终做出了更优的权衡,至少对于我在自身编程中可能使用 blocks 的场景而言是如此。
本周的内容就到这里。下期烧脑特辑将在 14 天后与您准时相见。我每次都会说,Friday Q & A 专栏由读者的想法驱动。如果您有任何希望在此探讨的话题,请务必发送给我。
Original (English)
Source: https://www.mikeash.com/pyblog/friday-qa-2011-06-03-objective-c-blocks-vs-c0x-lambdas-fight.html
Blocks are perhaps the most significant new language feature introduced by Apple in years, and I’ve written a lot about them before. The new C++ standard, C++0x, introduces lambdas, a similar feature. Today, I want to discuss the two features and how they are alike and how they differ, a topic suggested by David Dunham.
Terminology I will refer to Apple’s blocks extension as “Objective-C blocks” even though this is not entirely correct. They are actually an addition to C (and can even be used in C++), with some extra behaviors to make them more useful in Objective-C. However, they are deeply intertwined with Objective-C in their implementation, and “C blocks” is vague, so I think that “Objective-C blocks” is the best way to refer to them here.
C++0x lambdas are part of C++ only and can’t be used from C. Presumably they can be used in Objective-C++ if the compiler supports C++0x.
For a generic term to refer to both Objective-C blocks and C++0x lambdas, I will use “anonymous function”.
Syntax Both Objective-C blocks and C++0x lambdas have the same basic goal: to allow writing anonymous inline functions. Called closures, blocks, lambdas, or just anonymous functions, these are a common feature in higher level languages. They are extremely useful for building convenient, succint libraries for things like array iteration, multithreading, delayed computation, and many others.
As lower level languages, C and C++ had no concept of anonymous functions. To add them, new syntax had to be created. Because of this, Objective-C blocks and C++0x lambdas ended up with somewhat different syntax. An empty Objective-C block looks like this:
^{} []{}The anonymous function can take arguments by writing them in parentheses, in the style of function arguments, after the leading bit:
^(int x, NSString *y){} // ObjC, take int and NSString* [](int x, std::string y){} // C++, take int and std::string ^{ return 42; } // ObjC, returns int []{ return 42; } // C++, returns int []{ if(something) return 42; else return 43; }In contrast, Objective-C blocks do return type inference no matter how complicated the code is inside of the block. If no return statements are present, the type is inferred as void. Otherwise, it examines all of the return statements in the block. If they all return the same type, then the return type of the block is inferred to be that same type. If they conflict, an error is generated. Thus, the equivalent Objective-C example to the invalid C++0x lambda example works fine:
^{ if(something) return 42; else return 43; }With Objective-C blocks, the return type is declared immediately after the ^. In C++0x, the return type is declared by placing ->type after the lambda’s argument list. Here are those two examples with explicit return types, which allows it to work in both languages:
[]()->int { if(something) return 42; else return 43; } ^int { if(something) return 42; else return 43; } ^int (void) { if(something) return 42; else return 43; }Type Objective-C blocks introduce a new class of language-level types to represent block types. They match the standard (but tricky) syntax for C function pointer types, but with a ^ in place of the *:
void (*)(int) // function pointer taking int and returning void void (^)(int) // block taking int and returning voidC++0x takes a completely different approach. Lambdas have a unique anonymous type which implements operator(). In other words, you can call it, but otherwise you don’t have an accessible type the way Objective-C blocks do. In order to store them in variables, pass them to functions, or return them from functions, C++ templates or the C++0x auto keyword (which infers the type of a variable from the type of its initializer) must be used.
Captured Variables One of the most significant features of these anonymous functions is the ability to capture variables from the enclosing scope. For example:
int x = 42; void (^block)(void) = ^{ printf("%d\n", x); }; block(); // prints 42Objective-C blocks can capture variables from the enclosing scope by simply referring to them within the block, as seen above. By default, all captured variables are copied at the point where the block is created and cannot be modified from within the block.
If a block captures another block variable, that block’s memory is automatically managed. It’s copied and released as necessary. Objective-C object pointers are also automatically managed by retaining and releasing them as necessary. This deep, implicit integration with Objective-C memory management makes a lot of tasks significantly easier, because most of the time blocks automatically do the right thing with the variables they capture.
For cases where the block needs to be able to modify a captured variable, the variable must be declared with the special __block qualifier to mark it as being mutable:
__block int x = 42; void (^block)(void) = ^{ x = 43; }; block(); // x is now 43It should come as no surprise that C++0x lambdas offer considerably greater flexibility but also considerably greater complication in this area. The overall philosophy of C++ appears to be to give the programmer as many tools and choices as possible, which has its pros and cons.
The initial [] which begins a C++0x lambda controls how local variables are captured. If it’s left empty like that, then no variables can be captured at all. In order to capture variables, it’s necessary to tell the compiler what you want to do.
The most explicit way to do this is to list the variables to be captured inside the []. Any variable listed directly by name is captured by value. Any variable listed with a leading & is captured by reference. For example:
int x = 42; int y = 99; auto lambda = [x, &y]{ y = 100; }; lambda(); // y is now 100 int x = 42; int y = 99; auto lambda = [x, &y]() mutable { x++, y++; printf("%d, %d\n", x, y); }; lambda(); // prints 43, 100 printf("%d, %d\n", x, y); // prints 42, 100 lambda(); // prints 44, 101!It can be inconvenient to list every variable to be captured, so it is possible to specify a default capture behavior by putting either = or & within the []. For example:
int x = 42; int y = 99; auto lambda = [&] { x++, y++; }; lambda(); // x, y are now 43, 100 int x = 42; int y = 99; int z = 1001; auto lambda = [=, &z] { // can't modify x or y here, but we can read them z++; printf("%d, %d, %d\n", x, y, z); }; lambda(); // prints 42, 99, 1002 // z is now 1002Memory Management Both Objective-C blocks and C++0x lambdas start their lives as stack objects. After that point, however, they diverge significantly.
Objective-C blocks are also Objective-C objects. Like all Objective-C objects, they are stored by reference, never by value. When a block literal is written, the block object is created on the stack, and the literal expression evaluates to the address of that block.
In order for a block to outlive its slot on the stack, it must be copied. Because the value is just a reference, simply assigning it with = is not enough:
void (^block)(void); { block = ^{ printf("hello world"); }; } block(); // bad! void (^block)(void); { block = ^{ printf("hello world"); }; block = [block copy]; } block(); // good!C++0x lambdas are stored by value, not by reference. They can be copied onto the heap if needed, but the process is entirely manual. All captured variables are stored as member variables within the anonymous lambda object, so when the lambda is copied, those get copied as well, firing the appropriate constructors and destructors.
One extremely important aspect of this behavior is that variables which are captured by reference are stored as references within the lambda object. They get no special treatment in this respect. This means that a lambda which accesses one of those variables after the original enclosing scope has been destroyed is engaging in undefined behavior and will likely crash. Compare this to __block variables, where the storage is transparently moved to the heap and is guaranteed to live at least as long as the block does.
On the other hand, as long as nothing is captured by reference, a C++0x lambda can be returned without any extra work. The return will copy it and the copy will continue to function. With Objective-C blocks, they must be explicitly copied before returning, otherwise they become immediately invalid.
Performance Objective-C blocks are objects which contain an embedded function pointer. A block call translates to a call to that function pointer, passing the block as an implicit parameter:
block(); // equivalent to: block->impl(block);Opportunities for optimization are rare in most use cases. For example, this code calls a method to iterate over an array with a block:
[array do: ^(id obj) { NSLog(@"Obj is %@", obj); }];The cases where blocks can be optimized are mostly cases where they’re not needed in the first place, for example where they are defined and then called in the same scope. One place where useful optimizations could be made are inline functions which take block parameters, since the optimizer is able to improve the inlined code based on the calling code. However, as far as I know, no current blocks-capable compilers perform any of these optimizations, although I haven’t investigated it thoroughly.
C++0x lambdas are objects with a operator(). There is no dynamic dispatch involved, so calling the lambda works out to be a simple function call, with no dereferencing ahead of time.
Because passing a lambda to another function involves templates, there are further opportunities for optimization. Consider this code to iterate over a vector:
for_each(v.begin(), v.end(), [](int x) { printf("x is %d\n", x); });This is a typical tradeoff between C++ and Objective-C. C++ often favors the fastest possible generated code at the level of individual functions, sacrificing ease and speed of compilation and sometimes ease of programming to get it. Objective-C more often favors implementations which are simpler to create, compile, and use, at the cost of additional runtime overhead.
Conclusion Objective-C blocks and C++0x lambdas are similar language features with similar goals but considerably different approaches. Objective-C blocks are somewhat simpler to write and to use, especially in the case of using them for asynchronous or background tasks where the block has to be copied and kept alive beyond the lifetime of the scope where it was created. C++0x lambdas ultimately provide more flexibility and potential speed, but at the cost of considerable added complexity. In comparing the two, I believe that Apple ultimately made the better set of tradeoffs, at least for the cases where I am likely to use blocks in my own programming.
That wraps things up for this week. Come back in another 14 days for the next baffling edition. As I say every time, Friday Q&A is driven by reader ideas. If you have a topic that you would like to see covered here, please send it in.