GCD 入门(一)基础与派发队列

Mike Ash Friday Q&A 中文译文:GCD 入门(一)基础与派发队列

作者 TommyWu
封面圖片: GCD 入门(一)基础与派发队列

译文 · 原文: Friday Q&A 2009-08-28: Intro to Grand Central Dispatch, Part I: Basics and Dispatch Queues · 作者 Mike Ash

原文:https://www.mikeash.com/pyblog/friday-qa-2009-08-28-intro-to-grand-central-dispatch-part-i-basics-and-dispatch-queues.html 发布:2009-08-28 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样


欢迎回到周五问答。本周的版面恰逢苹果发布 Snow Leopard,因此我将借此机会开放讨论此前受 NDA(保密协议)保护的技术,并探讨 Snow Leopard 中一些值得关注的新特性。本周我将开启一个计划中的系列,讨论 Grand Central Dispatch(简称 GCD),这个主题由 Chris Liscio 提出。

什么是 GCD? Grand Central Dispatch(简称 GCD)是一个底层 API,它引入了一种进行并发编程(concurrent programming)的新方式。在基础功能上,它有点类似于 NSOperationQueue,允许将程序的工作分割成独立的任务,然后提交到工作队列中并发或串行执行。它比 NSOperationQueue 更低层且性能更高,并且不属于 Cocoa 框架的一部分。

除了提供代码并行执行的功能外,GCD 还提供了一个完全集成的事件处理系统(event handling system)。可以设置处理器(handlers)来响应文件描述符(file descriptors)、Mach 端口(mach ports)、进程(processes)、定时器(timers)、信号(signals)以及用户生成事件(user-generated events)上的事件。这些处理器通过 GCD 的并发执行机制来执行。

GCD 的 API 重度依赖代码块(blocks),相关内容我在之前的 Friday Q & A 中已介绍过,首先讲述了代码块的基础知识,随后探讨了在实际代码中运用代码块的实践方法。虽然 GCD 也能通过传统的 C 语言机制 —— 即提供函数指针(function pointer)和上下文指针(context pointer)—— 来实现功能,但使用代码块无疑更加便捷,从实用角度看也更具能力优势。

关于 GCD 的文档,可在雪豹(Snow Leopard)系统终端中输入 man dispatch 命令获取。

为何使用它? 相较于传统多线程编程,GCD 具有诸多优势:

  • 易于使用:GCD 比线程更易于操作。由于其基于工作单元(work units)而非计算线程(threads of computation)设计,它能自动处理诸如等待任务完成、监视文件描述符(file descriptors)、定期执行代码以及暂停任务等常见操作。基于代码块的 API 使得在不同代码段之间传递上下文变得极其简便。

  • 效率(Efficiency):GCD 以轻量级方式实现,这使得在许多创建专用线程成本过高的场景中使用 GCD 变得实用且快速。这与易用性相互关联:GCD 之所以易于使用,部分原因在于多数情况下你可以直接使用它,而无需过多担忧效率问题。

  • 性能(Performance):GCD 会根据系统负载自动调整线程使用量,从而减少上下文切换并提升计算效率。

派发队列(dispatch queues)和派发源(dispatch sources)(后续会进一步说明)可以被挂起和恢复,可以关联任意上下文指针,还可以关联终结器函数(finalizer function)。关于这些功能的更多信息,请参阅 man dispatch_object。

派发队列 GCD 的一个基本概念是派发队列。派发队列是一种接受任务并按照任务到达顺序执行的对象。派发队列可以是并发的(concurrent)或串行的(serial)。并发队列会根据系统负载同时执行多个任务,类似于 NSOperationQueue。串行队列则一次仅执行一个任务。

在 GCD(Grand Central Dispatch)中,主要有三种类型的队列(queues):

  • 主队列(main queue):类似于主线程。事实上,提交到主队列的作业在进程的主线上执行。可以通过调用 dispatch_get_main_queue() 来获取主队列。由于主队列本质上与主线程绑定,它是一个串行队列(serial queue)。

  • 全局队列(global queues):全局队列是整个进程共享的并发队列(concurrent queues)。存在三个全局队列:一个高优先级、一个默认优先级和一个低优先级队列。可以通过调用 dispatch_get_global_queue 并指定所需的优先级来访问全局队列。

  • 自定义队列(custom queues):自定义队列(GCD 并没有这样称呼它们,但没有特定的名称,所以我称之为 “自定义”)是使用 dispatch_queue_create 函数创建的队列。这些是串行队列,一次只执行一个作业。因此,它们可以用作同步机制(synchronization mechanism),就像传统线程程序中的互斥锁(mutex)一样。

提交任务
将任务提交给队列非常简单:调用 dispatch_async 函数,并传入一个队列和一个代码块(block)。该队列会在轮到该代码块执行时运行它。以下是使用全局队列(global queue)在后台执行某个耗时较长任务的示例:

dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
[self goDoSomethingLongAndInvolved];
NSLog(@"Done doing something long and involved");
});

当然,在工作完成后仅执行 NSLog 并不是很有用。在一个典型的 Cocoa 应用程序中,你可能想要更新 GUI 的某一部分,而这又意味着需要在主线程上运行代码。你可以通过使用嵌套的 dispatch 来轻松实现这一点:外层的 dispatch 执行后台工作,然后从后台 block 中将任务派发到主队列(main queue),示例如下:

dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
[self goDoSomethingLongAndInvolved];
dispatch_async(dispatch_get_main_queue(), ^{
[textField setStringValue:@"Done doing something long and involved"];
});
});
__block NSString *stringValue;
dispatch_sync(dispatch_get_main_queue(), ^{
// __block variables aren't automatically retained
// so we'd better make sure we have a reference we can keep
stringValue = [[textField stringValue] copy];
});
[stringValue autorelease];
// use stringValue in the background now
dispatch_queue_t bgQueue = myQueue;
dispatch_async(dispatch_get_main_queue(), ^{
NSString *stringValue = [[[textField stringValue] copy] autorelease];
dispatch_async(bgQueue, ^{
// use stringValue in the background now
});
});

替换锁机制
自定义队列可以作为同步机制替代传统的锁(locks)使用。在传统多线程编程中,你可能会设计一个可被多线程访问的对象。为实现这一点,它会使用一个锁(你可以发现它通常存储在实例变量(instance variable)中)来保护所有对共享数据的访问:

NSLock *lock;
- (id)something
{
id localSomething;
[lock lock];
localSomething = [[something retain] autorelease];
[lock unlock];
return localSomething;
}
- (void)setSomething:(id)newSomething
{
[lock lock];
if(newSomething != something)
{
[something release];
something = [newSomething retain];
[self updateSomethingCaches];
}
[lock unlock];
}
dispatch_queue_t queue;
- (id)something
{
__block id localSomething;
dispatch_sync(queue, ^{
localSomething = [something retain];
});
return [localSomething autorelease];
}
- (void)setSomething:(id)newSomething
{
dispatch_async(queue, ^{
if(newSomething != something)
{
[something release];
something = [newSomething retain];
[self updateSomethingCaches];
}
});
}

你可能会问,这一切听起来不错,但意义何在?我只是把代码从一种机制切换到另一种看起来几乎相同的机制。为什么要这样做呢?

实际上,采用 GCD(Grand Central Dispatch)的方式有几个显著优势:

  • 并行性:请注意在代码的第二个版本中,-setSomething: 使用了 dispatch_async。这意味着对 -setSomething: 的调用会立即返回,而主要的工作将在后台进行。如果 updateSomethingCaches 是一个耗时的操作,并且调用方同时也要进行处理器密集型任务,这将带来显著的性能提升。

  • 安全性:使用 GCD 时,你不可能意外地写出一条不释放锁的代码路径。在常规的锁定代码中,无意中在锁的中间添加 return 语句、设置条件性退出,或其他类似的不幸情况并不少见。而在 GCD 中,队列总是会继续运行,你不可能不正常地将控制权返回给它。

  • 控制性:可以随意挂起和恢复调度队列(dispatch queues),而这在基于锁(locks)的方法中不易实现。还可以让自定义队列指向另一个调度队列,使其继承该队列的属性。通过这种方式,可以调整队列的优先级 —— 只要将其指向不同的全局队列(global queues);如果出于某种原因需要代码在主线程上执行,甚至可以让队列执行此类操作。

  • 集成性:GCD 事件系统能与调度队列集成。对象所需的任何事件或定时器(timers)都可以指向该对象的队列,从而让处理器(handlers)自动在该队列上运行,并自动与对象同步。

总结 现在你已经了解了 Grand Central Dispatch(大中央调度,简称 GCD)的基础知识:如何创建调度队列,如何向调度队列提交工作单元,以及如何在多线程程序中使用队列作为锁的替代方案。下周我将向你展示一些使用 GCD 编写代码的技术,这些代码执行并行处理以从多核系统中提取更多性能。在未来几周里,我将更深入地讨论 GCD 的更多内容,包括事件系统和队列目标。

本周的 Friday Q & A 就到这里。下周请回来了解更多关于 GCD 的精彩内容。仅仅因为我为未来几周规划了主题,并不意味着我不需要你们的建议。恰恰相反:Friday Q & A 是由你们的建议驱动的,当这个系列接近尾声时,我拥有的建议越多,就能写出越好的文章。如果你有想讨论的主题建议,请在评论区留言或直接发邮件给我。


#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2009-08-28-intro-to-grand-central-dispatch-part-i-basics-and-dispatch-queues.html

Welcome back to Friday Q&A. This week’s edition lines up with Apple’s release of Snow Leopard, so I’m going to take this opportunity to open up the discussion on previously NDA’d technologies and talk about some of the cool stuff now available in Snow Leopard. For this week I’m going to start what I plan to be an ongoing series on Grand Central Dispatch, a topic suggested by Chris Liscio.

What Is It? Grand Central Dispatch, or GCD for short, is a low level API which introduces a new way to perform concurrent programming. For basic functionality it’s a bit like NSOperationQueue, in that it allows a program’s work to be divided up into individual tasks which are then submitted to work queues to run concurrently or serially. It’s lower level and higher performance than NSOperationQueue, and is not part of the Cocoa frameworks.

In addition to the facilities for parallel execution of code, GCD also provides a fully integrated event handling system. Handlers can be set up to respond to events on file descriptors, mach ports, and processes, to timers and signals, and to user-generated events. These handlers are executed through the GCD facilities for concurrent execution.

GCD’s API is heavily based around blocks, which I talked about in previous Friday Q&A’s, first to introduce the basics of blocks and then to discuss the practical aspects of using blocks in real-world code. While GCD can be used without blocks, by using the traditional C mechanism of providing a function pointer and a context pointer, it’s vastly easier to use and ultimately more capable, from a practical standpoint, when used with blocks.

For documentation on GCD, start with man dispatch on a Snow Leopard machine.

Why Use It? GCD offers many advantages over traditional multi-threaded programming:

  • Ease of use: GCD is much easier to work with than threads. Because it’s based around work units rather than threads of computation, it can take care of common tasks such as waiting for work to finish, monitoring file descriptors, executing code periodically, and suspending work. The blocks-based APIs make it extremely easy to pass context between different sections of code.

  • Efficiency: GCD is implemented in a lightweight manner which makes it practical and fast to use GCD in many places where creating dedicated threads is too costly. This ties into ease of use: part of what makes GCD so easy to use is that for the most part you can just use it, and not worry too much about using it efficiently.

  • Performance: GCD automatically scales its use of threads according to system load, which in turn leads to fewer context switches and more computational efficiency.

Dispatch queues and dispatch sources (more on what these are later) can be suspended and resumed, can have an arbitrary context pointer associated with them, and can have a finalizer function associated with them. For more information on these facilities, see man dispatch_object.

Dispatch Queues A fundamental concept in GCD is that of the dispatch queue. A dispatch queue is an object which accepts jobs and which executes them in the order in which they arrive. A dispatch queue can either be concurrent or serial. A concurrent queue will execute many jobs simultaneously, as appropriate for system load, much like NSOperationQueue. A serial queue will only execute a single job at a time.

There are three main types of queues in GCD:

  • The main queue: Analogous to the main thread. In fact, jobs submitted to the main queue execute on the main thread of the process. The main queue can be obtained by calling dispatch_get_main_queue(). Since the main queue is inherently tied to the main thread, it is a serial queue.

  • Global queues: Global queues are concurrent queues shared through the entire process. Three global queues exist: a high, a default, and a low priority queue. Global queues can be accessed by calling dispatch_get_global_queue and telling it which priority you want.

  • Custom queues: Custom queues (GCD does not call them this, but doesn’t have a specific name for these, so I call them “custom”) are queues created with the dispatch_queue_create function. These are serial queues which only execute one job at a time. Because of this, they can be used as a synchronization mechanism, much like a mutex in a traditional threaded program.

Submitting Jobs Submitting a job to a queue is easy: call the dispatch_async function, and pass it a queue and a block. The queue will then execute that block when it’s that block’s turn to execute. Here is an example of executing some long-running job in the background using a global queue:

dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
[self goDoSomethingLongAndInvolved];
NSLog(@"Done doing something long and involved");
});

Of course, it’s not really very useful to perform an NSLog when the work is done. In a typical Cocoa application, you probably want to update a part of your GUI, and that in turn means running code on the main thread. You can easily accomplish this by using nested dispatches, with the outer one performing the background work, and then from within the background block dispatching onto the main queue, like this:

dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
[self goDoSomethingLongAndInvolved];
dispatch_async(dispatch_get_main_queue(), ^{
[textField setStringValue:@"Done doing something long and involved"];
});
});
__block NSString *stringValue;
dispatch_sync(dispatch_get_main_queue(), ^{
// __block variables aren't automatically retained
// so we'd better make sure we have a reference we can keep
stringValue = [[textField stringValue] copy];
});
[stringValue autorelease];
// use stringValue in the background now
dispatch_queue_t bgQueue = myQueue;
dispatch_async(dispatch_get_main_queue(), ^{
NSString *stringValue = [[[textField stringValue] copy] autorelease];
dispatch_async(bgQueue, ^{
// use stringValue in the background now
});
});

Replacing Locks Custom queues can be used as a synchronization mechanism in place of locks. In traditional multi-threaded programming, you might have an object which is designed to be usable from multiple threads. In order to accomplish this, it protects all accesses to shared data using a lock, which you might find in an instance variable:

NSLock *lock;
- (id)something
{
id localSomething;
[lock lock];
localSomething = [[something retain] autorelease];
[lock unlock];
return localSomething;
}
- (void)setSomething:(id)newSomething
{
[lock lock];
if(newSomething != something)
{
[something release];
something = [newSomething retain];
[self updateSomethingCaches];
}
[lock unlock];
}
dispatch_queue_t queue;
- (id)something
{
__block id localSomething;
dispatch_sync(queue, ^{
localSomething = [something retain];
});
return [localSomething autorelease];
}
- (void)setSomething:(id)newSomething
{
dispatch_async(queue, ^{
if(newSomething != something)
{
[something release];
something = [newSomething retain];
[self updateSomethingCaches];
}
});
}

At this point you may be asking, this is all well and good, but what’s the point? I just switched code from one mechanism to another mechanism that looks pretty much the same. Why would you do this?

There are actually several advantages to the GCD approach:

  • Parallelism: Notice how -setSomething: uses dispatch_async in the second version of the code. This means that the call to -setSomething: will return right away, and then the bulk of the work will happen in the background. This could be a significant win if updateSomethingCaches is a costly operation and the caller will be doing something processor intensive as well.

  • Safety: It’s impossible to accidentally write a code path that doesn’t unlock the lock using GCD. In normal locked code it’s not unusual to inadvertently put a return statement in the middle of the lock, or conditionalize the exit, or something equally unfortunate. With GCD, the queue always continues to run and you can’t help but return control to it normally.

  • Control: It’s possible to suspend and resume dispatch queues at will, which cannot easily be done with a locks-based approach. It’s also possible to point a custom queue at another dispatch queue, making it inherit the attributes of that other dispatch queue. Using this, the priority of the queue can be adjusted by making it point to the different global queues, and the queue can even be made to execute code on the main thread if this were required for some reason.

  • Integration: The GCD event system integrates with dispatch queues. Any events or timers that the object needs to use can be pointed at the object’s queue, causing the handlers to automatically run on that queue, making them automatically synchronized with the object.

Conclusion Now you know the basics of Grand Central Dispatch, how to create dispatch queues, how to submit jobs to dispatch queues, and how to use queues as a substitute for locks in multithreaded programs. Next week I’ll show you techniques for using GCD to write code which performs parallel processing to extract more performance out of multi-core systems. And in the coming weeks, I’ll discuss more of GCD in depth, including the event system and queue targeting.

That wraps up this week’s Friday Q&A. Come back next week for more GCD goodness. And just because I have a subject mapped out for a few weeks doesn’t mean I don’t need your suggestions. Quite the contrary: Friday Q&A is driven by your suggestions, and the more I have, the better posts I’ll be able to make when I get to the end of this series. If you have a suggestion for a topic to cover, please post it in the comments or e-mail it directly to me.