译文 · 原文: Friday Q&A 2009-09-11: Intro to Grand Central Dispatch, Part III: Dispatch Sources · 作者 Mike Ash
原文:https://www.mikeash.com/pyblog/friday-qa-2009-09-11-intro-to-grand-central-dispatch-part-iii-dispatch-sources.html 发布:2009-09-11 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样
欢迎回来阅读本期周五问答。本周我将继续讨论前两周开启的 Grand Central Dispatch 话题。过去两周我主要聚焦于 dispatch queues(调度队列)。这周我将探讨 dispatch sources(调度源)—— 它们的工作原理以及使用方法。
请注意,我假设你已经读过本系列的前两篇文章。第一篇尤为重要,第二篇就相对次要一些。如果还没读过,请现在去阅读。
在继续之前,本周有一个好消息:GCD 已经开源了!这是苹果非常不错的举动。代码相对清晰,读起来也很有趣。
Dispatch Sources 是什么 简而言之,dispatch source 是一个用于监控某种类型事件的对象。当事件发生时,它会自动将一个 block(代码块)调度到 dispatch queue 上执行。
这解释有点模糊。具体是指哪些事件呢?
以下是 10.6.0 版本的 GCD 支持的完整事件列表:
-
Mach port 发送权限状态变更。(译注:Mach port 是 Mac OS X / iOS 内核底层的核心 IPC 机制,相关 API 在现代系统中已逐步被抽象和替代)
-
Mach port 接收权限状态变更。
-
外部进程状态变更。
-
文件描述符可读。
-
文件描述符可写。
-
文件系统节点事件。
-
POSIX 信号。(译注:POSIX 信号是一种传统的 Unix 进程间通信机制,在 GCD 中主要用于在特定信号发生时触发 block)
-
自定义定时器。
-
自定义事件 (Custom Event)。
自定义事件
这些事件大多不言自明,但你可能会好奇什么是自定义事件 (Custom Event)。简而言之,这是一种由你自己通过调用 dispatch_source_merge_data 函数来发出的事件。
对于一个用于触发事件的函数来说,这个名字有点奇怪。它之所以这样命名,是因为 GCD 会自动合并 (coalesce) 在事件处理器 (event handler) 有机会运行之前发生的多个事件。你可以多次向调度源 (dispatch source) “合并” 数据,如果整个期间调度队列一直很忙,GCD 将只调用事件处理器一次。
有两种类型的自定义事件可用:DISPATCH_SOURCE_TYPE_DATA_ADD 和 DISPATCH_SOURCE_TYPE_DATA_OR。一个自定义事件源拥有一个 unsigned long 类型的 data 属性,你也需要向 dispatch_source_merge_data 传递一个 unsigned long 值。使用 _ADD 变体时,事件通过将所有数字相加来合并。使用 _OR 变体时,事件通过进行逻辑或操作来合并。当事件处理器执行时,它可以通过 dispatch_source_get_data 访问当前值,随后数据被重置为 0。
让我们来看一个实际应用场景。假设有一些异步代码正在执行工作,需要更新进度条。由于对于 GCD 而言,主线程(main thread)只是另一个 dispatch queue(派发队列),我们可以将 GUI 更新工作推送至主队列。然而事件可能非常频繁,我们不想对 GUI 进行冗余更新;如果主线程正忙于处理其他工作,最好能尽可能合并所有变更。
此时使用 dispatch sources(派发源)的 DISPATCH_SOURCE_TYPE_DATA_ADD 类型就非常合适。我们可以合并已完成的工作量,这样主线程代码就能获取自上次事件以来完成的工作量,并按此更新进度指示器。
闲言少叙,上代码:
dispatch_source_t source = dispatch_source_create(DISPATCH_SOURCE_TYPE_DATA_ADD, 0, 0, dispatch_get_main_queue()); dispatch_source_set_event_handler(source, ^{ [progressIndicator incrementBy:dispatch_source_get_data(source)]; }); dispatch_resume(source);
dispatch_apply([array count], globalQueue, ^(size_t index) { // do some work on data at index dispatch_source_merge_data(source, 1); });假设你已正确配置了进度指示器的最小 / 最大值,上述机制将完美运作。数据会以并行方式处理,当每个数据块处理完成时,它会向调度源(dispatch source)发送信号,并将调度源数据加 1(我们将其视为已完成的工作单元数)。事件处理程序(event handler)会根据自上次运行以来已完成的工作单元数量来递增进度指示器。如果主线程空闲且工作单元完成速度较慢,则每次工作单元完成时都会调用事件处理程序,从而获得实时结果。如果主线程繁忙或工作单元完成速度很快,完成事件将被合并,并且每次主线程有空处理时,进度指示器仅更新一次。
听到这里你可能会想,这听起来都不错,但如果我不希望事件被合并呢?有时你只是希望每个信号都能触发一次动作,不需要背后有任何智能处理。这其实非常简单,只需要换个思路。如果你希望每个信号都触发动作,那就使用 dispatch_async 而不是 dispatch source。毕竟,dispatch_async 的功能正是:将指定的 block(代码块)调度到队列上执行。事实上,使用 dispatch source 而非 dispatch_async 的唯一理由,就是为了利用其合并(coalescing)特性。
内置事件 那么这就是自定义事件的使用方法,内置事件呢?让我们来看一个使用 GCD 读取标准输入(standard input)的例子:
dispatch_queue_t globalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_source_t stdinSource = dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, STDIN_FILENO, 0, globalQueue); dispatch_source_set_event_handler(stdinSource, ^{ char buf[1024]; int len = read(STDIN_FILENO, buf, sizeof(buf)); if(len > 0) NSLog(@"Got data from stdin: %.*s", len, buf); }); dispatch_resume(stdinSource);这种方式相比标准的 UNIX 实现方式还有一个优势:无需编写循环。在典型read调用中,开发者必须保持警惕,因为返回数据可能少于请求量,还可能遭遇瞬时” 错误” 如 EINTR(被中断的系统调用)。而使用 GCD 时,遇到这些情况可以直接中止操作不做处理。如果文件描述符上有未读数据,GCD 会再次调用您的处理函数。
对于标准输入这不成问题,但对于其他文件描述符,需要考虑读写完成后如何正确清理资源。在 dispatch source(派发源)仍处于活动状态时,绝不能关闭文件描述符。若有其他线程创建了新文件描述符(可能来自另一线程)且恰好获得相同编号,您的 dispatch source 会突然开始读写错误的目标。这种情况调试起来将极其棘手。
实现正确清理的方式是使用 dispatch_source_set_cancel_handler,并提供一个用于关闭文件描述符的 block。之后你可以调用 dispatch_source_cancel 来取消 dispatch source,从而触发处理器被调用并关闭文件描述符。
使用其他类型的 dispatch source 也大致相同。通常,你需要将源的标识符(mach port、文件描述符、进程 ID 等)作为 dispatch source 的 handle(句柄)。mask 参数通常未被使用,但对于 DISPATCH_SOURCE_TYPE_PROC 则指明了你希望接收哪些类型的进程事件。然后只需提供一个处理器,恢复(resume)这个源,一切便可运行。这些 dispatch source 还提供特定于源的数据,可通过 dispatch_source_get_data 函数访问。例如,文件描述符类型的源会提供该描述符上可读取的大致字节数作为 dispatch source 数据。进程源则提供自上次调用以来发生的事件掩码。关于每种源类型数据含义的完整列表,请参阅其 man page(手册页)。
定时器 定时器事件有些不同。它们不使用 handle / mask 参数,而是使用一个独立的函数 dispatch_source_set_timer 来配置定时器。该函数接受三个独立的参数来控制定时器何时触发:
start 参数控制定时器首次触发的时间。此参数类型为 dispatch_time_t,这是一种不透明类型,不能直接操作。可以使用 dispatch_time 和 dispatch_walltime 函数来创建它们,如果需要这些特定值,也可以使用常量 DISPATCH_TIME_NOW 和 DISPATCH_TIME_FOREVER。
interval(间隔)参数是一个整数,含义不言自明。leeway(宽裕量)参数则比较有趣。该参数告诉系统,你希望定时器触发的精确程度如何。定时器永远无法保证绝对 100% 精确,但此参数让你可以告知系统你希望它尽多大努力去接近精确。如果你希望一个定时器每 5 秒触发一次且尽可能精确,可以传入 0。另一方面,考虑诸如检查新邮件这类周期性任务。你希望每 10 分钟检查一次,但这不必非常精确。你可能会传入 60 秒的宽裕量,告诉系统:你允许定时器比计划时间最多延迟 60 秒执行。
这样做的意义何在?简而言之,是为了降低功耗。如果操作系统能让 CPU 尽可能长时间地休眠,然后在其唤醒时一次性完成大量任务,而不是为了分散地执行任务在休眠与唤醒之间频繁切换,那么能源利用效率会更高。通过给予定时器较大的宽裕量,你允许系统将你的定时器与其他操作合并执行,从而像这样把多个任务集中在一起处理。
现在,你已经了解了如何使用 GCD 的派发源(dispatch source)设施来监控文件描述符、运行定时器、合并自定义事件以及其他类似活动。由于派发源与派发队列(dispatch queue)完全集成,你可以使用任何可用的派发队列。你可以让派发源在主线程上运行其处理程序,或者通过使用自定义队列,在其中一个全局队列(global queue)上并行运行,或者与程序中特定模块的执行串行化。
这就是本周的全部内容。下周请回来,届时我将结束对 Grand Central Dispatch 的讨论,并讲解如何暂停、恢复和重定向派发队列,如何使用派发信号量(dispatch semaphore),以及如何使用 GCD 的一次性初始化设施(one-time initialization facility)。一如既往,如果你对未来 Friday Q & A 的主题有任何建议,请在评论区留言或直接发邮件给我。
Original (English)
Welcome back to another Friday Q&A. This week I continue the discussion of Grand Central Dispatch from the past two weeks. In the last two weeks I mainly focused on dispatch queues. This week I’m going to examine dispatch sources, how they work, and how to use them.
Note that I assume you’ve already read the first two posts in this series. The first post is particularly important, the second one less so. If you have not, go read them now.
Before I go any further, there’s been some great news this week: GCD has been open sourced! This is a very nice move on Apple’s part. The source is relatively clean and very interesting to read through.
What Are Dispatch Sources In short, a dispatch source is an object which monitors for some type of event. When the event occurs, it automatically schedules a block for execution on a dispatch queue.
That’s kind of vague. What kind of events are we talking about?
Here is the full list of events supported by GCD in 10.6.0:
-
Mach port send right state changes.
-
Mach port receive right state changes.
-
External process state change.
-
File descriptor ready for read.
-
File descriptor ready for write.
-
Filesystem node event.
-
POSIX signal.
-
Custom timer.
-
Custom event.
Custom Events Most of these events are pretty much self explanatory, but you may be wondering what a custom event is. In short, this is an event which you signal yourself by calling the dispatch_source_merge_data function.
This is a bit of an odd name for a function that signals an event. The reason it’s named this way is because GCD will automatically coalesce multiple events that happen before the event handler has a chance to run. You can “merge” data into the dispatch source as many times as you want, and if the dispatch queue was busy for this whole period, GCD will only invoke the event handler once.
Two types of custom events are available, DISPATCH_SOURCE_TYPE_DATA_ADD and DISPATCH_SOURCE_TYPE_DATA_OR. A custom event source has an unsigned long data attribute, and you also pass an unsigned long to dispatch_source_merge_data. When using the _ADD variant, events are coalesced by adding all of the numbers together. When using the _OR variant, events are coalesced by doing a logical or. When the event handler executes, it can access the current value using dispatch_source_get_data, and the data is then reset to 0.
Let’s look at a scenario where this could be useful. Imagine some asynchronous code performing some work that needs to update a progress bar. Since the main thread is just another dispatch queue to GCD, we can push the GUI work onto the main queue. However, there may be a lot of events, and we don’t want to make redundant updates to the GUI; it’s much better to coalesce all of the changes as much as possible if the main thread is busy with other work.
Dispatch sources are perfect for this, using the DISPATCH_SOURCE_TYPE_DATA_ADD type. We can merge the amount of work done, and then the main thread code can find out how much work has been performed since the last event, and update the progress indicator by that amount.
Enough talk, here’s some code:
dispatch_source_t source = dispatch_source_create(DISPATCH_SOURCE_TYPE_DATA_ADD, 0, 0, dispatch_get_main_queue()); dispatch_source_set_event_handler(source, ^{ [progressIndicator incrementBy:dispatch_source_get_data(source)]; }); dispatch_resume(source);
dispatch_apply([array count], globalQueue, ^(size_t index) { // do some work on data at index dispatch_source_merge_data(source, 1); });Assuming you’ve configured the progress indicator to have the correct min/max value, this will all work perfectly. The data will be processed in parallel. As each chunk of data finishes, it signals the dispatch source and adds 1 to the dispatch source data, which we treat as the number of work units completed. The event handler increments the progress indicator by the number of work units that have been completed since the last time it ran. If the main thread is idle and work units complete slowly, the event handler will be called for every work unit completion, giving real time results. If the main thread is busy or work units complete quickly, completion events will be coalesced and the progress indicator will only be updated one time each time the main thread becomes available to process it.
At this point you may be thinking, this all sounds great, but what if I don’t want my events to be coalesced? Sometimes you just want every signal to cause an action, without any smarts going on behind the scenes. Well, this is actually really easy, you just need to think a bit outside the box. If you want every signal to cause an action, use dispatch_async instead of a dispatch source. That’s what it does, after all: schedules a block to be executed on the queue in question. In fact, the only reason to use a dispatch source instead of dispatch_async is to take advantage of coalescing.
Built-In Events That’s how to use a custom event, how about a built-in event? Let’s look at an example of reading from standard input using GCD:
dispatch_queue_t globalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_source_t stdinSource = dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, STDIN_FILENO, 0, globalQueue); dispatch_source_set_event_handler(stdinSource, ^{ char buf[1024]; int len = read(STDIN_FILENO, buf, sizeof(buf)); if(len > 0) NSLog(@"Got data from stdin: %.*s", len, buf); }); dispatch_resume(stdinSource);This also has a nice benefit over the standard UNIX way of doing things in that there’s no need to write a loop. With typical calls to read, you always have to be wary because it can return less data than requested, and can also suffer from transient “errors” like EINTR (interrupted system call). With GCD, you can just bail out in those cases and not do anything. If you leave unread data on the file descriptor, GCD will just invoke your handler a second time.
For standard input it’s not a problem, but for other file descriptors you need to consider how to clean up once you’re done reading from (or writing to) the descriptor. You must not close the descriptor while the dispatch source is still active. If another file descriptor is created (perhaps from another thread) and happens to get the same number, your dispatch source will suddenly be reading from (or writing to) something it shouldn’t be. This will not be fun to debug.
The way to properly implement cleanup is to use dispatch_source_set_cancel_handler and give it a block which closes your file descriptor. You can then use dispatch_source_cancel to cancel the dispatch source, causing the handler to be invoked and the file descriptor to be closed.
Using other dispatch source types is much the same. In general, you give the identifier of the source (mach port, file descriptor, process ID, etc.) as the dispatch source handle. The mask argument is usually unused, but for DISPATCH_SOURCE_TYPE_PROC indicates what kind of process events you’re interested in receiving. Then just provide a handler, resume the source, and off you go. These dispatch sources also provide source-specific data which can be accessed using the dispatch_source_get_data function. For example, file descriptors will give the rough number of bytes available on the descriptor as the dispatch source data. Process sources will give a mask of events which occurred since the last call. For a complete listing of what the data means for each type of source, see the man page.
Timers Timer events are a bit different. They don’t use the handle/mask arguments, but instead use a separate function, dispatch_source_set_timer, to configure the timer. This function takes three separate parameters to control when the timer fires:
The start parameter controls when the timer first fires. This parameter is of type dispatch_time_t, which is an opaque type that you can’t manipulate directly. The functions dispatch_time and dispatch_walltime can be used to create them, and the constants DISPATCH_TIME_NOW and DISPATCH_TIME_FOREVER can be used if those are the values you’re after.
The interval argument is an integer and is self explanatory. The leeway argument is an interesting one. This argument tells the system how much precision you want on your timer firing. Timers are never guaranteed to be absolutely 100% precise, but this argument lets you tell the system how hard you want it to try. If you want a timer to fire every 5 seconds and be as exact as possible, you would pass 0. On the other hand, consider a periodic task like checking for new e-mail. You want to check every 10 minutes, but this doesn’t have to be exact. You might pass a leeway of 60 seconds, telling the system that you’ll accept the timer running up to 60 seconds later than the scheduled time.
What’s the point of this? In short, reduced power consumption. It’s more energy efficient if the OS can let the CPU sleep for as long as possible, and then accomplish a bunch of things at once when it wakes up, rather than cycling between sleep and wake constantly to accomplish tasks in a spread-out manner. By giving a large leeway to your timer, you allow the system to lump your timer with other actions in order to group tasks together like this.
Conclusion Now you know how to use GCD’s dispatch source facilities to monitor file descriptors, run timers, coalesce custom events, and other similar activities. Because dispatch sources are fully integrated with dispatch queues, you can use any dispatch queue you have available. You can have a dispatch source run its handler on the main thread, in parallel on one of the global queues, or serialized with respect to a particular module of your program by using a custom queue.
That’s it for this week. Come back next week as I wrap up the discussion of Grand Central Dispatch and talk about how to suspend, resume, and retarget dispatch queues, how to use dispatch semaphores, and how to use GCD’s one-time initialization facility. As always, if you have a suggestion for a topic to cover for a future Friday Q&A, please post it in the comments or e-mail it directly to me.