动手实现 NSNumber | TommyWu's Lab

文章發布時間 2012年7月6日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2012-07-06: Let's Build NSNumber · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2012-07-06-lets-build-nsnumber.html 发布：2012-07-06　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

NSNumber 是一个看似简单但实现细节有趣的类。在今天的 Friday Q & A 专栏中，我将探讨如何构建一个类似 NSNumber 的类，这个主题由 Jay Tamboli 提出。

概览

像许多（但不是所有）面向对象语言一样，「Objective-C（一种面向对象编程语言）」在「objects（对象）」和「non-objects（非对象）」之间存在分界线。对象可以响应「messages（消息）」，可以在「runtime（运行时）」被查询而无需知道其确切类型，可以放入「collections（集合）」，可以比较「equality（相等性）」，并共享一组共同的行为。非对象主要是「compile-time（编译时）」构造，它们的所有类型信息在运行时基本上消失了。在 Objective-C 中，这些非对象是所有来自 C 的东西，从整数 42 到字符串 “Hello, world” 再到复杂的「structs（结构体）」。

Boxing（装箱）是将这些非对象放入对象中，以便它们可以像其他对象一样使用的过程，通常是为了将它们放入集合中。NSNumber 是用于装箱 C 数字的 Cocoa 类。你不能拥有一个 int 类型的 NSArray，但可以拥有一个 NSNumber 类型的 NSArray。NSNumber 在 Cocoa 编程中经常出现。几乎在任何使用 Cocoa 集合存储数字的地方，NSNumber 都会出现。在许多其他地方，当你要求 NSUserDefaults 保存一个数字时，它存储和检索的就是 NSNumber 对象。我们的替代 NSNumber 将被称为 MANumber。与 Cocoa 版本不同，Cocoa 版本是更通用的装箱类 NSValue 的子类，这个版本将直接继承自 NSObject：

1
    @interface MANumber : NSObject

初始化一个实例有很多方法。每种 C 数值类型都有一个对应的初始化方法，另外还有一些针对 Cocoa 特有类型的额外方法：

1
    - (id)initWithChar:(char)value;
2
    - (id)initWithUnsignedChar:(unsigned char)value;
3
    - (id)initWithShort:(short)value;
4
    - (id)initWithUnsignedShort:(unsigned short)value;
5
    - (id)initWithInt:(int)value;
6
    - (id)initWithUnsignedInt:(unsigned int)value;
7
    - (id)initWithLong:(long)value;
8
    - (id)initWithUnsignedLong:(unsigned long)value;
9
    - (id)initWithLongLong:(long long)value;
10
    - (id)initWithUnsignedLongLong:(unsigned long long)value;
11
    - (id)initWithFloat:(float)value;
12
    - (id)initWithDouble:(double)value;
13
    - (id)initWithBool:(BOOL)value;
14
    - (id)initWithInteger:(NSInteger)value;
15
    - (id)initWithUnsignedInteger:(NSUInteger)value;

这些类型也有对应的 getter 方法。

1
    - (char)charValue;
2
    - (unsigned char)unsignedCharValue;
3
    - (short)shortValue;
4
    - (unsigned short)unsignedShortValue;
5
    - (int)intValue;
6
    - (unsigned int)unsignedIntValue;
7
    - (long)longValue;
8
    - (unsigned long)unsignedLongValue;
9
    - (long long)longLongValue;
10
    - (unsigned long long)unsignedLongLongValue;
11
    - (float)floatValue;
12
    - (double)doubleValue;
13
    - (BOOL)boolValue;
14
    - (NSInteger)integerValue;
15
    - (NSUInteger)unsignedIntegerValue;

需要注意的是，无论使用了哪种初始化方法（initializers），这些 getter 中的任何一个都能正常工作。MANumber 将需要执行适当的转换。

最后，还有一些用于字符串转换和比较的其他方法：

1
    - (NSString *)stringValue;
2
    - (NSComparisonResult)compare:(MANumber *)otherNumber;
3
    - (BOOL)isEqualToNumber:(MANumber *)number;
4
    - (NSString *)descriptionWithLocale:(id)locale;

实现策略 MANumber 将使用 union（联合体）来存储底层的数值。union 是标准 C 语言中一个较少使用的特性。它看起来和 struct（结构体）很像，但工作方式不同。struct 将多个值存储在一个位置。union 也是如此，但你只能访问你最后存储的那个值。当你往 union 里存入一个值时，所有其他字段的值都变得未定义。

按照 C 语言那种典型不友好但高效的风格，编译器不会强制执行这条规则，也不会通过比如让你查询最后设置的是哪个字段来帮助你遵循它。你必须自己跟踪这一点，通常用一个配合的 enum（枚举类型）来实现。

这个 union 可以用来存放每一种 C 数值类型，配合一个大型的枚举来说明正在使用的是哪一种。然而，这没必要地复杂。我们真正需要的只是三个字段：最大的可能整数类型、最大的可能无符号整数类型和最大的可能浮点数类型。根据我们需要处理的类型，它们分别是 long long、unsigned long long 和 double。其他所有类型都可以在不损失精度的情况下与这些类型进行相互转换。

这个实现并不完全精确匹配 NSNumber 的实际行为，后者会追踪创建它时所使用的具体类型。然而，使用这三种类型已经足够接近，并且消除了大量重复代码。事实上，NSNumber 精确追踪原始类型这一特性在大多数时候并不明显，只有在使用类似 -descriptionWithLocale: 或 -objCType 这样的方法时才会体现出来。

存储以下是实例变量：

1
    @implementation MANumber {
2
        enum { INT, UINT, DOUBLE } _type;
3
        union {
4
            long long i;
5
            unsigned long long u;
6
            double d;
7
        } _value;
8
    }

_type 变持有一个匿名枚举（enum），用于标识数值是 INT（有符号长长整型 long long）、UINT（无符号长长整型 unsigned long long）还是 DOUBLE（浮点类型）。_value 变量则通过联合体（union）存储实际的数值，这样只需保存单一值。

在初始化方法中，代码会设置 _type 并写入对应的 _value。随后的访问器（getter）会检查 _type 并按需提取值。

初始化方法

这里存在大量样板代码来处理各种类型。所有有符号整型最终都会调用 initWithLongLong:，无符号类型则调用 initWithUnsignedLongLong:。

1
    - (id)initWithChar:(char)value
2
    {
3
        return [self initWithLongLong: value];
4
    }
5

6
    - (id)initWithUnsignedChar:(unsigned char)value
7
    {
8
        return [self initWithUnsignedLongLong: value];
9
    }
10

11
    - (id)initWithShort:(short)value
12
    {
13
        return [self initWithLongLong: value];
14
    }
15

16
    - (id)initWithUnsignedShort:(unsigned short)value
17
    {
18
        return [self initWithUnsignedLongLong: value];
19
    }
20

21
    - (id)initWithInt:(int)value
22
    {
23
        return [self initWithLongLong: value];
24
    }
25

26
    - (id)initWithUnsignedInt:(unsigned int)value
27
    {
28
        return [self initWithUnsignedLongLong: value];
29
    }
30

31
    - (id)initWithLong:(long)value
32
    {
33
        return [self initWithLongLong: value];
34
    }
35

36
    - (id)initWithUnsignedLong:(unsigned long)value
37
    {
38
        return [self initWithUnsignedLongLong: value];
39
    }
40

41
    - (id)initWithBool:(BOOL)value
42
    {
43
        return [self initWithLongLong: value];
44
    }
45

46
    - (id)initWithInteger:(NSInteger)value
47
    {
48
        return [self initWithLongLong: value];
49
    }
50

51
    - (id)initWithUnsignedInteger:(NSUInteger)value
52
    {
53
        return [self initWithUnsignedLongLong: value];
54
    }

这些初始化方法（initialisers）只需设置 _type 和 _value，然后返回 self。（注意：为简洁起见，我省略了对 [super init] 的传统调用，虽然当父类是 NSObject 时这不是严格必需的，但仍是好习惯。）

1
    - (id)initWithLongLong:(long long)value
2
    {
3
        _type = INT;
4
        _value.i = value;
5
        return self;
6
    }
7

8
    - (id)initWithUnsignedLongLong:(unsigned long long)value
9
    {
10
        _type = UINT;
11
        _value.u = value;
12
        return self;
13
    }

浮点初始化器的实现也类似。用于 float 类型的初始化器只是简单地调用 initWithDouble: 方法，而后者则直接设置 _type 和 _value 字段的值：

1
    - (id)initWithFloat:(float)value
2
    {
3
        return [self initWithDouble: value];
4
    }
5

6
    - (id)initWithDouble:(double)value
7
    {
8
        _type = DOUBLE;
9
        _value.d = value;
10
        return self;
11
    }

Getters

getter（getter 方法）甚至比 initializer（初始化方法）更加相似。它们都检查 _type，然后返回 _value 的相应字段。编译器会处理从 _value 的活动字段到请求的返回类型的最终转换。由于这些方法都包含相同的代码，这是用 macro（宏）来封装相同部分的完美候选。这是一个检查 _type 然后返回 _value 的相应字段的宏：

1
    #define RETURN() do { \
2
            if(_type == INT) \
3
                return _value.i; \
4
            else if(_type == UINT) \
5
                return _value.u; \
6
            else \
7
                return _value.d; \
8
        } while(0)

有了那个宏，getter 方法几乎可以自动生成了：

1
    - (char)charValue
2
    {
3
        RETURN();
4
    }
5

6
    - (unsigned char)unsignedCharValue
7
    {
8
        RETURN();
9
    }
10

11
    - (short)shortValue
12
    {
13
        RETURN();
14
    }
15

16
    - (unsigned short)unsignedShortValue
17
    {
18
        RETURN();
19
    }
20

21
    - (int)intValue
22
    {
23
        RETURN();
24
    }
25

26
    - (unsigned int)unsignedIntValue
27
    {
28
        RETURN();
29
    }
30

31
    - (long)longValue
32
    {
33
        RETURN();
34
    }
35

36
    - (unsigned long)unsignedLongValue
37
    {
38
        RETURN();
39
    }
40

41
    - (long long)longLongValue
42
    {
43
        RETURN();
44
    }
45

46
    - (unsigned long long)unsignedLongLongValue
47
    {
48
        RETURN();
49
    }
50

51
    - (float)floatValue
52
    {
53
        RETURN();
54
    }
55

56
    - (double)doubleValue
57
    {
58
        RETURN();
59
    }
60

61
    - (NSInteger)integerValue
62
    {
63
        RETURN();
64
    }
65

66
    - (NSUInteger)unsignedIntegerValue
67
    {
68
        RETURN();
69
    }

这些宏调用看起来既乏味又冗长。

在这片宏调用的海洋中，唯一例外的是 -boolValue 方法。由于 BOOL 伪装成一个真正的布尔类型，对于 MANumber 对象中存储的任何非零值，该方法都应始终返回 YES。编译器的内置转换无法实现这一点。例如，整数 256 如果被转换为 BOOL，会返回 NO，因为 BOOL 只是一个有符号字符类型，即一个 8 位整数。因此，-boolValue 复制了宏的逻辑，但增加了一个对零值的显式检查：

1
    - (BOOL)boolValue
2
    {
3
        if(_type == INT)
4
            return _value.i != 0;
5
        else if(_type == UINT)
6
            return _value.u != 0;
7
        else
8
            return _value.d != 0;
9
    }

字符串转换

存在两种字符串转换方法：-stringValue 和 -descriptionWithLocale:。-stringValue 只是以 nil 作为参数调用 -descriptionWithLocale:。

1
    - (NSString *)stringValue
2
    {
3
        return [self descriptionWithLocale: nil];
4
    }

-descriptionWithLocale: 方法通过 -[NSString initWithFormat:locale:] 来构建字符串。这里没有处理不同数字类型的特殊方法，因此它简单地检查 _type 并为每种情况使用不同的格式字符串：

1
    - (NSString *)descriptionWithLocale:(id)locale
2
    {
3
        if(_type == INT)
4
            return [[NSString alloc] initWithFormat: @"%lld" locale: locale, _value.i];
5
        else if(_type == UINT)
6
            return [[NSString alloc] initWithFormat: @"%llu" locale: locale, _value.u];
7
        else
8
            return [[NSString alloc] initWithFormat: @"%f" locale: locale, _value.d];
9
    }

注意我这里使用了 ARC（自动引用计数），因此没有autorelease（自动释放）调用。

比较比较方法变得有趣起来，因为它们需要在不同类型 MANumber 对象之间工作。例如，双精度浮点数 - 1.1 应该比无符号整数 99999 更小。

类型排列共有九种情况，因此需要处理九个不同场景。通过强制排序，可以将这种情况减少到只有六种。如果两个对象具有 INT 和 UINT 类型，通过仅处理 self 为 INT 且另一个对象为 UINT 的情况，并在两者顺序相反时交换它们，可以将这两种情况合并为一种处理方式。

为了方便不同数据类型之间的比较，我编写了一个简单的宏，该宏接收两个数值并返回相应的 NSComparisonResult（比较结果枚举）。它的功能仅仅是接收两个参数，将其保存到临时变量中以避免多次求值，然后根据它们的大小关系返回相应的常量。这里还涉及一些浮点数的技巧。对于浮点数，NAN（非数字）永远不会与任何值相等，并且所有涉及 NAN 的比较结果都为假。由于 NSComparisonResult 无法表示 “该数字不等于任何值，甚至不等于自身” 这样的排序结果，为了 MANumber 比较的目的，我任意决定让 NAN 等于自身，并且小于任何其他数字：

1
    #define COMPARE(a, b) do { \
2
            __typeof__(a) __a_local = a; \
3
            __typeof__(b) __b_local = b; \
4
            BOOL __a_isnan = isnan(__a_local); \
5
            BOOL __b_isnan = isnan(__b_local); \
6
            if(__a_isnan && __b_isnan) \
7
                return NSOrderedSame; \
8
            else if(__a_isnan) \
9
                return NSOrderedAscending; \
10
            else if(__b_isnan) \
11
                return NSOrderedDescending; \
12
            else if(__a_local > __b_local) \
13
                return NSOrderedDescending; \
14
            else if(__a_local < __b_local) \
15
                return NSOrderedAscending; \
16
            else \
17
                return NSOrderedSame; \
18
        } while(0)

「comparison method（比较方法）」本身做的第一件事是提取要比较的两个对象的类型：

1
    - (NSComparisonResult)compare:(MANumber *)otherNumber
2
    {
3
        int selfType = _type;
4
        int otherType = otherNumber->_type;

如果两个类型顺序不对，我们会通过将参数反转后再次调用 compare: 方法来反转比较，并返回结果的取反值。由于 NSComparisonResult（比较结果）只是 -1、0 或 1，我们可以通过对它取反来反转其含义：

1
        if(selfType > otherType)
2
            return -[otherNumber compare: self];

现在我们剩下排序后的类型。共有六种情况。如果 selfType 是 INT，那么 otherType 可以是任何类型。如果 selfType 是 UINT，那么 otherType 只能是 UINT 或 DOUBLE。如果 selfType 是 DOUBLE，那么 otherType 也必须是 DOUBLE。

让我们看看 selfType 是 INT 的情况。如果两个值都是 INT，处理起来很简单：

1
        if(selfType == INT)
2
        {
3
            if(otherType == INT)
4
            {
5
                COMPARE([self longLongValue], [otherNumber longLongValue]);
6
            }

如果 otherType 是 UINT，则需要额外处理一些。直接与 [otherNumber unsignedLongLongValue] 比较是行不通的。C 语言会在比较前将 [self longLongValue] 提升为无符号类型（unsigned），这会导致负数被转换成正数，从而破坏比较逻辑。例如，-1 会因此被认为大于 1。为防止这种情况，我们先对负数进行特殊检查，若两者都已知为正数，再比较它们的无符号值：

1
            else if(otherType == UINT)
2
            {
3
                if([self longLongValue] < 0)
4
                    return NSOrderedAscending;
5
                else
6
                    COMPARE([self unsignedLongLongValue], [otherNumber unsignedLongLongValue]);
7
            }

接下来是 DOUBLE 的情况。这部分变得相当复杂，因为浮点数的工作原理与整数大相径庭。这里涉及几种不同的子情况，我将逐一讨论。不过，它首先会从另一个数字中提取 doubleValue，以便于处理：

1
            else
2
            {
3
                double other = [otherNumber doubleValue];

double 可以表示比 long long 大得多的范围。首先的情况是确定 long long 能容纳的最大数，并查看 other 是否超出了这个范围。如果超出了，那么 other 显然大于 self，因为 self 是一个 long long。

内置宏 LLONG_MAX 给出了 long long 能容纳的最大数。然而，我们不能直接将其转换为 double。这个数等于 2^63-1，而这个数值无法在 double 中精确表示。由于 double 的内部格式，当数值超过 2^54 时，它就只能表示偶数了。为了准确地进行比较，我们计算比最大 long long 大一个的数（注意在相加时使用无符号数），然后用该值进行比较：

1
                double longLongMaxPlusOne = LLONG_MAX + 1ULL;
2
                if(other >= longLongMaxPlusOne)
3
                    return NSOrderedAscending;

我们同样会检查负数方向。这部分稍微简单一些，因为最小的 long long 值可以直接用 double 精确表示：

1
                if(other < LLONG_MIN)
2
                    return NSOrderedDescending;

如果程序执行到这一步，说明该 double 值处于 long long 的范围内，需要直接比较两者。但我们不能直接使用 > 运算符，因为许多 double 值无法用 long long 表示（例如 1.5），同样也有许多 long long 值无法精确表示为 double（例如超过某个阈值的任何奇数，如前所述）。

当超过某个阈值后，double 只能表示整数值，因为此时数值的量级已超出其精度表示范围。当 double 值超过该阈值但未超过 long long 的最大值时，可以安全地将其转换为 long long 而不损失精度，随后将两个值作为 long long 进行比较。低于该阈值时，double 能精确表示任何整数，因此可以安全地将 long long 转换为 double 而不损失精度，再将两个值作为 double 进行比较。

该阈值的位置其实很容易确定。C 语言提供了宏 DBL_MANT_DIG，它表示 double 类型的精度位数。将其作为 2 的指数（因为 double 采用二进制表示），即可得到阈值：

1
                double pureIntegerStart = 1LL << DBL_MANT_DIG;

随后我们只需根据其他值相对于该阈值的位置进行简单比较。注意，该阈值对负数同样适用，因此我们必须进行双向检查：

1
                if(other >= pureIntegerStart || other <= -pureIntegerStart)
2
                    COMPARE([self longLongValue], (long long)other);
3
                else
4
                    COMPARE([self doubleValue], other);
5
            }
6
        }

接下来讨论 selfType 为 UINT 的情况。如同之前，当 otherType 同样为 UINT 时，处理代码十分简单：

1
        else if(selfType == UINT)
2
        {
3
            if(otherType == UINT)
4
            {
5
                COMPARE([self unsignedLongLongValue], [otherNumber unsignedLongLongValue]);
6
            }

注意，由于上文已经进行了类型排序，我们无需处理 INT。接下来处理 DOUBLE，这再次变得复杂。和之前一样，我们将 otherNumber 的值取入一个局部变量：

1
            else
2
            {
3
                double other = [otherNumber doubleValue];

我们首先检查 other 是否为负数。如果是，那么我们就能确定顺序，因为 self 是无符号类型（因而要么为零要么为正数）：

1
                if(other < 0)
2
                    return NSOrderedDescending;

否则，我们沿用之前的基本阈值计算方式。这次我们需要将 other 与最大的 unsigned long long 进行比较。这个操作有些棘手。就像处理 long long 时那样，我们需要加 1 来得到一个能用 double 精确表示的数值。然而，我们无法用整数表示任何大于最大 unsigned long long 的值，因为 unsigned long long 已经是我们所拥有的最大整数类型。取而代之的是，我们通过 (LLONG_MAX + 1) * 2 来计算，这样得到的结果比最大的 unsigned long long 大 1，并且需要精心选择所有正确的类型以避免溢出或精度损失：

1
                double unsignedLongLongMaxPlusOne = (double)(LLONG_MAX + 1ULL) * 2.0;
2
                if(other >= unsignedLongLongMaxPlusOne)
3
                    return NSOrderedAscending;

至此，我们已确认两个数值均处于各自类型的取值范围内，因此采用与之前相同的「纯整数起始」（pureIntegerStart）策略直接进行比较：

1
                double pureIntegerStart = 1LL << DBL_MANT_DIG;
2
                if(other >= pureIntegerStart)
3
                    COMPARE([self unsignedLongLongValue], (unsigned long long)other);
4
                else
5
                    COMPARE([self doubleValue], other);
6
            }
7
        }

现在只剩下” double” 情况了，这实际上非常简单。由于类型排序的原因，这里唯一可能出现的情况就是两边都是 double 类型，因此我们可以直接进行比较：

1
        else
2
        {
3
            COMPARE([self doubleValue], [otherNumber doubleValue]);
4
        }
5
    }

现在 compare: 方法已经实现，相等性检查就变得微不足道了：

1
    - (BOOL)isEqualToNumber:(MANumber *)number
2
    {
3
        return [self compare: number] == NSOrderedSame;
4
    }

我们也希望从 NSObject 继承 isEqual: 方法。这可以简单地先检查另一个对象的类，然后利用 isEqualToNumber: 方法：

1
    - (BOOL)isEqual: (id)other
2
    {
3
        if(![other isKindOfClass: [MANumber class]])
4
            return NO;
5

6
        return [self isEqualToNumber: other];
7
    }

最后，由于我们重写了 isEqual: 方法，也必须重写 hash 方法。由于浮点数的语义特性，hash 方法的实现有些棘手。对于非浮点数，我们可以直接返回整数值作为哈希值：

1
    - (NSUInteger)hash
2
    {
3
        if(_type != DOUBLE)
4
            return [self unsignedIntegerValue];

对于整型值的浮点数，我们希望进行同样的处理。由于我们的 isEqual: 方法会认为整型值的 DOUBLE 与相同值的 INT 或 UINT 相等，因此必须返回与等效的 INT 和 UINT 相同的哈希值。为实现这一点，我们会检查 DOUBLE 值是否确实为整数，如果是则返回该整数值：

1
        if(_value.d == floor(_value.d))
2
            return [self unsignedIntegerValue];

除此之外，我们还有非整数值。最终目标是直接返回双精度浮点数（double）的位模式（bit pattern），这样可以得到一个不错的哈希值。然而，这仅适用于位模式等价性（bit pattern equality）能隐含 isEqual: 的数值。并非所有双精度浮点数都是如此。首先是 NAN（非数字），我们令其与自身比较时返回相等，但它可以有多种不同的位表示。为了处理这种情况，我们显式检查 NAN 并为它返回一个常量哈希值：

1
        if(isnan(_value.d))
2
            return 0;

另一种特殊情况稍微奇特一些。IEEE 754 浮点数（几乎所有现代 CPU 都使用的那种）有两种零值：正零和负零。它们通常无法区分，因为比较时相等，且在大多数计算中产生相同结果。然而，它们具有不同的位模式（bit patterns），因此我们必须将它们作为特殊情况处理。我利用了负零与正零比较时相等这一特性，通过简单检查并返回一个常量哈希值来处理这两种零：

1
        if(_value.d == 0.0)
2
            return 0;

排除了所有特殊情况后，若代码执行至此，则该数字必然属于数值相等与位模式相等相同的类型。因此我们直接返回该位模式对应的哈希值 —— 即通过返回 union 结构中的 u 字段来实现：

1
        return _value.u;
2
    }

但是等等！之前我说过，你不允许访问 union（联合体）中除了最后设置的字段之外的任何字段，所以这显然是不允许的。虽然根据语言规范这在技术上是正确的，但 C 编译器通常已经允许这样做，并且只是重新解释现有的值。这段代码将存储在 union 中的 double 重新解释为 unsigned long long 的位，这正是我们想要的。技术上这依赖于未定义行为，但我们实际使用的编译器官方认可了这种做法。

结论 NSNumber 是一个概念上简单的类，主要存在是为了我们可以将数值塞入 Cocoa 集合（Cocoa collections），但其灵活性意味着大量底层复杂性。通过实现一个类似功能的 MANumber 类，我们可以看到 NSNumber 内部必须做哪些事情。自动转换到不同的整数类型需要大量样板代码，而不同数值类型之间的可靠转换可能会变得相当复杂。

今天的内容就到这里。下期 Friday Q & A 再见。一如既往，Friday Q & A 栏目由读者的建议驱动，如果你有任何希望探讨的话题，请发送给我们！

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2012-07-06-lets-build-nsnumber.html

NSNumber is a deceptively simple class with some interesting implementation details. In today’s edition of Friday Q&A, I’ll explore how to build a class that works like NSNumber, a topic suggested by Jay Tamboli.

OverviewLike many (but not all) object-oriented languages, Objective-C has a divide between objects and non-objects. Objects respond to messages, can be queried at runtime without knowing their exact type, placed in collections, compared for equality, and share a common set of behavior. Non-objects are largely compile-time constructs, with all of their type information essentially gone at runtime. In Objective-C, these non-objects are everything that comes from C, from the integer 42 to the string “Hello, world” to complicated structs.

Boxing is the process of placing these non-objects into an object so that they can be used like other objects, typically so that they can be placed in a collection. NSNumber is the Cocoa class used to box C numbers. You can’t have an NSArray of int, but you can have an NSArray of NSNumber. NSNumber shows up a lot in Cocoa programming. Just about any place a Cocoa collection is used to store a number, NSNumber is there. Among many other places, NSNumber objects are what NSUserDefaults stores and retrieves when you ask it to save a number.

InterfaceOur surrogate NSNumber will be called MANumber. Unlike the Cocoa version, which is a subclass of the more general boxing class NSValue, this one will directly subclass NSObject:

1
    @interface MANumber : NSObject

There are a lot of methods for initializing an instance. There’s one initializer for each C numeric type, plus some extra ones for types specific to Cocoa:

1
    - (id)initWithChar:(char)value;
2
    - (id)initWithUnsignedChar:(unsigned char)value;
3
    - (id)initWithShort:(short)value;
4
    - (id)initWithUnsignedShort:(unsigned short)value;
5
    - (id)initWithInt:(int)value;
6
    - (id)initWithUnsignedInt:(unsigned int)value;
7
    - (id)initWithLong:(long)value;
8
    - (id)initWithUnsignedLong:(unsigned long)value;
9
    - (id)initWithLongLong:(long long)value;
10
    - (id)initWithUnsignedLongLong:(unsigned long long)value;
11
    - (id)initWithFloat:(float)value;
12
    - (id)initWithDouble:(double)value;
13
    - (id)initWithBool:(BOOL)value;
14
    - (id)initWithInteger:(NSInteger)value;
15
    - (id)initWithUnsignedInteger:(NSUInteger)value;

There are also getters for these types:

1
    - (char)charValue;
2
    - (unsigned char)unsignedCharValue;
3
    - (short)shortValue;
4
    - (unsigned short)unsignedShortValue;
5
    - (int)intValue;
6
    - (unsigned int)unsignedIntValue;
7
    - (long)longValue;
8
    - (unsigned long)unsignedLongValue;
9
    - (long long)longLongValue;
10
    - (unsigned long long)unsignedLongLongValue;
11
    - (float)floatValue;
12
    - (double)doubleValue;
13
    - (BOOL)boolValue;
14
    - (NSInteger)integerValue;
15
    - (NSUInteger)unsignedIntegerValue;

Note that any of these getters works no matter which initializer was used. MANumber will have to perform the appropriate conversions.

Finally, there are a few other methods for string conversion and comparison:

1
    - (NSString *)stringValue;
2
    - (NSComparisonResult)compare:(MANumber *)otherNumber;
3
    - (BOOL)isEqualToNumber:(MANumber *)number;
4
    - (NSString *)descriptionWithLocale:(id)locale;

Implementation StrategyMANumber will use a union to store the underlying numeric value. union is a rarely-seen feature of standard C. It looks just like a struct, but works differently. A struct stores many values together in one spot. A union does this as well, but you can only access the last one you stored. When you store a value in a union, the value of all other fields becomes undefined.

In typical unhelpful-but-efficient C fashion, the compiler doesn’t enforce that rule, nor does it help you follow it by, say, letting you query which field was the last one set. You have to keep track of this yourself, typically with an accompanying enum.

The union could be used to hold every C numeric type, with a big enum to say which one is in use. However, this is unnecessarily complex. All we really need is three fields: the largest possible integer type, the largest possible unsigned integer type, and the largest possible floating-point type. From the types we have to handle, these are long long, unsigned long long, and double. Everything else can be converted to and from those without loss.

This implementation does not precisely match that of NSNumber, which keeps track of the specific type used to create it. However, using these three types is plenty close enough, and eliminates a lot of extra repetitive code. The fact that NSNumber precisely tracks the original type isn’t visible most of the time, and only shows up when using a method like -descriptionWithLocale: or -objCType.

StorageHere are the instance variables:

1
    @implementation MANumber {
2
        enum { INT, UINT, DOUBLE } _type;
3
        union {
4
            long long i;
5
            unsigned long long u;
6
            double d;
7
        } _value;
8
    }

The _type variable holds an anonymous enum saying whether the value is an INT (long long), UINT (unsigned long long), or DOUBLE (guess). The _value variable then holds the actual number, using a union so that it only ends up storing one.

The code will set _type and the corresponding _value in the initializers. The getters can then check the _type and extract the value accordingly.

InitializersThere’s a ton of boilerplate to deal with all of the different types. All of the signed integer types just call through to initWithLongLong:, and the unsigned types call through to initWithUnsignedLongLong:

1
    - (id)initWithChar:(char)value
2
    {
3
        return [self initWithLongLong: value];
4
    }
5

6
    - (id)initWithUnsignedChar:(unsigned char)value
7
    {
8
        return [self initWithUnsignedLongLong: value];
9
    }
10

11
    - (id)initWithShort:(short)value
12
    {
13
        return [self initWithLongLong: value];
14
    }
15

16
    - (id)initWithUnsignedShort:(unsigned short)value
17
    {
18
        return [self initWithUnsignedLongLong: value];
19
    }
20

21
    - (id)initWithInt:(int)value
22
    {
23
        return [self initWithLongLong: value];
24
    }
25

26
    - (id)initWithUnsignedInt:(unsigned int)value
27
    {
28
        return [self initWithUnsignedLongLong: value];
29
    }
30

31
    - (id)initWithLong:(long)value
32
    {
33
        return [self initWithLongLong: value];
34
    }
35

36
    - (id)initWithUnsignedLong:(unsigned long)value
37
    {
38
        return [self initWithUnsignedLongLong: value];
39
    }
40

41
    - (id)initWithBool:(BOOL)value
42
    {
43
        return [self initWithLongLong: value];
44
    }
45

46
    - (id)initWithInteger:(NSInteger)value
47
    {
48
        return [self initWithLongLong: value];
49
    }
50

51
    - (id)initWithUnsignedInteger:(NSUInteger)value
52
    {
53
        return [self initWithUnsignedLongLong: value];
54
    }

Those initialisers then simply set the _type, _value, and return self. (Note that I’m leaving out the traditional call to [super init] for brevity, as it’s not strictly necessary when your superclass is NSObject, although still a good idea.)

1
    - (id)initWithLongLong:(long long)value
2
    {
3
        _type = INT;
4
        _value.i = value;
5
        return self;
6
    }
7

8
    - (id)initWithUnsignedLongLong:(unsigned long long)value
9
    {
10
        _type = UINT;
11
        _value.u = value;
12
        return self;
13
    }

The floating-point initializers are similar. The one for float just calls through to initWithDouble:, and that one just sets _type and _value appropriately:

1
    - (id)initWithFloat:(float)value
2
    {
3
        return [self initWithDouble: value];
4
    }
5

6
    - (id)initWithDouble:(double)value
7
    {
8
        _type = DOUBLE;
9
        _value.d = value;
10
        return self;
11
    }

GettersThe getters are even more similar then the initializers. They all check the _type, then return the appropriate field of _value. The compiler will handle the final conversion from the active field of _value to the requested return type.

Since these methods all contain the same code, this is a perfect candidate for a macro to encapsulate the identical bits. Here’s a macro that checks _type and then returns the corresponding field of _value:

1
    #define RETURN() do { \
2
            if(_type == INT) \
3
                return _value.i; \
4
            else if(_type == UINT) \
5
                return _value.u; \
6
            else \
7
                return _value.d; \
8
        } while(0)

With that macro, the getters pretty much write themselves:

1
    - (char)charValue
2
    {
3
        RETURN();
4
    }
5

6
    - (unsigned char)unsignedCharValue
7
    {
8
        RETURN();
9
    }
10

11
    - (short)shortValue
12
    {
13
        RETURN();
14
    }
15

16
    - (unsigned short)unsignedShortValue
17
    {
18
        RETURN();
19
    }
20

21
    - (int)intValue
22
    {
23
        RETURN();
24
    }
25

26
    - (unsigned int)unsignedIntValue
27
    {
28
        RETURN();
29
    }
30

31
    - (long)longValue
32
    {
33
        RETURN();
34
    }
35

36
    - (unsigned long)unsignedLongValue
37
    {
38
        RETURN();
39
    }
40

41
    - (long long)longLongValue
42
    {
43
        RETURN();
44
    }
45

46
    - (unsigned long long)unsignedLongLongValue
47
    {
48
        RETURN();
49
    }
50

51
    - (float)floatValue
52
    {
53
        RETURN();
54
    }
55

56
    - (double)doubleValue
57
    {
58
        RETURN();
59
    }
60

61
    - (NSInteger)integerValue
62
    {
63
        RETURN();
64
    }
65

66
    - (NSUInteger)unsignedIntegerValue
67
    {
68
        RETURN();
69
    }

That’s a lot of boring and ugly code.

The one exception to this uniform sea of macro invocations is the -boolValue method. Since BOOL pretends to be a real boolean value, this method should always return YES for any non-zero value stored in the MANumber object. The compiler’s built-in conversion won’t do this. For example, the integer 256 will return NO if converted to a BOOL, since BOOL is just a signed char, which is an 8-bit integer. Because of that, -boolValue replicates the macro logic, but with an explicit check for zero:

1
    - (BOOL)boolValue
2
    {
3
        if(_type == INT)
4
            return _value.i != 0;
5
        else if(_type == UINT)
6
            return _value.u != 0;
7
        else
8
            return _value.d != 0;
9
    }

String ConversionThere are two string conversion methods: -stringValue and -descriptionWithLocale:. -stringValue simply calls -descriptionWithLocale: with a nil parameter:

1
    - (NSString *)stringValue
2
    {
3
        return [self descriptionWithLocale: nil];
4
    }

-descriptionWithLocale: then uses -[NSString initWithFormat:locale:] to build the string. There’s no fancy way to deal with the different numeric types here, so it simply checks _type and uses a different format string for each case:

1
    - (NSString *)descriptionWithLocale:(id)locale
2
    {
3
        if(_type == INT)
4
            return [[NSString alloc] initWithFormat: @"%lld" locale: locale, _value.i];
5
        else if(_type == UINT)
6
            return [[NSString alloc] initWithFormat: @"%llu" locale: locale, _value.u];
7
        else
8
            return [[NSString alloc] initWithFormat: @"%f" locale: locale, _value.d];
9
    }

Note that I’m using ARC, which is why there are no autorelease calls here.

ComparisonThe comparison methods get interesting, because they need to work between MANumber objects of different types. For example, the double value -1.1 should compare less than the unsigned integer value 99999.

There are nine permutations of the types, so nine different cases to handle. This can be reduced to only six cases by enforcing an order. If the two objects have types INT and UINT, the two cases for that can be reduced to one by only handling the case where self is INT and the other object is UINT, and swapping the two objects if they show up the other way around.

To help with comparison between the different types, I wrote a simple macro that takes two numbers and returns the appropriate NSComparisonResult. All it does is take two arguments, save them into temporary variables to avoid multiple evaluation, then return the appropriate constant depending on how they’re ordered. There’s also a bit of floating-point trickery here. With floating-point numbers, NAN (not a number) never compares equal to anything, and all comparisons with it are false. Since NSComparisonResult has no way to represent an ordering which means, “this number is not equal to anything, not even itself,” I arbitrarily decide to make NAN equal to itself and less than any other number, for the purposes of MANumber comparison:

1
    #define COMPARE(a, b) do { \
2
            __typeof__(a) __a_local = a; \
3
            __typeof__(b) __b_local = b; \
4
            BOOL __a_isnan = isnan(__a_local); \
5
            BOOL __b_isnan = isnan(__b_local); \
6
            if(__a_isnan && __b_isnan) \
7
                return NSOrderedSame; \
8
            else if(__a_isnan) \
9
                return NSOrderedAscending; \
10
            else if(__b_isnan) \
11
                return NSOrderedDescending; \
12
            else if(__a_local > __b_local) \
13
                return NSOrderedDescending; \
14
            else if(__a_local < __b_local) \
15
                return NSOrderedAscending; \
16
            else \
17
                return NSOrderedSame; \
18
        } while(0)

The first thing the comparison method itself does is extract the types of the two objects to compare:

1
    - (NSComparisonResult)compare:(MANumber *)otherNumber
2
    {
3
        int selfType = _type;
4
        int otherType = otherNumber->_type;

If the two types aren’t in order, we reverse the comparison by calling compare: again with the arguments reversed, and returning the inverse of the result. Since NSComparisonResult is just -1, 0, or 1, we can invert its meaning by negating it:

1
        if(selfType > otherType)
2
            return -[otherNumber compare: self];

Now we’re left with sorted types. There are six cases. If selfType is INT, then otherType could be anything. If selfType is UINT, then otherType can only be UINT or DOUBLE. If selfType is DOUBLE, then otherType must be DOUBLE as well.

Let’s look at the cases where selfType is INT. If both values are INT, the code is easy:

1
        if(selfType == INT)
2
        {
3
            if(otherType == INT)
4
            {
5
                COMPARE([self longLongValue], [otherNumber longLongValue]);
6
            }

If otherType is UINT, there’s a bit of extra work. Directly comparing with [otherNumber unsignedLongLongValue] won’t work. C will promote [self longLongValue] to unsigned before the comparison, turning negative numbers into positive numbers and wrecking the comparison. -1 will compare greater than 1 because of this. To prevent that, we make a special check for negative numbers, then compare their unsigned values if both are known to be positive:

1
            else if(otherType == UINT)
2
            {
3
                if([self longLongValue] < 0)
4
                    return NSOrderedAscending;
5
                else
6
                    COMPARE([self unsignedLongLongValue], [otherNumber unsignedLongLongValue]);
7
            }

Next comes the case for DOUBLE. This gets pretty complicated, because floating-point numbers work fairly differently from integers. There are several different subcases here, which I’ll take one by one. However, the first thing it does is extract the doubleValue from the other number to make it more convenient to work with:

1
            else
2
            {
3
                double other = [otherNumber doubleValue];

double can hold a much larger range than long long. The first subcase is to figure out the largest possible number a long long can hold, and see if other is beyond it. If it is, it’s obviously larger than self, since self is a long long.

The built-in macro LLONG_MAX gives us the largest number a long long can hold. However, we can’t directly convert this to a double. That number is equal to 263-1, which can’t be represented in a double. Due to the internal format of double, it can only represent even numbers when it gets beyond 254. To perform the comparison accurately, we calculate one number beyond the largest long long, careful to use an unsigned one when adding, and compare against that:

1
                double longLongMaxPlusOne = LLONG_MAX + 1ULL;
2
                if(other >= longLongMaxPlusOne)
3
                    return NSOrderedAscending;

We also check in the negative direction. This is a bit easier, as the smallest possible long long can be directly represented in a double:

1
                if(other < LLONG_MIN)
2
                    return NSOrderedDescending;

If we’re still running at this point, then the double is within the range of a long long and they need to be compared directly. However, we can’t just whip out the > operator, because there are a lot of doubles that can’t be represented in long long (e.g. 1.5), and there are a lot of long longs that can’t be represented as a double (e.g. any odd number above a threshold, as mentioned above).

Beyond a certain threshold, double can only represent integer values, as the magnitude of the value exceeds the precision of the representation. When beyond that threshold, and below the maximum possible long long, the double can safely be converted to a long long with no loss of precision. The two values can then be compared as long longs. Below that threshold, double can represent any integer, and so the long long can safely be converted to a double with no loss of precision, and the two values compared as doubles.

The location of that threshold is actually fairly easy to figure out. C provides a macro, DBL_MANT_DIG, which gives the precision of the double type. By raising that to a power of two (since double is a binary representation), we get the threshold:

1
                double pureIntegerStart = 1LL << DBL_MANT_DIG;

Then we simply compare based on where other lies relative to that. Note that the threshold applies equally for negative numbers, so we must check it in both directions:

1
                if(other >= pureIntegerStart || other <= -pureIntegerStart)
2
                    COMPARE([self longLongValue], (long long)other);
3
                else
4
                    COMPARE([self doubleValue], other);
5
            }
6
        }

Next up comes the case where selfType is UINT. As before, when otherType is also UINT, the code is easy:

1
        else if(selfType == UINT)
2
        {
3
            if(otherType == UINT)
4
            {
5
                COMPARE([self unsignedLongLongValue], [otherNumber unsignedLongLongValue]);
6
            }

Note that we don’t have to handle INT, due to the type sorting performed above. We move on to DOUBLE, which is once again complicated. As before, we fetch the value of otherNumber into a local variable:

1
            else
2
            {
3
                double other = [otherNumber doubleValue];

The first thing we do is see if other is negative. If it is, then we know the order, as self is unsigned (and thus either zero or positive):

1
                if(other < 0)
2
                    return NSOrderedDescending;

Otherwise, we do the same basic threshold calculations as before. This time we have to compare other against the largest possible unsigned long long. Doing this is a bit tricky. Just like with long long, we have to add 1 to get a number that works as a double. However, we can’t represent anything greater than the largest possible unsigned long long as an integer, since unsigned long long is the largest integer type we have. Instead, we calculate (LLONG_MAX + 1) * 2, which gives one greater than the largest unsigned long long, carefully doing so with all the right types to avoid overflow or imprecision:

1
                double unsignedLongLongMaxPlusOne = (double)(LLONG_MAX + 1ULL) * 2.0;
2
                if(other >= unsignedLongLongMaxPlusOne)
3
                    return NSOrderedAscending;

At this point, we know that both numbers are within each type’s range, and so we use the same pureIntegerStart strategy as before to compare them directly:

1
                double pureIntegerStart = 1LL << DBL_MANT_DIG;
2
                if(other >= pureIntegerStart)
3
                    COMPARE([self unsignedLongLongValue], (unsigned long long)other);
4
                else
5
                    COMPARE([self doubleValue], other);
6
            }
7
        }

All that’s left now is the DOUBLE case, which is actually really easy. Due to the type sorting, the only possible case here is when they’re both DOUBLE, so we can just directly compare them:

1
        else
2
        {
3
            COMPARE([self doubleValue], [otherNumber doubleValue]);
4
        }
5
    }

Now that compare: implemented, equality checking is trivial:

1
    - (BOOL)isEqualToNumber:(MANumber *)number
2
    {
3
        return [self compare: number] == NSOrderedSame;
4
    }

We also want isEqual: from NSObject. This can simply check the class of the other object, then leverage isEqualToNumber:

1
    - (BOOL)isEqual: (id)other
2
    {
3
        if(![other isKindOfClass: [MANumber class]])
4
            return NO;
5

6
        return [self isEqualToNumber: other];
7
    }

Finally, since we override isEqual:, we must also override hash. The implementation of hash gets mildly tricky due to the semantics of floating-point numbers. For non-floats, we can simply return the straight integer value as the hash:

1
    - (NSUInteger)hash
2
    {
3
        if(_type != DOUBLE)
4
            return [self unsignedIntegerValue];

For floats that are integer values, we want to do the same thing. Since our isEqual: considers an integer-valued DOUBLE equal to an INT or UINT of the same value, we must return the same hash as the INT and UINT equivalent. To accomplish this, we check to see if the DOUBLE value is actually an integer, and return the integer value if so:

1
        if(_value.d == floor(_value.d))
2
            return [self unsignedIntegerValue];

Beyond this, we have non-integer values. The ultimate goal is to simply return the bit pattern of the double, which will give a nice hash. However, this only works for numbers where bit pattern equality implies isEqual:. This is not true for all doubles. First is NAN, which we made compare equal to itself, but which has many different possible bit representations. To handle that, we check for NAN explicitly and return a constant hash for it:

1
        if(isnan(_value.d))
2
            return 0;

The other special case is a bit weirder. IEEE 754 floats (the kind used by just about any modern CPU) have two possible values for zero: positive and negative. These are typically indistinguishable, as they compare equal and produce the same results for most calculations. However, they have different bit patterns, so we have to special-case them. I take advantage of the fact that negative zero compares equal to positive zero to make a simple check and return a constant hash for both zeroes:

1
        if(_value.d == 0.0)
2
            return 0;

Having ruled out all the special cases, if the code reaches this point then the number must be one where numerical equality is the same as bit pattern equality. Thus we simply return the bit pattern for the hash. We do this by returning the u field of the union:

1
        return _value.u;
2
    }

But wait! Previously I said that you’re not allowed to access any field in a union besides the one that was last set, so this is clearly not allowed. While technically correct according to the language spec, C compilers have generally settled on allowing it and simply reinterpreting the existing value. This code takes the double that’s stored in the union and reinterprets its bits as an unsigned long long, which is exactly what we want. Technically this relies on undefined behavior, but it’s officially blessed by the compilers we’re actually using.

ConclusionNSNumber is a conceptually simple class which mainly exists so that we can stuff numeric values into Cocoa collections, but its flexibility implies a fair amount of underlying complication. By implementing a workalike MANumber class, we can see what kinds of things NSNumber has to be doing on the inside. Automatic conversion to different integer types requires a fair amount of boilerplate code, and reliable conversion between number of different types can get pretty complicated.

That’s it for today. Come back next time for yet another Friday Q&A. As always, Friday Q&A is driven by reader suggestions, so if you have a topic you’d like to see covered, please send it in!