Mach-o文件解析

介绍

Mach-o文件是MacOS和iOS上的可执行文件。其格式如下图所示:

1

用Mach-o分析工具(MachOView )查看,其界面如下:

2

结合apple源码来看。

  • 如果是多架构fat文件,文件的头部定义为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14

struct fat_header {
uint32_t magic; /* FAT_MAGIC */
uint32_t nfat_arch; /* number of structs that follow */
};

struct fat_arch {
cpu_type_t cputype; /* cpu specifier (int) */
cpu_subtype_t cpusubtype; /* machine specifier (int) */
uint32_t offset; /* file offset to this object file */
uint32_t size; /* size of this object file */
uint32_t align; /* alignment as a power of 2 */
};

fat_header结构体的nfat_arch字段,标识了fat文件含有几个架构,而每个架构又由fat_arch结构体定义了文件偏移和架构类型,找到文件偏移之后,架构的结构和单架构是一样的,所以解析多架构文件和解析单架构文件,除了头部结构有区别,其他的都一样。

  • 如果是单架构Mach-o文件,其头部定义为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/*
* The 32-bit mach header appears at the very beginning of the object file for
* 32-bit architectures.
*/
struct mach_header {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
};

/* Constant for the magic field of the mach_header (32-bit architectures) */
#define MH_MAGIC 0xfeedface /* the mach magic number */
#define MH_CIGAM 0xcefaedfe /* NXSwapInt(MH_MAGIC) */

/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};

各个字段的含义,注释都标注的很清楚,就不再赘述。

load_command

在header之下,又有多个load_commad段。header的定义中,给出了Mach-o文件的load_commad数,以及所有段的大小(ncmds/sizeofcmds)。load_command其结构体定义为:

1
2
3
4
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};

如果你以为load_command的定义就两个字段那么简单,那你可就错了。每个load_command的cmd字段,标识了该load_command的类型。

以下列出了cmd的所有类型以(代码来自apple source):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/*
* After MacOS X 10.1 when a new load command is added that is required to be
* understood by the dynamic linker for the image to execute properly the
* LC_REQ_DYLD bit will be or'ed into the load command constant. If the dynamic
* linker sees such a load command it it does not understand will issue a
* "unknown load command required for execution" error and refuse to use the
* image. Other load commands without this bit that are not understood will
* simply be ignored.
*/
#define LC_REQ_DYLD 0x80000000

/* Constants for the cmd field of all load commands, the type */
#define LC_SEGMENT 0x1 /* segment of this file to be mapped */
#define LC_SYMTAB 0x2 /* link-edit stab symbol table info */
#define LC_SYMSEG 0x3 /* link-edit gdb symbol table info (obsolete) */
#define LC_THREAD 0x4 /* thread */
#define LC_UNIXTHREAD 0x5 /* unix thread (includes a stack) */
#define LC_LOADFVMLIB 0x6 /* load a specified fixed VM shared library */
#define LC_IDFVMLIB 0x7 /* fixed VM shared library identification */
#define LC_IDENT 0x8 /* object identification info (obsolete) */
#define LC_FVMFILE 0x9 /* fixed VM file inclusion (internal use) */
#define LC_PREPAGE 0xa /* prepage command (internal use) */
#define LC_DYSYMTAB 0xb /* dynamic link-edit symbol table info */
#define LC_LOAD_DYLIB 0xc /* load a dynamically linked shared library */
#define LC_ID_DYLIB 0xd /* dynamically linked shared lib ident */
#define LC_LOAD_DYLINKER 0xe /* load a dynamic linker */
#define LC_ID_DYLINKER 0xf /* dynamic linker identification */
#define LC_PREBOUND_DYLIB 0x10 /* modules prebound for a dynamically */
/* linked shared library */
#define LC_ROUTINES 0x11 /* image routines */
#define LC_SUB_FRAMEWORK 0x12 /* sub framework */
#define LC_SUB_UMBRELLA 0x13 /* sub umbrella */
#define LC_SUB_CLIENT 0x14 /* sub client */
#define LC_SUB_LIBRARY 0x15 /* sub library */
#define LC_TWOLEVEL_HINTS 0x16 /* two-level namespace lookup hints */
#define LC_PREBIND_CKSUM 0x17 /* prebind checksum */

/*
* load a dynamically linked shared library that is allowed to be missing
* (all symbols are weak imported).
*/
#define LC_LOAD_WEAK_DYLIB (0x18 | LC_REQ_DYLD)

#define LC_SEGMENT_64 0x19 /* 64-bit segment of this file to be
mapped */
#define LC_ROUTINES_64 0x1a /* 64-bit image routines */
#define LC_UUID 0x1b /* the uuid */
#define LC_RPATH (0x1c | LC_REQ_DYLD) /* runpath additions */
#define LC_CODE_SIGNATURE 0x1d /* local of code signature */
#define LC_SEGMENT_SPLIT_INFO 0x1e /* local of info to split segments */
#define LC_REEXPORT_DYLIB (0x1f | LC_REQ_DYLD) /* load and re-export dylib */
#define LC_LAZY_LOAD_DYLIB 0x20 /* delay load of dylib until first use */
#define LC_ENCRYPTION_INFO 0x21 /* encrypted segment information */
#define LC_DYLD_INFO 0x22 /* compressed dyld information */
#define LC_DYLD_INFO_ONLY (0x22|LC_REQ_DYLD) /* compressed dyld information only */
#define LC_LOAD_UPWARD_DYLIB (0x23 | LC_REQ_DYLD) /* load upward dylib */
#define LC_VERSION_MIN_MACOSX 0x24 /* build for MacOSX min OS version */
#define LC_VERSION_MIN_IPHONEOS 0x25 /* build for iPhoneOS min OS version */
#define LC_FUNCTION_STARTS 0x26 /* compressed table of function start addresses */
#define LC_DYLD_ENVIRONMENT 0x27 /* string for dyld to treat
like environment variable */
#define LC_MAIN (0x28|LC_REQ_DYLD) /* replacement for LC_UNIXTHREAD */
#define LC_DATA_IN_CODE 0x29 /* table of non-instructions in __text */
#define LC_SOURCE_VERSION 0x2A /* source version used to build binary */
#define LC_DYLIB_CODE_SIGN_DRS 0x2B /* Code signing DRs copied from linked dylibs */

所有段的定义以及含义,注释的都比较清楚。这里,我们主要关注LC_SEGMENT_64(LC_SEGMENT)段。

定义为该类型的段有:

1
2
3
4
5
6
7
8
9
10
11
#define    SEG_PAGEZERO    "__PAGEZERO"	/* the pagezero segment which has no */
/* protections and catches NULL */
/* references for MH_EXECUTE files */
#define SEG_TEXT "__TEXT" /* the tradition UNIX text segment */
#define SEG_DATA "__DATA" /* the tradition UNIX data segment */
#define SEG_OBJC "__OBJC" /* objective-C runtime segment */
#define SEG_LINKEDIT "__LINKEDIT" /* the segment containing all structs */
/* created and maintained by the link */
/* editor. Created with -seglinkedit */
/* option to ld(1) for MH_EXECUTE and */
/* FVMLIB file types only */
section

其中,SEG_TEXT以及SEG_DATA段中,又含有多个Section:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
struct section { /* for 32-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint32_t addr; /* memory address of this section */
uint32_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
};

struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};

下面介绍几个常见的section:

1
2
3
4
5
6
7
__TEXT,__text:	代码区
__TEXT,__cstring: 代码中定义的C字符串存放在这个节
__TEXT,__stubs: 装载符号表的节,一般称之为桩
__TEXT,__objc_classname: OC类名
__TEXT,__objc_methname: OC方法名
__DATA,__la_symbol_ptr: 懒加载符号表
__DATA,__objc_classlist: OC类列表

代码分析Mach-o文件(class-dump的实现原理探析)

从上面的分析过程,可以看出,多架构文件只是含有多个架构文件的fat文件。所以这里,直接用一个单架构的Mach-o文件解析。

MachOView + 010 Editor分析
用MachOView大致浏览一下文件

img3

找到__DATA,__objc_classlist

4

这里保存了所有类的指针,我们取第一个去查看:

1
uintptr_t cls_ptr = 0x0000000100514468;
获取指针所对应的文件偏移

首先用010 Editor查看当前__DATA,__objc_classlist的起始地址和文件偏移

5

1
2
uintptr_t address = 0x100429400;
uint32 offset = 4363264;

一般地,虚拟地址—》文件偏移的公式为:文件偏移=虚拟地址-当前section的起始地址+当前section的文件偏移。

1
uint32 cls_offset = 0x0000000100514468-0x100429400+4363264 = 0x514468;

经过计算,cls_ptr指针指向的就是文件的这个位置。

在class-dump的源码中,对cls_ptr指针指向的位置进行了如下定义:

1
2
3
4
5
6
7
8
9
10
struct cd_objc2_class {
uint64_t isa;
uint64_t superclass;
uint64_t cache;
uint64_t vtable;
uint64_t data; // points to class_ro_t
uint64_t reserved1;
uint64_t reserved2;
uint64_t reserved3;
};

意味着0x514468处,存储的是cd_objc2_class结构体的数据。

而这个结构体中的data,即为存储class数据的指针。

查看cd_objc2_class数据

6

用MachOView查看文件偏移为0x514468的地方,可以看到,这个地址是__DATA,__objc_data的首地址,其中,0x00514488处,即为data指针的值。

1
uintptr_t data = 0x010042baa0;
获取data指针的文件偏移

和上面的一样,用010 Editor查看基地址和文件偏移

7

1
2
uintptr_t address = 0x100514468;
uint32 offset = 5325928;

根据之前说的公式,可以算出:

1
uint32 data_offset = 0x010042baa0-0x100514468+5325928 = 0x0x42BAA0;

cd_objc2_class的定义中已经告诉了我们,data指针指向的是class_ro_t,所以0x0x42BAA0存储的应该是class_ro_t结构体的数据。

class_ro_t的定义:

1
2
3
4
5
6
7
8
9
10
11
12
13
struct cd_objc2_class_ro_t {
uint32_t flags;
uint32_t instanceStart;
uint32_t instanceSize;
uint32_t reserved; // *** this field does not exist in the 32-bit version ***
uint64_t ivarLayout;
uint64_t name;
uint64_t baseMethods;
uint64_t baseProtocols;
uint64_t ivars;
uint64_t weakIvarLayout;
uint64_t baseProperties;
};
查看cd_objc2_class_ro_t数据

8

根据结构体的定义:

1
2
name = 0x01003d1cdc;
baseMethods = 0x010042b670;

用同样的方法,找到基地址和文件偏移

9

1
2
uintptr_t address = 0x10042B1F8;
uint32 offset = 4370936;

计算name和baseMethods的文件偏移

1
2
name_offset = 0x01003d1cdc-0x10042B1F8+4370936 = 0x3D1CDC;
baseMethods_offset = 0x010042b670-0x10042B1F8+4370936 = 0x42B670;
查看name

10

name = “WXShotSendPersonCardView”;

查看baseMethods

11

这个位置存储的就是类q的方法。

其定义为:

1
2
3
4
5
6
7
8
9
10
struct method_t {
char *sel;
char *type;
void *imp;
}
struct method_list_t {
uint32_t entsizeAndFlags;
uint32_t count;
struct method_t first[1];
}

method_list_t的前八个字节存储的是method_t的大小和method_t的个数。

method_t的存储方式为:

12

我们从0x42B670+0x8处,往下取三个指针:

1
2
3
sel = 0x0100365c2f;
type = 0x01003d6ced;
imp = 0x010000587c;

用上述公式计算出文件偏移:

1
2
3
sel_offset=0x0100365c2f-0x10042B1F8+4370936 = 0x365C2F;
type_offset=0x01003d6ced-0x10042B1F8+4370936 = 0x3D6CED;
imp_offset=0x010000587c-0x10042B1F8+4370936 = 0x587C;
  • 查看sel

13

1
sel = initWithCustomFrame:
  • 查看type

14

1
type = "@48@0:8{CGRect={CGPoint=dd}{CGSize=dd}}16";
  • 查看imp

imp指向的是代码段,所以这里直接在ida上查看

15

可以看出,该偏移正好对上[WXShotSendPersonCardView initWithCustomFrame:]的实现部分。