背景知识

Android手机大多采用NAND Flash架构的闪存卡来存储内容。NAND Flash的内部存储单位从小到大依次为:Page、Block、Plane、Die,而一个Device上可以封装若干个Die。下图就是一个NAND Flash组成结构的示意图。

磁盘的结构

它的读写操作均是以Page为单位进行的,但擦除操作却是按Block为单位进行的。

读写规则:

  1. 删除数据时,芯片将标记这些Page为闲置状态,但并不会立马执行擦除操作。
  2. 写入数据时,如果目前磁盘剩余空间充足,则由芯片指定Block后直接按Page为单位进行写入即可。
  3. “写入放大”:写入数据时,如果目前磁盘剩余空间不足,为了获得足够的空间,磁盘先将某块Block的内容读至缓存,然后再在该Block上进行擦除操作,最后将新内容与原先内容一起写入至该Block。

原始数据写入是整体写入,是否会改变原始数据真实的地址?

TRIM

TRIM是一条ATA指令,由操作系统发送给闪存主控制器,告诉它哪些数据占的地址是“无效”的。在TRIM的帮助下,闪存主控制器就可以提前知道哪些Page是“无效”的,便可以在适当的时机做出优化,从而改善性能。实际操作是有两种方案:

  1. discard选项:在挂载 ext4 分区时加上 discard 选项,此后操作系统在执行每一个磁盘操作时同时都会执行 TRIM 指令。该方案的优点是总体耗时短,但影响会到删除文件时的性能。
  2. fstrim命令:选择合适的时机对整个分区执行TRIM操作。相对于方案一,该方案总体耗时较长,但不会影响正常操作时的磁盘性能。

TRIM在Android中的实现

磁盘的结构

启动fstrim

1
su fstrim -v /data

查看是否启动

1
adb logcat -d |grep -i fstrim

Android 文件系统演进

Nexus 手机开始从Ext3迁移到Ext4文件系统.迁移过程中无需格式化磁盘和,Ext4作用新数据。inode大小有128字节变成256字节;
系统版本:>Android 2.3

Ext4 inode 的分配策略

  • 相关的数据寻处位置相近
  • 文件系统的顶层目录,尽量要伸展开。兄弟目录,父子目录尽可能的靠近
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
static int find_group_orlov(struct super_block *sb, struct inode *parent,
ext4_group_t *group, umode_t mode,
const struct qstr *qstr)
{
ext4_group_t parent_group = EXT4_I(parent)->i_block_group;
struct ext4_sb_info *sbi = EXT4_SB(sb);
/*获取块组的个数,存放到real_ngroups变量中*/
ext4_group_t real_ngroups = ext4_get_groups_count(sb);
int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
unsigned int freei, avefreei, grp_free;
ext4_fsblk_t freeb, avefreec;
unsigned int ndirs;
int max_dirs, min_inodes;
ext4_grpblk_t min_clusters;
ext4_group_t i, grp, g, ngroups;
struct ext4_group_desc *desc;
struct orlov_stats stats;
int flex_size = ext4_flex_bg_size(sbi);
struct dx_hash_info hinfo;

ngroups = real_ngroups;
/*如果文件系统打开了flex_bg选项,就以逻辑块组为单位分配*/
if (flex_size > 1) {
ngroups = (real_ngroups + flex_size - 1) >>
sbi->s_log_groups_per_flex;
parent_group >>= sbi->s_log_groups_per_flex;
}

freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
avefreei = freei / ngroups;
freeb = EXT4_C2B(sbi,
percpu_counter_read_positive(&sbi->s_freeclusters_counter));
avefreec = freeb;
do_div(avefreec, ngroups);
ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);

if (S_ISDIR(mode) &&
((parent == d_inode(sb->s_root)) ||
(ext4_test_inode_flag(parent, EXT4_INODE_TOPDIR)))) {
int best_ndir = inodes_per_group;
int ret = -1;

if (qstr) {
hinfo.hash_version = DX_HASH_HALF_MD4;
hinfo.seed = sbi->s_hash_seed;
ext4fs_dirhash(qstr->name, qstr->len, &hinfo);
grp = hinfo.hash;
} else
grp = prandom_u32();
parent_group = (unsigned)grp % ngroups;
for (i = 0; i < ngroups; i++) {
g = (parent_group + i) % ngroups;
get_orlov_stats(sb, g, flex_size, &stats);
if (!stats.free_inodes)
continue;
if (stats.used_dirs >= best_ndir)
continue;
if (stats.free_inodes < avefreei)
continue;
if (stats.free_clusters < avefreec)
continue;
grp = g;
ret = 0;
best_ndir = stats.used_dirs;
}
if (ret)
goto fallback;
found_flex_bg:
if (flex_size == 1) {
*group = grp;
return 0;
}

/*
* We pack inodes at the beginning of the flexgroup's
* inode tables. Block allocation decisions will do
* something similar, although regular files will
* start at 2nd block group of the flexgroup. See
* ext4_ext_find_goal() and ext4_find_near().
*/
grp *= flex_size;
for (i = 0; i < flex_size; i++) {
if (grp+i >= real_ngroups)
break;
desc = ext4_get_group_desc(sb, grp+i, NULL);
if (desc && ext4_free_inodes_count(sb, desc)) {
*group = grp+i;
return 0;
}
}
goto fallback;
}

max_dirs = ndirs / ngroups + inodes_per_group / 16;
min_inodes = avefreei - inodes_per_group*flex_size / 4;
if (min_inodes < 1)
min_inodes = 1;
min_clusters = avefreec - EXT4_CLUSTERS_PER_GROUP(sb)*flex_size / 4;

/*
* Start looking in the flex group where we last allocated an
* inode for this parent directory
*/
if (EXT4_I(parent)->i_last_alloc_group != ~0) {
parent_group = EXT4_I(parent)->i_last_alloc_group;
if (flex_size > 1)
parent_group >>= sbi->s_log_groups_per_flex;
}

for (i = 0; i < ngroups; i++) {
grp = (parent_group + i) % ngroups;
get_orlov_stats(sb, grp, flex_size, &stats);
if (stats.used_dirs >= max_dirs)
continue;
if (stats.free_inodes < min_inodes)
continue;
if (stats.free_clusters < min_clusters)
continue;
goto found_flex_bg;
}

fallback:
ngroups = real_ngroups;
avefreei = freei / ngroups;
fallback_retry:
parent_group = EXT4_I(parent)->i_block_group;
for (i = 0; i < ngroups; i++) {
grp = (parent_group + i) % ngroups;
desc = ext4_get_group_desc(sb, grp, NULL);
if (desc) {
grp_free = ext4_free_inodes_count(sb, desc);
if (grp_free && grp_free >= avefreei) {
*group = grp;
return 0;
}
}
}

if (avefreei) {
/*
* The free-inodes counter is approximate, and for really small
* filesystems the above test can fail to find any blockgroups
*/
avefreei = 0;
goto fallback_retry;
}

return -1;
}

文件的分配策略

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
tatic int find_group_other(struct super_block *sb, struct inode *parent,
ext4_group_t *group, umode_t mode)
{
ext4_group_t parent_group = EXT4_I(parent)->i_block_group;
ext4_group_t i, last, ngroups = ext4_get_groups_count(sb);
struct ext4_group_desc *desc;
int flex_size = ext4_flex_bg_size(EXT4_SB(sb));

/*
* Try to place the inode is the same flex group as its
* parent. If we can't find space, use the Orlov algorithm to
* find another flex group, and store that information in the
* parent directory's inode information so that use that flex
* group for future allocations.
*/
if (flex_size > 1) {
int retry = 0;

try_again:
parent_group &= ~(flex_size-1);
last = parent_group + flex_size;
if (last > ngroups)
last = ngroups;
for (i = parent_group; i < last; i++) {
desc = ext4_get_group_desc(sb, i, NULL);
if (desc && ext4_free_inodes_count(sb, desc)) {
*group = i;
return 0;
}
}
if (!retry && EXT4_I(parent)->i_last_alloc_group != ~0) {
retry = 1;
parent_group = EXT4_I(parent)->i_last_alloc_group;
goto try_again;
}
/*
* If this didn't work, use the Orlov search algorithm
* to find a new flex group; we pass in the mode to
* avoid the topdir algorithms.
*/
*group = parent_group + flex_size;
if (*group > ngroups)
*group = 0;
return find_group_orlov(sb, parent, group, mode, NULL);
}

/*
* Try to place the inode in its parent directory
*/
*group = parent_group;
desc = ext4_get_group_desc(sb, *group, NULL);
if (desc && ext4_free_inodes_count(sb, desc) &&
ext4_free_group_clusters(sb, desc))
return 0;

/*
* We're going to place this inode in a different blockgroup from its
* parent. We want to cause files in a common directory to all land in
* the same blockgroup. But we want files which are in a different
* directory which shares a blockgroup with our parent to land in a
* different blockgroup.
*
* So add our directory's i_ino into the starting point for the hash.
*/
*group = (*group + parent->i_ino) % ngroups;

/*
* Use a quadratic hash to find a group with a free inode and some free
* blocks.
*/
for (i = 1; i < ngroups; i <<= 1) {
*group += i;
if (*group >= ngroups)
*group -= ngroups;
desc = ext4_get_group_desc(sb, *group, NULL);
if (desc && ext4_free_inodes_count(sb, desc) &&
ext4_free_group_clusters(sb, desc))
return 0;
}

/*
* That failed: try linear search for a free inode, even if that group
* has no free blocks.
*/
*group = parent_group;
for (i = 0; i < ngroups; i++) {
if (++*group >= ngroups)
*group = 0;
desc = ext4_get_group_desc(sb, *group, NULL);
if (desc && ext4_free_inodes_count(sb, desc))
return 0;
}

return -1;
}

参考文献:

https://blog.csdn.net/xiaosongluo/article/details/52014585