 | Level: Introductory Daniel Robbins (drobbins@gentoo.org), President/CEO, Gentoo Technologies, Inc.
01 Jan 2002 With the 2.4 release of Linux come many new filesystem possibilities, including Reiserfs, XFS, GFS, and others. These filesystems sound cool, but what exactly can they do, what are they good at, and exactly how do you go about safely using them in a production Linux environment? Daniel Robbins answers these questions by showing you how to set up these new advanced filesystems under Linux 2.4. In this installment, Daniel introduces XFS, SGI's free enterprise-class filesystem now available for Linux.
In this article, we'll take a look at XFS, SGI's free, 64-bit high-performance
filesystem for Linux. First, I'll explain how XFS compares to ext3 and
ReiserFS, and describe many of the technologies that XFS uses internally,
Then in the next article, I'll guide you through the process of setting up XFS on
your own system, as well as cover XFS tuning tips and useful XFS features like
ACL (access control lists) and extended attribute support. Introducing XFS
XFS was originally developed by Silicon Graphics, Inc. back in the early 90s. At that time, SGI
found that their existing filesystem (EFS) was quickly becoming unsuitable for tackling the extreme computing challenges of the day.
Addressing this problem, SGI decided to design a completely new
high-performance 64-bit filesystem rather than attempting to tweak EFS to do
something that it was never designed to do. Thus, XFS was born, and was made
available to the computing public with the release of IRIX 5.3 in 1994. To
this day, it continues to be used as the underlying filesystem for all of SGI's
IRIX-based products, from workstations to supercomputers. And now, XFS is also
available for Linux. The arrival of XFS for Linux is exciting, primarily
because it provides the Linux community with a robust, refined, and very
feature-rich filesystem that's capable of scaling to meet the toughest storage
challenges.
XFS, ReiserFS, and ext3 performance
Up until now, choosing the appropriate next-generation Linux filesystem has
been refreshingly straightforward. Those who were looking for raw performance
generally leaned towards ReiserFS, while those more interested in meticulous
data integrity features preferred ext3. However, with the release of XFS for
Linux, things have suddenly become much more confusing. In particular, it's no
longer clear that ReiserFS is still the next-gen performance leader.
Recently, I performed a series of tests in an attempt to figure out how XFS,
ReiserFS, and ext3 compare in terms of raw performance. Before I share my
results, it's important to understand that my results only highlight
general filesystem performance trends under light system loads on a
uniprocessor system, and are not an absolute measure of whether a
particular filesystem is "better" than another. Despite this, my results
should help give you an idea of what filesystem may be best suited for a
particular task. Again, my results should not be considered conclusive; the
best test is always to try your particular application under each filesystem to
see how it performs.
The results
In my tests, I found XFS to be generally quite speedy. XFS consistently won
all tests that involved manipulating large files, which should be expected
since it has been designed and tuned over the years to do this very well. I
also discovered that XFS has a singular performance quirk: it doesn't delete
files very quickly; it was easily bested by both ReiserFS and ext3 in this
area. According to Steve Lord, the Principal Engineer of filesystem software
for SGI, a patch has just been written to address this problem, and it
should be available soon.
Other than that, XFS performance was very close to that of ReiserFS and
generally surpasses that of ext3. One of the nicest things about XFS is that,
like ReiserFS, it doesn't generate a lot of unnecessary disk activity. XFS
tries to cache as much data in memory as possible, and generally only writes
things out to disk when memory pressure dictates that it do so. When it's
flushing data to disk, other IO operations seem largely unaffected. In
contrast, when ext3 (in "data=ordered" mode, the default) flushes data to the
drive, it can result in a lot of additional seeks and, depending on the IO
load, even some unnecessary disk thrashing.
My performance and tuning tests were primarily focused around extracting an
uncompressed kernel source tarball from a RAM disk to the test filesystem, and
then recursively copying the new source tree to a new directory on the same
filesystem. XFS performed these tasks quite well, although initially, XFS
performance was slightly worse than that of ReiserFS. However, after tweaking
the mkfs.xfs and mount options for my test XFS filesystem, I was able to get
XFS to perform slightly better than ReiserFS when handling medium-sized files
such as those found in the kernel source tree. That is, except for deletes;
both ReiserFS and ext3 delete files much more quickly than XFS, at least for
now.
Performance summary
I hope I've given you a general idea of what kind of performance you can
expect from XFS; my results show that XFS is the best filesystem to use if you
need to manipulate large files. For small to medium-sized files, XFS can be
competitive and sometimes even faster than ReiserFS if you create and mount your
XFS filesystem with some performance-enhancing options. Ext3 in "data=journal"
mode offered good performance, but it was difficult to get consistent
performance numbers due to apparent irregularities in how ext3 flushed data
from previous tests to disk, which would result in some disk thrashing.
XFS design
In the "Scalability in the XFS Filesystem" paper (see Resources later in this article)
featured at USENIX '96, the
SGI engineers explain that XFS was designed with a single main idea: "think
big". Indeed, XFS has been designed to eliminate the limitations found in
traditional filesystems. Now, let's take a look at some of the intriguing
design features behind XFS that make this possible.
Introducing allocation groups
When an XFS filesystem is created, the underlying block device is split into
eight or more equally-sized linear regions. You can think of them as "chunks"
or "linear ranges", but in XFS terminology each region is called an "allocation
group". Allocation groups are unique in that each allocation
group manages its own inodes and free space, in effect turning them into a kind
of sub-filesystem that exists transparently within the XFS filesystem proper.
Allocation groups and scalability
So, why exactly does XFS have allocation groups? Primarily, XFS uses
allocation groups so that it can efficiently handle parallel IO. Because each
allocation group is effectively its own independent entity, the kernel can
interact with multiple allocation groups simultaneously. Without
allocation groups, the XFS filesystem code could become a performance
bottleneck, forcing IO-hungry processes to "get in line" to make inode
modifications or performing other kinds of metadata-intensive operations.
Thanks to allocation groups, the XFS code will allow multiple threads and
processes to continue to run in parallel, even if many of them are performing
non-trivial IO on the same filesystem. So, match XFS with some high-end
hardware and you'll get high-end results rather than a filesystem bottleneck.
Allocation groups also help to optimize parallel IO performance on
multiprocessor systems, because more than one metadata update can be "in transit"
at the same time.
B+ trees everywhere
Internally, allocation groups use efficient B+ trees to keep track of important
data such as ranges (also called "extents") of free space, as well as inodes.
In fact, each allocation group has two B+ trees used to keep track of
free space; one stores the extents of free space ordered by size, and the other
tree has the regions ordered by their starting physical location on the block
device. The ability to find regions of free space quickly is critical for
maximizing write performance, which is something that XFS is very good at.
XFS is also very efficient when it comes to the management of inodes. Each
allocation group allocates inodes as needed, in groups of 64. An allocation
group keeps track of its own inodes by using a B+ tree that records where each
particular inode number can be found on disk. You'll find that XFS uses B+
trees as much as possible, due to their excellent performance and tremendous
scalability.
Journaling
Of course, XFS is a journaling filesystem, allowing for fast recovery after an
unexpected reboot. Like ReiserFS, XFS uses a logical journal; that is, it does
not journal literal filesystem blocks like ext3, and instead uses an efficient
on-disk format to log metadata changes. In the case of XFS, logical journaling
is a good fit; on high-end hardware, the journal is often the most contentious
resource of the entire filesystem. By using a space-efficient logical journal,
contention for the journal can be minimized. In addition, XFS allows the
journal to be stored on another block device, such as a partition on another
disk. This feature works well to improve XFS filesystem performance even
further.
Like ReiserFS, XFS only journals metadata, and does not take any special
precautions to ensure that the data makes it to disk before metadata is
written. This means that with XFS (just like with ReiserFS), it's possible for
recently modified data to be lost in the event of an unexpected reboot.
However, a couple of properties of XFS' journal make this issue less common
than it is with ReiserFS.
With ReiserFS, an unexpected reboot can result in recently modified files
containing portions of previously deleted files. Besides the obvious data
loss, this could also theoretically pose a security threat. In contrast, XFS
ensures that any unwritten data blocks are zeroed on reboot, when XFS
journal is replayed. Thus, missing blocks are filled with null bytes,
eliminating the security hole -- a much better approach.
Now, what about the data loss issue itself? In general, this problem is
minimized with XFS due to the fact that XFS generally writes pending metadata
updates to disk much more frequently than ReiserFS does, especially during
periods high disk activity. Thus, in the event of a lockup, you will
generally lose fewer of your recent metadata modifications than you would
with ReiserFS. Of course, this does not directly address the problem of
not writing data blocks in time, but writing metadata more frequently
does encourage data to be written more frequently as well.
Delayed allocation
We'll finish our technical overview of XFS by taking a look at delayed
allocation, a feature unique to XFS. As you probably know, the term
allocation refers to the process of finding regions of free space to use
for storing new data.
XFS handles allocation by breaking it into a two-step process. First, when XFS
receives new data to be written, it records the pending transaction in RAM and
simply reserves an appropriate amount of space on the underlying
filesystem. However, while XFS reserves space for the new data, it doesn't
decide what filesystem blocks will be used to store the data, at least not
yet. XFS procrastinates, delaying this decision to the last possible moment,
right before this data is actually written to disk.
By delaying allocation, XFS gains many opportunities to optimize write
performance. When it comes time to write the data to disk, XFS can now
allocate free space intelligently, in a way that optimizes filesystem
performance. In particular, if a bunch of new data is being appended to a
single file, XFS can allocate a single, contiguous region on disk to
store this data. If XFS hadn't delayed its allocation decision, it may have
unknowingly written the data into multiple non-contiguous chunks, reducing
write performance significantly. But, because XFS delayed its allocation
decision, it was able to write the data in one fell swoop, improving write
performance as well as reducing overall filesystem fragmentation.
Delayed allocation also has another performance benefit. In situations where
many short-lived temporary files are created, XFS may never need to write these
files to disk at all. Since no blocks are ever allocated, there's no need to
deallocate any blocks, and the underlying filesystem metadata doesn't even get
touched.
Conclusion
I hope you've enjoyed reading about the performance and technical characteristics
of XFS, one of Linux's powerful next-generation filesystems. Join me in my next
article when I show you how to get XFS up and running on your system. In my next
article, we'll also take a look at some of XFS' advanced features, such as ACLs
and extended attributes. I'll see you then!
Resources
About the author  | |  |
Residing in Albuquerque, New Mexico, Daniel Robbins is the President/CEO of
Gentoo Technologies, Inc., the creator of Gentoo Linux, an advanced Linux for the PC,
and the Portage system, a next-generation ports system for Linux. He
has also served as a contributing author for the Macmillan books
Caldera OpenLinux Unleashed, SuSE Linux
Unleashed, and Samba Unleashed. Daniel has been involved with computers in some fashion since the second grade, when he was first exposed to the Logo programming language as well as a
potentially dangerous dose of Pac Man. This probably explains why he has since
served as a Lead Graphic Artist at SONY Electronic Publishing/Psygnosis.
Daniel enjoys spending time with his wife, Mary, and their daughter, Hadassah. You can contact Daniel at drobbins@gentoo.org. |
Rate this page
|  |