PDA

View Full Version : best OS/file system for many small files



FoBoT
03-28-2005, 10:01 AM
what OS/file system is best to process/store millions of small files (from 1 to 50 Kb, most are < 5 Kb)?

by process i mean do some simple text parsing and then consolidate the data of thousands/10k's of them into single larger .csv files which are then used to populate a DB

thanks

needs to work well with about 10-20 million source files per year

Leviathann
03-28-2005, 10:38 AM
linux = ext3 (ReiserFS is suppose to be better but I've heard it's not worth the small performace gains) you might have to visit some linux forums/irc channels and get some help here as there are severeal linux FS's that you can use but I've only really used ext3. But I don't like linux anyway. Gentoo is THE optimized linux distro but is a pain in the arse

windows = ntfs5 (winXP Pro SP2)

FoBoT
03-28-2005, 12:15 PM
is ntfs5 already in server 2003? or will it be part of SP1 for server 2003?

Leviathann
03-28-2005, 01:35 PM
I'm pretty sure it's already ntfs5. I think ntfs5 started with win2000

ECL
03-28-2005, 01:40 PM
Win2K and XP use NTFS5. It started showing up as an option in NT 4 SP4. As far as MS is concerned, NTFS5 became the only supported NTFS once Win2K shipped, and it's no longer identified with a version number (the plan, apparently, was that NTFS5 would be the last NTFS before WinFS shipped with Longhorn in 2003. Oops.)

I'm OS agnostic, but if the filesystem is the critical element, I'd consider the issue of fragmentation.

NTFS is much more prone to performance-killing fragmentation than the Linux filesystems. If it's practical to periodically stop writing to the volume and run the defragger, then NTFS might be the way to go. If that's impractical, I'd consider going with Linux and probably the ext3 filesystem.

Linux filesystems are something of a religious issue to some, but the "common wisdom" is that ext3 is extremely stable while reiserfs is slightly faster but slightly less stable. I build my home Linux boxes with reiser (it's the default with Suse) but I'd use ext3 on production servers.


Without knowing more about the actual implementation you're planning, I can't say too much more. Is processing-time critical? Are you using Perl or something else to crunch the files? How often does this process happen? Is this a Windows shop or are you running a mixed environment?

IronBits
03-28-2005, 06:35 PM
Get a copy of Executive Software Diskeeper.
It's a MUST have, to keep your performance at it's maximum.

Scoofy12
03-28-2005, 08:30 PM
ive heard reiserfs is good for lots of tiny files... but that may have been in a benchmark on www.namesys.com (the reiserfs home site :)
i'm not sure reiser3 is any less stable than ext3, theyve both been around for about the same amount of time, and seem (to me) to be about equally popular

i use reiser4, but you need to be running a -mm linux kernel for that.

on the windows side of things, FAT16 is the fastest filesystem, but... well you probably dont want that:) might as well do what MS tells you to and use NTFS, its benefits probably outweigh any speed gains FAT32 would give you.