KEMBAR78
PythonFuse (PyCon4) | PDF
Python FUSE               File-System in USErspace


                             Beyond the Traditional File-Systems
                                 http://mbertozzi.develer.com/python-fuse




Matteo Bertozzi (Th30z)
      http://th30z.netsons.org
Talk Overview
• What is a File-System
• Brief File-Systems History
• What is FUSE
• Beyond the Traditional File-System
• API Overview
• Examples (Finally some code!!!)
  http://mbertozzi.develer.com/python-fuse


• Q&A
What is a File-System
Is a Method of storing and organizing data
     to make it easy to find and access.



...to interact with an object
 You name it, and you say
     what you want it do.


 The Filesystem takes the name you give
 Looks through disk to find the object
 Gives the object your request to do
 something.
What is a File-System



• On Disk Format (...serialized struct)
  ext2, ext3, reiserfs, btrfs...


• Namespace
  (Mapping between name and content)
  /home/th30z/, /usr/local/share/test.c, ...


• Runtime Service: open(), read(), write(), ...
(The Origins)

                         ...A bit of History
Multics 1965 (File-System Paper)
A General-Purpose File System For Secondary Storage
Unix Late 1969



                                           User Program

                  User Space

                 Kernel Space
                                        System Call Layer

                                           The File-System




   Only One File-System
(The Evolution)

                         ...A bit of History
Multics 1965 (File-System Paper)
A General-Purpose File System For Secondary Storage
Unix Late 1969



                                             User Program

                  User Space

                 Kernel Space
                                          System Call Layer
                                                    (which?)

                                The File-System 1              The File-System 2
(The Evolution)

                         ...A bit of History
Multics 1965 (File-System Paper)
A General-Purpose File System For Secondary Storage
Unix Late 1969



                                           User Program

                  User Space

                 Kernel Space
                                        System Call Layer
                                               (which?)

                                FS 1   FS 2    FS 3   FS 4    ...    FS N
(The Solution)

                         ...A bit of History
Multics 1965 (File-System Paper)
A General-Purpose File System For Secondary Storage
Unix Late 1969
Sun Microsystem 1984


                                              User Program

                  User Space

                 Kernel Space

                                         System Call Layer


                                       Vnode/VFS Layer

                                FS 1   FS 2    FS 3   FS 4   ...    FS N
Virtual File-System
•   Provides an abstraction within the kernel
    which allows different filesystem                                    C Library
    implementations to coexist.                                 (open(), read(), write(), ...)
                                                User Space
•   Provides the filesystem interface to
    userspace programs.                         Kernel Space
                                                                      System Calls
                                                                (sys_open(), sys_read(), ...)


               VFS Concepts                                                 VFS
    A super-block object represents a                           (vfs_read(), vfs_write(), ...)
    filesystem.
                                                   Kernel
                                                               ext2       ReiserFS         XFS
    I-Nodes are filesystem objects such as        Supported
                                                File-Systems
    regular files, directories, FIFOs, ...                      ext3        Reiser4          JFS

    A file object represents a file opened by a                  ext4         Btrfs         HFS+
    process.                                                    ...           ...           ...
Wow, It seems not much
difficult writing a filesystem
Why File-System are Complex

• You need to know the Kernel (No helper libraries: Qt, Glib, ...)
• Reduce Disk Seeks / SSD Block limited write cycles
• Be consistent, Power Down, Unfinished Write...
   (Journal, Soft-Updates, Copy-on-Write, ...)
• Bad Blocks, Disk Error
• Don't waste to much space for Metadata
• Extra Features: Deduplication, Compression,
   Cryptography, Snapshots…
File-Systems Lines of Code
    MinixFS    2,000

MinixFS Fuse   800

        UFS    2,000

   UFS Fuse    1,000

        FAT    6,000

       ext2    8,000

       ext3    16,000

    ReiserFS   27,000

       ext4    30,000

       btrfs   50,000

        NFS    10,000           8,000

       Fuse    7,000        9,000

      FtpFS    800

      SshFS    2,000


                        Kernel Space    User Space
Building a File-System is Difficult


• Writing good code is not easy (Bugs, Typo, ...)
• Writing good code in Kernel Space
  Is much more difficult!
• Too many reboots during the development
• Too many Kernel Panic during Reboot
• We need more flexibility and Speedups!
FUSE, develop your file-system
with your favorite language and library
             in user space
What is FUSE


• Kernel module! (like ext2, ReiserFS, XFS, ...)
• Allows non-privileged user to create their own file-
  system without editing the kernel code. (User Space)
• FUSE is particularly useful for writing "virtual file
  systems", that act as a view or translation of an
  existing file-system storage device. (Facilitate Disk-
  Based, Network-Based and Pseudo File-System)
• Bindings: Python, Objective-C, Ruby, Java, C#, ...
File-Systems in User Space?
             ...Make File Systems Development Super Easy




• All UserSpace Libraries are Available
• ...Debugging Tools
• No Kernel Recompilation
• No Machine Reboot!
  ...File-System upgrade/fix
  2 sec downtime, app restart!
Yeah, ...but what’s FUSE?
 It’s a File-System with user-space callbacks


 ntfs-3g            ifuse       ChrionFS
          sshfs
                       zfs-fuse     YouTubeFS
  gnome-vfs2    ftpfs        cryptoFS    RaleighFS
                                              U n i x
FUSE Kernel Space and User Space
                                                          Your FS

 The FUSE kernel module                                   SshFS
                                                                    lib Fuse
 and the FUSE library                                     FtpFS


 communicate via a                                          ...
                                         User Space
 special file descriptor                                             /dev/fuse
                                         Kernel Space
 which is obtained by                                                FUSE

 opening /dev/fuse                                                   ext2

                                                   ...    VFS        ext4
                                                drivers
                          Your Fuse FS                                 ...
                                                firmware
         User Input
                                                 kernel              Btrfs
         ls -l /myfuse/     lib FUSE



Kernel
             VFS              FUSE
...be creative

   Beyond the Traditional File-Systems

• ImapFS: Access to your           Thousand of
  mail with grep.                 tools available
• SocialFS: Log all your social    cat/grep/sed
  network to collect news/
  jokes and other social          open() is the
  things.                          most used
• YouTubeFS: Watch YouTube         function in
                                      our
  video as on your disk.
                                  applications
• GMailFS: Use your mailbox
  as backup disk.
FUSE API Overview
•   create(path, mode)           • mkdir(path, mode)
•   truncate(path, size)         • unlink(path)
•   mknod(path, mode, dev)       • readdir(path)
•   open(path, mode)             • rmdir(path)
•   write(path, data, offset)    • rename(opath, npath)
•   read(path, length, offset)   • link(srcpath, dstpath)
•   release(path)                                       Your Fuse FS


•
                                       User Input
    fsync(path)                        ls -l /myfuse/     lib FUSE

•   chmod(path, mode)
                              Kernel

•   chown(path, oid, gid)                  VFS              FUSE
(File Operations)

                                   FUSE API Overview
   Reading                                                    Appending
cat /myfuse/test.txt            Writing                 echo World >> /myfuse/test2.txt
                                                                                                Truncating
      getattr()        echo Hello > /myfuse/test2.txt               getattr()               echo Woo > /myfuse/test2.txt

                                   getattr()                                                       getattr()
       open()                                                        open()

                                   create()                                                       truncate()
       read()                                                       write()

                                   write()                                                          open()
       read()                                                        flush()

                                    flush()                                                          write()
      release()                                                     release()

                                  release()                                                         flush()

                                                                                                   release()


           Removing                 getattr()            unlink()
          rm /myfuse/test.txt
(Directory Operations)

                               FUSE API Overview
             Creating                                                      Removing
         mkdir /myfuse/folder              Reading                      rmdir /myfuse/folder

                getattr()              ls /myfuse/folder/                      getattr()

                                                getattr()
                 mkdir()                                                       rmdir()

                                                opendir()

                                                readdir()

                                               releasedir()



    Other Methods (getattr() is always called)
                                               chown th30z:develer /myfuse/test.txt        getattr() -> chown()
                                               chmod 755 /myfuse/test.txt                  getattr() -> chmod()

ln -s /myfuse/test.txt /myfuse/test-link.txt         getattr() -> symlink()
mv /myfuse/folder /myfuse/fancy-folder               getattr() -> rename()
First Code Example!
       HTFS
   (HashTable File-System)
HTFS Overview
                                                  FS Item/Object
• Traditional Filesystem                                             Metadata
  Object with Metadata                      Time of last access
                                            Time of last modification
  (mode, uid, gid, ...)                     Time of last status change
                                            Protection and file-type (mode)

• HashTable (dict) keys are                 User ID of owner (UID)
                                            Group ID of owner (GID)
  paths values are Items.                   Extended Attributes (Key/Value)


                                                          Data
                       Item 1

 Path 1
 Path 2                              Item can be a Regular File or
 Path 3
                      Item 2         Directory or FIFO...
 Path 4               Item 3
                                     Data is raw data or filename list if
 Path 5               Item 4         item is a directory.
 (Disk - Storage HashTable)
HTFS Item
class Item(object):
  def __init__(self, mode, uid, gid):
    # ----------------------------------- Metadata --
    self.atime = time.time() # time of last acces
    self.mtime = self.atime # time of last modification
    self.ctime = self.atime # time of last status change

   self.mode = mode # protection and file-type
   self.uid = uid   # user ID of owner
   self.gid = gid  # group ID of owner

   # Extended Attributes
   self.xattr = {}

   # --- Data -----------
                                            This is a File!
   if stat.S_ISDIR(mode):                  we’ve metadata
       self.data = set()                 data and even xattr
   else:
       self.data = ''
(Data Helper)

                               HTFS Item
def read(self, offset, length):
  return self.data[offset:offset+length]

def write(self, offset, data):
   length = len(data)
   self.data = self.data[:offset] + data + self.data[offset+length:]
   return length

def truncate(self, length):
   if len(self.data) > length:
        self.data = self.data[:length]
   else:
        self.data += 'x00' * (length - len(self.data))              ...a couple
                                                           of utility methods
                                                                to read/write
                                                            and interact with
                                                                           data.
HTFS Fuse Operations
class HTFS(fuse.Fuse):                                          getattr() is called
  def __init__(self, *args, **kwargs):
                                                             before any operation.
   fuse.Fuse.__init__(self, *args, **kwargs)
                                                             Tells to the VFS if you
  self.uid = os.getuid()                                        can access to the
  self.gid = os.getgid()                                      specified file and the
                                                                     “State”.
  root_dir = Item(0755 | stat.S_IFDIR, self.uid, self.gid)
  self._storage = {'/': root_dir}                            def getattr(self, path):
                                                              if not path in self._storage:
                                                                  return -errno.ENOENT
 File-System must be initialized
       with the / directory                                    # Lookup Item and fill the stat struct
                                                               item = self._storage[path]
                                                               st = zstat(fuse.Stat())
              def main():                                      st.st_mode = item.mode
                server = HTFS()                                st.st_uid = item.uid
                                                               st.st_gid = item.gid
                server.main()                                  st.st_atime = item.atime
                                                               st.st_mtime = item.mtime
         Your FUSE File-System                                 st.st_ctime = item.ctime
            is like a Server...                                st.st_size = len(item.data)
                                                               return st
(File Operations)

                      HTFS Fuse Operations
def create(self, path, flags, mode):
  self._storage[path] = Item(mode | stat.S_IFREG, self.uid, self.gid)
  self._add_to_parent_dir(path)

def truncate(self, path, len):
   self._storage[path].truncate(len)

def read(self, path, size, offset):
  return self._storage[path].read(offset, size)    def unlink(self, path):
                                                    self._remove_from_parent_dir(path)
def write(self, path, buf, offset):                 del self._storage[path]
  return self._storage[path].write(offset, buf)
                                                   def rename(self, oldpath, newpath):
        Disk is just a big                          item = self._storage.pop(oldpath)
          dictionary...                             self._storage[newpath] = item

     ...and files are items
          key = name
          value = data
(Directory Operations)

                  HTFS Fuse Operations
def mkdir(self, path, mode):
  self._storage[path] = Item(mode | stat.S_IFDIR, self.uid, self.gid)
  self._add_to_parent_dir(path)

def rmdir(self, path):
  self._remove_from_parent_dir(path)
  del self._storage[path]                     Directory is a File
                                                that contains
def readdir(self, path, offset):                 File names
  dir_items = self._storage[path].data
  for item in dir_items:                           as data!
     yield fuse.Direntry(item)

def _add_to_parent_dir(self, path):
  parent_path = os.path.dirname(path)
  filename = os.path.basename(path)
  self._storage[parent_path].data.add(filename)
(XAttr Operations)

                     HTFS Fuse Operations
def setxattr(self, path, name, value, flags):
  self._storage[path].xattr[name] = value

def getxattr(self, path, name, size):
  value = self._storage[path].xattr.get(name, '')
  if size == 0: # We are asked for size of the value
    return len(value)
  return value                                      Extended attributes
                                                extend the basic attributes
def listxattr(self, path, size):                  associated with files and
  attrs = self._storage[path].xattr.keys()
                                                    directories in the file
  if size == 0:
     return len(attrs) + len(''.join(attrs))      system. They are stored
  return attrs                                       as name:data pairs
                                                associated with file system
def removexattr(self, path, name):                         objects
  if name in self._storage[path].xattr:
      del self._storage[path].xattr[name]
(Other Operations)

                   HTFS Fuse Operations
       Lookup Item,                    def chmod(self, path, mode):
                                         item = self._storage[path]
       Access to its                     item.mode = mode
information/data return or
          write it.                    def chown(self, path, uid, gid):
                                         item = self._storage[path]
         This is the                     item.uid = uid
     File-System’s Job                   item.gid = gid

  def symlink(self, path, newpath):
    item = Item(0644 | stat.S_IFLNK, self.uid, self.gid)
    item.data = path
    self._storage[newpath] = item
    self._add_to_parent_dir(newpath)        Symlinks contains just
                                               pointed file path.
  def readlink(self, path):
    return self._storage[path].data
Other small Examples
Simulate Tera Byte Files
class TBFS(fuse.Fuse):                               Read-Only FS
    def getattr(self, path):
                                                      with 1 file
       st = zstat(fuse.Stat())
       if path == '/':
                                                      of 128TiB
           st.st_mode = 0644 | stat.S_IFDIR
           st.st_size = 1
           return st
       elif path == '/tera.data':                   No
           st.st_mode = 0644 | stat.S_IFREG
           st.st_size = 128 * (2 ** 40)       Disk/RAM Space
           return st
       return -errno.ENOENT
                                                 Required!
   def read(self, path, size, offset):
      return '0' * size
                                                    read()
   def readdir(self, path, offset):
                                                Send data only
      if path == '/':                          when is requested
          yield fuse.Direntry('tera.data')
X^OR File-System
def _xorData(data):
  data = [chr(ord(c) ^ 10) for c in data]             10101010 ^
  return string.join(data, “”)                        01010101 =
class XorFS(fuse.Fuse):                               ---------
   ...                                                11111111 ^
   def write(self, path, buf, offset):
       data = _xorData(buf)                           01010101 =
       return _writeData(path, offset, data)          ---------
  def read(self, path, length, offset):               10101010
      data = _readData(path, offset, length)
      return _xorData(data)
  ...
                                               res = _xorData(“xor”)
                                               print res // “rex”
                                               res2 = _xorData(res)
                                               print res // “xor”
Dup Write File-System
class DupFS(fuse.Fuse):
   def __init__(self, *args, **kwargs):               Write on your Disk
       ...
       fd_disk1 = open(‘/dev/hda1’, ...)               partition 1 and 2.
       fd_disk2 = open(‘/dev/hdb5’, ...)
       fd_log = open(‘/home/th30z/testfs.log’, ...)
       fd_net = socket.socket(...)                         Send data
       ...
   ...                                                   over Network
   def write(self, path, buf, offset):
       ...
       disk_write(fd_disk1, path, offset, buf)              Log your
       disk_write(fd_disk2, path, offset, buf)             file-system
       net_write(fd_net, path, offset, buf)
       log_write(fd_log, path, offset, buf)
                                                           operations
       ...
                                                  ...do other fancy stuff
One more thing
(File and Folders doesn’t fit)


Rethink the File-System
                 I dont’t know
               where I’ve to place
                    this file...

                 ...Ok, for now
                Desktop is a good
                     place...
(Mobile/Home Devices)


Rethink the File-System
             Small Devices
              Small Files
             EMails, Text...


       We need to lookup
        quickly our data.
         Tags, Full-Text
            Search...

                           ...Encourage people
                         to view their content
                                    as objects.
(Large Clusters, The Cloud...)


 Rethink the File-System
Distributed data
   Scalability
   Fail over
    Cluster
  Rebalancing
Q&A
                                          Python FUSE
                                 http://mbertozzi.develer.com/python-fuse




Matteo Bertozzi (Th30z)
      http://th30z.netsons.org

PythonFuse (PyCon4)

  • 1.
    Python FUSE File-System in USErspace Beyond the Traditional File-Systems http://mbertozzi.develer.com/python-fuse Matteo Bertozzi (Th30z) http://th30z.netsons.org
  • 2.
    Talk Overview • Whatis a File-System • Brief File-Systems History • What is FUSE • Beyond the Traditional File-System • API Overview • Examples (Finally some code!!!) http://mbertozzi.develer.com/python-fuse • Q&A
  • 3.
    What is aFile-System Is a Method of storing and organizing data to make it easy to find and access. ...to interact with an object You name it, and you say what you want it do. The Filesystem takes the name you give Looks through disk to find the object Gives the object your request to do something.
  • 4.
    What is aFile-System • On Disk Format (...serialized struct) ext2, ext3, reiserfs, btrfs... • Namespace (Mapping between name and content) /home/th30z/, /usr/local/share/test.c, ... • Runtime Service: open(), read(), write(), ...
  • 5.
    (The Origins) ...A bit of History Multics 1965 (File-System Paper) A General-Purpose File System For Secondary Storage Unix Late 1969 User Program User Space Kernel Space System Call Layer The File-System Only One File-System
  • 6.
    (The Evolution) ...A bit of History Multics 1965 (File-System Paper) A General-Purpose File System For Secondary Storage Unix Late 1969 User Program User Space Kernel Space System Call Layer (which?) The File-System 1 The File-System 2
  • 7.
    (The Evolution) ...A bit of History Multics 1965 (File-System Paper) A General-Purpose File System For Secondary Storage Unix Late 1969 User Program User Space Kernel Space System Call Layer (which?) FS 1 FS 2 FS 3 FS 4 ... FS N
  • 8.
    (The Solution) ...A bit of History Multics 1965 (File-System Paper) A General-Purpose File System For Secondary Storage Unix Late 1969 Sun Microsystem 1984 User Program User Space Kernel Space System Call Layer Vnode/VFS Layer FS 1 FS 2 FS 3 FS 4 ... FS N
  • 9.
    Virtual File-System • Provides an abstraction within the kernel which allows different filesystem C Library implementations to coexist. (open(), read(), write(), ...) User Space • Provides the filesystem interface to userspace programs. Kernel Space System Calls (sys_open(), sys_read(), ...) VFS Concepts VFS A super-block object represents a (vfs_read(), vfs_write(), ...) filesystem. Kernel ext2 ReiserFS XFS I-Nodes are filesystem objects such as Supported File-Systems regular files, directories, FIFOs, ... ext3 Reiser4 JFS A file object represents a file opened by a ext4 Btrfs HFS+ process. ... ... ...
  • 10.
    Wow, It seemsnot much difficult writing a filesystem
  • 11.
    Why File-System areComplex • You need to know the Kernel (No helper libraries: Qt, Glib, ...) • Reduce Disk Seeks / SSD Block limited write cycles • Be consistent, Power Down, Unfinished Write... (Journal, Soft-Updates, Copy-on-Write, ...) • Bad Blocks, Disk Error • Don't waste to much space for Metadata • Extra Features: Deduplication, Compression, Cryptography, Snapshots…
  • 12.
    File-Systems Lines ofCode MinixFS 2,000 MinixFS Fuse 800 UFS 2,000 UFS Fuse 1,000 FAT 6,000 ext2 8,000 ext3 16,000 ReiserFS 27,000 ext4 30,000 btrfs 50,000 NFS 10,000 8,000 Fuse 7,000 9,000 FtpFS 800 SshFS 2,000 Kernel Space User Space
  • 13.
    Building a File-Systemis Difficult • Writing good code is not easy (Bugs, Typo, ...) • Writing good code in Kernel Space Is much more difficult! • Too many reboots during the development • Too many Kernel Panic during Reboot • We need more flexibility and Speedups!
  • 14.
    FUSE, develop yourfile-system with your favorite language and library in user space
  • 15.
    What is FUSE •Kernel module! (like ext2, ReiserFS, XFS, ...) • Allows non-privileged user to create their own file- system without editing the kernel code. (User Space) • FUSE is particularly useful for writing "virtual file systems", that act as a view or translation of an existing file-system storage device. (Facilitate Disk- Based, Network-Based and Pseudo File-System) • Bindings: Python, Objective-C, Ruby, Java, C#, ...
  • 16.
    File-Systems in UserSpace? ...Make File Systems Development Super Easy • All UserSpace Libraries are Available • ...Debugging Tools • No Kernel Recompilation • No Machine Reboot! ...File-System upgrade/fix 2 sec downtime, app restart!
  • 17.
    Yeah, ...but what’sFUSE? It’s a File-System with user-space callbacks ntfs-3g ifuse ChrionFS sshfs zfs-fuse YouTubeFS gnome-vfs2 ftpfs cryptoFS RaleighFS U n i x
  • 18.
    FUSE Kernel Spaceand User Space Your FS The FUSE kernel module SshFS lib Fuse and the FUSE library FtpFS communicate via a ... User Space special file descriptor /dev/fuse Kernel Space which is obtained by FUSE opening /dev/fuse ext2 ... VFS ext4 drivers Your Fuse FS ... firmware User Input kernel Btrfs ls -l /myfuse/ lib FUSE Kernel VFS FUSE
  • 19.
    ...be creative Beyond the Traditional File-Systems • ImapFS: Access to your Thousand of mail with grep. tools available • SocialFS: Log all your social cat/grep/sed network to collect news/ jokes and other social open() is the things. most used • YouTubeFS: Watch YouTube function in our video as on your disk. applications • GMailFS: Use your mailbox as backup disk.
  • 20.
    FUSE API Overview • create(path, mode) • mkdir(path, mode) • truncate(path, size) • unlink(path) • mknod(path, mode, dev) • readdir(path) • open(path, mode) • rmdir(path) • write(path, data, offset) • rename(opath, npath) • read(path, length, offset) • link(srcpath, dstpath) • release(path) Your Fuse FS • User Input fsync(path) ls -l /myfuse/ lib FUSE • chmod(path, mode) Kernel • chown(path, oid, gid) VFS FUSE
  • 21.
    (File Operations) FUSE API Overview Reading Appending cat /myfuse/test.txt Writing echo World >> /myfuse/test2.txt Truncating getattr() echo Hello > /myfuse/test2.txt getattr() echo Woo > /myfuse/test2.txt getattr() getattr() open() open() create() truncate() read() write() write() open() read() flush() flush() write() release() release() release() flush() release() Removing getattr() unlink() rm /myfuse/test.txt
  • 22.
    (Directory Operations) FUSE API Overview Creating Removing mkdir /myfuse/folder Reading rmdir /myfuse/folder getattr() ls /myfuse/folder/ getattr() getattr() mkdir() rmdir() opendir() readdir() releasedir() Other Methods (getattr() is always called) chown th30z:develer /myfuse/test.txt getattr() -> chown() chmod 755 /myfuse/test.txt getattr() -> chmod() ln -s /myfuse/test.txt /myfuse/test-link.txt getattr() -> symlink() mv /myfuse/folder /myfuse/fancy-folder getattr() -> rename()
  • 23.
    First Code Example! HTFS (HashTable File-System)
  • 24.
    HTFS Overview FS Item/Object • Traditional Filesystem Metadata Object with Metadata Time of last access Time of last modification (mode, uid, gid, ...) Time of last status change Protection and file-type (mode) • HashTable (dict) keys are User ID of owner (UID) Group ID of owner (GID) paths values are Items. Extended Attributes (Key/Value) Data Item 1 Path 1 Path 2 Item can be a Regular File or Path 3 Item 2 Directory or FIFO... Path 4 Item 3 Data is raw data or filename list if Path 5 Item 4 item is a directory. (Disk - Storage HashTable)
  • 25.
    HTFS Item class Item(object): def __init__(self, mode, uid, gid): # ----------------------------------- Metadata -- self.atime = time.time() # time of last acces self.mtime = self.atime # time of last modification self.ctime = self.atime # time of last status change self.mode = mode # protection and file-type self.uid = uid # user ID of owner self.gid = gid # group ID of owner # Extended Attributes self.xattr = {} # --- Data ----------- This is a File! if stat.S_ISDIR(mode): we’ve metadata self.data = set() data and even xattr else: self.data = ''
  • 26.
    (Data Helper) HTFS Item def read(self, offset, length): return self.data[offset:offset+length] def write(self, offset, data): length = len(data) self.data = self.data[:offset] + data + self.data[offset+length:] return length def truncate(self, length): if len(self.data) > length: self.data = self.data[:length] else: self.data += 'x00' * (length - len(self.data)) ...a couple of utility methods to read/write and interact with data.
  • 27.
    HTFS Fuse Operations classHTFS(fuse.Fuse): getattr() is called def __init__(self, *args, **kwargs): before any operation. fuse.Fuse.__init__(self, *args, **kwargs) Tells to the VFS if you self.uid = os.getuid() can access to the self.gid = os.getgid() specified file and the “State”. root_dir = Item(0755 | stat.S_IFDIR, self.uid, self.gid) self._storage = {'/': root_dir} def getattr(self, path): if not path in self._storage: return -errno.ENOENT File-System must be initialized with the / directory # Lookup Item and fill the stat struct item = self._storage[path] st = zstat(fuse.Stat()) def main(): st.st_mode = item.mode server = HTFS() st.st_uid = item.uid st.st_gid = item.gid server.main() st.st_atime = item.atime st.st_mtime = item.mtime Your FUSE File-System st.st_ctime = item.ctime is like a Server... st.st_size = len(item.data) return st
  • 28.
    (File Operations) HTFS Fuse Operations def create(self, path, flags, mode): self._storage[path] = Item(mode | stat.S_IFREG, self.uid, self.gid) self._add_to_parent_dir(path) def truncate(self, path, len): self._storage[path].truncate(len) def read(self, path, size, offset): return self._storage[path].read(offset, size) def unlink(self, path): self._remove_from_parent_dir(path) def write(self, path, buf, offset): del self._storage[path] return self._storage[path].write(offset, buf) def rename(self, oldpath, newpath): Disk is just a big item = self._storage.pop(oldpath) dictionary... self._storage[newpath] = item ...and files are items key = name value = data
  • 29.
    (Directory Operations) HTFS Fuse Operations def mkdir(self, path, mode): self._storage[path] = Item(mode | stat.S_IFDIR, self.uid, self.gid) self._add_to_parent_dir(path) def rmdir(self, path): self._remove_from_parent_dir(path) del self._storage[path] Directory is a File that contains def readdir(self, path, offset): File names dir_items = self._storage[path].data for item in dir_items: as data! yield fuse.Direntry(item) def _add_to_parent_dir(self, path): parent_path = os.path.dirname(path) filename = os.path.basename(path) self._storage[parent_path].data.add(filename)
  • 30.
    (XAttr Operations) HTFS Fuse Operations def setxattr(self, path, name, value, flags): self._storage[path].xattr[name] = value def getxattr(self, path, name, size): value = self._storage[path].xattr.get(name, '') if size == 0: # We are asked for size of the value return len(value) return value Extended attributes extend the basic attributes def listxattr(self, path, size): associated with files and attrs = self._storage[path].xattr.keys() directories in the file if size == 0: return len(attrs) + len(''.join(attrs)) system. They are stored return attrs as name:data pairs associated with file system def removexattr(self, path, name): objects if name in self._storage[path].xattr: del self._storage[path].xattr[name]
  • 31.
    (Other Operations) HTFS Fuse Operations Lookup Item, def chmod(self, path, mode): item = self._storage[path] Access to its item.mode = mode information/data return or write it. def chown(self, path, uid, gid): item = self._storage[path] This is the item.uid = uid File-System’s Job item.gid = gid def symlink(self, path, newpath): item = Item(0644 | stat.S_IFLNK, self.uid, self.gid) item.data = path self._storage[newpath] = item self._add_to_parent_dir(newpath) Symlinks contains just pointed file path. def readlink(self, path): return self._storage[path].data
  • 32.
  • 33.
    Simulate Tera ByteFiles class TBFS(fuse.Fuse): Read-Only FS def getattr(self, path): with 1 file st = zstat(fuse.Stat()) if path == '/': of 128TiB st.st_mode = 0644 | stat.S_IFDIR st.st_size = 1 return st elif path == '/tera.data': No st.st_mode = 0644 | stat.S_IFREG st.st_size = 128 * (2 ** 40) Disk/RAM Space return st return -errno.ENOENT Required! def read(self, path, size, offset): return '0' * size read() def readdir(self, path, offset): Send data only if path == '/': when is requested yield fuse.Direntry('tera.data')
  • 34.
    X^OR File-System def _xorData(data): data = [chr(ord(c) ^ 10) for c in data] 10101010 ^ return string.join(data, “”) 01010101 = class XorFS(fuse.Fuse): --------- ... 11111111 ^ def write(self, path, buf, offset): data = _xorData(buf) 01010101 = return _writeData(path, offset, data) --------- def read(self, path, length, offset): 10101010 data = _readData(path, offset, length) return _xorData(data) ... res = _xorData(“xor”) print res // “rex” res2 = _xorData(res) print res // “xor”
  • 35.
    Dup Write File-System classDupFS(fuse.Fuse): def __init__(self, *args, **kwargs): Write on your Disk ... fd_disk1 = open(‘/dev/hda1’, ...) partition 1 and 2. fd_disk2 = open(‘/dev/hdb5’, ...) fd_log = open(‘/home/th30z/testfs.log’, ...) fd_net = socket.socket(...) Send data ... ... over Network def write(self, path, buf, offset): ... disk_write(fd_disk1, path, offset, buf) Log your disk_write(fd_disk2, path, offset, buf) file-system net_write(fd_net, path, offset, buf) log_write(fd_log, path, offset, buf) operations ... ...do other fancy stuff
  • 36.
  • 37.
    (File and Foldersdoesn’t fit) Rethink the File-System I dont’t know where I’ve to place this file... ...Ok, for now Desktop is a good place...
  • 38.
    (Mobile/Home Devices) Rethink theFile-System Small Devices Small Files EMails, Text... We need to lookup quickly our data. Tags, Full-Text Search... ...Encourage people to view their content as objects.
  • 39.
    (Large Clusters, TheCloud...) Rethink the File-System Distributed data Scalability Fail over Cluster Rebalancing
  • 40.
    Q&A Python FUSE http://mbertozzi.develer.com/python-fuse Matteo Bertozzi (Th30z) http://th30z.netsons.org