logging in or signing up MOS19 Nellwyn Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 174 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 23, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript File Systems: File Systems © Jeff Parker, 2004 I haven’t lost my mind: it’s backed up on tape somewhereOutline: Outline Review what a file system must doFile Access Profile: File Access Profile Results of study by Satyanarayanan on Academic Unix File system interface allows many patterns of access. Most files are small (<10K), but largest files take up more than 1/2 disk Read far more frequent than write Data in files is often overwritten Most files written, read by a one user File reference is very local within file, and few files are read in a time slice Most common pattern: sequential read of all data (not true for DBs) Lifetime is Bimodal: temp files: have short life - system files: long lifeWhat goes into a file?: What goes into a file? In Unix, a file is a named sequence of bytes This is a reaction to the excesses of the time As with everything else, Unix provides the basic function, and lets others add value by implementing a database on top. Other alternatives include a sequence of records Once you have records, you want index access to those records File attributes: File attributes Typical to store the following for each file Name Type (if you keep) Version (if you keep versions) Location Size Current position of cursor - (is this per file, per user, or in between?) Protection Use Count Time Of Day of Creation, Last Modification, Last Use… File Type: File Type DOS and VMS take the extension to define the file type .ppt and .pdf .com and .exe Mac has resource fork that stores info on icon and application Tops-20 would keep track of the origin of an executable and rebuild if the source was more recent Typical in typed systems to limit what the user can do Thus you can run a binary file, and edit a source file You cannot edit a binary file or run a source file What do you do with an encrypted e-mail? Is it text or data? Unix does not have the notion of a file type Two types: text and data, and it is up to the application to guess File Type: File Type Unix does not have the notion of a file type Two types: text and data, and it is up to the application to guess, using the magic file % file a.out a.out: Mach-O executable ppc % more /usr/share/file/magic … # ALAN 0 beshort 0x0206 ALAN game data >2 byte <10 version 2.6%d # Infocom #0 byte 3 Infocom game data (Z-machine 3, #>2 beshort <0x7fff Release %3d, #>26 beshort >0 Size %d*2 #>18 string >\0 Serial %.6s)Blocks: Blocks We want to optimize the access of the file May depend upon selecting a good size for the disk block Usual trade-off applies with internal fragmentation vs overhead of fetch If you have a record-based file, can a record smaller than block size cross block boundaries? There are arguments for both answersDirectory Structure: Directory Structure Directories are ways to map from a file name to file attributes, including contents Early systems would have a single directory Simplified the OS Forces user to organize things by name alone Does not scale Other early systems would have a single directory per user Has all of the drawbacks of the previous system, as well as making it difficult to share To share executables, store them with the “bin” user - perhaps user 0 Include this directory in you directory PATH Hierarchical Directory is almost exclusively used today Directory Structure: Directory Structure If we have a Hierarchical Directory system, what structures are allowed? Tree structure Prevents a user from linking to my files Acyclic Graph Graph with no loops How do you check that there are no loops? When do you do that? General Graph Complicates garbage collection: use counter is no longer enough Protection: Protection What actions might we wish to do to a file? Read, Write, Execute, Append, Delete Various schemes Naming: if you can say it, you can have it Passwords: Lock with password ACLs: List of who can do what to the file On it’s own, this does not scale unless you can use groups Access Groups: How do you manage users who can be in multiple groups? Capabilities: Like the fabled Letter of Passage in Casablanca Directory Operations: Directory Operations The directory is a symbolic table that maps names into objects What do we want to do in a directory? Insert files Delete files Rename files List files in order There are multiple Data Structures that can do this Linked List Binary Search Tree Hash Table Let’s look at the operations we might wish to performFile Operations: File Operations Create File Find space to store the file Insert entry into Directory Write Find the file Update file structures Read Find file Copy data Update Cursor Lseek Find file Update Cursor Delete Find file Update directory Release resources Rather than find the file again and again, we typically open the file and keep a reference to it This state is kept by userProgramming Interface: Programming Interface Two major models for a file service 1) Map file into user address space, as done by Multics, Apollo Domain Interface becomes pointer or array based manipulations 2) Have the file service export an interface: traditional Unix model int fd; fd = open (filename, flags, mode); ftruncate(fd, (off_t) 0); len = strlen(st); count = write(fd, st, len); ... count = read(fd, buf, MAX); if (count != MAX) perror("Too few bytes"); Access Modes: Access Modes How do we access file? Sequential Access - one darn thing after another Direct Access - get the byte (record) at position 14 Index Access IBM offered ISAM - Indexed Sequential Access Method files Each file had one or more tree-structured indexs File was kept sorted on primary key for lists, etc Index allowed rapid search on other fields Unix offers sequential and direct access Designers felt that it was best to provide tools for others to write a DB system Consistency Semantics: Consistency Semantics What happens when two users are updating the same file? Unix Writes by User A are visible almost instantly by User B Users can share a cursor The file position is part of task state that is copied in fork Andrew (AFS) Writes by A are not seen by B After A closes the file, his updates will be seen by others who open the file Immutable File System User A cannot change the file A can make a new version of AFile Contents: File Contents How do we keep track of file contents? Contiguous Allocation Simple to implement - directory entry just needs to remember first block Inefficient if files are allowed to grow - external fragmentation Linked List Allocation Each block includes pointer to next block Directory only needs to keep pointer to the first block Blocks are no longer power of 2 in size Seek is slow to implement Linked List in Index Block We keep an index block on the disk. One word per disk block Each word holds the address of next block in the file Directory only needs to keep the index of the first work. To perform Random Access, we read index block into memory To keep the size of the table reasonable, need to use large block size File Contents: File Contents Unix uses the best of the these, by adding a level of indirection Defines a structure called an i-node (index node) Structure has room for file attributes Alternative is to store this in the directory Has room for about a dozen block pointers Then has an indirect block, with more pointers Double Indirect block… Triple Indirect Block Can deal with huge files even with small block sizeDirectory Entries: Directory Entries CP/M would have been DOS, if Gary Kildal had known enough to wear a suit User code is used in searches: users share one common directory, but only see their files The Extent is used if the file needs more than 16 blocks File gets multiple directory entries, and extent orders them The block count tells us how many blocks there are. There is not file size: file size is rounded up to number of blocks User code File Name .etn Extent Block Count Disk Block NumbersMS-DOS Directory Entries: MS-DOS Directory Entries Directory entry holds pointer to first block in index block Index block simplifies the entry considerably File Name .etn Attributes 8 8 1 10 2 2 2 4 size Time Date First Block numberUnix Directory Entries: Unix Directory Entries Directory Entry can be as simple as an ASCII string and an i-node number Consider a lookup of /usr/ast/mbox You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
MOS19 Nellwyn Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 174 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 23, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript File Systems: File Systems © Jeff Parker, 2004 I haven’t lost my mind: it’s backed up on tape somewhereOutline: Outline Review what a file system must doFile Access Profile: File Access Profile Results of study by Satyanarayanan on Academic Unix File system interface allows many patterns of access. Most files are small (<10K), but largest files take up more than 1/2 disk Read far more frequent than write Data in files is often overwritten Most files written, read by a one user File reference is very local within file, and few files are read in a time slice Most common pattern: sequential read of all data (not true for DBs) Lifetime is Bimodal: temp files: have short life - system files: long lifeWhat goes into a file?: What goes into a file? In Unix, a file is a named sequence of bytes This is a reaction to the excesses of the time As with everything else, Unix provides the basic function, and lets others add value by implementing a database on top. Other alternatives include a sequence of records Once you have records, you want index access to those records File attributes: File attributes Typical to store the following for each file Name Type (if you keep) Version (if you keep versions) Location Size Current position of cursor - (is this per file, per user, or in between?) Protection Use Count Time Of Day of Creation, Last Modification, Last Use… File Type: File Type DOS and VMS take the extension to define the file type .ppt and .pdf .com and .exe Mac has resource fork that stores info on icon and application Tops-20 would keep track of the origin of an executable and rebuild if the source was more recent Typical in typed systems to limit what the user can do Thus you can run a binary file, and edit a source file You cannot edit a binary file or run a source file What do you do with an encrypted e-mail? Is it text or data? Unix does not have the notion of a file type Two types: text and data, and it is up to the application to guess File Type: File Type Unix does not have the notion of a file type Two types: text and data, and it is up to the application to guess, using the magic file % file a.out a.out: Mach-O executable ppc % more /usr/share/file/magic … # ALAN 0 beshort 0x0206 ALAN game data >2 byte <10 version 2.6%d # Infocom #0 byte 3 Infocom game data (Z-machine 3, #>2 beshort <0x7fff Release %3d, #>26 beshort >0 Size %d*2 #>18 string >\0 Serial %.6s)Blocks: Blocks We want to optimize the access of the file May depend upon selecting a good size for the disk block Usual trade-off applies with internal fragmentation vs overhead of fetch If you have a record-based file, can a record smaller than block size cross block boundaries? There are arguments for both answersDirectory Structure: Directory Structure Directories are ways to map from a file name to file attributes, including contents Early systems would have a single directory Simplified the OS Forces user to organize things by name alone Does not scale Other early systems would have a single directory per user Has all of the drawbacks of the previous system, as well as making it difficult to share To share executables, store them with the “bin” user - perhaps user 0 Include this directory in you directory PATH Hierarchical Directory is almost exclusively used today Directory Structure: Directory Structure If we have a Hierarchical Directory system, what structures are allowed? Tree structure Prevents a user from linking to my files Acyclic Graph Graph with no loops How do you check that there are no loops? When do you do that? General Graph Complicates garbage collection: use counter is no longer enough Protection: Protection What actions might we wish to do to a file? Read, Write, Execute, Append, Delete Various schemes Naming: if you can say it, you can have it Passwords: Lock with password ACLs: List of who can do what to the file On it’s own, this does not scale unless you can use groups Access Groups: How do you manage users who can be in multiple groups? Capabilities: Like the fabled Letter of Passage in Casablanca Directory Operations: Directory Operations The directory is a symbolic table that maps names into objects What do we want to do in a directory? Insert files Delete files Rename files List files in order There are multiple Data Structures that can do this Linked List Binary Search Tree Hash Table Let’s look at the operations we might wish to performFile Operations: File Operations Create File Find space to store the file Insert entry into Directory Write Find the file Update file structures Read Find file Copy data Update Cursor Lseek Find file Update Cursor Delete Find file Update directory Release resources Rather than find the file again and again, we typically open the file and keep a reference to it This state is kept by userProgramming Interface: Programming Interface Two major models for a file service 1) Map file into user address space, as done by Multics, Apollo Domain Interface becomes pointer or array based manipulations 2) Have the file service export an interface: traditional Unix model int fd; fd = open (filename, flags, mode); ftruncate(fd, (off_t) 0); len = strlen(st); count = write(fd, st, len); ... count = read(fd, buf, MAX); if (count != MAX) perror("Too few bytes"); Access Modes: Access Modes How do we access file? Sequential Access - one darn thing after another Direct Access - get the byte (record) at position 14 Index Access IBM offered ISAM - Indexed Sequential Access Method files Each file had one or more tree-structured indexs File was kept sorted on primary key for lists, etc Index allowed rapid search on other fields Unix offers sequential and direct access Designers felt that it was best to provide tools for others to write a DB system Consistency Semantics: Consistency Semantics What happens when two users are updating the same file? Unix Writes by User A are visible almost instantly by User B Users can share a cursor The file position is part of task state that is copied in fork Andrew (AFS) Writes by A are not seen by B After A closes the file, his updates will be seen by others who open the file Immutable File System User A cannot change the file A can make a new version of AFile Contents: File Contents How do we keep track of file contents? Contiguous Allocation Simple to implement - directory entry just needs to remember first block Inefficient if files are allowed to grow - external fragmentation Linked List Allocation Each block includes pointer to next block Directory only needs to keep pointer to the first block Blocks are no longer power of 2 in size Seek is slow to implement Linked List in Index Block We keep an index block on the disk. One word per disk block Each word holds the address of next block in the file Directory only needs to keep the index of the first work. To perform Random Access, we read index block into memory To keep the size of the table reasonable, need to use large block size File Contents: File Contents Unix uses the best of the these, by adding a level of indirection Defines a structure called an i-node (index node) Structure has room for file attributes Alternative is to store this in the directory Has room for about a dozen block pointers Then has an indirect block, with more pointers Double Indirect block… Triple Indirect Block Can deal with huge files even with small block sizeDirectory Entries: Directory Entries CP/M would have been DOS, if Gary Kildal had known enough to wear a suit User code is used in searches: users share one common directory, but only see their files The Extent is used if the file needs more than 16 blocks File gets multiple directory entries, and extent orders them The block count tells us how many blocks there are. There is not file size: file size is rounded up to number of blocks User code File Name .etn Extent Block Count Disk Block NumbersMS-DOS Directory Entries: MS-DOS Directory Entries Directory entry holds pointer to first block in index block Index block simplifies the entry considerably File Name .etn Attributes 8 8 1 10 2 2 2 4 size Time Date First Block numberUnix Directory Entries: Unix Directory Entries Directory Entry can be as simple as an ASCII string and an i-node number Consider a lookup of /usr/ast/mbox