OPS102 - Filesystem Basics

From CDOT Wiki
Jump to: navigation, search


Hierarchical File Systems

Most modern computers are equipped with one or more random-access storage devices -- either a mechanical hard disk drive (HDD), or a fully electronic solid state disk (SSD). Both of these provide a numbered set of blocks or sectors, each of which stores a set amount of data (typically 512 or 1024 bytes).

In order to conveniently use this storage, it is arranged into files, which are named collections of bytes of arbitrary length. The organization of blocks/sectors into files is handled by a filesystem, which is a scheme for structuring data, along with the corresponding software to implement this scheme. Most filesystems track certain metadata about a file in addition to the file name (or "filename"), such as the date/time of creation, the date/time of last modification, the owner or original creator of the file, and the permissions applicable to the file (for example, who is permitted to read and to change the file contents).

A hierarchical filesystem introduces the concept of directories, which are special files which hold zero or more other files. The files in a directory may themselves be directories, enabling files to be nested into an arbitrary hierarchy. The master directory is called the root directory.

When graphical user interfaces were developed, the metaphor of a traditional paper-based office was introduced, and directories were called folders in this metaphor (a folder in a traditional office is a piece of card stock folded in half to group together related papers). Therefore, the terms directory and folder are synonyms.

Filenames

The rules for valid filenames vary with the filesystem, but generally, filenames may include letters, numbers, dashes, underscores, and periods. Other punctuation marks may be acceptable in some filesystems but not in others, and are therefore best avoided, especially if files may be transferred between different types of filesystems or between computers.

Spaces may be included in filenames, but may require quoting if accessed from the command-line, so that the shell does not interpret the filename as two or more separate filenames. For example, the filename "red leaf" may be interpreted as two separate filenames if written without quoting:

 ls red leaf

When quotes are added, the ambiguity is removed, and the shell will correctly interpret the filename as a single name:

 ls "red leaf"

For this reason, it is good practice to avoid using spaces in filenames (underscores are a good alternative).

Extensions

Many operating systems use an extension at the end of a filename to denote the type of data stored in the file. These extensions are delimited by a period followed by one or more characters. For example, in the filename:

 ops102_project.pdf

The extension is "pdf", denoting a file in Portable Document Format.

It is unusual to use multiple extensions on a Windows system, but not uncommon on a Linux (or other Unix-like) system.

For example, on a Linux system, the filename

 backup.tar.gz

has two extensions, "tar" indicating a archive created with the tar command, and "gz" indiciating that the file was compressed with the gzip command.

Case Sensitivity

Some filesystems are case-sensitive, and UPPER- and lower-case letters are considered to be different. For example, the filenames MILK.PDF, Milk.pdf, and milk.pdf refer to three different files. This is the case with most Linux (and Unix-like) filesystems.

Most Windows filesystems are not case-sensitive, so the filesnames MILK.PDF, Milk.pdf, and milk.pdf would refer to the same file.

Current Directory or Working Directory

Most operating systems have the concept of a "current directory" or "working directory", which allows a directory to be temporarily designated as the current working location. The working directory may be changed at any time.

Pathnames

A pathname is a filename that includes information about the directory in which the file is stored. (Sometime pathnames are simply called filenames!).

There are three types of pathnames:

Absolute Pathname

An absolute pathname starts with a slash (on Unix-based operating systems, such as Linux), or a backslash (on Windows), indicating the root directory. It contains the names of all of the directories from the root directory to the specified file, separated by slash/backslash characters.

For example, on a Linux system, the pathname

 /home/kim/ops102/presentation.pdf

indicates that the file presentation.pdf can be found by starting at the root directory, then traversing to a directory named "home" containing the directory "kim" containing the directory "ops102" containing the file ("presentation.pdf").

Similarly, the Windows pathname

 \Users\kim\ops102\presentation.pdf

indicates that the file presentation.pdf can be found by starting at the root directory, then traversing to a directory named "Users" containin the directory "kim" containing the directory "ops102" containing the file ("presentation.pdf").

Absolute pathnames can be readily identified by the fact that they start with the slash/backslash character. They are often the longest form of the pathname, but they are unambiguous.

Relative-to-Home Pathnames

On Linux (and other Unix-like operating systems), pathnames may be specified starting with the tilde ("~") character followed by a slash, which represents the current user's home directory. This is a directory assigned by the system administrator which contains all of the user's personal files. The home directory is usually (but not always) /home/username.

For example, if the current user's home directory is /home/kim, then the filename

 ~/ops102/presentation.pdf

corresponds to the absolute pathname

 /home/kim/ops102/presentation.pdf

For any file in the user's home directory, a relative-to-home pathname is generally shorter than an absolute pathanme. However, a relative-to-home pathname will have a different meaning for other users, since each user has a unique home directory.

You can also specify the user whose home directory is to be used as the starting point, by placing the a userid between the tilde and slash characters. Thus the pathanme

 ~kim/ops102/presentation.pdf

is relative to the home directory of the user "kim", regardless of which user is currently logged in, while the pathname

 ~sam/ops102/presentation.pdf

is relative to the home directory of the user "sam".

Relative Pathnames

Any pathname that does not start with a slash/backslash or a tilde character is a relative pathname, which is interpreted as starting at the current directory.

If the current directory is /home/kim/ops102, then the Linux pathname

 presentation.pdf

is interpreted as

 /home/kim/ops102/presentation.pdf

The symbol ".." means the parent directory. Assuming the same current directory as above (/home/kim/ops102), the Linux pathname

 ../Downloads/example.txt

Is interpreted as

 /home/kim/Downloads/example.txt

In the same way, the symbol "." is interpreted as referring to the current directory, so

 ./test.odt

is the same as

 test.odt

and both refer to

 /home/kim/ops102/test.odt

Likewise, if the current directory on a Windows system was \Users\kim, then the pathaname

 ops102/presentation.pdf

refers to the absolute pathname

 \Users\kim\ops102\presentatin.pdf

And the relative pathname

 ..\Downloads\example.txt

refers to the absolute pathname

 \Users\kim\Downloads\example.txt

Relative pathnames are often the shortest form of pathname, but the meaning of a relative pathname changes based on the current working directory.

Volume Designators

On Windows systems, a volume designator consisting of a letter followed by a colon may prefix a pathname. The volume may be a partition on a disk drive (HDD or SSD), a network storage location, or a multi-drive volume, where multiple partitions or disks are combined into a single storage pool.

Since the original IBM PC was designed to have up to two floppy disk drives, designated A: and B:, the main/first disk drive in a Windows system is usually designated as volume C:

Therefore, the \Windows folder on the main/first disk drive on a Windows system may be referred to as

 C:\Windows

The volume designator is case-insensitive.

Each unique volume on a Windows system has its own root directory and its own current working directory.

To switch between volumes, type the volume designator by itself:

 C:

Or

 E:

On a Linux system, instead of using drive designators, volumes are mounted into the filesystem hierarchy -- that is, volumes are attached as directories, creating a unified hierarchy with a single root directory.