dir_it

iterator to get all files in a directory

Abstract

The Standard C++ Library does not have any way to access the directory structure of a computer. This is due to the missing notion of directories at all on some C++ target platforms. However, many important platforms do have a notion of a directory but the system interface is very different between these platforms. This class provides a standard interface which is extensible to suit specific needs on the platform (when it comes to the need to access file attributes).

Synopsis

#include <boost/directory.h>

std::string dirname(...);

boost::filesystem::dir_it begin(dirname);
boost::filesystem::dir_it end;
boost::filesystem::dir_it it(begin);

it = begin
*it
++it
*it++
it == end
it != end

prop::value_type v = boost::filesystem::get<prop>(it)
boost::filesystem::set<prop>(it, value)
    

Description

The class boost::filesystem::dir_it (dir_it for short) is an input iterator which iterates over the entries in a directory. A begin iterator is constructed from a valid directory name using the platform specific notation, an end iterator is constructed using the default constructor of the class. The two function boost::filesystem::get() and boost::filesystem::set() are used to access specific properties of a file. The exact list of available properties depends on the system. Below is a list of common properties and lists of properties supported on specific systems.

Since the file properties differ between systems, an extensible interface was choosen to allow different sets of properties to be accessed. It is even possible for the user to add special properties. To define a new file property, a struct is defines which gives the name and the type to the property. Of course, it is also necessary to define the get() and/or set() functions. Details for this are given below.

Basic Functionality

The main functionality of the class dir_it is to iterate over the entries in a directory. Here is an example how the class can be used to print the files in a directory:
#include <iterator>
#include <iostream>
#include <algorithm>
#include <boost/directory.h>

int main(int ac, char *av[])
{
  if (ac == 2)
  {
    typedef boost::filesystem::dir_it        InIt;
    typedef std::ostream_iterator<std::string> OutIt;

    std::copy(InIt(av[1]), InIt(), OutIt(std::cout, "\\n"));
  }
  return 0;
}
    

Of course, it is also possible to do this loop manually: The class dir_it is just an input iterator. Note, that the post increment operator only returns a proxy object which can be used for dereferencing (using operator*()) as required by the input iterator specification. However, the proxy object cannot be used to access other file attributes than the name.

dir_it Members

Lifecycle

Default Constructor
The default constructor is used to create the "past the end" iterator. This construction never fails and the resulting iterator cannot be deferenced.
Constructor taking a std::string
A std::string naming a directory can be used to construct a "begin" iterator. If the argument does not name an accessible directory, the resulting iterator compares equal to the past the end iterator constructed with the default constructor. On most system it is no problem how this failure is indicated because even an empty directory has entries, e.g. on POSIX systems the directories "." (the directory itself) and ".." (the parent directory).
Copy Constructor
The copy constructor creates a new instance which is always positioned on the same current entry as the original dir_it instance. This means, that advancing either the original or the newly created iterator will advance both iterators. It is not possible to copy a dir_it to iterate over the same directory entries twice. To do this, two objects of type dir_it have to be constructed from the directory name.
Destructor
The destructor releases the resources associated with the dir_it. However, if the dir_it was copied, associated system resources are released when the last copy is destroyed. This is because the various copies share the same system resources.
Assignment
The assigned dir_it is always position on the same entry as the original iterator. Thus, the same restriction on the assigned iterator apply as those for iterators created with the copy constructor.

Operations

Dereference (operator*())
Dereferencing a dir_it returns the name of the current directory entry as std::string. It is only possible to derference a dir_it if it does not compare equal to the past the end iterator.
Pre Increment (operator++())
The major means to advance a dir_it is the pre increment operator. This operation moves the object to the next directory entry, if there is another entry. Otherwise, the dir_it object compares equal to the past the end iterator after the pre increment. The pre increment operator returns the object itself.
Post Increment (operator++(int))
The post increment advances the dir_it to the next entry and returns a proxy object which can be dereferenced as if it were an object of type dir_it. However, nothing else can be done with this object. This method of advancing the iterator is normally less efficient such that the pre increment operator should be used if possible.
Equals Operator (operator==())
The equals operator determines whether two objects of type dir_it are either both indicating a current directory entry, or both objects are past the end iterators. Because every directory turns into a past the end iterator once all entries in the directory have been seen, this can be used to test whether there are any more entries. However, it is not possible to determine whether a dir_it is positioned on a specific directory entry (but this can be done by comparing the results of the dereference operator).
Not Equal Operator (operator!=())
The not equal operator returns the exact negation of the equals operator. Thus, this operator returns true if one of the two iterators indicates a current directory entry while the other iterator is a past the end iterator.

File Properties

Using the functions get() and set() it is possible to access file properties. Here is an example which prints the file sizes in addition to the name:
#include <iostream>
#include <boost/directory.h>

int main(int ac, char *av[])
{
  if (ac == 2)
  {
    using namespace boost::filesystem;

    for (dir_it it(av[1]); it != dir_it(); ++it)
      std::cout << std::setw(10) << get<size>(it)
                << " " << *it << "\\n";
  }
  return 0;
}
    

Each property constists of two major components

  • A struct which gives the name to the property and which defines the type accessed using the property. The type of the property is defined using a typedef defining the type value_type in the corresponding struct. For the standard properties, the corresponding structs are defined in the namespace boost::filesystem.
  • Access functions which are just specializations of the functions boost::filesystem::get() and boost::filesystem::set(). Of course, if the property can only be read or only be written, only the corresponding access function is defined.

Example Property

The size property used in the above example might be defined as follows:
namespace boost {
  namespace filesystem {
    struct size
    {
      typedef size_t value_type;
    };

    template <>
    size::value_type get<size>(dir_it const &it)
    {
      return ... /* environment specific code */
    }
  }
}
      

The properties which are already provided by the implementation normally access some data structure internal to the dir_it objects to avoid multiple system calls.

Details

Property Selection
The file property to be accessed is selected using a template argument to the get() or set() function. The template argument is a type which defines the type value_type as a subtype. The get() and set() functions are specialized for the properties provided by the system. By specializing addtional versions of these functions, the user may extend the set of accessible properties.
Property Type
The type of a file property is determined from a typedef called value_type in the type selecting the property.
Reading a Property
To read a file property, a dir_it is passed as argument to the template function boost::filesystem::get(). The template argument prop selecting the file property to be accessed is explicitly specified. The return type returned from the get() function is prop::value_type.
Setting a Property
To set a file property, a dir_it and the new value of the property are passed to the template function boost::filesystem::set(). The template argument prop selecting the file property to be accessed is explicitly specified. The type of the second argument to the set() function is prop::value_type const &.

Standard Properties

The organization of files differ heavily between different system. As a result, the sets of file properties defined on different systems vary. The property interface is choosen such that it is obvious how specific properties are accessed except that the names and the exact types are still open. To enhance portability, some common file properties are always defined:
is_directory
A boolean read only property which can be used to determine whether a directory entry is itself a directory.
is_hidden
A boolean property indicating whether the file is "hidden". By default, hidden files are not shown to the user. However, with appropriate options, these files may be shown anyway. On some systems, there is a special flag for the files which indicates that the file is hidden. On such systems this flag is a read/write property. On other systems, e.g. on POSIX systems, files starting with a dot (".") are considered to be hidden. On such systems this flag is a read only property.
size
A read only property of type size_t returning the size in bytes of a file. Note that the size returned is not necessarily identical to the number of characters retrieved from an ifstream created for this file: In text mode, some character sequences are replaced by single characters during reading. However, the number of characters in binary mode should normally match the size of the file.
mtime
A read only property of type time_t returning the last modification time of the file. On some systems, e.g. POSIX, it is possible to write this property to set the value to an arbitrary value.

POSIX Properties

WinNT Properties

Future Directions

In computer systems there are other structures than the system's directory which can also be viewed as directories. An obvious example are archive files which store copies of directory hierarchies, like ZIP or tar files. It might be useful to extend the class dir_it to consider such structures also to be directories and somehow add support to iterate of these.

A potential approach might be the definition of a CORBA interface which is used internally by the class dir_it to determine directory entries and to figure out, whether an entry itself a directory. This way it would be possible to even extend what is considered to be a directory and have the same class iterate over very different structures.

Whether this approach is reasonable whill have to be evaluated in the future. Personally, I think this is an interesting direction and I hope that I will find time to test this in the near future.

See Also

POSIX: opendir(3), readdir(3), closedir(3), stat(2)
Standard Template Library: Input Iterator Requirements
Dietmar Kühl <dietmar.kuehl@claas-solutions.de>