The spaceland package

The spaceland package — named after the three-dimensional world in Edwin Abbot’s book Flatland: A Romance of Many Dimensions — contains everything required to read ESRI shapefiles. It’s broken down into several core modules:

The spaceland.shp module

Read non-topological geometric records from the ESRI Shapefile format.

The Shapefile format was documented by ESRI in 1998 and is available in a document titled ESRI Shapefile Technical Description.

class spaceland.shp.Shapefile(shp: typing.IO[bytes]) → None

Read records from an ESRI shapefile.

A shapefile is a binary format created by ESRI in the early 1990s for storing non-topographical geometries. After a short header containing file metadata the geometries are stored in a sequence of individual records. The format is compact and fast to read but because it can’t contain indexes, details of the projection used, or metadata on individual shapes, it’s commonly accompanied by other files (e.g. a dBase III database for geometry metadata).

Class objects allow for iteration and can be used as context managers.

get_parse_function()

Return a function capable of parsing a particular type of shape.

The function returned will be suitable for parsing shapefile records of one type (e.g. two-dimensional points). The type is defined in the header of the shapefile, and so the returned function will handle all non-null records within a single shapefile.

Return type:Callable[[bytes], tuple]
records()

Yield all geometric records in the shapefile, one-by-one.

Records are returned in file order. Records are returned as a tuple, with the structure of the tuple dependent on the shape type. The structure of each shape type’s tuple is detailed in the shape parsing functions:

The appropriate parsing function for a file can be found using Shapefile.get_parse_function().

Return type:Iterable[tuple]
class spaceland.shp.ShapefileMeta(shape_type, x_min, y_min, x_max, y_max, z_min, z_max, m_min, m_max)
m_max

Alias for field number 8

m_min

Alias for field number 7

shape_type

Alias for field number 0

x_max

Alias for field number 3

x_min

Alias for field number 1

y_max

Alias for field number 4

y_min

Alias for field number 2

z_max

Alias for field number 6

z_min

Alias for field number 5

spaceland.shp.parse_null_record(content)

Parse a null shape record from a shapefile.

A null shape is an empty record with no geometric data. It can be used as a shape type for a shapefile but it’s also valid as a placeholder in a shapefile of any other type. That is, a shapefile of polygons can also incude null shape records. This is the only valid way a shapefile can contain multiple shape types.

Parameters:content (bytes) – An empty byte string
Return type:tuple
Returns:An empty tuple.
spaceland.shp.parse_point_record(content)

Parse a point shape record from a shapefile.

A point consists of a pair of double-precision coordinates ordered x, y.

Parameters:content (bytes) – 16 bytes containing two 64-bit IEEE double-precision floating-point numbers, in little-endian byte order.
Return type:tuple
Returns:An tuple containing a point in x, y order.

The spaceland.dbf module

Reads the subset of the dBase III file format used by ESRI shapefiles.

The dBase III format was never specified publicly but it has been reverse-engineered. The best documentation on the subject can be found at http://www.clicketyclick.dk/databases/xbase/format/dbf.html.

class spaceland.dbf.DbaseFile(dbf: typing.IO[bytes], encoding: str = 'ascii') → None

Read fields and records from a dBase III binary file.

A dBase III file is a simple tabular data format consisting of a header, fields (columns), and records (rows). Fields are typed; as used in the ESRI shapefile format, the records in a dBase III file must have one of five field types: string, float, integer, date, or boolean. All types allow null values.

Class objects allow for iteration and slicing, and they also work as context managers.

record(index)

Return the record at the given index.

Parameters:index (int) – The position of the record relative to the beginning of the file.
Return type:tuple
Returns:A namedtuple, each item matching one field in the record.
records(start=0)

Yield the records in the file.

A record is a set of fields and their values. The field names, types, and order are consistent across all records in the file.

It’s possible that a field has an invalid value (e.g. a non-numeric value in an integer field). When this happens the value becomes None and no error is raised.

Parameters:start (int) – The record from which to start iteration. By default starts with the first record in the file.
Yields:A namedtuple, each item matching one field in the record. Item names and order are consistent across records within the same file, but will differ between files.
Return type:Iterable[tuple]
spaceland.dbf.get_parse_str(encoding)

Return a function that decodes bytes to strings.

The returned function decodes the bytes using the character encoding passed to this function.

>>> utf8 = get_parse_str("UTF-8")
>>> utf8(b'\xf0\x9f\x91\x8d')
'👍'
Parameters:encoding (str) – The name of a character encoding that can be used to decode the bytes to a string.
Return type:Callable[[bytes], str]
Returns:A function that uses the given character encoding to convert bytes to strings.
spaceland.dbf.parse_bool(value)

Convert bytes to a boolean value.

Parameters:value (bytes) – A bytes value to be converted to a boolean value.
Return type:Optional[bool]
Returns:True if the bytes value is Y, y, T, or t; False if the bytes value is N, n, F, or f; None otherwise.
spaceland.dbf.parse_date(value)

Convert bytes in the format YYYYMMDD to a datetime.date object.

Parameters:value (bytes) – A bytes value to be converted to a date.
Return type:Optional[date]
Returns:A datetime.date object if the bytes value is a valid date, but None otherwise.
spaceland.dbf.parse_float(value)

Convert bytes to a float.

Parameters:value (bytes) – A bytes value to be converted to a float.
Return type:Optional[float]
Returns:A float if the bytes value is a valid numeric value, but None otherwise.
spaceland.dbf.parse_int(value)

Convert bytes to an integer.

Parameters:value (bytes) – A bytes value to be converted to an integer.
Return type:Optional[int]
Returns:An integer if the bytes value is a valid numeric value, but None otherwise.

The spaceland.cli module

Command-line interface to the library’s functionality.

This module provides the following functions that are registered as ‘console script’ entry points in setup.py:

  • dbf_to_csv(): convert dBase III files to CSVs (as command dbfr)

When the package is installed via setuptools (e.g. using pip install) the commands are immediately available to the user.

spaceland.cli.dbf_to_csv()

Read a dBase III file and convert it to a CSV.

Used as a ‘console script’ entry point in setup.py and available on the command-line as dbfr. The dBase III file named as an argument is parsed and converted to CSV, and output to stdout. The CSV dialect used can be configured using command-line options, as can the character-encoding used when reading the dBase file.

Return type:None
spaceland.cli.extant_file(arg)

Type-check an argument to ensure it names an existing file.

Return type:Path
spaceland.cli.single_char(arg)

Type-check an argument to ensure it’s a string of length one.

Return type:str
spaceland.cli.valid_codec(arg)

Type-check an argument to ensure it names an known codec.

Return type:str