Configuration Reference

datafaker is configured using a YAML file, which is passed to several commands with the --config-file option. Throughout the docs, we will refer to this file as config.yaml but it can be called anything (the exception being that there will be a naming conflict if you have a vocabulary table called config).

You can generate an example configuration file, based on your source database and filled with only default values (therefore you can safely delete any parts of the generated configuration file you don’t need) like this:

datafaker generate-config

Below, we see the schema for the configuration file. Note that our config file format includes a section of SmartNoise SQL metadata, which is explained more fully here.

datafaker Config

datafaker Config

Type: object

A datafaker configuration YAML file

No Additional Properties

Type: boolean

Run source-statistics queries using asyncpg.

Type: string

The name of a local Python module of row generators (excluding .py).

Type: string

The name of a local Python module of story generators (excluding .py).

Type: object

Objects that need to be instantiated from the row and story generators modules.

Type: array

An array of source statistics queries.

No Additional Items

Each item of this array must be:

Type: object
No Additional Properties

Type: string

A name for the query, which will be used in the stats file.

Type: array of string

Comments to be copied into the src-stats.yaml file describing the query results.

No Additional Items

Each item of this array must be:

Type: string

A SQL query.

Type: string

A SmartNoise SQL query.

Type: number

The differential privacy epsilon value for the DP query.

Type: number

The differential privacy delta value for the DP query.

Type: object

See https://docs.smartnoise.org/sql/metadata.html#yaml-format.

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: ^(?!(max_ids|row_privacy|sample_max_ids|censor_dims|clamp_counts|clamp_columns|use_dpsu)).*$
Type: object
No Additional Properties

Type: array of object

An array of story generators.

No Additional Items

Each item of this array must be:

Type: object
No Additional Properties

Type: string

The full name of a story generator (e.g. mystorygenerators.short_story).

Type: array

Positional arguments to pass to the story generator.

No Additional Items

Type: object

Keyword arguments to pass to the story generator.

Type: integer

The number of times to call the story generator per pass.

Type: integer

The maximum number of tries to respect a uniqueness constraint.

Type: object

Table configurations.

All properties whose name matches the following regular expression must respect the following conditions

Property name regular expression: .*
Type: object

A table configuration.

No Additional Properties

Type: boolean

Whether to completely ignore this table.

Type: boolean

Whether to export the table data.

Type: boolean

Whether the table is a Primary Private table (perhaps a table of patients).

Type: integer

The number of rows to generate per pass.

Type: array of object

An array of row generators to create column values.

No Additional Items

Each item of this array must be:

Type: object

Type: string

The name of a (built-in or custom) function (e.g. max or myrowgenerators.my_gen).

Type: array

Positional arguments to pass to the function.

No Additional Items

Type: object

Keyword arguments to pass to the function.

Type: array of string or string

One or more columns to assign the return value to.

No Additional Items

Each item of this array must be:

Type: array of object

Function to generate a set of nullable columns that should not be null

No Additional Items

Each item of this array must be:

Type: object

Type: string

The name of a (built-in or custom) function (e.g. column_presence.sampled).

Type: object

Keyword arguments to pass to the function.

Type: array of string

Column names that might be returned.

No Additional Items

Each item of this array must be:

Type: string

Column that provides a name for each row in the table. Used to make foreign keys to this table more readable in the src-stats.yaml file.