Configuration Reference
datafaker is configured using a YAML file, which is passed to several commands with the --config-file option.
Throughout the docs, we will refer to this file as config.yaml but it can be called anything (the exception being that there will be a naming conflict if you have a vocabulary table called config).
You can generate an example configuration file, based on your source database and filled with only default values (therefore you can safely delete any parts of the generated configuration file you don’t need) like this:
datafaker generate-config
Below, we see the schema for the configuration file. Note that our config file format includes a section of SmartNoise SQL metadata, which is explained more fully here.
datafaker Config
Type: objectA datafaker configuration YAML file
No Additional PropertiesRun source-statistics queries using asyncpg.
The name of a local Python module of row generators (excluding .py).
The name of a local Python module of story generators (excluding .py).
Objects that need to be instantiated from the row and story generators modules.
An array of source statistics queries.
No Additional ItemsEach item of this array must be:
No Additional Properties
A name for the query, which will be used in the stats file.
Comments to be copied into the src-stats.yaml file describing the query results.
No Additional ItemsEach item of this array must be:
A SQL query.
A SmartNoise SQL query.
The differential privacy epsilon value for the DP query.
The differential privacy delta value for the DP query.
See https://docs.smartnoise.org/sql/metadata.html#yaml-format.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:^(?!(max_ids|row_privacy|sample_max_ids|censor_dims|clamp_counts|clamp_columns|use_dpsu)).*$
Type: object
No Additional Properties
An array of story generators.
No Additional ItemsEach item of this array must be:
No Additional Properties
The full name of a story generator (e.g. mystorygenerators.short_story).
Positional arguments to pass to the story generator.
No Additional ItemsKeyword arguments to pass to the story generator.
The number of times to call the story generator per pass.
The maximum number of tries to respect a uniqueness constraint.
Table configurations.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Type: object
A table configuration.
No Additional PropertiesWhether to completely ignore this table.
Whether to export the table data.
Whether the table is a Primary Private table (perhaps a table of patients).
The number of rows to generate per pass.
An array of row generators to create column values.
No Additional ItemsEach item of this array must be:
The name of a (built-in or custom) function (e.g. max or myrowgenerators.my_gen).
Positional arguments to pass to the function.
No Additional ItemsKeyword arguments to pass to the function.
One or more columns to assign the return value to.
No Additional ItemsEach item of this array must be:
Function to generate a set of nullable columns that should not be null
No Additional ItemsEach item of this array must be:
The name of a (built-in or custom) function (e.g. column_presence.sampled).
Keyword arguments to pass to the function.
Column names that might be returned.
No Additional ItemsEach item of this array must be:
Column that provides a name for each row in the table. Used to make foreign keys to this table more readable in the src-stats.yaml file.