vinegar.data_source.text_file
Data source backed by a text file.
This data source is designed to work with any text file, where there is a line for each system. The exact format of the file can be configured through the use of regular expressions.
This data source supports the find_sytem method, which makes it
perfect for being used as the root source that defines the list of existing
systems.
Specifying the file format
This section only describes the options related to the file format. For a full list of supported options, please refer to Configuration options. For an example configuration, please refer to Configuration example.
The centerpiece of the file format configuration is a regular expression that
defines the format of a single line in the file. This regular expression is
specified through the regular_expression option. This regular expression
must match the full line (the pattern is matched using fullmatch).
Consequently, there is no need to use start of string or end of string anchors.
This regular expression has to define groups that represent the various pieces
of data. Both regular groups (identified by their index) and named groups
((?P<...> syntax) can be used.
Often, it is desirable to ignore certain lines (e.g. empty lines or lines
representing comments). This can be achieved through the
regular_expression_ignore option. If a line matches that expression, it is
ignored entirely, without even a warning message being logged. Again, the
regular expression has to match the full line.
For each line, the system ID has to be extracted and at least one associated piece of data has to be extract. This both works through the same mechanism: A configuration that refers to one of the groups defined in the regular expression matching the line.
The configuration for extracting the system ID is specified through the
system_id configuration option. The configuration for extracting pieces of
data is specified through the variables option.
There are two differences between the two: First, the variables option is
actually a dict where each key is the name of the corresponding key that is
included in the data tree and the value is the configuration for extract that
piece of data. Second, the configuration for the system ID must never result in
a value of None being extracted.
The keys in the dict of the variables option can define a hierarchy.
That hierarchy is specified by using the colon (:) in keys. Each key is
split at these colons and the components are used as keys into nested instances
of dict.
Each of the configurations for extracting a piece of data is itself a dict
that has the following keys:
source(mandatory):The name (as a
str) or index (as anint) of the group in the regular expression that provides this piece of data.transform(optional):A list defining the transformations that shall be applied to the string extracted through the regular expression. This list is passed to
vinegar.transform.get_transformation_chain. If this list is empty (the default), no transformations are applied and the string extracted by the regular expression is used as is.transform_none_value(optional):A
booldefining whether a value ofNoneshould still be transformed. As most transformation functions do not supportNonevalues, the default isFalse. If setting this option toTrueone has to ensure that only transformation functions that can handle a value ofNoneare used. The value extracted from a line can beNoneif the corresponding capturing group in the regular expression is optional.use_none_value(optional):A
booldefining whether a value ofNone(possibly as a result of the transformations) should still result in the corresponding key being added to the data tree. Usually, there is no sense in adding a key without a value, so this option has a default value ofFalse. Please note that this option does not have any effects when being specified in the configuration for thesystem_id. The system ID is mandatory and thus a system ID ofNoneis treated as an error. This should be avoided by ensuring that the group capturing the system ID is non-optional.
It might be that a file contains some lines that do not match the expected
format (as specified by regular_expression), but are not lines that shall
be ignored (as specified by regular_expression_ignore) either. The
mismatch_action option defines how to deal with those lines. By default, a
warning is logged when such a line is encountered. This can be changed to
raising an exception by setting mismatch_action to error. Such lines
can also be ignored completely (without logging a warning), by setting
mismatch_action to ignore.
If there is more than one line specifying the same system ID, the behavior is
controlled by the duplicate_system_id_action option. By default, a warning
is logged and only the first line for the system ID is used (option value
warn_ignore). This can be changed to raising an exception by setting the
option to error. If the option is set to ignore, only the first line is
used, but no warning message is logged.
Configuration example
In order to get a better understanding of how the various configuration options work together, let us discuss the following example (in this example, we use YAML for describing the configuration):
# The cache is enabled by default, so we only specify it here for
# completeness.
cache_enabled: True
# The warn action is already the default, we only specify it here for
# completeness.
duplicate_system_id_action: warn
# This is the path to the text file.
file: /path/to/file.txt
# Enabling find_first_match has the effect that if multiple systems use the
# same value for a key (as defined in the variable dict), the first of the
# systems (the first line) is returned by the find_system method when
# looking for that specific key-value combination. If this option is not
# enabled, no system is returned if there is no unique match.
find_first_match: True
# The warn action is already the default, we only specify it here for
# completeness.
mismatch_action: warn
# This is the regular expression that matches the lines that we want to
# use. We specify the X flag first (?x) so that we can use the multi-line
# syntax, which makes the regular expression much more readable.
regular_expression: |
(?x)
# We expect a CSV file with three columns that are separated by
# semicolons.
# The first column specifies the MAC address.
(?P<mac>[0-9A-Fa-f]{2}(?::[0-9A-Fa-f]{2}){5});
# The second column specifies the IP address.
(?P<ip>[0-9]{1,3}(?:\.[0-9]{1,3}){3});
# The third column specifies the hostname and an optional list
# of additional names.
(?P<hostname>[^,]+)
(,(?P<extra_names>.+))?
# We want to ignore empty lines and lines starting with a "#".
regular_expression_ignore: "|(?:#.*)"
# We build the system ID from the hostname by adding a domain name and
# ensuring that everything is in lower case.
system_id:
source: hostname:
transform:
- string.add_suffix: .mydomain.example.com
- string.to_lower
# We define a couple of variables that will be available in the data tree
# for each system.
variables:
'info:extra_names':
source: extra_names
transform:
- string.to_lower
# Please not that we could also write this shorter as
# "- string.split: ." because "sep" is the first argument
# (after the value) and "maxsplit" defaults to -1.
- string.split:
sep: .
maxsplit: -1
'net:fqdn':
source: hostname
transform:
- string.add_suffix: .mydomain.example.com
- string.to_lower
'net:hostname':
source: hostname
transform:
- string.to_lower
'net:ipv4_addr':
source: ip
transform:
- ipv4_address.normalize
'net:mac_addr':
source: mac
transform:
# The colon is the default delimiter, so we could also simply
# write "- mac_address.normalize" without specifying any
# options.
- mac_address.normalize:
delimiter: colon
Now, let us assume we have the following text file:
02:00:00:00:00:01;192.168.0.1;System1
02:00:00:00:00:02;192.168.0.2;system2,alias1,Alias2
02:00:00:00:00:0a;192.168.000.3;system3
02:00:00:00:00:0A;192.168.0.4;system4
Parsing this file with the configuration specified earlier, would result in the
following data for the systems (we list the data in YAML format and use the
system IDs as the keys in the top dict):
system1.mydomain.example.com:
net:
fqdn: system1.mydomain.example.com:
hostname: system1
ipv4_addr: 192.168.0.1
mac_addr: '02:00:00:00:00:01'
system2.mydomain.example.com:
info:
extra_names:
- alias1
- alias2
net:
fqdn: system2.mydomain.example.com:
hostname: system2
ipv4_addr: 192.168.0.2
mac_addr: '02:00:00:00:00:02'
system3.mydomain.example.com:
net:
fqdn: system3.mydomain.example.com:
hostname: system3
ipv4_addr: 192.168.0.3
mac_addr: '02:00:00:00:00:0A'
system4.mydomain.example.com:
net:
fqdn: system4.mydomain.example.com:
hostname: system4
ipv4_addr: 192.168.0.4
mac_addr: '02:00:00:00:00:0A'
Thanks to the transformations, all names have been converted to lower case and IP and MAC addresses have been normalized.
With this data, it is possible to look up systems through
find_system. For example
find_system('net:mac_addr', '02:00:00:00:00:0A'), will return
system3.mydomain.example.com. This works because the look-up is done on the
final (transformed) data and the find_first_match configuration option has
been enabled. If it had not been enabled, the result would be None because
system4.mydomain.example.com has the same MAC address.
Configuration options
This data source has several configuration options that can be used to control its behavior. This section only gives an overview of the available options. For a more detailed discussion about the options controlling the file format, please refer to Specifying the file format and Configuration example.
file(mandatory):Path to the text file (as a
str).regular_expression(mandatory):Regular expression (as a
str) matching the data lines in the file. This regular expression must match the full line (the pattern is matched usingfullmatch). Consequently, there is no need to use start of string or end of string anchors. The regular expression must define catching groups that can then be referenced from thesystem_idandvariablesconfiguration. See Specifying the file format for details.system_id(mandatory):Configuration describing how the system ID is extracted from a line. This configuration refers to a catching group of
regular_expressionthrough itssourceoption. See Specifying the file format for details.variables(mandatory):Configuration describing how the various data itmes are extracted from a line. This configuration option expects a
dictwhere each key-value pair refers to one data item, using the key as the key in the data tree generated for the system and the value as the configuration for that data item. See Specifying the file format for details.cache_enabled(optional):If
True(the default), the contents of the text file are read once and cached until the file changes. File changes are detected through the time-stamp of the file. IfFalsethe file is read and parsed every timefind_system()orget_data()is called.duplicate_system_id_action(optional):If
warn(the default), a warning message is logged when a line specifying the same system ID as an earlier line is encountered and the second line is ignored. IferroraValueErroris raised instead. Ifignorethe second line is ignored without logging a warning.find_first_match(optional):If
Trueand there are multiple matches in a call tofind_system(), the first system ID (this is the ID of the first system in the file that matches the specified query) is returned. IfFalse(the default), no system ID is returned if there are multiple matches, so a system ID is only returned if there is only one system matching the query.mismatch_action(optional):Controls the behavior when a line that matches neither
regular_expressionnorregular_expression_ignoreis encountered. Ifwarn(the default), a warning message is logged. IferroraValueErroris raised instead. If``ignore`` the line is ignored without logging a warning.regular_expression_ignore(optional):Regular expression (as a
str) matching the lines in the file that shall be ignored. This regular expression must match the full line (the pattern is matched usingfullmatch). Consequently, there is no need to use start of string or end of string anchors. IfNone(the default), no lines are ignored.
- class vinegar.data_source.text_file.TextFileSource(config: Mapping[Any, Any])
Data source that reads data from a text file.
For information about the configuration options supported by this data source, please refer to the
module documentation.- find_system(lookup_key: str, lookup_value: Any) str | None
Find a system given the specified key and value.
If no system can be found, the data source returns
None.- Parameters:
lookup_key – key for which to look. The interpretation of the key is up to the data source. Some data sources might use a flat structure, while others might support hierarchical data-structures. In the latter case, the use of the colon (:) as a hierarchy separator in the key is encouraged, but not required.
lookup_value – value for which to look. The interpreation of the value is up to the data source.
- Returns:
system identifier or
Noneif no system could be identified using the specified key and value.
- get_data(system_id: str, preceding_data: Mapping[Any, Any], preceding_data_version: str) Tuple[Mapping[Any, Any], str]
Return data associated with the specified system.
If the data source does not have any information associated with the specified system ID, it should return an empty dictionary.
The return value of this method is in fact a tuple of the configuration data and a version string. The version string can be used by the calling code to decide whether the data has changed and thus caches have to be discarded. For example, the results of rendering a template might be cached and the cached version might be used as long as the version string returned by this method does not change. This means that implementations have to be careful to never return the same version string when the data for a system has changed. The
vinegar.utils.versionprovides utility functions for generating version strings in a way that makes accidental collisions unlikely.Please note that it is not the job of a data source to merge the
preceding_datawith the data provided by itself. The calling code takes care of this. Code wanting to use multiple data sources in a chain can use theget_composite_data_sourcefunction.Implementations are encouraged to use caching to improve performance when this method is repeatedly called for the same systems.
- Parameters:
system_id – ID of the system for which data is requested.
preceding_data – Data provided by the data source(s) that come earlier in the chain. This may be empty if there are no preceding data sources or if they did not provide any data for the system.
preceding_data_version – Version of the
preceding_data. This is an arbitrary string (typically a hash) that can be used to detect when the data provided by the preceding sources has changed.
- Returns:
tuple where the first element is the data associated with the specified system and the second element is a version string that changes whenver the returned data changes (for the same system).
- vinegar.data_source.text_file.get_instance(config: Mapping[Any, Any]) TextFileSource
Create a text file data source.
For information about the configuration options supported by that source, please refer to the
module documentation.- Parameters:
config – configuration for the data source.
- Returns:
text file data source using the specified configuration.