import gtn
import nb_utils
nb_utils.init()
Let's start by constructing some very basic automata to get a feel for their various properties.
$\newcommand{\vx}{{\bf x}} \newcommand{\vy}{{\bf y}} \newcommand{\gA}{\mathcal{A}} \newcommand{\gT}{\mathcal{T}} \DeclareMathOperator*{\LSE}{\textrm{LSE}}$The broad class of graphs we are going to look at are finite-state automata. These include deterministic finite-state automata (DFAs) and non-deterministic finite-state automata (NFAs). More specifically we will consider a generalization of DFAs and NFAs called weighted finite-state acceptors (WFSAs). That's a mouthful, so I will just call them acceptors. We will also consider a further generalization of an acceptor called a transducer (weighted finite-state transducers or WFSTs). The following figure shows the relation between these three graphs; transducers, acceptors, and automata. Transducers are the most expressive in terms of their representational power, followed by acceptors followed by unweighted automata.
Before we dive into acceptors and transducers, let's introduce some general graph terminology that I will use throughout. In the following graph a state or node is represented by a circle. The arrows represent connections between two states. We usually refer to these as arcs but sometimes also edges. The graph is directed since the connections between states are unidirectional arrows. The arcs in a graph can have labels. In the graph below the arc between states $0$ and $1$ has a label of $a$. Similarly, the arc between states $1$ and $2$ has a label of $b$. The graph is an example of a finite-state automata (FSA) or finite-state machine (FSM), so called because it has a finite number of nodes.
An automata is deterministic if for each state and label pair there is only one outgoing transition which matches that label. An automata is nondeterministic if multiple transitions leaving a state have the same label. The graphs below show an example of a deterministic and a nondeterministic automata. In general, acceptors and transducers to can be nondeterministic.
# Define the mapping from integer ids to arc symbol labels
symbols = {0: 'a', 1: 'b', 2: 'c'}
# Make a graph
fsa = gtn.Graph()
# Add a start node
fsa.add_node(start=True)
# Add an accepting node
fsa.add_node(accept=True)
# Add an internal node
fsa.add_node()
# Add an arc from node 0 to 2 with label 0
fsa.add_arc(src_node=0, dst_node=2, label=0)
# Add an arc from node 0 to 2 with input label 1 and output label 1
fsa.add_arc(src_node=0, dst_node=2, ilabel=1, olabel=1)
# Add an arc from node 2 to 1 with input label 0, output label 0 and weight 2
fsa.add_arc(src_node=2, dst_node=1, ilabel=0, olabel=0, weight=2)
# Save the FSA as an svg
gtn.draw(fsa, "figures/simple_fsa.svg", isymbols=symbols)
The start state $s = 0$ has a bold circle around it. The accepting state $1$ is represented with concentric circles. Each arc has a label and a corresponding weight. So the first arc from state $0$ to state $1$ with the text $a/0$ means the label is $a$ and the weight is $0$. The fact that there is only a single label on each arc means this graph is an acceptor. Since it has weights, we say its a weighted acceptor. Since the number of states is finite, some would call it a weighted finite-state acceptor or WFSA. Again, that's a mouthful, so I'll just call these graphs acceptors.
An accepting path in the graph is a sequence of arcs which begin at a start state and end in an accepting state. By concatenating the labels on an accepting path, we get a string which is accepted by the graph. So the string $aa$ is accepted by the graph by following the state sequence $0 \rightarrow 2 \rightarrow 1$. The string $ba$ is also accepted by the graph by following the same state sequence but taking the arc with label $b$ when traversing from state $0$ to state $1$. The language of the acceptor is the set of all strings which are accepted by it. You may also encounter "recognized" used as a synonym for "accepted". Let the variable $\gA$ represent the above acceptor. In general, I'll use uppercase script letters to represent graphs. Let $\mathcal{L}(\gA)$ denote the language of $\gA$. In this case $\mathcal{L}(\gA) = \{aa, ba\}$.
There are different ways to compute the weight of a string accepted by the graph. The most common is to sum the weights of the arcs on the accepting path for that string. For example the string $aa$ in the above graph has a weight of $0 + 2 = 2$. Another option would be to multiply the weights. These two options correspond to interpreting the weights as either log probabilities or probabilities. We'll have more to say about this later.
fsa.add_node()
fsa.add_arc(src_node=0, dst_node=3, ilabel=0, olabel=0, weight=1)
fsa.add_arc(src_node=3, dst_node=1, ilabel=0, olabel=0, weight=3)
gtn.draw(fsa, "figures/multi_path.svg", isymbols=symbols)
The below graph accepts the same sequence by multiple paths.
The string $aa$ is accepted along the state sequence $0 \rightarrow 2 \rightarrow 1$ and along the state sequence $0 \rightarrow 3 \rightarrow 1$. In this case, to compute the score of $aa$ we need to consider both paths. Again we have a couple of options here. The most common approach is to log-sum-exp the individual path scores. Again this corresponds to interpreting the path scores as log probabilities. We'll use $\LSE(s_1, s_2)$ to denote the log-sum-exp of the two scores $s_1$ and $s_2$:
$$ \LSE(s_1, s_2) = \log \left( e^{s_1} + e^{s_2}\right). $$So the overall weight for the string $aa$ in the above graph is given by $\log \left(e^{0 + 2} + e^{1 + 3}\right) = 4.13$.
fsa = gtn.Graph()
# Graphs can have multiple start-nodes
fsa.add_node(start=True)
fsa.add_node(start=True)
fsa.add_node()
# Graphs can also have multiple accept nodes
fsa.add_node(accept=True)
fsa.add_node(accept=True)
# Start nodes can have incoming arcs
fsa.add_arc(src_node=0, dst_node=1, label=1)
fsa.add_arc(src_node=0, dst_node=2, label=0)
fsa.add_arc(src_node=1, dst_node=3, label=0)
fsa.add_arc(src_node=2, dst_node=3, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)
fsa.add_arc(src_node=2, dst_node=4, label=2)
# Accept nodes can have outgoing arcs
fsa.add_arc(src_node=3, dst_node=4, label=1)
# Set the arc weights
fsa.set_weights([1, 1, 1, 3, 1, 2, 2])
gtn.draw(fsa, "figures/multi_start_accept.svg", isymbols=symbols)
Acceptors can have multiple start states and multiple accept states. In the below graph the states $0$ and $1$ are both start states, and the states $3$ and $4$ are both accept states.
It turns out that allowing multiple start or accept states does not increase the expressive power of the graph. With $\epsilon$ transitions (which we will discuss soon), one can convert any graph with multiple start states and multiple accept states into an equivalent graph with a single start state and a single accept state.
Note also that start states can have incoming arcs (as in state $1$) and accept states can have outgoing arcs, as in state $3$.
Compute the score of the string $ab$ in figure 1.
The two state sequences which accept the string $ab$ are the states $0 \rightarrow 2 \rightarrow 3$ and $1 \rightarrow 3 \rightarrow 4$. The overall score is given by $\log (e^{1 + 3} + e^{1 + 2}) = 4.31$.
fsa = gtn.Graph()
fsa.add_node(start=True)
fsa.add_node()
fsa.add_node(accept=True)
# Self loops are allowed
fsa.add_arc(src_node=0, dst_node=0, label=0);
fsa.add_arc(src_node=0, dst_node=1, label=1);
fsa.add_arc(src_node=0, dst_node=1, label=2);
fsa.add_arc(src_node=1, dst_node=2, label=1);
# Cycles are also allowed
fsa.add_arc(src_node=2, dst_node=0, label=1);
gtn.draw(fsa, "figures/fsa_loops.svg", isymbols=symbols)
Graphs can also have self-loops and cycles. For example, the below graphs has a self-loop on the state $0$ and a cycle following the state sequence $0 \rightarrow 1 \rightarrow 2 \rightarrow 0$.
The language of a graph with cycles and self-loops contains infinitely many strings. For example, the language of the graph above includes any string that starts with zero or more $a$s and ends in $bb$. As a regular expression we write this as $a^*bb$ where the $^*$ denotes zero or more $a$s.
# We also allow ϵ transitions in acceptors
fsa = gtn.Graph()
fsa.add_node(start=True)
fsa.add_node()
fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=1, label=0)
fsa.add_arc(src_node=0, dst_node=1, label=gtn.epsilon)
fsa.add_arc(src_node=1, dst_node=2, label=1)
gtn.draw(fsa, "figures/fsa_epsilon.svg", isymbols=symbols)
The $\epsilon$ symbols has a special meaning when it is the label on an arc. Any arc with an $\epsilon$ label can be traversed without consuming an input token in the string. So the below graph accepts the string $ab$, but it also accepts the string $b$ because we can traverse from state $0$ to state $1$ without consuming an input.
As it turns out, any graph with $\epsilon$-transitions can be converted to an equivalent graph without $\epsilon$ transitions. However, this usually comes at a large cost in the size of the graph. Complex languages can be represented by much more compact graphs with the use of $\epsilon$-transitions.
Convert the graph above which has multiple start and accept states to an equivalent graph with only a single start and accept state using $\epsilon$ transitions.
fsa = gtn.Graph()
# Recreate the old graph without any start or accept states
fsa.add_node()
fsa.add_node()
fsa.add_node()
fsa.add_node()
fsa.add_node()
fsa.add_arc(src_node=0, dst_node=1, label=1)
fsa.add_arc(src_node=0, dst_node=2, label=0)
fsa.add_arc(src_node=1, dst_node=3, label=0)
fsa.add_arc(src_node=2, dst_node=3, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)
fsa.add_arc(src_node=2, dst_node=4, label=2)
fsa.add_arc(src_node=3, dst_node=4, label=1)
fsa.set_weights([1, 1, 1, 3, 1, 2, 2, 0, 0])
# The new start and accept states
start = fsa.add_node(start=True)
accept = fsa.add_node(accept=True)
# Connect the new start state to the old start states with
# ϵ transitions with a weight of 0
fsa.add_arc(src_node=start, dst_node=0, label=gtn.epsilon)
fsa.add_arc(src_node=start, dst_node=1, label=gtn.epsilon)
fsa.add_arc(src_node=3, dst_node=accept, label=gtn.epsilon)
fsa.add_arc(src_node=4, dst_node=accept, label=gtn.epsilon)
gtn.draw(fsa, "figures/epsilon_start_accept.svg", isymbols=symbols)
The graph below is the equivalent graph with a single start state and a single accept state.
The construction works by creating a new start state and connecting it to the old start states with $\epsilon$ transitions with a weight of $0$. The old start nodes are regular internal nodes in this new graph. Similarly the old accept states are now regular states and they connect to the new accept state with $\epsilon$ transitions with a weight of $0$.
A transducer maps input strings to output strings. Transducers are a generalization of acceptors. Every acceptor is a transducer, but not every transducer is an acceptor. Let's look at a few example transducers to understand how they work.
# Define the mappings from integer ids to arc input and output labels
isymbols = {0: 'a', 1: 'b', 2: 'c'}
osymbols = {0: 'x', 1: 'y', 2: 'z'}
fst = gtn.Graph()
fst.add_node(start=True)
fst.add_node()
fst.add_node(accept=True)
# Adding an arc with just an input label, the output label defaults to have
# the same value as the input label
fst.add_arc(src_node=0, dst_node=1, label=0)
# Add an arc from node 0 to 2 with the same input and output label index of 1
fst.add_arc(src_node=0, dst_node=1, ilabel=1, olabel=1, weight=2)
# However, we can add an arc with a different input and output label.
fst.add_arc(src_node=1, dst_node=2, ilabel=1, olabel=2, weight=3)
gtn.draw(fst, "figures/simple_fst.svg", isymbols=isymbols, osymbols=osymbols)
The arc labels distinguish an acceptor from a transducer. A transducer has both an input and output arc label. The arc labels are of the form $a\!:\!x/0$ where $a$ is the input label $x$ is the output label and $0$ is the weight. An acceptor can be represented as a transducer where the input and output labels on every arc are identical.
Instead of saying that a transducer accepts a given string, we say that it transduces one string to another. The above graph transduces the string $ab$ to the string $xz$ and the string $bb$ to the string $yz$. The weight of a transduced pair is computed in the same way as in an acceptor. The scores of the individual arcs on the path are summed. The path scores are combined with log-sum-exp. So the weight of the transduced pair $(ab, xz)$ in the above graph is $0+3 = 3$.
We have to generalize concept of the language from an acceptor to a transducer. I'll call this generalization the transduced set. Since it will always be clear from context if the graph is an acceptor or transducer, I'll use the same symbol $\mathcal{L}$ to represent the transduced set. If $\gT$ is a transducer, then $\mathcal{L}(\gT)$ is the set of pairs of strings transduced by $\gT$. More formally, a pair of strings $(\vx, \vy) \in \mathcal{L}(\gT)$ if $\gT$ transduces $\vx$ to $\vy$.
Compute the score of the transduced pair $(aab, zyy)$ in the graph below.
fst = gtn.Graph()
fst.add_node(start=True)
fst.add_node()
fst.add_node()
fst.add_node(accept=True)
fst.add_arc(src_node=0, dst_node=0, ilabel=0, olabel=2, weight=1)
fst.add_arc(src_node=0, dst_node=1, ilabel=0, olabel=2, weight=2)
fst.add_arc(src_node=1, dst_node=3, ilabel=0, olabel=1, weight=3)
fst.add_arc(src_node=3, dst_node=3, ilabel=1, olabel=1, weight=1)
fst.add_arc(src_node=0, dst_node=2, ilabel=0, olabel=1, weight=2)
fst.add_arc(src_node=2, dst_node=3, ilabel=1, olabel=1, weight=3)
gtn.draw(fst, "figures/fst_example_score.svg", isymbols=isymbols, osymbols=osymbols)
The two paths which transduce $aab$ to $zyy$ are following the state sequence $0 \rightarrow 1 \rightarrow 3 \rightarrow 3$ and $0 \rightarrow 0 \rightarrow 2 \rightarrow 3$. The score of the first path is $6$ and the score of the second path is $6$. So the overall score is $\log \left(e^6 + e^6\right) = 6.69$
fst = gtn.Graph()
fst.add_node(start=True)
fst.add_node()
fst.add_node()
fst.add_node(accept=True)
# The input label on an arc can be an ϵ
fst.add_arc(src_node=0, dst_node=1, ilabel=gtn.epsilon, olabel=0)
# The output label on an arc can be an ϵ
fst.add_arc(src_node=1, dst_node=2, ilabel=1, olabel=gtn.epsilon)
# And both an input label and the output label can be ϵ
fst.add_arc(src_node=2, dst_node=3, ilabel=gtn.epsilon, olabel=gtn.epsilon)
gtn.draw(fst, "figures/fst_epsilon.svg", isymbols=isymbols, osymbols=osymbols)
Transducers can also have $\epsilon$ transitions. The $\epsilon$ can be either the input label on an arc, the output label on an arc, or both. When the $\epsilon$ is the input label on an arc, it means we can traverse that arc without consuming an input token, but we still output the arc's corresponding output label. When the $\epsilon$ is the output label, the opposite is true. The input is consumed but no output is produced. And when the $\epsilon$ is both the input and the output label, the arc can be traversed without consuming an input or producing an output.
In the above graph, the string $b$ gets transduced to the string $x$. On the first arc between states $0$ and $1$, we output an $x$ without consuming any token. On the second arc between states $1$ and $2$, a $b$ is consumed without outputting any new token. Finally, on the arc between states $2$ and $3$ we neither consume nor output a token.