In [1]:

```
import gtn
import nb_utils
nb_utils.init()
```

$\newcommand{\vx}{{\bf x}} \newcommand{\vy}{{\bf y}} \newcommand{\gA}{\mathcal{A}} \newcommand{\gT}{\mathcal{T}} \newcommand{\gU}{\mathcal{U}} \DeclareMathOperator*{\LSE}{\textrm{LSE}}$An operation on a transducer (or acceptor) takes one or more transducers as input and outputs a transducer. You can think of these operations as functions on graphs. As a reminder, I'll use uppercase script letters to represent graphs, so $\gA$ for example can represent a graph. Functions will be denoted by lower case variables. So $f(\gA)$ is a function which takes as input a single graph and outputs a graph.

The closure, sometimes called the Kleene star, is a unary function (takes a single input) which can operate on either an acceptor or transducer. If the sequence $\vx$ is accepted by $\gA$, then zero or more copies of $\vx$ are accepted by the closure of $\gA$. More formally, if the language of an acceptor is $\mathcal{L}(\gA)$, then the language of the closure of $\gA$ is $\{\vx^n \mid \vx \in \mathcal{L}(\gA),\;\; n = 0, 1, \ldots, \}$. The notation $\vx^n$ means $\vx$ concatenated $n$ times. So $\vx^2$ is $\vx \vx$ and $\vx^0$ is the empty string. Usually the closure of an acceptor is denoted by $^*$, as in $\gA^*$. This is the same notation used in regular expressions.

In [2]:

```
# Define the mapping from integer ids to arc symbol labels:
symbols = {0: 'a', 1: 'b', 2: 'c'}
fsa = gtn.Graph()
fsa.add_node(start=True)
fsa.add_node()
fsa.add_node()
fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=1, label=0)
fsa.add_arc(src_node=1, dst_node=2, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)
gtn.draw(fsa, "figures/fsa_pre_closure.svg", isymbols=symbols)
```

The closure of a graph is easy to construct with the use of $\epsilon$ transitions. The language of the graph below is the string $aba$.

The closure of the graph needs to accept an arbitrary number of copies of $aba$ including the empty string. To accept the empty string we make the start state an accept state as well. To accept one or more copies of $aba$ we simply wire up the old accept states to the new start state with $\epsilon$ transitions.

In [3]:

```
# Recreate the old graph but without any start states:
fsa = gtn.Graph()
old_start = fsa.add_node()
fsa.add_node()
fsa.add_node()
accept = fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=1, label=0)
fsa.add_arc(src_node=1, dst_node=2, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)
# Add a new start state which is also an accept state
start = fsa.add_node(start=True, accept=True)
# Connect the new start state to the old start states with an ϵ transition:
fsa.add_arc(src_node=start, dst_node=old_start, label=gtn.epsilon)
# Connect the accept states to the new start state with an ϵ transition:
fsa.add_arc(src_node=accept, dst_node=start, label=gtn.epsilon)
gtn.draw(fsa, "figures/fsa_closure.svg", isymbols=symbols)
```

The closure of the graph is shown below.

You might notice that state $4$ in the above graph is not necessary. Consider an alternate construction for computing the closure of a graph. We could have made the state $0$ into an accept state and connected state $3$ to state $0$ with an $\epsilon$ transition, as in the graph below.

In [4]:

```
fsa = gtn.Graph()
# Make the start state also an accept state:
start = fsa.add_node(start=True, accept=True)
fsa.add_node()
fsa.add_node()
accept = fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=1, label=0)
fsa.add_arc(src_node=1, dst_node=2, label=1)
fsa.add_arc(src_node=2, dst_node=3, label=0)
# Connect the accept states to the start state with an ϵ transition:
fsa.add_arc(src_node=accept, dst_node=start, label=gtn.epsilon)
gtn.draw(fsa, "figures/fsa_closure_2.svg", isymbols=symbols)
```

For the graph above, this alternate construction works and requires fewer states and arcs. In the general case, this construction turns every start state into an accept state instead of adding a new start state. Give an example where this doesn't work? In other words, give an example where the graph from this modified construction is not the correct closure of the original graph.

In [5]:

```
fsa = gtn.Graph()
fsa.add_node(start=True)
fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=0, label=0)
fsa.add_arc(src_node=0, dst_node=1, label=1)
gtn.draw(fsa, "figures/fsa_pre_closure_wrong.svg", isymbols=symbols)
```

In [6]:

```
# The modified (and incorrect) construction for the closure:
fsa = gtn.Graph()
fsa.add_node(start=True, accept=True)
fsa.add_node(accept=True)
fsa.add_arc(src_node=0, dst_node=0, label=0)
fsa.add_arc(src_node=0, dst_node=1, label=1)
fsa.add_arc(src_node=1, dst_node=0, label=gtn.epsilon)
gtn.draw(fsa, "figures/fsa_closure_wrong.svg", isymbols=symbols)
```

In [7]:

```
# The correct construction for the closure
fsa = gtn.Graph()
fsa.add_node()
fsa.add_node(accept=True)
fsa.add_node(start=True, accept=True)
fsa.add_arc(src_node=0, dst_node=0, label=0)
fsa.add_arc(src_node=0, dst_node=1, label=1)
fsa.add_arc(src_node=2, dst_node=0, label=gtn.epsilon)
fsa.add_arc(src_node=1, dst_node=2, label=gtn.epsilon)
gtn.draw(fsa, "figures/fsa_closure_right.svg", isymbols=symbols)
```

The graph following the correct construction of the closure is shown below.

The union takes as input two or more graphs and produces a new graph. The language of the resultant graph is the union of the languages of the input graphs. More formally let $\gA_1, \ldots, \gA_n$ be $n$ graphs. The language of the union graph is given by $\{ x \mid x \in \gA_i \textrm{ for some } i = 1, \ldots, n \}$. I'll occasionally use the $+$ sign to denote the union, as in $\gU = \gA_1 + \gA_2$.

In [8]:

```
# A graph which recognizes "aba*"
g1 = gtn.Graph()
g1.add_node(start=True)
g1.add_node()
g1.add_node(accept=True)
g1.add_arc(src_node=0, dst_node=1, label=0)
g1.add_arc(src_node=1, dst_node=2, label=1)
g1.add_arc(src_node=2, dst_node=2, label=0)
# A graph which recognizes "ba"
g2 = gtn.Graph()
g2.add_node(start=True)
g2.add_node()
g2.add_node(accept=True)
g2.add_arc(src_node=0, dst_node=1, label=1)
g2.add_arc(src_node=1, dst_node=2, label=0)
# A graph which recognizes "ac"
g3 = gtn.Graph()
g3.add_node(start=True)
g3.add_node()
g3.add_node(accept=True)
g3.add_arc(src_node=0, dst_node=1, label=0)
g3.add_arc(src_node=1, dst_node=2, label=2)
gtn.draw(g1, "figures/union_1.svg", isymbols=symbols)
gtn.draw(g2, "figures/union_2.svg", isymbols=symbols)
gtn.draw(g3, "figures/union_3.svg", isymbols=symbols)
```

Since we let a graph have multiple start states and multiple accept states, the union is easy to construct. A state in the union graph is a start state if it was a start state in one of the original graphs. A state in the union graph is an accept state if it was an accept state in one of the original graphs.

Consider the three graphs below with languages $\{ab, aba, abaa, \ldots\}$, $\{ba\}$, and $\{ac\}$ respectively.

In [9]:

```
g = gtn.union([g1, g2, g3])
gtn.draw(g, "figures/union.svg", isymbols=symbols)
```

Notice in the union graph below the only visual distinction from the individual graphs is that the states are numbered consecutively from $0$ to $8$ indicating a single graph with nine states instead of three individual graphs. The language of the union graph below is $\{ab, aba, abaa, \ldots\} \cup \{ba\} \cup \{ac\}$.

Like union, concatenate produces a new graph given two or more graphs as input. The language of the concatenated graph is the set of strings which can be formed by any concatenation of strings from the individual graph. Concatenate is not commutative, the order of the input graphs matters. More formally the language of the concatenated graph is given by $\{\vx_1 \ldots \vx_n \mid \vx_1 \in \mathcal{L}(\gA_1), \ldots, \vx_n \in \mathcal{L}(\gA_n)\}$. Occasionally, I will denote concatenation by placing the graphs side-by-side. So $\gA_1 \gA_2$ represents the concatenation of $\gA_1$ and $\gA_2$.

In [10]:

```
# The graph which recognizes "ba"
g1 = gtn.Graph()
g1.add_node(start=True)
g1.add_node()
g1.add_node(accept=True)
g1.add_arc(src_node=0, dst_node=1, label=1)
g1.add_arc(src_node=1, dst_node=2, label=0)
# The graph which recognizes "ac" and "bc"
g2 = gtn.Graph()
g2.add_node(start=True)
g2.add_node()
g2.add_node()
g2.add_node(accept=True)
g2.add_arc(src_node=0, dst_node=1, label=0)
g2.add_arc(src_node=1, dst_node=3, label=2)
g2.add_arc(src_node=0, dst_node=2, label=1)
g2.add_arc(src_node=2, dst_node=3, label=2)
gtn.draw(g1, "figures/concat_1.svg", isymbols=symbols)
gtn.draw(g2, "figures/concat_2.svg", isymbols=symbols)
```

The concatenated graph can be constructed from the original input graphs by connecting the accept states of one graph to the start states of the next. Assume we are concatenating $\gA_1, \ldots, \gA_n$. The start states of the concatenated graph are the start states of the first graph, $\gA_1$. The accept states of the concatenated graph are the accept states of the last graph, $\gA_n$. For any two graph $\gA_i$ and $\gA_{i+1}$, we connect each accept state of $\gA_i$ to each start state of $\gA_{i+1}$ with an $\epsilon$ transition.

As an example, consider the two graphs above.

In [11]:

```
g = gtn.concat([g1, g2])
gtn.draw(g, "figures/concat.svg", isymbols=symbols)
```

The concatenated graph is below and has the language $\{baac, babc\}$.

What is the identity graph for the concatenation function? The identity in a binary operation is the value which when used in the operation leaves the second input unchanged. In multiplication this would be $1$ since $c * 1 = c$ for any real value $c$.

What is the equivalent of the annihilator graph in the concatenation function? The annihilator in a binary operation is the value such that the operation with the annihilator always returns the annihilator. For multiplication $0$ is the annihilator since $c *0 = 0$ for any real value $c$.

The graph which accepts the empty string is the identity. The graph which does not accept any strings is the annihilator. See the figures below for an example of these two graphs.

In [12]:

```
# The identity graph which accepts the empty string:
fsa = gtn.Graph()
fsa.add_node(start=True, accept=True)
gtn.draw(fsa, "figures/concat_identity.svg", isymbols=symbols)
# The annihilator graph which does not accept any strings:
fsa = gtn.Graph()
fsa.add_node(start=True, accept=False)
gtn.draw(fsa, "figures/concat_annihilator.svg", isymbols=symbols)
```

The identity graph is a single node which is both a start and accept state. The language of the identity graph is the empty string. The annihilator graph is a single non accepting state. The language of the annihilator graph is the empty set. Note the subtle distinction between the language that contains the empty string and the language that is the empty set. The former can be written as $\{\epsilon\}$ whereas the latter is $\{\}$ (also commonly denoted by $\varnothing$).

Construct the concatenation of the two graphs below.

In [13]:

```
g1 = gtn.Graph()
g1.add_node(start=True)
g1.add_node()
g1.add_node(accept=True)
g1.add_node(accept=True)
g1.add_arc(src_node=0, dst_node=1, label=1)
g1.add_arc(src_node=1, dst_node=2, label=0)
g1.add_arc(src_node=1, dst_node=3, label=2)
g2 = gtn.Graph()
g2.add_node(start=True)
g2.add_node(start=True)
g2.add_node(accept=True)
g2.add_arc(src_node=0, dst_node=2, label=0)
g2.add_arc(src_node=1, dst_node=2, label=2)
gtn.draw(g1, "figures/concat_example_1.svg", isymbols=symbols)
gtn.draw(g2, "figures/concat_example_2.svg", isymbols=symbols)
```

In [14]:

```
gtn.draw(gtn.concat([g1, g2]), "figures/concat_example.svg", isymbols=symbols)
```

The concatenated graph is shown below.

Suppose we have a list of graphs to concatenate $\gA_1, \ldots, \gA_n$ where the $i$-th graph has $s_i$ start states and $a_i$ accept states. How many new arcs will the concatenated graph require?

For each consecutive pair of graphs $\gA_i$ and $\gA_{i+1}$, we need to add $a_i * s_{i+1}$ connecting arcs in the concatenated graph. So the total number of additional arcs is:

$$\sum_{i=1}^{n-1} a_i * s_{i+1}$$.

We've seen three basic operations so far:

**Closure:**The closed graph accepts any string in the input graph repeated zero or more times. The closure of a graph $\gA$ is denoted $\gA^*$.**Union:**The union graph accepts any string from any of the input graphs. The union of two graphs $\gA_1$ and $\gA_2$ is denoted $\gA_1 + \gA_2$.**Concatenate:**The concatenated graph accepts any string which can be formed by concatenating strings (respecting order) from the input graphs. The concatenation of two graphs $\gA_1$ and $\gA_2$ is denoted $\gA_1\gA_2$.

Assume you are given the following individual graphs $\gA_a$, $\gA_b$, and $\gA_c$, which recognize $a$, $b$, and $c$ respectively. Using only closure, union, and concatenate, construct the graph which recognizes any number of repeats of the strings $aa$, $bb$, and $cc$. For example $aabb$ and $bbaacc$ are in the language but $b$ and $ccaab$ are not.

In [15]:

```
fsa_a = gtn.Graph()
fsa_a.add_node(start=True)
fsa_a.add_node(accept=True)
fsa_a.add_arc(src_node=0, dst_node=1, label=0)
fsa_b = gtn.Graph()
fsa_b.add_node(start=True)
fsa_b.add_node(accept=True)
fsa_b.add_arc(src_node=0, dst_node=1, label=1)
fsa_c = gtn.Graph()
fsa_c.add_node(start=True)
fsa_c.add_node(accept=True)
fsa_c.add_arc(src_node=0, dst_node=1, label=2)
gtn.draw(fsa_a, "figures/fsa_a.svg", isymbols=symbols)
gtn.draw(fsa_b, "figures/fsa_b.svg", isymbols=symbols)
gtn.draw(fsa_c, "figures/fsa_c.svg", isymbols=symbols)
```

In [16]:

```
aa = gtn.concat([fsa_a, fsa_a])
bb = gtn.concat([fsa_b, fsa_b])
cc = gtn.concat([fsa_c, fsa_c])
fsa_repeats = gtn.closure(gtn.union([aa, bb, cc]))
gtn.draw(fsa_repeats, "figures/fsa_repeats.svg", isymbols=symbols)
```

First concatenate the individual graphs with themselves to get graphs which recognize $aa$, $bb$, and $cc$. Then take the union of the three concatenated graphs followed by the closure. The resulting graph is shown below. The equation to compute the desired graph is $\gA = (\gA_a\gA_a + \gA_b\gA_b + \gA_c\gA_c)^*$.