Basic Operations

$\newcommand{\vx}{{\bf x}} \newcommand{\vy}{{\bf y}} \newcommand{\gA}{\mathcal{A}} \newcommand{\gT}{\mathcal{T}} \newcommand{\gU}{\mathcal{U}} \DeclareMathOperator*{\LSE}{\textrm{LSE}}$An operation on a transducer (or acceptor) takes one or more transducers as input and outputs a transducer. You can think of these operations as functions on graphs. As a reminder, I'll use uppercase script letters to represent graphs, so $\gA$ for example can represent a graph. Functions will be denoted by lower case variables. So $f(\gA)$ is a function which takes as input a single graph and outputs a graph.

Closure

The closure, sometimes called the Kleene star, is a unary function (takes a single input) which can operate on either an acceptor or transducer. If the sequence $\vx$ is accepted by $\gA$, then zero or more copies of $\vx$ are accepted by the closure of $\gA$. More formally, if the language of an acceptor is $\mathcal{L}(\gA)$, then the language of the closure of $\gA$ is $\{\vx^n \mid \vx \in \mathcal{L}(\gA),\;\; n = 0, 1, \ldots, \}$. The notation $\vx^n$ means $\vx$ concatenated $n$ times. So $\vx^2$ is $\vx \vx$ and $\vx^0$ is the empty string. Usually the closure of an acceptor is denoted by $^*$, as in $\gA^*$. This is the same notation used in regular expressions.

The closure of a graph is easy to construct with the use of $\epsilon$ transitions. The language of the graph below is the string $aba$.

An example of a simple acceptor with language $\{aba\}$, in other words $\mathcal{L}(\gA) = \{aba\}$.

The closure of the graph needs to accept an arbitrary number of copies of $aba$ including the empty string. To accept the empty string we make the start state an accept state as well. To accept one or more copies of $aba$ we simply wire up the old accept states to the new start state with $\epsilon$ transitions.

The closure of the graph is shown below.

The closure $\gA^*$ of the graph above. The language of the graph is $\{\epsilon, aba, abaaba, \ldots\}$.

Example

You might notice that state $4$ in the above graph is not necessary. Consider an alternate construction for computing the closure of a graph. We could have made the state $0$ into an accept state and connected state $3$ to state $0$ with an $\epsilon$ transition, as in the graph below.

The closure $\gA^*$ of the graph above using an alternate, simpler construction which connects the accept state to the original start state with an $\epsilon$ transition. This construction does not work for every case.

For the graph above, this alternate construction works and requires fewer states and arcs. In the general case, this construction turns every start state into an accept state instead of adding a new start state. Give an example where this doesn't work? In other words, give an example where the graph from this modified construction is not the correct closure of the original graph.

An example for which the alternate construction does not work is shown in the graph below. The language of the graph is $a^nb$ (any number of $a$'s followed by a $b$) and the closure is $(a^nb)^*$, or any sequence that ends with $b$.

An acceptor for which the alternate construction for the closure does not yield the correct result.

If we follow the modified construction for the closure, as in the graph below, then the language would incorrectly include sequences that do not end with $b$ such as $a^*$.

The alternate construction which incorrectly computes the closure of the graph above.

The graph following the correct construction of the closure is shown below.

The correct closure of the graph above which has the language $(a^nb)^*$, which is any sequence which ends with $b$.

Union

The union takes as input two or more graphs and produces a new graph. The language of the resultant graph is the union of the languages of the input graphs. More formally let $\gA_1, \ldots, \gA_n$ be $n$ graphs. The language of the union graph is given by $\{ x \mid x \in \gA_i \textrm{ for some } i = 1, \ldots, n \}$. I'll occasionally use the $+$ sign to denote the union, as in $\gU = \gA_1 + \gA_2$.

Since we let a graph have multiple start states and multiple accept states, the union is easy to construct. A state in the union graph is a start state if it was a start state in one of the original graphs. A state in the union graph is an accept state if it was an accept state in one of the original graphs.

Consider the three graphs below with languages $\{ab, aba, abaa, \ldots\}$, $\{ba\}$, and $\{ac\}$ respectively.

Three acceptors with languages $\{ab, aba, abaa, \ldots\}$, $\{ba\}$, and $\{ac\}$ from left to right, respectively.

Notice in the union graph below the only visual distinction from the individual graphs is that the states are numbered consecutively from $0$ to $8$ indicating a single graph with nine states instead of three individual graphs. The language of the union graph below is $\{ab, aba, abaa, \ldots\} \cup \{ba\} \cup \{ac\}$.

The union of the three acceptors above with language $\{ab, aba, abaa, \ldots\} \cup \{ba\} \cup \{ac\}$.

Concatenate

Like union, concatenate produces a new graph given two or more graphs as input. The language of the concatenated graph is the set of strings which can be formed by any concatenation of strings from the individual graph. Concatenate is not commutative, the order of the input graphs matters. More formally the language of the concatenated graph is given by $\{\vx_1 \ldots \vx_n \mid \vx_1 \in \mathcal{L}(\gA_1), \ldots, \vx_n \in \mathcal{L}(\gA_n)\}$. Occasionally, I will denote concatenation by placing the graphs side-by-side. So $\gA_1 \gA_2$ represents the concatenation of $\gA_1$ and $\gA_2$.

The concatenated graph can be constructed from the original input graphs by connecting the accept states of one graph to the start states of the next. Assume we are concatenating $\gA_1, \ldots, \gA_n$. The start states of the concatenated graph are the start states of the first graph, $\gA_1$. The accept states of the concatenated graph are the accept states of the last graph, $\gA_n$. For any two graph $\gA_i$ and $\gA_{i+1}$, we connect each accept state of $\gA_i$ to each start state of $\gA_{i+1}$ with an $\epsilon$ transition.

The acceptor on left has language $\{ba\}$ and the acceptor on the right has language $\{ac, bc\}$.

As an example, consider the two graphs above.

The concatenated graph is below and has the language $\{baac, babc\}$.

The graph is the concatenation of the two graphs above and has the language $\{baac, babc\}$.

Example

What is the identity graph for the concatenation function? The identity in a binary operation is the value which when used in the operation leaves the second input unchanged. In multiplication this would be $1$ since $c * 1 = c$ for any real value $c$.

What is the equivalent of the annihilator graph in the concatenation function? The annihilator in a binary operation is the value such that the operation with the annihilator always returns the annihilator. For multiplication $0$ is the annihilator since $c *0 = 0$ for any real value $c$.

The graph which accepts the empty string is the identity. The graph which does not accept any strings is the annihilator. See the figures below for an example of these two graphs.

The identity (left) and the annihilator (right) for the concatenate operation. The language of the identity graph is the empty string $\{\epsilon\}$. The language of the annihilator graph is the empty set $\{\}$.

The identity graph is a single node which is both a start and accept state. The language of the identity graph is the empty string. The annihilator graph is a single non accepting state. The language of the annihilator graph is the empty set. Note the subtle distinction between the language that contains the empty string and the language that is the empty set. The former can be written as $\{\epsilon\}$ whereas the latter is $\{\}$ (also commonly denoted by $\varnothing$).


Example

Construct the concatenation of the two graphs below.

The acceptor on the right has the language $\{ba, bc\}$, the acceptor on the left has the language $\{a, c\}$.

The concatenated graph is shown below.

The concatenation of the two graphs above has the language $\{baa, bac, bca, bcc\}$.

Example

Suppose we have a list of graphs to concatenate $\gA_1, \ldots, \gA_n$ where the $i$-th graph has $s_i$ start states and $a_i$ accept states. How many new arcs will the concatenated graph require?

For each consecutive pair of graphs $\gA_i$ and $\gA_{i+1}$, we need to add $a_i * s_{i+1}$ connecting arcs in the concatenated graph. So the total number of additional arcs is:

$$\sum_{i=1}^{n-1} a_i * s_{i+1}$$

.


Summary

We've seen three basic operations so far:


Example

Assume you are given the following individual graphs $\gA_a$, $\gA_b$, and $\gA_c$, which recognize $a$, $b$, and $c$ respectively. Using only closure, union, and concatenate, construct the graph which recognizes any number of repeats of the strings $aa$, $bb$, and $cc$. For example $aabb$ and $bbaacc$ are in the language but $b$ and $ccaab$ are not.

The three individual automata with languages $\{a\}$, $\{b\}$, and $\{c\}$ from left to right, respectively.

First concatenate the individual graphs with themselves to get graphs which recognize $aa$, $bb$, and $cc$. Then take the union of the three concatenated graphs followed by the closure. The resulting graph is shown below. The equation to compute the desired graph is $\gA = (\gA_a\gA_a + \gA_b\gA_b + \gA_c\gA_c)^*$.

The even numbered repeats graph constructed from the individual token graphs using the operations $\gA = (\gA_a\gA_a + \gA_b\gA_b + \gA_c\gA_c)^*$.