Strings

This chapter covers notes on strings-related algorithms.

In today's world, there's no such thing as a "string." There are ASCII strings and Unicode strings. While we can classify them as a single type "string," that classification isn't very useful β€” they're different enough that we must treat them separately. Accordingly, whenever we work with "strings," we must always β€” always β€” ask: Are working with ASCII strings or Unicode string?

Definitions

alphabet. A non-empty set of symbols is called an alphabet, denoted Ξ£.{\Sigma.}

For example, the set { 0,1 }{\set{0,1}} is an alphabet of two symbols. The set { a,b,c,d }{\set{a,b,c,d}} is an alphabet of four symbols.

string. Given an alphabet Ξ£,{\Sigma,} the finite sequence

Ο‰=Ο‰i=(Ο‰0Β Β Β Ο‰1Β Β Β Ο‰2   …   ωnβˆ’1) {\omega} = \omega_i = \ar{{\omega}_0~~~{\omega}_1~~~{\omega}_2~~~\ldots~~~{\omega}_{n-1}}

with Ο‰i∈Σ{\omega_i \in \Sigma} and n∈N{n \in \nat} is called a string over Ξ£{\Sigma} of length n.{n.} Given a symbol Ο‰i,{\omega_i,} we say that Ο‰0{\omega_0} is the first symbol, and Ο‰nβˆ’1{\omega_{n-1}} is the last symbol. We may denote the character o We denote the length n{n} of Ο‰{\omega} as lengthΒ (Ο‰).{\len(\omega).} If lengthΒ (Ο‰)=0,{\len(\omega)=0,} then we call the Ο‰{\omega} the empty string, and denote it Ο΅.{\epsilon.}

Note that we will denote strings with double quotes. Thus, the string cookie is denoted cookie.{{cookie}.} The string 2Ο€r{2 \pi r} is denoted 2Ο€r.{{2 \pi r}.} Additionally, the empty string may also be denoted with the Greek letter Ξ».{\lambda.}

Because how small subscripts are, we will adopt a few conventions when working with strings. First, given a string S,{S,} we will reference the first character with S[i],{S \ix{i},} where i{i} is an index. For example, given the string cake, writing cake[0]{{cake}\ix{0}} returns the string c,{{c},} and writing cake[2]{{cake}\ix{2}} returns the string k.{{k}.}

kleene closure. Given an alphabet Ξ£,{\Sigma,} the set of all strings over Ξ£{\Sigma} is called the Kleene closure of Ξ£,{\Sigma,} denoted Ξ£βˆ—.{\Sigma^*.}

language. Given a Kleene closure Ξ£βˆ—{\Sigma^*} and a set of strings L,{\cal{L},} if LβŠ†Ξ£βˆ—,{\cal{L} \subseteq \Sigma^*,} then L{\cal{L}} is a language.

word. Given a language L,{\cal{L},} we call each string β„“βˆˆL{\ell \in \cal{L}} a word.

character. Given a word W=c0Β c1 … cnβˆ’1{W = {c_0 ~ c_1 ~ \ldots ~ c_{n-1}}} of length n,{n,} we call each symbol ciβˆˆβ„“{c_i \in \ell} with i=0,1,…,nβˆ’1{i = 0,1,\ldots,n-1} a character.

character encoding. Let C{C} denote a character set, and { xi }i=1n{\set{x_i}_{i=1}^{n}} denote a finite range of integers { x0,x1,…,xn }{\set{x_0, x_1, \ldots, x_{n}}}. Then the injection

encode:{ xi }i=1n⇀C \df{encode} : \set{x_i}_{i=1}^n \inj C

is called a character encoding, whose domain is called the code space dom(encode),{\dom{\df{encode}},} and whose image β„‘(encode){\image{\df{(encode)}}} is called the coded character set. For each (xi,c)∈encode,{(x_i, c) \in \df{encode},} we call c{c} the coded character, and xi{x_i} by the code point of c.{c.}

k-Permutation String

Given an alphabet Ξ£,{\Sigma,} we define a k-permutation string as a permutation of a subset of Ξ£.{\Sigma.} That is, a string composed entirely of distinct symbols.

k-permutation?

Let S{S} be a finite, non-empty string s0,s1,…,snβˆ’1{s_0,s_1,\ldots,s_{n-1}} over an alphabet Ξ£.{\Sigma.} The function k-permutation?{\df{k-permutation?}} returns 0{0} if, and only if, each si{s_i} with 0<i<nβˆ’1{0 \lt i \lt n-1} and i,n∈N{i,n \in \nat} is distinct.

  1. L←lengthΒ S{\let{L}{\len{S}}}
  2. seen←newΒ hash-table{\let{seen}{\df{new}~\df{hash-table}}}
  3. seenβ‡š(S0,0){seen \Lleftarrow (S_0, 0)}
  4. i←1{\let{i}{1}}
  5. while i<L{i \lt L} do
    1. if Si∈seen{S_i \in seen} then return 0{0}
    2. else seenβ‡š(Si,i){seen \Lleftarrow (S_i, i)}
    3. i++{i \inc}
  6. return 1{1}