ghc/tests/programs/dmgob_native2/LPA.lhs

   1 \begin{comment}
   2
   3 > module LPA where
   4 >#ifndef __GLASGOW_HASKELL__
   5 > import Trace
   6 >#endif
   7 > import Int( Num(fromInt) )
   8
   9 \end{comment}
  10
  11
  12 %----------------------------------------------------------------------
  13 \section {A Brief Review of Discrete-Time Signal Processing}
  14 %----------------------------------------------------------------------
  15
  16 %%------------------------------------------------------------
  17 \subsection {Discrete-Time Signals}
  18 %%------------------------------------------------------------
  19
  20         This section only provides some background terminology;
  21 see~\cite{OppeScha75} or \cite{ProaMano92} for motivation and more
  22 details.
  23
  24
  25         A {\em discrete-time signal\/} is simply a sequence of
  26 numbers.  That is, a discrete-time signal is a function whose domain
  27 is the set of integers and whose range is some set of numbers, e.g.,
  28 the real or complex numbers.  The elements of the sequence are called
  29 {\em samples}.  The $n$th sample of the signal $x$ is denoted by
  30 $x_n$.  For the rest of this paper, we will refer to discrete-time
  31 signals as ``signals'' or ``sequences'' interchangeably.
  32
  33
  34         It is mathematically convenient to consider signals to be
  35 defined for all integers, with samples taking the value zero if not
  36 otherwise specified.  A signal $x$ is {\em finite-duration\/} if there
  37 are integers $n_1$ and $n_2$ such that $x_n = 0$ whenever $n<n_1$ or
  38 $n>n_2$.  A {\em right-sided sequence\/} $x$ is one for which there is
  39 an integer $n_1$ such that $x_n = 0$ for all $n < n_1$; if $n_1 \geq
  40 0$ then the signal is also said to be {\em causal}.  {\em
  41 Left-sided\/} and {\em two-sided\/} sequences are defined similarly.
  42 All signals in this paper are assumed to be causal with $n_1 = 0$.
  43
  44
  45         Since signals are sequences of numbers, they are naturally
  46 represented in Haskell using lists, particularly since we are assuming
  47 causal signals indexed from zero and Haskell lists are indexed from
  48 zero.
  49
  50
  51         We will often use the notation $x^n_k$ to denote the list of
  52 samples $[x_k,x_{k+1},\ldots,x_n]$ of the signal $x$.  If $x$ is
  53 potentially infinite in duration or if knowing its duration is not
  54 important, we will replace $n$ by $\infty$.  For a causal signal, this
  55 convention means that $x^\infty_0$ represents the entire signal.
  56
  57
  58         For convenience, we introduce the following type synonym:
  59         \begin{verbatim}
  60
  61 > type Signal a = [a]
  62
  63 \end{verbatim}
  64         The element type is unspecified because signals can be
  65 sequences of integers, floating point numbers, or complex numbers.
  66 Thus, \verb~Signal Int~ represents a signal whose samples are
  67 finite-precision integers, \verb~Signal Float~ represents a signal
  68 whose samples are floating point numbers, etc.
  69
  70
  71 %%------------------------------------------------------------
  72 \subsection {The $Z$ Transform}
  73 %%------------------------------------------------------------
  74
  75         Given a signal $x$, the {\em Z transform\/} of $x$ is the
  76 complex-valued function $X(z)$ of the complex variable $z$ defined by
  77         \begin{equation}
  78         X(z) \isDefined \sum_{k=-\infty}^\infty x_k z^{-k}
  79         \label{eq:Z-transform}
  80         \end{equation}
  81         for all $z$ for which the power series converges.  If the
  82 signal $x$ has $Z$ transform $X(z)$, we will write
  83         \[
  84         x \transformPair X(z)
  85         \]
  86         The $Z$ transform is useful because it allows us to study
  87 discrete-time systems (Section~\ref{sc:discrete-time-systems}) using
  88 algebraic manipulations.  This is because the $Z$ transform has a
  89 number of useful properties, only three of which are needed here.
  90         \begin{enumerate}
  91
  92         \item {\em scaling}
  93
  94         If we multiply each sample of the signal $x$ by a scalar $a$
  95 and call the resulting sequence $y$, then
  96         \[
  97         y \transformPair Y(z) = aX(z) \transformPair map\ (a*)\ x
  98         \]
  99
 100         \item {\em superposition}
 101
 102         Given two signals $x$ and $y$, if we add them
 103 sample-by-sample to get a new signal $w$, then
 104         \[
 105         w \transformPair W(z)  =  X(z) + Y(z)
 106         \]
 107
 108         \item {\em shift}
 109
 110         Given a signal $x$ and an integer $k$, define the signal $y$
 111 by
 112         \[
 113         y_n = x_{n-k}
 114         \]
 115         we have
 116         \[
 117         y \transformPair Y(z) = z^{-k}X(z)
 118         \]
 119
 120
 121         \end{enumerate}
 122
 123
 124 %%------------------------------------------------------------
 125 \subsection {Discrete-Time Systems}
 126 \label{sc:discrete-time-systems}
 127 %%------------------------------------------------------------
 128
 129         A {\em discrete-time system\/} is an operator that transforms
 130 an {\em input\/} signal into an {\em output\/} signal.  These systems
 131 are commonly called {\em filters\/} when their function is to suppress
 132 or enhance some feature of the input signal.  In Section~\ref{sc:fir}
 133 we will define a function for performing a particular type of
 134 filtering.
 135         \begin{verbatim}
 136
 137 > type Filter a = Signal a -> Signal a
 138
 139 \end{verbatim}
 140
 141
 142 %%------------------------------------------------------------
 143 \subsection {Finite Impulse Response Filtering}
 144 \label{sc:fir}
 145 %%------------------------------------------------------------
 146
 147         A (causal) {\em finite impulse response\/} (FIR) filter is
 148 characterized by a finite-duration signal $h$ called the {\em impulse
 149 response\/} of the filter.  For an arbitrary input signal $x$, the
 150 samples of the output signal $y$ are defined by the {\em convolution
 151 sum\/}
 152         \begin{equation}
 153         y_n = \sum^{M-1}_{k=0} h_k x_{n-k}
 154         \label{eq:convolution}
 155         \end{equation}
 156         where $M$ is the length of $h$. We will use FIR filtering
 157 twice, once in the definition of the function \verb~preemph~ in
 158 Section~\ref{sc:preemph} and again in the definition of the function
 159 \verb~cepstra~ in Section~\ref{sc:cepstra}.  Hence we need to define a
 160 Haskell function that implements FIR filtering.  In doing so, we will
 161 demonstrate how it is often possible to derive an executable
 162 definition from a mathematical specification.
 163
 164
 165         $Z$ transforms can be used to express the filtering operation.
 166 Suppose signals $x$, $h$, and $y$ are related by
 167 (\ref{eq:convolution}).  Denoting the $Z$ transforms of these signals
 168 by $X(z), Y(z)$, and $H(z)$, respectively, it is easy to show that
 169         \begin{equation}
 170         Y(z) = H(z)X(z)
 171         \end{equation}
 172         We now derive a Haskell function for performing FIR filtering.
 173         \begin{eqnarray*}
 174         Y(z) & = & H(z)X(z) \\
 175              & = & \sum^{M-1}_{k=0} h_k z^{-k} X(z)\\
 176              & = & \sum^{M-1}_{k=0} z^{-k} (h_kX(z)) \\
 177              & = & \sum^{M-1}_{k=0} z^{-k} W^{(k)}(z)
 178         \end{eqnarray*}
 179         where we have set $W^{(k)}(z) = h_kX(z)$.  Using Horner's rule,
 180         \begin{eqnarray*}
 181         Y(z) & = & W^{(0)}(z)\ +\ z^{-1}(W^{(1)}(z)
 182                                +\ z^{-1}(W^{(2)}(z)
 183                                +\ \ldots \nonumber \\
 184              &   & \hspace{1.2in}
 185                                +\ z^{-1}(W^{(M-2)}(z)
 186                                +\ z^{-1} W^{(M-1)}(z)) \ldots\ ))
 187         \end{eqnarray*}
 188         %
 189         %
 190         Defining the transform relationships $w^{(k)} \transformPair
 191 W^{(k)}(z)$, we get
 192         \begin{equation}
 193         y  =  w^{(0)} \oplus (w^{(1)} \oplus (w^{(2)} \oplus \ldots
 194                    \oplus (w^{(M-2)} \oplus w^{(M-1)}) \ldots\ ))
 195         \label{eq:ws}
 196         \end{equation}
 197         where the operator $\oplus$ is such that, for two arbitrary
 198 signals $u$ and $v$ having $Z$ transforms $U(z)$ and $V(z)$ we have
 199         \[
 200         u \oplus v \ \transformPair\  U(z) + z^{-1}V(z).
 201         \]
 202         From the superposition and shift properties of the $Z$
 203 transform, it is clear that the operator $\oplus$ is implemented by
 204 delaying the second operand by one sample and then performing
 205 sample-by-sample addition.  Let's call this operation
 206 ``\verb~delayAdd~'':
 207         \begin{verbatim}
 208
 209 > delayAdd :: (Num a) => Signal a -> Signal a -> Signal a
 210 > (x:rxs) `delayAdd` y  = x : combine rxs y
 211 > []      `delayAdd` y  = 0 : y
 212
 213 \end{verbatim}
 214         (The second equation treats the empty list as the all-zero
 215 signal.)  This definition uses a function \verb~combine~ which is
 216 similar to the Standard Prelude function \verb~zipWith~, except that
 217 \verb~combine~ returns a list as long as the {\em longer\/} of the two
 218 lists, with the ``missing'' values from the shorter list assumed to be
 219 zero.
 220         \begin{verbatim}
 221
 222 > combine :: (Num a) => Signal a -> Signal a -> Signal a
 223 > combine (x:rxs) (y:rys) = x+y : combine rxs rys
 224 > combine  []       ys    = ys
 225 > combine  xs       []    = xs
 226
 227 \end{verbatim}
 228
 229
 230         Now we observe that (\ref{eq:ws}) can be rewritten using
 231 $foldr1$:
 232         \begin{eqnarray*}
 233          y    & = & foldr1\ \ (\oplus)\ \ [\: w^{(0)},\
 234                                  w^{(1)},\ \ldots,\ w^{(M-1)} \:] \\
 235               & = & foldr1\ \ (\oplus)\ \ [\: map\ \ (h_k *)\ \ x \:|\:
 236                                            k \leftarrow [0..M-1] \:]
 237         \end{eqnarray*}
 238
 239
 240         Thus, we can implement an FIR filter having impulse response
 241 \verb~hs~ on an input signal \verb~x~ as follows:\footnote{My thanks
 242 to Mark Jones of Yale University who first suggested this form of the
 243 definition.}
 244         \begin{verbatim}
 245
 246 > fir :: Signal Float -> Filter Float
 247 > fir hs x = foldr1 delayAdd [map (h*) x | h <- hs]
 248
 249 \end{verbatim}
 250
 251
 252 %----------------------------------------------------------------------
 253 \section {Linear Predictive Analysis}
 254 %----------------------------------------------------------------------
 255
 256         The processor described here is rather conventional; we just
 257 implement most of the blocks described in~\cite[pp.\
 258 112--117]{RabiJuan93}.
 259
 260
 261 %--------------------------------------------------
 262 \subsection {Preemphasis}
 263 \label{sc:preemph}
 264 %--------------------------------------------------
 265
 266         The speech signal is first passed through a simple high-pass
 267 filter to reduce the degree of spectral tilt.  The system
 268 function is
 269         \[
 270         P(z) = 1 - a z^{-1}
 271         \]
 272         where $a$ is a real number less than but close to 1; a typical
 273 value is 0.95.  Hence, the preemphasis filter is an FIR filter with
 274 the impulse response $[1,\ -a\ ]$.
 275         \begin{verbatim}
 276
 277 > preemph :: Float -> Filter Float
 278 > preemph a =  fir [1,-a]
 279
 280 \end{verbatim}
 281
 282
 283 %--------------------------------------------------
 284 \subsection {Framing}
 285 %--------------------------------------------------
 286
 287         Because of the time-varying nature of speech, the signal is
 288 broken into a list of equal-length {\em frames}, each offset from its
 289 predecessor by a fixed number of samples.  Typically, the frame length
 290 is about $20$ ms and the offset is about $10$ ms.  We need a function
 291 \verb~frames~ that takes three arguments: a frame length, an offset,
 292 and the signal.  We can derive the definition of \verb~frames~ as
 293 follows:
 294         \begin{eqnarray*}
 295       frames\ n\ m\ x^\infty_0
 296       & = & [
 297           x_0^{n-1},\;
 298           x_m^{m+n-1},\;
 299           x_{2m}^{2m+n-1},\
 300           \ldots\ ] \\
 301       & = & map\ (take\ n)\
 302           [
 303           x_0^\infty,\;
 304           x_m^\infty,\;
 305           x_{2m}^\infty,\;
 306           \ldots\ ] \\
 307       & = & map\ (take\ n)\ (iterate\ (drop\ m)\ x_0^\infty).
 308 \end{eqnarray*}
 309 and thus
 310         \begin{equation}
 311         frame\ n\ m\ = map\ (take\ n)\ \circ\ iterate\ (drop\ m)
 312         \end{equation}
 313         But the signal may be finite, so we define and use a function
 314 \verb~check_for_end~ to avoid processing an infinite stream of empty
 315 lists.
 316         \begin{verbatim}
 317
 318 > type Frame a =  [a]
 319 > frames      :: Int -> Int -> Signal a -> [Frame a]
 320 > frames n m  =  check_for_end . map (take n) . iterate (drop m)
 321
 322 \end{verbatim}
 323         Here is the definition of \verb~check_for_end~:
 324         \begin{verbatim}
 325
 326 > check_for_end = takeWhile (not . null)
 327
 328 \end{verbatim}
 329
 330
 331 %--------------------------------------------------
 332 \subsection {Windowing}
 333 %--------------------------------------------------
 334
 335         A {\em window\/} is used to smoothly attenuate samples at the
 336 edges of a frame prior to analysis.  A window is applied by
 337 sample-by-sample multiplication. It is tempting to define the
 338 windowing operation using the Standard Prelude function
 339 \verb~zipWith~, however, we don't want to analyze a frame of data that
 340 is too short, as would happen if the length of the entire signal
 341 exceeded that of an integral number of frames by only a few samples.
 342 Hence we define the function \verb~window~ using a utility function
 343 \verb~window'~ that makes sure that the segment being windowed is at
 344 least as long as the window itself; otherwise an empty frame is
 345 returned.  Note that \verb~window'~ is not a stream function; the
 346 output list is accumulated in the third argument so that it can be
 347 forgotten if the signal turns out to be shorter than the window.
 348         \begin{verbatim}
 349
 350 > type Window = [Float]
 351
 352 > window      :: Window -> Signal Float -> Frame Float
 353 > window w x  =  window' w x []
 354
 355 > window' (w:rws) (x:rxs) buf = window' rws rxs (w*x : buf)
 356 > window' (_:_)    []     _   = []
 357 > window'  []      _      buf = buf
 358
 359 \end{verbatim}
 360
 361
 362         A popular window is the {\em Hamming window}.  If the window
 363 is to have $N$ samples, the formula for the $n$th sample is:
 364         \[
 365         w(n) = 0.54 - 0.46 \cos( \frac{2\pi}{N-1} n )
 366         \]
 367         Here is the function for building a Hamming window of \verb~n~
 368 samples:
 369         \begin{verbatim}
 370
 371 > hamming_window :: Int -> Window
 372 > hamming_window npts =
 373 >       let angle = (2*pi)/fromInt (npts-1)
 374 >       in  [0.54 - 0.46*cos (angle*fromInt n) | n <- [0..npts-1]]
 375
 376 \end{verbatim}
 377
 378
 379         One advantage of lazy evaluation is that by defining the
 380 windowing function the way we have we can do the windowing and the
 381 framing simultaneously. Thus, it turns out that we don't need the
 382 function \verb~frames~.  Let the first argument be the window width
 383 and the second argument be the offset between frames.
 384         \begin{verbatim}
 385
 386 > windows :: Int -> Int -> Signal Float -> [Frame Float]
 387 > windows n m =
 388 >       let
 389 >         hw           = hamming_window n
 390 >         apply_window = window hw
 391 >       in
 392 >         check_for_end . map apply_window . iterate (drop m)
 393
 394 \end{verbatim}
 395
 396
 397 %--------------------------------------------------
 398 \subsection {The Autocorrelation Sequence}
 399 %--------------------------------------------------
 400
 401         The function \verb~autocorr~ computes the autocorrelation
 402 sequence for the signal $x$ as a prelude to computing the coefficients
 403 of the optimal $p$th-order linear prediction filter
 404 (Section~\ref{sc:Durbin}). The $i$th sample of the autocorrelation
 405 sequence for an $N$-point signal is defined by the equation
 406         \[
 407         r_i = \left\{
 408               \begin{array}{ll}
 409               \sum^{N-1-i}_{n=0} x_n x_{n+i} & 0 \leq i < N \\
 410                0                             & i \geq N
 411               \end{array}
 412               \right.
 413         \]
 414         The summation operation is just the familiar dot product of
 415 linear algebra.  The dot product can be coded in Haskell as follows:
 416         \begin{verbatim}
 417
 418 > x `dot` y = sum (zipWith (*) x y)
 419
 420 \end{verbatim}
 421         However, the operator \verb~`dot`~ is actually more general
 422 than the familiar dot product because the two argument lists can have
 423 different length; the longer list is essentially truncated to the
 424 length of the shorter list.  For derivation purposes, let `$\odot$'
 425 denote this more general operation, and let `$\cdot$' denote the
 426 standard dot product.  In general,
 427         \[
 428         x^n_0 \cdot y^n_0 = x_0^{n+k} \odot y^n_0 =
 429         x^n_0 \odot y^{n+k}_0
 430         \]
 431         for all $k \geq 0$, whereas $x^{n+k}_0 \cdot y^n_0$ would be
 432 undefined.  We can now formally derive the Haskell function for
 433 computing the non-zero portion of the autocorrelation sequence as
 434 follows:
 435         \begin{eqnarray*}
 436         [r_0,\ r_1,\ \ldots,\ r_{N-1} ]
 437         & = & [ \sum^{N-1-i}_{n=0} x_n x_{n+i} \ |\ i \leftarrow [0..N-1]\:]\\
 438         & = & [ x^{N-1-i}_0 \cdot x^{N-1}_i    \ |\ i \leftarrow [0..N-1]\:]\\
 439         & = & [ x^{N-1}_0 \odot x^{N-1}_i      \ |\ i \leftarrow [0..N-1]\:]\\
 440         & = & map\ \ (x^{N-1}_0\ \odot)\ \ [ x^{N-1}_i \ |\ i \leftarrow
 441                                                          [0..N-1]\:]\\
 442         & = & map\ \ (x^{N-1}_0\ \odot)\ \ (tails\ x^{N-1}_0)
 443         \end{eqnarray*}
 444         where the function $tails$ is such that
 445         \[
 446         tails\ x^N_0 = [\ x^N_0,\ x^N_1,\ x^N_2,\ \ldots,\ x^N_N\ ]
 447         \]
 448         To get the complete autocorrelation sequence, we just append
 449 an infinite list of zeros:
 450         \begin{verbatim}
 451
 452 > autocorr x = map (x `dot`) (tails x) ++ repeat 0.0
 453
 454 \end{verbatim}
 455         The function \verb~tails~ is defined as follows:\footnote{This
 456 function is also provided in the Chalmer's hbc library module {tt
 457 ListUtil}.}
 458         \begin{verbatim}
 459
 460 > tails xs@(_:rxs) | null rxs   = [xs]
 461 >                  | otherwise  = xs : tails rxs
 462
 463 \end{verbatim}
 464
 465
 466         In a conventional imperative language we would need to pass
 467 {\em two\/} arguments to this function, one specifying how many
 468 autocorrelation values are to be computed.  The function would then
 469 return an array containing precisely that many values.  However, in a
 470 lazy language the autocorrelation values will only be computed as
 471 needed, so we do not need to specify such a value for this function.
 472
 473
 474         The value $r_0$ is the {\em energy} of the signal.  The
 475 logarithm of the energy is often included as a parameter in feature
 476 vectors used in recognizers.
 477
 478
 479 %--------------------------------------------------
 480 \subsection {The Durbin Algorithm}
 481 \label{sc:Durbin}
 482 %--------------------------------------------------
 483
 484         The next step is to solve for the optimum linear predictor
 485 filter coefficients for a given order $p$.  This can be done
 486 efficiently using the {\em Durbin Algorithm}.  The steps are listed in
 487 Figure~\ref{fg:durbin} as they appear in~\cite[p.\ 411]{RabiScha78}
 488 but with a few changes in notation.  Basically, we successively
 489 compute the optimum prediction filters for each order $i$ starting
 490 with $i=1$.  The optimum $i$th-order filter coefficients, $a^{(i)} =
 491 [a^{(i)}_1,\ldots,a^{(i)}_i]$, are calculated from the optimum
 492 $(i-1)$th-order filter coefficients, $a^{(i-1)} = [
 493 a^{(i-1)}_1,\ldots, a^{(i-1)}_{i-1}]$, the mean squared error of that
 494 filter, $e_{i-1}$, the autocorrelation value $r_i$, and the
 495 autocorrelation values $r_1, r_2,\ldots, r_{i-1}$.
 496         \begin{figure}
 497         \fbox{
 498         \begin{tabular}{cl}
 499 Initialization: & $e_0 = r_0$ \\[0.10in]
 500 %
 501 %  Durbin algorithm
 502 %
 503 \shortstack{ Iteration: \\ $(i=1,2,\ldots,p)$} &
 504    \begin{minipage}{2.850in}
 505         \begin{eqnarray*}
 506 %
 507       k_i & = & \left( r_i - \sum^{i-1}_{j=1} a_j^{(i-1)} r_{i-j}
 508                 \right) \left/ \: e_{i-1} \right. \\
 509 %
 510       a_j^{(i)} & = & \left\{ \begin{array}{ll}
 511                               a_j^{(i-1)} - k_i a_{i-j}^{(i-1)}, &
 512                                     j = 1, \ldots, i-1 \\
 513                               k_i, & j=i
 514                              \end{array}
 515                        \right.\\
 516 %
 517       e_i   & = & (1 - k_i^2) e_{i-1}
 518 %
 519         \end{eqnarray*}
 520    \end{minipage}
 521 %
 522 \end{tabular}
 523   }
 524 \caption[]{The Durbin Algorithm.}
 525 \label{fg:durbin}
 526 \end{figure}
 527
 528
 529         This description can be straightforwardly realized in a
 530 conventional language using arrays; we will implement the algorithm in
 531 Haskell without using arrays.
 532         \begin{verbatim}
 533
 534 > type LPA_Filter = (Float, [Float])
 535
 536 \end{verbatim}
 537
 538
 539 %%------------------------------------------------------------
 540         \subsubsection {Updating the Predictor Polynomial}
 541 %%------------------------------------------------------------
 542
 543         First, we derive a function \verb~new_as~ that takes the
 544 optimal $(i-1)$th-order linear prediction filter coefficients and the
 545 $i$th reflection coefficient and returns the optimal $i$th-order
 546 linear prediction filter coefficients.  From Figure~\ref{fg:durbin},
 547 the optimal $i$th-order linear prediction filter is given by
 548         \begin{eqnarray*}
 549         a^{(i)} & \isDefined &
 550                       [ a^{(i)}_1,\ a^{(i)}_2,\ \ldots,\ a^{(i)}_i ] \\
 551                 & = & [ a^{(i-1)}_1 - k_i a^{(i-1)}_{i-1},\ \
 552                         a^{(i-1)}_2 - k_i a^{(i-1)}_{i-2},\ \
 553                         \ldots,\ \
 554                         a^{(i-1)}_{i-1} - k_i a^{(i-1)}_1,\ \
 555                         k_i ]
 556         \end{eqnarray*}
 557         Note that the subtraction operation involves the first and
 558 last coefficients, the second and second-to-last coefficients, etc.
 559 If we place two copies of $a^{(i-1)}$ side by side,
 560         \[
 561         [ a^{(i-1)}_1,\ a^{(i-1)}_2,\ \ldots,\
 562           a^{(i-1)}_{i-2},\ a^{(i-1)}_{i-1} ]\ \ \
 563         [ a^{(i-1)}_1,\ a^{(i-1)}_2,\ \ldots,\
 564           a^{(i-1)}_{i-2},\ a^{(i-1)}_{i-1} ]
 565         \]
 566         it is easy to see that what is called for is some type
 567 of \verb~foldr~ operation that consumes the list on the right as it
 568 moves through the list on the left.  Indeed, if we define the binary
 569 operator $\oplus$ by (assuming that $k_i$ is in scope)
 570         \begin{equation}
 571         a \oplus (p,\ b:bs)  =  ((a-k_i*b):p,\ bs)
 572         \label{eq:oplus}
 573         \end{equation}
 574         then we have
 575         \begin{equation}
 576         a^{(i)} = fst\ \ (foldr\ \ (\oplus)\ \ ([k_i], a^{(i-1)})\ \
 577                         a^{(i-1)})
 578         \label{eq:new-as}
 579         \end{equation}
 580         Combining (\ref{eq:oplus}) and (\ref{eq:new-as}) into a
 581 Haskell definition:
 582         \begin{verbatim}
 583
 584 > new_as as k = fst (foldr op ([k],as) as)
 585 >               where  a `op` (p,b:bs) = ((a-k*b):p,bs)
 586
 587 \end{verbatim}
 588
 589
 590 %%------------------------------------------------------------
 591         \subsubsection {Computing the Reflection Coefficient}
 592 %%------------------------------------------------------------
 593
 594         Next we consider the calculation of the $i$th reflection
 595 coefficient.  Ignoring the division by $e_{i-1}$ for the moment,
 596 consider just the expression
 597         \begin{equation}
 598         r_i - \sum^{i-1}_{j=1} a^{(i-1)}_j r_{i-j}
 599         \label{eq:k-numerator}
 600         \end{equation}
 601         For the summation term we can use the same approach we used
 602 for calculating the new prediction filter coefficients: a
 603 \verb~foldr~ operation that consumes one list while moving through
 604 another.  Defining the binary operator $\otimes$ by
 605         \begin{equation}
 606         a \otimes (s,\ b:bs) = (s + a*b,\ bs)
 607         \label{eq:otimes}
 608         \end{equation}
 609         the summation is the first component of the pair returned by
 610 the expression
 611         \begin{equation}
 612         foldr\ \ (\otimes)\ \ (0,rs)\ \ as
 613         \label{eq:k-summation}
 614         \end{equation}
 615         where it is assumed that the length of $rs$ is at least as
 616 great as that of $as$ (it is in practice).  The second component of
 617 the pair returned by (\ref{eq:k-summation}) is a list having $r_i$ as
 618 its head, which, as we see from (\ref{eq:k-numerator}), is what we are
 619 subtracting the summation from.  Combining (\ref{eq:otimes}) and
 620 (\ref{eq:k-summation}) with the division by $e_{i-1}$ yields the
 621 following definition.
 622         \begin{verbatim}
 623
 624 > new_k rs (e,as) =
 625 >       let (summation,r:_) = foldr op (0,rs) as
 626 >                             where  a `op` (s,b:bs) = (s+a*b,bs)
 627 >       in (r - summation)/e
 628
 629 \end{verbatim}
 630
 631
 632 %%------------------------------------------------------------
 633         \subsubsection {One Step in the Durbin Algorithm}
 634 %%------------------------------------------------------------
 635
 636         We can now define the function that produces the optimal
 637 $i$th-order linear prediction filter from the autocorrelation sequence
 638 and the optimal $(i-1)$th-order filter.
 639         \begin{verbatim}
 640
 641 > durbin_step rs (e,as) = let k = new_k rs (e,as)
 642 >                         in  ((1-k*k)*e, new_as as k)
 643
 644 \end{verbatim}
 645
 646
 647 %%------------------------------------------------------------
 648         \subsubsection {The Durbin Algorithm}
 649 %%------------------------------------------------------------
 650
 651         To get the optimal $p$th-order linear predictor given the
 652 autocorrelation sequence, we just iteratively apply the function
 653 \verb~durbin_step~ to the appropriate initial condition and select the
 654 $p$th element of the resulting sequence.  If $(r : rrs)$ is the
 655 autocorrelation sequence for a frame, then the expression
 656         \[
 657         iterate\ \ (durbin\_step\ rrs)\ \ (r,[\ ])
 658         \]
 659         produces the list
 660         \[
 661         [\: (e_0,[\ ]),\ (e_1,a^{(1)}),\ (e_2,a^{(2)}),\ \ldots \:]
 662         \]
 663 from which we want the element at position $p$.
 664         \begin{verbatim}
 665
 666 > durbin :: Int -> Signal Float -> LPA_Filter
 667 > durbin p (r:rrs) = (iterate (durbin_step rrs) (r,[]))!!p
 668
 669 \end{verbatim}
 670
 671
 672 %%------------------------------------------------------------
 673 \subsection {Conversion to Cepstral Coefficients}
 674 \label{sc:cepstra}
 675 %%------------------------------------------------------------
 676
 677         Given a discrete-time linear system with the system function
 678         \[
 679         \frac{G}{1 - \sum^p_{i=1} a_i z^{-i}}
 680         \]
 681         the sequence of {\em cepstral coefficients\/} $c^\infty_0$ for
 682 this system is defined by
 683         \begin{equation}
 684         c_n = \left\{
 685               \begin{array}{ll}
 686               \ln G     & n=0 \\[0.10in]
 687               a_n + \frac{1}{n} \sum^{n-1}_{k=1} a_k \cdot (n-k)
 688                                                      \cdot c_{n-k} &
 689                         1 \leq n \leq p \\[0.10in]
 690         \frac{1}{n} \sum^p_{k=1} a_k \cdot (n-k) \cdot c_{n-k} & n > p
 691                 \end{array}
 692               \right.
 693         \label{eq:cepstra}
 694         \end{equation}
 695         Note that the summation terms of (\ref{eq:cepstra}) are just
 696 convolution sums for the FIR filter with $a^{(p)}$ as its impulse
 697 response and the scaled cepstral coefficient signal $[c_1, 2c_2, 3c_3,
 698 4c_4, \ldots]$.  Also, the second and third formulas are the same
 699 except for the adding of the $a_i$'s to the scaled and filtered
 700 cepstral coefficients.  Hence, we can use the function \verb~delayAdd~
 701 that we defined when developing the FIR filtering function \verb~fir~
 702 (Section~\ref{sc:fir}).  Also, the gain term is usually ignored in
 703 forming the feature vector, so we just compute the sequence
 704 $[c_1,c_2,\ldots]$.
 705         \begin{verbatim}
 706
 707 > cepstra :: LPA_Filter -> Signal Float
 708 > cepstra (_,as) = cs
 709 >       where
 710 >       cs  = as `delayAdd` xs
 711 >       xs  = zipWith (/) (fir as (zipWith (*) [1..] cs)) [2..]
 712
 713 \end{verbatim}
 714         Because there is no terminating condition and because of the
 715 recursion---the sequence \verb~cs~ appears in the definition of the
 716 sequence \verb~xs~---this definition relies on lazy evaluation in a
 717 critical way.
 718
 719
 720         \subsection {Putting it all together}
 721
 722         The function \verb~analyze~ transforms a windowed frame of
 723 samples to a pair of values, a log gain term and a list of cepstral
 724 coeficients.  The first argument is the order of the linear prediction
 725 analysis.  The second argument is the number of cepstral coefficients.
 726         \begin{verbatim}
 727
 728 > analyze :: Int -> Int -> Frame Float -> (Float, [Float])
 729
 730 > analyze p q wxs = let
 731 >                     rs         = autocorr wxs
 732 >                     log_energy = log10 (head rs)
 733 >                     cep        = take q (cepstra (durbin p rs))
 734 >                   in
 735 >                     (log_energy, cep)
 736
 737 > log10 :: Float -> Float
 738 > log10 x = log x / log 10
 739 > --log10 x = let result = log x / log 10 in
 740 > --  trace ("log10:"++(shows x ":")++(shows (log x) "/")++(shows (log 10) "=")++(show result)) result
 741
 742 \end{verbatim}
 743
 744
 745
 746         \subsection {The Main Program}
 747
 748         Figure~\ref{fg:complete} shows the main Haskell program,
 749 including the speech/feature I/O functions.
 750         \begin{figure}[p]
 751         \input{Main.lhs}
 752         \caption[]{The linear predictive speech analysis main program}
 753         \label{fg:complete}
 754         \end{figure}
 755