ghc/tests/programs/dmgob_native2/LPA.lhs

   1 \begin{comment}
   2
   3 > module LPA where
   4 >#ifndef __GLASGOW_HASKELL__
   5 > import Trace
   6 >#endif
   7
   8 \end{comment}
   9
  10
  11 %----------------------------------------------------------------------
  12 \section {A Brief Review of Discrete-Time Signal Processing}
  13 %----------------------------------------------------------------------
  14
  15 %%------------------------------------------------------------
  16 \subsection {Discrete-Time Signals}
  17 %%------------------------------------------------------------
  18
  19         This section only provides some background terminology;
  20 see~\cite{OppeScha75} or \cite{ProaMano92} for motivation and more
  21 details.
  22
  23
  24         A {\em discrete-time signal\/} is simply a sequence of
  25 numbers.  That is, a discrete-time signal is a function whose domain
  26 is the set of integers and whose range is some set of numbers, e.g.,
  27 the real or complex numbers.  The elements of the sequence are called
  28 {\em samples}.  The $n$th sample of the signal $x$ is denoted by
  29 $x_n$.  For the rest of this paper, we will refer to discrete-time
  30 signals as ``signals'' or ``sequences'' interchangeably.
  31
  32
  33         It is mathematically convenient to consider signals to be
  34 defined for all integers, with samples taking the value zero if not
  35 otherwise specified.  A signal $x$ is {\em finite-duration\/} if there
  36 are integers $n_1$ and $n_2$ such that $x_n = 0$ whenever $n<n_1$ or
  37 $n>n_2$.  A {\em right-sided sequence\/} $x$ is one for which there is
  38 an integer $n_1$ such that $x_n = 0$ for all $n < n_1$; if $n_1 \geq
  39 0$ then the signal is also said to be {\em causal}.  {\em
  40 Left-sided\/} and {\em two-sided\/} sequences are defined similarly.
  41 All signals in this paper are assumed to be causal with $n_1 = 0$.
  42
  43
  44         Since signals are sequences of numbers, they are naturally
  45 represented in Haskell using lists, particularly since we are assuming
  46 causal signals indexed from zero and Haskell lists are indexed from
  47 zero.
  48
  49
  50         We will often use the notation $x^n_k$ to denote the list of
  51 samples $[x_k,x_{k+1},\ldots,x_n]$ of the signal $x$.  If $x$ is
  52 potentially infinite in duration or if knowing its duration is not
  53 important, we will replace $n$ by $\infty$.  For a causal signal, this
  54 convention means that $x^\infty_0$ represents the entire signal.
  55
  56
  57         For convenience, we introduce the following type synonym:
  58         \begin{verbatim}
  59
  60 > type Signal a = [a]
  61
  62 \end{verbatim}
  63         The element type is unspecified because signals can be
  64 sequences of integers, floating point numbers, or complex numbers.
  65 Thus, \verb~Signal Int~ represents a signal whose samples are
  66 finite-precision integers, \verb~Signal Float~ represents a signal
  67 whose samples are floating point numbers, etc.
  68
  69
  70 %%------------------------------------------------------------
  71 \subsection {The $Z$ Transform}
  72 %%------------------------------------------------------------
  73
  74         Given a signal $x$, the {\em Z transform\/} of $x$ is the
  75 complex-valued function $X(z)$ of the complex variable $z$ defined by
  76         \begin{equation}
  77         X(z) \isDefined \sum_{k=-\infty}^\infty x_k z^{-k}
  78         \label{eq:Z-transform}
  79         \end{equation}
  80         for all $z$ for which the power series converges.  If the
  81 signal $x$ has $Z$ transform $X(z)$, we will write
  82         \[
  83         x \transformPair X(z)
  84         \]
  85         The $Z$ transform is useful because it allows us to study
  86 discrete-time systems (Section~\ref{sc:discrete-time-systems}) using
  87 algebraic manipulations.  This is because the $Z$ transform has a
  88 number of useful properties, only three of which are needed here.
  89         \begin{enumerate}
  90
  91         \item {\em scaling}
  92
  93         If we multiply each sample of the signal $x$ by a scalar $a$
  94 and call the resulting sequence $y$, then
  95         \[
  96         y \transformPair Y(z) = aX(z) \transformPair map\ (a*)\ x
  97         \]
  98
  99         \item {\em superposition}
 100
 101         Given two signals $x$ and $y$, if we add them
 102 sample-by-sample to get a new signal $w$, then
 103         \[
 104         w \transformPair W(z)  =  X(z) + Y(z)
 105         \]
 106
 107         \item {\em shift}
 108
 109         Given a signal $x$ and an integer $k$, define the signal $y$
 110 by
 111         \[
 112         y_n = x_{n-k}
 113         \]
 114         we have
 115         \[
 116         y \transformPair Y(z) = z^{-k}X(z)
 117         \]
 118
 119
 120         \end{enumerate}
 121
 122
 123 %%------------------------------------------------------------
 124 \subsection {Discrete-Time Systems}
 125 \label{sc:discrete-time-systems}
 126 %%------------------------------------------------------------
 127
 128         A {\em discrete-time system\/} is an operator that transforms
 129 an {\em input\/} signal into an {\em output\/} signal.  These systems
 130 are commonly called {\em filters\/} when their function is to suppress
 131 or enhance some feature of the input signal.  In Section~\ref{sc:fir}
 132 we will define a function for performing a particular type of
 133 filtering.
 134         \begin{verbatim}
 135
 136 > type Filter a = Signal a -> Signal a
 137
 138 \end{verbatim}
 139
 140
 141 %%------------------------------------------------------------
 142 \subsection {Finite Impulse Response Filtering}
 143 \label{sc:fir}
 144 %%------------------------------------------------------------
 145
 146         A (causal) {\em finite impulse response\/} (FIR) filter is
 147 characterized by a finite-duration signal $h$ called the {\em impulse
 148 response\/} of the filter.  For an arbitrary input signal $x$, the
 149 samples of the output signal $y$ are defined by the {\em convolution
 150 sum\/}
 151         \begin{equation}
 152         y_n = \sum^{M-1}_{k=0} h_k x_{n-k}
 153         \label{eq:convolution}
 154         \end{equation}
 155         where $M$ is the length of $h$. We will use FIR filtering
 156 twice, once in the definition of the function \verb~preemph~ in
 157 Section~\ref{sc:preemph} and again in the definition of the function
 158 \verb~cepstra~ in Section~\ref{sc:cepstra}.  Hence we need to define a
 159 Haskell function that implements FIR filtering.  In doing so, we will
 160 demonstrate how it is often possible to derive an executable
 161 definition from a mathematical specification.
 162
 163
 164         $Z$ transforms can be used to express the filtering operation.
 165 Suppose signals $x$, $h$, and $y$ are related by
 166 (\ref{eq:convolution}).  Denoting the $Z$ transforms of these signals
 167 by $X(z), Y(z)$, and $H(z)$, respectively, it is easy to show that
 168         \begin{equation}
 169         Y(z) = H(z)X(z)
 170         \end{equation}
 171         We now derive a Haskell function for performing FIR filtering.
 172         \begin{eqnarray*}
 173         Y(z) & = & H(z)X(z) \\
 174              & = & \sum^{M-1}_{k=0} h_k z^{-k} X(z)\\
 175              & = & \sum^{M-1}_{k=0} z^{-k} (h_kX(z)) \\
 176              & = & \sum^{M-1}_{k=0} z^{-k} W^{(k)}(z)
 177         \end{eqnarray*}
 178         where we have set $W^{(k)}(z) = h_kX(z)$.  Using Horner's rule,
 179         \begin{eqnarray*}
 180         Y(z) & = & W^{(0)}(z)\ +\ z^{-1}(W^{(1)}(z)
 181                                +\ z^{-1}(W^{(2)}(z)
 182                                +\ \ldots \nonumber \\
 183              &   & \hspace{1.2in}
 184                                +\ z^{-1}(W^{(M-2)}(z)
 185                                +\ z^{-1} W^{(M-1)}(z)) \ldots\ ))
 186         \end{eqnarray*}
 187         %
 188         %
 189         Defining the transform relationships $w^{(k)} \transformPair
 190 W^{(k)}(z)$, we get
 191         \begin{equation}
 192         y  =  w^{(0)} \oplus (w^{(1)} \oplus (w^{(2)} \oplus \ldots
 193                    \oplus (w^{(M-2)} \oplus w^{(M-1)}) \ldots\ ))
 194         \label{eq:ws}
 195         \end{equation}
 196         where the operator $\oplus$ is such that, for two arbitrary
 197 signals $u$ and $v$ having $Z$ transforms $U(z)$ and $V(z)$ we have
 198         \[
 199         u \oplus v \ \transformPair\  U(z) + z^{-1}V(z).
 200         \]
 201         From the superposition and shift properties of the $Z$
 202 transform, it is clear that the operator $\oplus$ is implemented by
 203 delaying the second operand by one sample and then performing
 204 sample-by-sample addition.  Let's call this operation
 205 ``\verb~delayAdd~'':
 206         \begin{verbatim}
 207
 208 > delayAdd :: (Num a) => Signal a -> Signal a -> Signal a
 209 > (x:rxs) `delayAdd` y  = x : combine rxs y
 210 > []      `delayAdd` y  = 0 : y
 211
 212 \end{verbatim}
 213         (The second equation treats the empty list as the all-zero
 214 signal.)  This definition uses a function \verb~combine~ which is
 215 similar to the Standard Prelude function \verb~zipWith~, except that
 216 \verb~combine~ returns a list as long as the {\em longer\/} of the two
 217 lists, with the ``missing'' values from the shorter list assumed to be
 218 zero.
 219         \begin{verbatim}
 220
 221 > combine :: (Num a) => Signal a -> Signal a -> Signal a
 222 > combine (x:rxs) (y:rys) = x+y : combine rxs rys
 223 > combine  []       ys    = ys
 224 > combine  xs       []    = xs
 225
 226 \end{verbatim}
 227
 228
 229         Now we observe that (\ref{eq:ws}) can be rewritten using
 230 $foldr1$:
 231         \begin{eqnarray*}
 232          y    & = & foldr1\ \ (\oplus)\ \ [\: w^{(0)},\
 233                                  w^{(1)},\ \ldots,\ w^{(M-1)} \:] \\
 234               & = & foldr1\ \ (\oplus)\ \ [\: map\ \ (h_k *)\ \ x \:|\:
 235                                            k \leftarrow [0..M-1] \:]
 236         \end{eqnarray*}
 237
 238
 239         Thus, we can implement an FIR filter having impulse response
 240 \verb~hs~ on an input signal \verb~x~ as follows:\footnote{My thanks
 241 to Mark Jones of Yale University who first suggested this form of the
 242 definition.}
 243         \begin{verbatim}
 244
 245 > fir :: Signal Float -> Filter Float
 246 > fir hs x = foldr1 delayAdd [map (h*) x | h <- hs]
 247
 248 \end{verbatim}
 249
 250
 251 %----------------------------------------------------------------------
 252 \section {Linear Predictive Analysis}
 253 %----------------------------------------------------------------------
 254
 255         The processor described here is rather conventional; we just
 256 implement most of the blocks described in~\cite[pp.\
 257 112--117]{RabiJuan93}.
 258
 259
 260 %--------------------------------------------------
 261 \subsection {Preemphasis}
 262 \label{sc:preemph}
 263 %--------------------------------------------------
 264
 265         The speech signal is first passed through a simple high-pass
 266 filter to reduce the degree of spectral tilt.  The system
 267 function is
 268         \[
 269         P(z) = 1 - a z^{-1}
 270         \]
 271         where $a$ is a real number less than but close to 1; a typical
 272 value is 0.95.  Hence, the preemphasis filter is an FIR filter with
 273 the impulse response $[1,\ -a\ ]$.
 274         \begin{verbatim}
 275
 276 > preemph :: Float -> Filter Float
 277 > preemph a =  fir [1,-a]
 278
 279 \end{verbatim}
 280
 281
 282 %--------------------------------------------------
 283 \subsection {Framing}
 284 %--------------------------------------------------
 285
 286         Because of the time-varying nature of speech, the signal is
 287 broken into a list of equal-length {\em frames}, each offset from its
 288 predecessor by a fixed number of samples.  Typically, the frame length
 289 is about $20$ ms and the offset is about $10$ ms.  We need a function
 290 \verb~frames~ that takes three arguments: a frame length, an offset,
 291 and the signal.  We can derive the definition of \verb~frames~ as
 292 follows:
 293         \begin{eqnarray*}
 294       frames\ n\ m\ x^\infty_0
 295       & = & [
 296           x_0^{n-1},\;
 297           x_m^{m+n-1},\;
 298           x_{2m}^{2m+n-1},\
 299           \ldots\ ] \\
 300       & = & map\ (take\ n)\
 301           [
 302           x_0^\infty,\;
 303           x_m^\infty,\;
 304           x_{2m}^\infty,\;
 305           \ldots\ ] \\
 306       & = & map\ (take\ n)\ (iterate\ (drop\ m)\ x_0^\infty).
 307 \end{eqnarray*}
 308 and thus
 309         \begin{equation}
 310         frame\ n\ m\ = map\ (take\ n)\ \circ\ iterate\ (drop\ m)
 311         \end{equation}
 312         But the signal may be finite, so we define and use a function
 313 \verb~check_for_end~ to avoid processing an infinite stream of empty
 314 lists.
 315         \begin{verbatim}
 316
 317 > type Frame a =  [a]
 318 > frames      :: Int -> Int -> Signal a -> [Frame a]
 319 > frames n m  =  check_for_end . map (take n) . iterate (drop m)
 320
 321 \end{verbatim}
 322         Here is the definition of \verb~check_for_end~:
 323         \begin{verbatim}
 324
 325 > check_for_end = takeWhile (not . null)
 326
 327 \end{verbatim}
 328
 329
 330 %--------------------------------------------------
 331 \subsection {Windowing}
 332 %--------------------------------------------------
 333
 334         A {\em window\/} is used to smoothly attenuate samples at the
 335 edges of a frame prior to analysis.  A window is applied by
 336 sample-by-sample multiplication. It is tempting to define the
 337 windowing operation using the Standard Prelude function
 338 \verb~zipWith~, however, we don't want to analyze a frame of data that
 339 is too short, as would happen if the length of the entire signal
 340 exceeded that of an integral number of frames by only a few samples.
 341 Hence we define the function \verb~window~ using a utility function
 342 \verb~window'~ that makes sure that the segment being windowed is at
 343 least as long as the window itself; otherwise an empty frame is
 344 returned.  Note that \verb~window'~ is not a stream function; the
 345 output list is accumulated in the third argument so that it can be
 346 forgotten if the signal turns out to be shorter than the window.
 347         \begin{verbatim}
 348
 349 > type Window = [Float]
 350
 351 > window      :: Window -> Signal Float -> Frame Float
 352 > window w x  =  window' w x []
 353
 354 > window' (w:rws) (x:rxs) buf = window' rws rxs (w*x : buf)
 355 > window' (_:_)    []     _   = []
 356 > window'  []      _      buf = buf
 357
 358 \end{verbatim}
 359
 360
 361         A popular window is the {\em Hamming window}.  If the window
 362 is to have $N$ samples, the formula for the $n$th sample is:
 363         \[
 364         w(n) = 0.54 - 0.46 \cos( \frac{2\pi}{N-1} n )
 365         \]
 366         Here is the function for building a Hamming window of \verb~n~
 367 samples:
 368         \begin{verbatim}
 369
 370 > hamming_window :: Int -> Window
 371 > hamming_window npts =
 372 >       let angle = (2*pi)/fromInt (npts-1)
 373 >       in  [0.54 - 0.46*cos (angle*fromInt n) | n <- [0..npts-1]]
 374
 375 \end{verbatim}
 376
 377
 378         One advantage of lazy evaluation is that by defining the
 379 windowing function the way we have we can do the windowing and the
 380 framing simultaneously. Thus, it turns out that we don't need the
 381 function \verb~frames~.  Let the first argument be the window width
 382 and the second argument be the offset between frames.
 383         \begin{verbatim}
 384
 385 > windows :: Int -> Int -> Signal Float -> [Frame Float]
 386 > windows n m =
 387 >       let
 388 >         hw           = hamming_window n
 389 >         apply_window = window hw
 390 >       in
 391 >         check_for_end . map apply_window . iterate (drop m)
 392
 393 \end{verbatim}
 394
 395
 396 %--------------------------------------------------
 397 \subsection {The Autocorrelation Sequence}
 398 %--------------------------------------------------
 399
 400         The function \verb~autocorr~ computes the autocorrelation
 401 sequence for the signal $x$ as a prelude to computing the coefficients
 402 of the optimal $p$th-order linear prediction filter
 403 (Section~\ref{sc:Durbin}). The $i$th sample of the autocorrelation
 404 sequence for an $N$-point signal is defined by the equation
 405         \[
 406         r_i = \left\{
 407               \begin{array}{ll}
 408               \sum^{N-1-i}_{n=0} x_n x_{n+i} & 0 \leq i < N \\
 409                0                             & i \geq N
 410               \end{array}
 411               \right.
 412         \]
 413         The summation operation is just the familiar dot product of
 414 linear algebra.  The dot product can be coded in Haskell as follows:
 415         \begin{verbatim}
 416
 417 > x `dot` y = sum (zipWith (*) x y)
 418
 419 \end{verbatim}
 420         However, the operator \verb~`dot`~ is actually more general
 421 than the familiar dot product because the two argument lists can have
 422 different length; the longer list is essentially truncated to the
 423 length of the shorter list.  For derivation purposes, let `$\odot$'
 424 denote this more general operation, and let `$\cdot$' denote the
 425 standard dot product.  In general,
 426         \[
 427         x^n_0 \cdot y^n_0 = x_0^{n+k} \odot y^n_0 =
 428         x^n_0 \odot y^{n+k}_0
 429         \]
 430         for all $k \geq 0$, whereas $x^{n+k}_0 \cdot y^n_0$ would be
 431 undefined.  We can now formally derive the Haskell function for
 432 computing the non-zero portion of the autocorrelation sequence as
 433 follows:
 434         \begin{eqnarray*}
 435         [r_0,\ r_1,\ \ldots,\ r_{N-1} ]
 436         & = & [ \sum^{N-1-i}_{n=0} x_n x_{n+i} \ |\ i \leftarrow [0..N-1]\:]\\
 437         & = & [ x^{N-1-i}_0 \cdot x^{N-1}_i    \ |\ i \leftarrow [0..N-1]\:]\\
 438         & = & [ x^{N-1}_0 \odot x^{N-1}_i      \ |\ i \leftarrow [0..N-1]\:]\\
 439         & = & map\ \ (x^{N-1}_0\ \odot)\ \ [ x^{N-1}_i \ |\ i \leftarrow
 440                                                          [0..N-1]\:]\\
 441         & = & map\ \ (x^{N-1}_0\ \odot)\ \ (tails\ x^{N-1}_0)
 442         \end{eqnarray*}
 443         where the function $tails$ is such that
 444         \[
 445         tails\ x^N_0 = [\ x^N_0,\ x^N_1,\ x^N_2,\ \ldots,\ x^N_N\ ]
 446         \]
 447         To get the complete autocorrelation sequence, we just append
 448 an infinite list of zeros:
 449         \begin{verbatim}
 450
 451 > autocorr x = map (x `dot`) (tails x) ++ repeat 0.0
 452
 453 \end{verbatim}
 454         The function \verb~tails~ is defined as follows:\footnote{This
 455 function is also provided in the Chalmer's hbc library module {tt
 456 ListUtil}.}
 457         \begin{verbatim}
 458
 459 > tails xs@(_:rxs) | null rxs   = [xs]
 460 >                  | otherwise  = xs : tails rxs
 461
 462 \end{verbatim}
 463
 464
 465         In a conventional imperative language we would need to pass
 466 {\em two\/} arguments to this function, one specifying how many
 467 autocorrelation values are to be computed.  The function would then
 468 return an array containing precisely that many values.  However, in a
 469 lazy language the autocorrelation values will only be computed as
 470 needed, so we do not need to specify such a value for this function.
 471
 472
 473         The value $r_0$ is the {\em energy} of the signal.  The
 474 logarithm of the energy is often included as a parameter in feature
 475 vectors used in recognizers.
 476
 477
 478 %--------------------------------------------------
 479 \subsection {The Durbin Algorithm}
 480 \label{sc:Durbin}
 481 %--------------------------------------------------
 482
 483         The next step is to solve for the optimum linear predictor
 484 filter coefficients for a given order $p$.  This can be done
 485 efficiently using the {\em Durbin Algorithm}.  The steps are listed in
 486 Figure~\ref{fg:durbin} as they appear in~\cite[p.\ 411]{RabiScha78}
 487 but with a few changes in notation.  Basically, we successively
 488 compute the optimum prediction filters for each order $i$ starting
 489 with $i=1$.  The optimum $i$th-order filter coefficients, $a^{(i)} =
 490 [a^{(i)}_1,\ldots,a^{(i)}_i]$, are calculated from the optimum
 491 $(i-1)$th-order filter coefficients, $a^{(i-1)} = [
 492 a^{(i-1)}_1,\ldots, a^{(i-1)}_{i-1}]$, the mean squared error of that
 493 filter, $e_{i-1}$, the autocorrelation value $r_i$, and the
 494 autocorrelation values $r_1, r_2,\ldots, r_{i-1}$.
 495         \begin{figure}
 496         \fbox{
 497         \begin{tabular}{cl}
 498 Initialization: & $e_0 = r_0$ \\[0.10in]
 499 %
 500 %  Durbin algorithm
 501 %
 502 \shortstack{ Iteration: \\ $(i=1,2,\ldots,p)$} &
 503    \begin{minipage}{2.850in}
 504         \begin{eqnarray*}
 505 %
 506       k_i & = & \left( r_i - \sum^{i-1}_{j=1} a_j^{(i-1)} r_{i-j}
 507                 \right) \left/ \: e_{i-1} \right. \\
 508 %
 509       a_j^{(i)} & = & \left\{ \begin{array}{ll}
 510                               a_j^{(i-1)} - k_i a_{i-j}^{(i-1)}, &
 511                                     j = 1, \ldots, i-1 \\
 512                               k_i, & j=i
 513                              \end{array}
 514                        \right.\\
 515 %
 516       e_i   & = & (1 - k_i^2) e_{i-1}
 517 %
 518         \end{eqnarray*}
 519    \end{minipage}
 520 %
 521 \end{tabular}
 522   }
 523 \caption[]{The Durbin Algorithm.}
 524 \label{fg:durbin}
 525 \end{figure}
 526
 527
 528         This description can be straightforwardly realized in a
 529 conventional language using arrays; we will implement the algorithm in
 530 Haskell without using arrays.
 531         \begin{verbatim}
 532
 533 > type LPA_Filter = (Float, [Float])
 534
 535 \end{verbatim}
 536
 537
 538 %%------------------------------------------------------------
 539         \subsubsection {Updating the Predictor Polynomial}
 540 %%------------------------------------------------------------
 541
 542         First, we derive a function \verb~new_as~ that takes the
 543 optimal $(i-1)$th-order linear prediction filter coefficients and the
 544 $i$th reflection coefficient and returns the optimal $i$th-order
 545 linear prediction filter coefficients.  From Figure~\ref{fg:durbin},
 546 the optimal $i$th-order linear prediction filter is given by
 547         \begin{eqnarray*}
 548         a^{(i)} & \isDefined &
 549                       [ a^{(i)}_1,\ a^{(i)}_2,\ \ldots,\ a^{(i)}_i ] \\
 550                 & = & [ a^{(i-1)}_1 - k_i a^{(i-1)}_{i-1},\ \
 551                         a^{(i-1)}_2 - k_i a^{(i-1)}_{i-2},\ \
 552                         \ldots,\ \
 553                         a^{(i-1)}_{i-1} - k_i a^{(i-1)}_1,\ \
 554                         k_i ]
 555         \end{eqnarray*}
 556         Note that the subtraction operation involves the first and
 557 last coefficients, the second and second-to-last coefficients, etc.
 558 If we place two copies of $a^{(i-1)}$ side by side,
 559         \[
 560         [ a^{(i-1)}_1,\ a^{(i-1)}_2,\ \ldots,\
 561           a^{(i-1)}_{i-2},\ a^{(i-1)}_{i-1} ]\ \ \
 562         [ a^{(i-1)}_1,\ a^{(i-1)}_2,\ \ldots,\
 563           a^{(i-1)}_{i-2},\ a^{(i-1)}_{i-1} ]
 564         \]
 565         it is easy to see that what is called for is some type
 566 of \verb~foldr~ operation that consumes the list on the right as it
 567 moves through the list on the left.  Indeed, if we define the binary
 568 operator $\oplus$ by (assuming that $k_i$ is in scope)
 569         \begin{equation}
 570         a \oplus (p,\ b:bs)  =  ((a-k_i*b):p,\ bs)
 571         \label{eq:oplus}
 572         \end{equation}
 573         then we have
 574         \begin{equation}
 575         a^{(i)} = fst\ \ (foldr\ \ (\oplus)\ \ ([k_i], a^{(i-1)})\ \
 576                         a^{(i-1)})
 577         \label{eq:new-as}
 578         \end{equation}
 579         Combining (\ref{eq:oplus}) and (\ref{eq:new-as}) into a
 580 Haskell definition:
 581         \begin{verbatim}
 582
 583 > new_as as k = fst (foldr op ([k],as) as)
 584 >               where  a `op` (p,b:bs) = ((a-k*b):p,bs)
 585
 586 \end{verbatim}
 587
 588
 589 %%------------------------------------------------------------
 590         \subsubsection {Computing the Reflection Coefficient}
 591 %%------------------------------------------------------------
 592
 593         Next we consider the calculation of the $i$th reflection
 594 coefficient.  Ignoring the division by $e_{i-1}$ for the moment,
 595 consider just the expression
 596         \begin{equation}
 597         r_i - \sum^{i-1}_{j=1} a^{(i-1)}_j r_{i-j}
 598         \label{eq:k-numerator}
 599         \end{equation}
 600         For the summation term we can use the same approach we used
 601 for calculating the new prediction filter coefficients: a
 602 \verb~foldr~ operation that consumes one list while moving through
 603 another.  Defining the binary operator $\otimes$ by
 604         \begin{equation}
 605         a \otimes (s,\ b:bs) = (s + a*b,\ bs)
 606         \label{eq:otimes}
 607         \end{equation}
 608         the summation is the first component of the pair returned by
 609 the expression
 610         \begin{equation}
 611         foldr\ \ (\otimes)\ \ (0,rs)\ \ as
 612         \label{eq:k-summation}
 613         \end{equation}
 614         where it is assumed that the length of $rs$ is at least as
 615 great as that of $as$ (it is in practice).  The second component of
 616 the pair returned by (\ref{eq:k-summation}) is a list having $r_i$ as
 617 its head, which, as we see from (\ref{eq:k-numerator}), is what we are
 618 subtracting the summation from.  Combining (\ref{eq:otimes}) and
 619 (\ref{eq:k-summation}) with the division by $e_{i-1}$ yields the
 620 following definition.
 621         \begin{verbatim}
 622
 623 > new_k rs (e,as) =
 624 >       let (summation,r:_) = foldr op (0,rs) as
 625 >                             where  a `op` (s,b:bs) = (s+a*b,bs)
 626 >       in (r - summation)/e
 627
 628 \end{verbatim}
 629
 630
 631 %%------------------------------------------------------------
 632         \subsubsection {One Step in the Durbin Algorithm}
 633 %%------------------------------------------------------------
 634
 635         We can now define the function that produces the optimal
 636 $i$th-order linear prediction filter from the autocorrelation sequence
 637 and the optimal $(i-1)$th-order filter.
 638         \begin{verbatim}
 639
 640 > durbin_step rs (e,as) = let k = new_k rs (e,as)
 641 >                         in  ((1-k*k)*e, new_as as k)
 642
 643 \end{verbatim}
 644
 645
 646 %%------------------------------------------------------------
 647         \subsubsection {The Durbin Algorithm}
 648 %%------------------------------------------------------------
 649
 650         To get the optimal $p$th-order linear predictor given the
 651 autocorrelation sequence, we just iteratively apply the function
 652 \verb~durbin_step~ to the appropriate initial condition and select the
 653 $p$th element of the resulting sequence.  If $(r : rrs)$ is the
 654 autocorrelation sequence for a frame, then the expression
 655         \[
 656         iterate\ \ (durbin\_step\ rrs)\ \ (r,[\ ])
 657         \]
 658         produces the list
 659         \[
 660         [\: (e_0,[\ ]),\ (e_1,a^{(1)}),\ (e_2,a^{(2)}),\ \ldots \:]
 661         \]
 662 from which we want the element at position $p$.
 663         \begin{verbatim}
 664
 665 > durbin :: Int -> Signal Float -> LPA_Filter
 666 > durbin p (r:rrs) = (iterate (durbin_step rrs) (r,[]))!!p
 667
 668 \end{verbatim}
 669
 670
 671 %%------------------------------------------------------------
 672 \subsection {Conversion to Cepstral Coefficients}
 673 \label{sc:cepstra}
 674 %%------------------------------------------------------------
 675
 676         Given a discrete-time linear system with the system function
 677         \[
 678         \frac{G}{1 - \sum^p_{i=1} a_i z^{-i}}
 679         \]
 680         the sequence of {\em cepstral coefficients\/} $c^\infty_0$ for
 681 this system is defined by
 682         \begin{equation}
 683         c_n = \left\{
 684               \begin{array}{ll}
 685               \ln G     & n=0 \\[0.10in]
 686               a_n + \frac{1}{n} \sum^{n-1}_{k=1} a_k \cdot (n-k)
 687                                                      \cdot c_{n-k} &
 688                         1 \leq n \leq p \\[0.10in]
 689         \frac{1}{n} \sum^p_{k=1} a_k \cdot (n-k) \cdot c_{n-k} & n > p
 690                 \end{array}
 691               \right.
 692         \label{eq:cepstra}
 693         \end{equation}
 694         Note that the summation terms of (\ref{eq:cepstra}) are just
 695 convolution sums for the FIR filter with $a^{(p)}$ as its impulse
 696 response and the scaled cepstral coefficient signal $[c_1, 2c_2, 3c_3,
 697 4c_4, \ldots]$.  Also, the second and third formulas are the same
 698 except for the adding of the $a_i$'s to the scaled and filtered
 699 cepstral coefficients.  Hence, we can use the function \verb~delayAdd~
 700 that we defined when developing the FIR filtering function \verb~fir~
 701 (Section~\ref{sc:fir}).  Also, the gain term is usually ignored in
 702 forming the feature vector, so we just compute the sequence
 703 $[c_1,c_2,\ldots]$.
 704         \begin{verbatim}
 705
 706 > cepstra :: LPA_Filter -> Signal Float
 707 > cepstra (_,as) = cs
 708 >       where
 709 >       cs  = as `delayAdd` xs
 710 >       xs  = zipWith (/) (fir as (zipWith (*) [1..] cs)) [2..]
 711
 712 \end{verbatim}
 713         Because there is no terminating condition and because of the
 714 recursion---the sequence \verb~cs~ appears in the definition of the
 715 sequence \verb~xs~---this definition relies on lazy evaluation in a
 716 critical way.
 717
 718
 719         \subsection {Putting it all together}
 720
 721         The function \verb~analyze~ transforms a windowed frame of
 722 samples to a pair of values, a log gain term and a list of cepstral
 723 coeficients.  The first argument is the order of the linear prediction
 724 analysis.  The second argument is the number of cepstral coefficients.
 725         \begin{verbatim}
 726
 727 > analyze :: Int -> Int -> Frame Float -> (Float, [Float])
 728
 729 > analyze p q wxs = let
 730 >                     rs         = autocorr wxs
 731 >                     log_energy = log10 (head rs)
 732 >                     cep        = take q (cepstra (durbin p rs))
 733 >                   in
 734 >                     (log_energy, cep)
 735
 736 > log10 :: Float -> Float
 737 > log10 x = log x / log 10
 738 > --log10 x = let result = log x / log 10 in
 739 > --  trace ("log10:"++(shows x ":")++(shows (log x) "/")++(shows (log 10) "=")++(show result)) result
 740
 741 \end{verbatim}
 742
 743
 744
 745         \subsection {The Main Program}
 746
 747         Figure~\ref{fg:complete} shows the main Haskell program,
 748 including the speech/feature I/O functions.
 749         \begin{figure}[p]
 750         \input{Main.lhs}
 751         \caption[]{The linear predictive speech analysis main program}
 752         \label{fg:complete}
 753         \end{figure}
 754