X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fcomm%2Fthe-beast%2Fsyntax.html;fp=docs%2Fcomm%2Fthe-beast%2Fsyntax.html;h=be5bbefa17a7c8bd1c6a09b8518bfd1270dd6725;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/comm/the-beast/syntax.html b/docs/comm/the-beast/syntax.html new file mode 100644 index 0000000..be5bbef --- /dev/null +++ b/docs/comm/the-beast/syntax.html @@ -0,0 +1,99 @@ + + +
+ +
+ The lexical and syntactic analyser for Haskell programs are located in
+ fptools/ghc/compiler/parser/
.
+
+ The lexer is a rather tedious piece of Haskell code contained in the
+ module Lex
.
+ Its complexity partially stems from covering, in addition to Haskell 98,
+ also the whole range of GHC language extensions plus its ability to
+ analyse interface files in addition to normal Haskell source. The lexer
+ defines a parser monad P a
, where a
is the
+ type of the result expected from a successful parse. More precisely, a
+ result of type
+
++data ParseResult a = POk PState a + | PFailed Message+
+ is produced with Message
being from ErrUtils
+ (and currently is simply a synonym for SDoc
).
+
+ The record type PState
contains information such as the
+ current source location, buffer state, contexts for layout processing,
+ and whether Glasgow extensions are accepted (either due to
+ -fglasgow-exts
or due to reading an interface file). Most
+ of the fields of PState
store unboxed values; in fact, even
+ the flag indicating whether Glasgow extensions are enabled is
+ represented by an unboxed integer instead of by a Bool
. My
+ (= chak's) guess is that this is to avoid having to perform a
+ case
on a boxed value in the inner loop of the lexer.
+
+ The same lexer is used by the Haskell source parser, the Haskell + interface parser, and the package configuration parser. + +
+ The parser for Haskell source files is defined in the form of a parser
+ specification for the parser generator Happy in the file Parser.y
.
+ The parser exports three entry points for parsing entire modules
+ (parseModule
, individual statements
+ (parseStmt
), and individual identifiers
+ (parseIdentifier
), respectively. The last two are needed
+ for GHCi. All three require a parser state (of type
+ PState
) and are invoked from HscMain
.
+
+ Parsing of Haskell is a rather involved process. The most challenging
+ features are probably the treatment of layout and expressions that
+ contain infix operators. The latter may be user-defined and so are not
+ easily captured in a static syntax specification. Infix operators may
+ also appear in the right hand sides of value definitions, and so, GHC's
+ parser treats those in the same way as expressions. In other words, as
+ general expressions are a syntactic superset of expressions - ok, they
+ nearly are - the parser simply attempts to parse a general
+ expression in such positions. Afterwards, the generated parse tree is
+ inspected to ensure that the accepted phrase indeed forms a legal
+ pattern. This and similar checks are performed by the routines from ParseUtil
. In
+ some cases, these routines do, in addition to checking for
+ wellformedness, also transform the parse tree, such that it fits into
+ the syntactic context in which it has been parsed; in fact, this happens
+ for patterns, which are transformed from a representation of type
+ RdrNameHsExpr
into a representation of type
+ RdrNamePat
.
+
+
+ The parser for interface files is also generated by Happy from ParseIface.y
.
+ It's main routine parseIface
is invoked from RnHiFiles
.readIface
.
+
+
+ The parser for configuration files is by far the smallest of the three
+ and defined in ParsePkgConf.y
.
+ It exports loadPackageConfig
, which is used by DriverState
.readPackageConf
.
+
+
+ +Last modified: Wed Jan 16 00:30:14 EST 2002 + + + +