DOMjudge-devel
Threads by month
- ----- 2026 -----
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
January 2017
- 2 participants
- 2 discussions
Hi all,
Checktestdata is a nice DSL and has proved helpful in many situation,
but from the moment I used it I was missing features and wondering why
it wasn't implemented as an EDSL (Embedded Domain Specific Language).
Now, many years later, I've finally found some time to create it. I've
implemented a little library in Haskell, and as a sneak preview I've
taken the interception problem from NCPC 2016 as referenced to in [1]
and attached it. The first question is: where shall I put the sources to
the library? Somewhere in the checktestdata repository (and if so,
where), or maybe create a checktestdata2 repository?
Let me give a little more information on why I think an EDSL is a good
idea here, and why I choose for Haskell (other than my background):
+ With an EDSL, you get so much for free from the underlying language,
such as (typed) variables, all sorts of abstraction mechanisms, compiler
optimizations, etc. For example, some testdata script may consist of
some repeated parts, which in an EDSL can simply be given a name to get
rid of the duplication.
+ Compiler optimizations, which of course also exist for C++ in which
checktestdata is built now, don't only optimize the checktestdata
runtime, but also the script itself since it is just a program and not
some custom script anymore. Especially with a functional language such
as Haskell this means definitions of primitives can be inlined and whole
parts of the script may get fused to a very efficient version.
+ Maintainability: my library is now +- 200 LOC, including docs and
empty lines, and contains quite a lot of the checktestdata language
already. Because Haskell has many well optimized parsing libraries, we
get a lot for free, and only have to create a nice generalization for
this particular use case.
+ Efficiency: the current checktestdata is just too slow, as mentioned
in [1]. As a reference, on my machine the checktestdata script for
interception 09.max_ans.in takes 35.5s, the python script takes 2.2s,
and the attached version only 2s. And I haven't done anything to
optimize specifically for this case, I've just translated the ctd script
to the syntax of my EDSL.
+ With this, the script can do more than just checking the syntax. I.e.
when the problem specifies that it contains a connected graph, one can
check the syntax, but also the connectedness of the graph, while still
separating concerns, i.e. the syntax of the input can be separated from
the semantic checks, but the semantic checks don't need to come in a
separate program which also has to read the input again, like we should
now. All can be combined naturally.
+ Create your own combinators: the primitives can be easily combined
into more complex combinators which you can then reuse. I.e. think of a
'point2d' combinator that reads two space separated integers.
- Haskell Platform needs to be available, this is easy on all major OS,
but is a quite large dependency. Then again, GCC and Boost are large as
well, and are less Windows-friendly.
- Slighly different syntax: although Haskell is quite liberal in its
syntax for EDSL's, users have to get used to particularities of the
language/
- Data generation is not easily possible, if not infeasible. There
should be some way of embedding the language to make this possible, and
this is actually what I've been working on for the past few years when
thinking about this problem, but since we mostly use it for checking
data I went with a monadic embedding which is very easy for data checking.
- Backwards compatibility: parsing existing scripts and running it with
the new backend is not going to be easy. I don't expect show stoppers,
but it definitely won't be as efficient as writing scripts within the
language because of variable bindings. I need to work on this a bit to
be sure though.
That's it, of course my libary is not done yet (no good error messages,
no regex yet), but it's a proof of concept that is already usable for
simpler problems. I hope to convince you that this is the way to go, and
to get some ideas to create the POC into a working tool. Also,
suggestions on 'interesting use cases' are welcome, the interception is
a nice one but I am sure there are many more problems for which the
standard checktestdata just not works.
Yours truly,
Jeroen
[1] https://github.com/DOMjudge/checktestdata/issues/3
2
10
Hi Jeroen,
Could you add a description or Makefile documenting how to build this? I
tried to do so by installing cabal, and the required packages, but
failed. I think I didn't have exactly the right versions, and even then,
the Setup binary complained that none of the packages were available.
Jaap
On 01-01-17 10:42, DOMjudge wrote:
> The branch, master has been updated
> from c525f8b20c1d59e54db2b6399fb5dab9475bb903 (commit)
> via c6bcd2e71281bf990b0ca9f6254236622d4ebc97 (commit)
> via d7fea2aabc5ad165ce93589d47398cc99a3f0765 (commit)
>
> - Log -----------------------------------------------------------------
> -----------------------------------------------------------------------
> https://www.domjudge.org/gitweb/?p=checktestdata.git;a=commitdiff;h=c6bcd2e7
> commit c6bcd2e71281bf990b0ca9f6254236622d4ebc97
> Author: Jeroen Bransen <jeroen(a)chordify.net>
> Date: Sun Jan 1 13:38:25 2017 +0100
>
> [haskell_edsl] Add regex support
>
> diff --git a/haskell_edsl/checktestdata.cabal b/haskell_edsl/checktestdata.cabal
> index 42babc6..9e42a30 100644
> --- a/haskell_edsl/checktestdata.cabal
> +++ b/haskell_edsl/checktestdata.cabal
> @@ -27,6 +27,7 @@ library
> containers >=0.5 && <0.6,
> either >=4.3 && <4.5,
> mtl >=2.2 && <2.3,
> + regex-tdfa >= 1.2,
> uu-parsinglib >= 2.9
> hs-source-dirs: src
> default-language: Haskell2010
> diff --git a/haskell_edsl/src/Checktestdata/Core.hs b/haskell_edsl/src/Checktestdata/Core.hs
> index e3242b3..11615dc 100644
> --- a/haskell_edsl/src/Checktestdata/Core.hs
> +++ b/haskell_edsl/src/Checktestdata/Core.hs
> @@ -12,6 +12,7 @@ module Checktestdata.Core (
> nextHex,
> nextFloat,
> string,
> + regex,
> eof,
> isEOF,
> ) where
> @@ -21,6 +22,9 @@ import qualified Data.ByteString.Char8 as BS
> import qualified Data.ByteString.Lex.Fractional as FR
> import qualified Data.ByteString.Lex.Integral as INT
>
> +import Text.Regex.TDFA
> +import Text.Regex.TDFA.ByteString
> +
> import Control.Monad.State
> import Control.Monad.Trans.Either
>
> @@ -172,6 +176,24 @@ string s = PrimOp $ do
> True -> do
> putRemaining $ BS.drop (length s) cs
>
> +-- | Match with the given regular expression
> +regex :: String -> CTD String
> +regex rs = PrimOp $ do
> + let reg = compile defaultCompOpt defaultExecOpt $ BS.pack rs
> + case reg of
> + Left e -> failWithLocation e
> + Right r -> do
> + cs <- getRemaining
> + let err = "Expression " ++ show rs ++ " does not match"
> + case regexec r cs of
> + Right (Just (pre,main,post,_)) -> case BS.null pre of
> + True -> do
> + putRemaining post
> + return $ BS.unpack main
> + False -> failWithLocation err
> + Left e -> failWithLocation e
> + Right _ -> failWithLocation err
> +
> -- | Check whether we are at the end of the file.
> isEOF :: CTD Bool
> isEOF = PrimOp $ do
>
> -----------------------------------------------------------------------
> https://www.domjudge.org/gitweb/?p=checktestdata.git;a=commitdiff;h=d7fea2aa
> commit d7fea2aabc5ad165ce93589d47398cc99a3f0765
> Author: Jeroen Bransen <jeroen(a)chordify.net>
> Date: Sun Jan 1 13:40:02 2017 +0100
>
> [haskell_edsl] Almost complete backwards compatibility, passes most tests
>
> diff --git a/haskell_edsl/src/Checktestdata/Script/AST.hs b/haskell_edsl/src/Checktestdata/Script/AST.hs
> index f116515..7542c24 100644
> --- a/haskell_edsl/src/Checktestdata/Script/AST.hs
> +++ b/haskell_edsl/src/Checktestdata/Script/AST.hs
> @@ -1,5 +1,6 @@
> module Checktestdata.Script.AST (
> - Var,
> + VarName,
> + Var (..),
> Block,
> AST (..),
> Expr (..),
> @@ -8,9 +9,11 @@ module Checktestdata.Script.AST (
> Test (..),
> CompOp (..),
> ) where
> --- data Var = Var String [Expr]
>
> -type Var = String
> +type VarName = String
> +
> +data Var = Var VarName [Expr]
> + deriving ( Show )
>
> type Block = [AST]
>
> @@ -20,11 +23,12 @@ data AST = CSpace
> | CInt Expr Expr (Maybe Var)
> | CFloat Expr Expr (Maybe Var) (Maybe FloatOption)
> | CString Expr
> + | CRegex Expr (Maybe Var)
> | CRep (Maybe Var) Expr (Maybe AST) Block -- var, count, separator, body
> | CWhile (Maybe Var) Test (Maybe AST) Block
> | CAssert Test
> | CSet [(Var, Expr)]
> - | CUnset [Var]
> + | CUnset [VarName]
> | CIf Test Block (Maybe Block)
> deriving (Show)
>
> @@ -54,8 +58,8 @@ data Test = Not Test
> | And Test Test
> | Or Test Test
> | Match Expr
> - | Unique [Var]
> - | InArray Expr Var
> + | Unique [VarName]
> + | InArray Expr VarName
> | IsEOF
> deriving (Show)
>
> diff --git a/haskell_edsl/src/Checktestdata/Script/Interpreter.hs b/haskell_edsl/src/Checktestdata/Script/Interpreter.hs
> index c74c9a9..8013e09 100644
> --- a/haskell_edsl/src/Checktestdata/Script/Interpreter.hs
> +++ b/haskell_edsl/src/Checktestdata/Script/Interpreter.hs
> @@ -8,6 +8,7 @@ import Checktestdata.Core
> import Checktestdata.Derived
> import Checktestdata.Script.AST
>
> +import Data.List ( genericLength, transpose )
> import Data.Map ( Map )
> import qualified Data.Map as Map
>
> @@ -19,26 +20,32 @@ import Control.Monad.State
> --------------------------------------------------------------------------------
>
> -- todo: error handling on script parsing level
> --- todo: arrays
>
> data Value = VInt Integer
> | VFloat Rational
> | VString String
> + deriving ( Show, Eq, Ord )
>
> -type ValueMap = Map Var Value
> +type ValueMap = Map VarName (Map [Value] Value)
>
> setValue :: Maybe Var -> Value -> InterpretMonad ()
> setValue Nothing _ = return ()
> -setValue (Just var) val = modify $ Map.insert var val
> +setValue (Just (Var var eidxs)) val = do
> + idxs <- mapM fExpr eidxs
> + modify $ Map.insertWith Map.union var $ Map.singleton idxs val
>
> getValue :: Var -> InterpretMonad Value
> -getValue var = do
> +getValue (Var var eidxs) = do
> vm <- get
> case Map.lookup var vm of
> - Nothing -> error $ "Undefined variable " ++ var
> - Just val -> return val
> -
> -unsetVar :: Var -> InterpretMonad ()
> + Nothing -> error $ "Undefined variable " ++ show var
> + Just vmi -> do
> + idxs <- mapM fExpr eidxs
> + case Map.lookup idxs vmi of
> + Nothing -> error $ "Undefined index " ++ show idxs
> + Just val -> return val
> +
> +unsetVar :: VarName -> InterpretMonad ()
> unsetVar var = modify $ Map.delete var
>
> toInt :: Value -> InterpretMonad Integer
> @@ -66,133 +73,158 @@ liftC = lift
>
> -- | Interpret an old checktestdata script into an executable 'CTD'
> interpret :: Block -> CTD ()
> -interpret = flip evalStateT Map.empty . fBlock where
> +interpret = flip evalStateT Map.empty . fBlock
> +
> -- Code blocks (simple fold)
> - fBlock :: Block -> InterpretMonad ()
> - fBlock = mapM_ fAST
> -
> - -- AST elements
> - fAST :: AST -> InterpretMonad ()
> - fAST CSpace = liftC space
> - fAST CNewline = liftC newline
> - fAST CEOF = liftC eof
> - fAST (CSet vs) = forM_ vs $ \(var,e) -> do
> - val <- fExpr e
> - setValue (Just var) val
> - fAST (CUnset vs) = mapM_ unsetVar vs
> - fAST (CInt low up var) = do
> - vlow <- fExpr low >>= toInt
> - vup <- fExpr up >>= toInt
> - val <- liftC $ int vlow vup
> - setValue var (VInt val)
> - fAST (CFloat low up var _) = do -- todo: scientific/fixed option
> - vlow <- fExpr low >>= toFloat
> - vup <- fExpr up >>= toFloat
> - val <- liftC $ float vlow vup
> - setValue var (VFloat val)
> - fAST (CString s) = do
> - str <- fExpr s >>= toString
> - liftC $ string str
> - fAST (CAssert test) = do
> - b <- fTest test
> - liftC $ assert b
> - fAST (CRep var count mbSep body) = do
> - vcount <- fExpr count >>= toInt
> - forM_ [0..vcount-1] $ \i -> do
> - -- Parse separator
> - case mbSep of
> - Just sep | i > 0 -> fAST sep
> - _ -> return ()
> -
> - -- Set iterator
> - setValue var (VInt i)
> +fBlock :: Block -> InterpretMonad ()
> +fBlock = mapM_ fAST
> +
> +-- AST elements
> +fAST :: AST -> InterpretMonad ()
> +fAST CSpace = liftC space
> +fAST CNewline = liftC newline
> +fAST CEOF = liftC eof
> +fAST (CSet vs) = forM_ vs $ \(var,e) -> do
> + val <- fExpr e
> + setValue (Just var) val
> +fAST (CUnset vs) = mapM_ unsetVar vs
> +fAST (CInt low up var) = do
> + vlow <- fExpr low >>= toInt
> + vup <- fExpr up >>= toInt
> + val <- liftC $ int vlow vup
> + setValue var (VInt val)
> +fAST (CFloat low up var _) = do -- todo: scientific/fixed option
> + vlow <- fExpr low >>= toFloat
> + vup <- fExpr up >>= toFloat
> + val <- liftC $ float vlow vup
> + setValue var (VFloat val)
> +fAST (CString s) = do
> + str <- fExpr s >>= toString
> + liftC $ string str
> +fAST (CRegex sr var) = do
> + r <- fExpr sr >>= toString
> + val <- liftC $ regex r
> + setValue var (VString val)
> +fAST (CAssert test) = do
> + b <- fTest test
> + liftC $ assert b
> +fAST (CRep var count mbSep body) = do
> + vcount <- fExpr count >>= toInt
> + forM_ [0..vcount-1] $ \i -> do
> + -- Parse separator
> + case mbSep of
> + Just sep | i > 0 -> fAST sep
> + _ -> return ()
>
> - -- Do the body
> - fBlock body
> - fAST (CWhile var test mbSep body) = do
> - let it :: Integer -> InterpretMonad ()
> - it i = do
> - setValue var (VInt i)
> - b <- fTest test
> - when b $ do
> - -- Condition true, so parse separator
> - case mbSep of
> - Just sep | i > 0 -> fAST sep
> - _ -> return ()
> - -- And parse body
> - fBlock body
> - -- And repeat
> - it $ i + 1
> - it 0
> - fAST (CIf test ifTrue mbIfFalse) = do
> - b <- fTest test
> - if b then fBlock ifTrue
> - else case mbIfFalse of
> - Nothing -> return ()
> - Just bl -> fBlock bl
> -
> - -- Expression evaluation
> - fExpr :: Expr -> InterpretMonad Value
> - fExpr (EVar var) = getValue var
> - fExpr (ConstI v) = return $ VInt v
> - fExpr (ConstF v) = return $ VFloat v
> - fExpr (ConstS v) = return $ VString v
> - fExpr (Negate e) = do
> - val <- fExpr e
> - case val of
> - VInt v -> return $ VInt $ negate v
> - VFloat v -> return $ VFloat $ negate v
> - _ -> error "Integer of float expected"
> - fExpr (BinOp op e1 e2) = do
> - v1 <- fExpr e1
> - v2 <- fExpr e2
> - fBinOp op v1 v2
> -
> - -- Binary operators
> - fBinOp :: BinOp -> Value -> Value -> InterpretMonad Value
> - fBinOp Plus v1 v2 = fNumOp (+) v1 v2
> - fBinOp Minus v1 v2 = fNumOp (-) v1 v2
> - fBinOp Times v1 v2 = fNumOp (*) v1 v2
> - fBinOp Div v1 v2 = case (v1, v2) of
> - (VInt i1, VInt i2) -> return $ VInt $ i1 `div` i2
> - _ -> do
> - f1 <- toFloat v1
> - f2 <- toFloat v2
> - return $ VFloat $ f1 / f2
> - fBinOp Modulo v1 v2 = do
> - i1 <- toInt v1
> - i2 <- toInt v2
> - return $ VInt $ i1 `mod` i2
> - fBinOp Pow v1 v2 = do
> - i2 <- toInt v2
> - case v1 of
> - VInt i1 -> return $ VInt $ i1 ^ i2
> - VFloat f1 -> return $ VFloat $ f1 ^^ i2
> - _ -> error "Integer of float expected"
> -
> - fNumOp :: (forall a. Num a => a -> a -> a) -> Value -> Value -> InterpretMonad Value
> - fNumOp op v1 v2 = case (v1, v2) of
> - (VInt i1, VInt i2) -> return $ VInt $ i1 `op` i2
> - _ -> do
> - f1 <- toFloat v1
> - f2 <- toFloat v2
> - return $ VFloat $ f1 `op` f2
> -
> - fTest :: Test -> InterpretMonad Bool
> - fTest IsEOF = liftC isEOF
> - fTest (Not test) = liftM not (fTest test)
> - fTest (And t1 t2) = liftM2 (&&) (fTest t1) (fTest t2)
> - fTest (Or t1 t2) = liftM2 (||) (fTest t1) (fTest t2)
> - fTest (CompOp cmp e1 e2) = do
> - v1 <- fExpr e1 >>= toFloat -- note that Float here is also exact
> - v2 <- fExpr e2 >>= toFloat
> - let Just op = cmp `lookup` [ (CompGT, (>))
> - , (CompGE, (>=))
> - , (CompEQ, (==))
> - , (CompNE, (/=))
> - , (CompLT, (<))
> - , (CompLE, (<=)) ]
> - return $ v1 `op` v2
> - fTest (Match e) = do
> - s <- fExpr e >>= toString
> - liftC $ match s
> + -- Set iterator
> + setValue var (VInt i)
> +
> + -- Do the body
> + fBlock body
> +fAST (CWhile var test mbSep body) = do
> + let it :: Integer -> InterpretMonad ()
> + it i = do
> + setValue var (VInt i)
> + b <- fTest test
> + when b $ do
> + -- Condition true, so parse separator
> + case mbSep of
> + Just sep | i > 0 -> fAST sep
> + _ -> return ()
> + -- And parse body
> + fBlock body
> + -- And repeat
> + it $ i + 1
> + it 0
> +fAST (CIf test ifTrue mbIfFalse) = do
> + b <- fTest test
> + if b then fBlock ifTrue
> + else case mbIfFalse of
> + Nothing -> return ()
> + Just bl -> fBlock bl
> +
> +-- Expression evaluation
> +fExpr :: Expr -> InterpretMonad Value
> +fExpr (EVar var) = getValue var
> +fExpr (ConstI v) = return $ VInt v
> +fExpr (ConstF v) = return $ VFloat v
> +fExpr (ConstS v) = return $ VString v
> +fExpr (StrLen e) = do
> + val <- fExpr e >>= toString
> + return $ VInt $ genericLength val
> +fExpr (Negate e) = do
> + val <- fExpr e
> + case val of
> + VInt v -> return $ VInt $ negate v
> + VFloat v -> return $ VFloat $ negate v
> + _ -> error "Integer of float expected"
> +fExpr (BinOp op e1 e2) = do
> + v1 <- fExpr e1
> + v2 <- fExpr e2
> + fBinOp op v1 v2
> +
> +-- Binary operators
> +fBinOp :: BinOp -> Value -> Value -> InterpretMonad Value
> +fBinOp Plus v1 v2 = fNumOp (+) v1 v2
> +fBinOp Minus v1 v2 = fNumOp (-) v1 v2
> +fBinOp Times v1 v2 = fNumOp (*) v1 v2
> +fBinOp Div v1 v2 = case (v1, v2) of
> + (VInt i1, VInt i2) -> return $ VInt $ i1 `div` i2
> + _ -> do
> + f1 <- toFloat v1
> + f2 <- toFloat v2
> + return $ VFloat $ f1 / f2
> +fBinOp Modulo v1 v2 = do
> + i1 <- toInt v1
> + i2 <- toInt v2
> + return $ VInt $ i1 `mod` i2
> +fBinOp Pow v1 v2 = do
> + i2 <- toInt v2
> + case v1 of
> + VInt i1 -> return $ VInt $ i1 ^ i2
> + VFloat f1 -> return $ VFloat $ f1 ^^ i2
> + _ -> error "Integer of float expected"
> +
> +fNumOp :: (forall a. Num a => a -> a -> a) -> Value -> Value -> InterpretMonad Value
> +fNumOp op v1 v2 = case (v1, v2) of
> + (VInt i1, VInt i2) -> return $ VInt $ i1 `op` i2
> + _ -> do
> + f1 <- toFloat v1
> + f2 <- toFloat v2
> + return $ VFloat $ f1 `op` f2
> +
> +fTest :: Test -> InterpretMonad Bool
> +fTest IsEOF = liftC isEOF
> +fTest (Not test) = liftM not (fTest test)
> +fTest (And t1 t2) = liftM2 (&&) (fTest t1) (fTest t2)
> +fTest (Or t1 t2) = liftM2 (||) (fTest t1) (fTest t2)
> +fTest (Unique vs) = do
> + vm <- get
> + keyvals <- forM vs $ \v -> do
> + let mp = Map.findWithDefault (error $ "Undefined variable " ++ v) v vm
> + let (keys, vals) = unzip $ Map.toList mp
> + return (keys, vals)
> + let (k:ks, vals) = unzip keyvals
> + let pairs = transpose vals
> + when (any (/=k) ks) $ liftC $ fail "Different sets of indices"
> + return $ unique pairs
> +fTest (InArray val var) = do
> + vm <- get
> + case Map.lookup var vm of
> + Nothing -> liftC $ fail $ "Undefined variable " ++ var
> + Just vals -> do
> + v <- fExpr val
> + return $ v `elem` Map.elems vals
> +fTest (CompOp cmp e1 e2) = do
> + v1 <- fExpr e1 >>= toFloat -- note that Float here is also exact
> + v2 <- fExpr e2 >>= toFloat
> + let Just op = cmp `lookup` [ (CompGT, (>))
> + , (CompGE, (>=))
> + , (CompEQ, (==))
> + , (CompNE, (/=))
> + , (CompLT, (<))
> + , (CompLE, (<=)) ]
> + return $ v1 `op` v2
> +fTest (Match e) = do
> + s <- fExpr e >>= toString
> + liftC $ match s
> diff --git a/haskell_edsl/src/Checktestdata/Script/Parser.hs b/haskell_edsl/src/Checktestdata/Script/Parser.hs
> index 125ce40..5618432 100644
> --- a/haskell_edsl/src/Checktestdata/Script/Parser.hs
> +++ b/haskell_edsl/src/Checktestdata/Script/Parser.hs
> @@ -6,11 +6,6 @@ module Checktestdata.Script.Parser (
>
> import Checktestdata.Script.AST
>
> ---import Text.ParserCombinators.Parsec
> ---import Text.ParserCombinators.Parsec.Language
> ---import Text.ParserCombinators.Parsec.Expr
> ---import qualified Text.ParserCombinators.Parsec.Token as Token
> -
> import Data.Char
> import Data.Ratio
>
> @@ -25,12 +20,24 @@ import Text.ParserCombinators.UU.Utils
> parseScript :: FilePath -> IO Block
> parseScript fp = do
> contents <- readFile fp
> - case execParser (pSpaces *> pBlock) (dropComments contents) of
> - (r, []) -> return r
> - (r, err) -> mapM_ print err >> return r
> + return $ runParser fp (pSpaces *> pBlock) (dropComments contents)
>
> +-- | Remove all comments from the text. This is less trivial than it seems
> +-- as # may also be inside a string literal.
> dropComments :: String -> String
> -dropComments = unlines . map (takeWhile (/='#')) . lines
> +dropComments = f False False where
> + f :: Bool -> Bool -> String -> String -- inComment, inString
> + f _ _ "" = ""
> + f True False ('\n':xs) = '\n' : f False False xs
> + f True False ( _:xs) = f True False xs
> + f False True ('\\':x:xs) = '\\' : x : f False True xs
> + f False True ('"' :xs) = '"' : f False False xs
> + f False True ( x:xs) = x : f False True xs
> + f False False ('"' :xs) = '"' : f False True xs
> + f False False ('#' :xs) = f True False xs
> + f False False ( x:xs) = x : f False False xs
> + f _ _ _ = error "dropComments: invariant failed"
> +
>
> --------------------------------------------------------------------------------
> -- Parsing
> @@ -48,19 +55,19 @@ pAST = CSpace <$ pSymbol "SPACE"
> <*> pExpr
> <* pComma
> <*> pExpr
> - <*> (Just <$ pComma <*> identifier <<|> pure Nothing)
> + <*> (Just <$ pComma <*> pVar <<|> pure Nothing)
> <* pRParen
> <<|> CFloat <$ pSymbol "FLOAT"
> <* pLParen
> <*> pExpr
> <* pComma
> <*> pExpr
> - <*> (Just <$ pComma <*> identifier <<|> pure Nothing)
> + <*> (Just <$ pComma <*> pVar <<|> pure Nothing)
> <*> (Just <$ pComma <*> pFloatOption <<|> pure Nothing)
> <* pRParen
> <<|> CRep <$ pSymbol "REPI"
> <* pLParen
> - <*> (Just <$> identifier <* pComma)
> + <*> (Just <$> pVar <* pComma)
> <*> pExpr
> <*> (Just <$ pComma <*> pAST <<|> pure Nothing)
> <* pRParen
> @@ -76,7 +83,7 @@ pAST = CSpace <$ pSymbol "SPACE"
> <* pSymbol "END"
> <<|> CWhile <$ pSymbol "WHILEI"
> <* pLParen
> - <*> (Just <$> identifier <* pComma)
> + <*> (Just <$> pVar <* pComma)
> <*> pTest
> <*> (Just <$ pComma <*> pAST <<|> pure Nothing)
> <* pRParen
> @@ -101,6 +108,11 @@ pAST = CSpace <$ pSymbol "SPACE"
> <* pLParen
> <*> pTest
> <* pRParen
> + <<|> CRegex <$ pSymbol "REGEX"
> + <* pLParen
> + <*> pExpr
> + <*> (Just <$ pComma <*> pVar <<|> pure Nothing)
> + <* pRParen
> <<|> CString <$ pSymbol "STRING"
> <* pLParen
> <*> pExpr
> @@ -109,11 +121,15 @@ pAST = CSpace <$ pSymbol "SPACE"
> <* pLParen
> <*> pListSep pComma
> (
> - (,) <$> identifier
> - <* pSym '='
> + (,) <$> pVar
> + <* lexeme (pSym '=')
> <*> pExpr
> )
> <* pRParen
> + <<|> CUnset <$ pSymbol "UNSET"
> + <* pLParen
> + <*> pListSep pComma identifier
> + <* pRParen
>
> pFloatOption :: Parser FloatOption
> pFloatOption = Scientific <$ pSymbol "SCIENTIFIC"
> @@ -121,12 +137,16 @@ pFloatOption = Scientific <$ pSymbol "SCIENTIFIC"
>
> pExpr :: Parser Expr
> pExpr = foldr pChainl pExprBase (map same_prio operators) where
> - same_prio ops = msum [ BinOp op <$ pSym c | (c, op) <- ops]
> + same_prio ops = msum [ BinOp op <$ lexeme (pSym c) | (c, op) <- ops]
> pExprBase :: Parser Expr
> pExprBase = pParens pExpr
> - <<|> Negate <$ pSym '-'
> + <<|> Negate <$ lexeme (pSym '-')
> <*> pExpr
> - <<|> EVar <$> identifier
> + <<|> StrLen <$ pSymbol "STRLEN"
> + <* pLParen
> + <*> pExpr
> + <* pRParen
> + <<|> EVar <$> pVar
> <<|> ConstS <$> pString
> <<|> lexeme pNumber
>
> @@ -151,6 +171,7 @@ pNumber = mkNum
> mkNum i mbF mbE = ConstF $ (fromInteger i + fpart) * epart where
> fpart = case mbF of
> Nothing -> 0
> + Just "" -> 0
> Just f -> (read f) % (10 ^ length f)
> epart = case mbE of
> Nothing -> 1
> @@ -158,16 +179,31 @@ pNumber = mkNum
>
> -- | Parse a literal string
> pString :: Parser String
> -pString = lexeme $ pSym '"' *> pList pChar <* pSym '"' where
> - pChar :: Parser Char -- todo: \[0-7]{1,3} denotes an octal escape for a character
> - pChar = '\n' <$ pToken "\\n"
> - <<|> '\t' <$ pToken "\\t"
> - <<|> '\r' <$ pToken "\\r"
> - <<|> '\b' <$ pToken "\\b"
> - <<|> '"' <$ pToken "\\\""
> - <<|> '\\' <$ pToken "\\\\"
> - <<|> pToken "\\\n" *> pSatisfy (/='"') (Insertion "x" 'x' 5)
> - <<|> pSatisfy (/='"') (Insertion "x" 'x' 5)
> +pString = lexeme $ pSym '"' *> pList pChar <* pSym '"' where
> + pChar :: Parser Char
> + pChar = pSym '\\' *> ( pOctal
> + <<|> '\n' <$ pSym 'n'
> + <<|> '\t' <$ pSym 't'
> + <<|> '\r' <$ pSym 'r'
> + <<|> '\b' <$ pSym 'b'
> + <<|> '\\' <$ pSym '\\'
> + <<|> '"' <$ pSym '"'
> + <<|> pSym '\n' *> pChar
> + <<|> pure '\\')
> + <<|> pSatisfy (/='"') (Insertion "x" 'x' 5)
> + pOctal :: Parser Char
> + pOctal = toOct <$> octDig
> + <*> (Just <$> octDig <<|> pure Nothing)
> + <*> (Just <$> octDig <<|> pure Nothing)
> + toOct :: Char -> Maybe Char -> Maybe Char -> Char
> + toOct d1 (Just d2) (Just d3) = chr $ 64 * toNum d1 + 8 * toNum d2 + toNum d3
> + toOct d1 (Just d2) Nothing = chr $ 8 * toNum d1 + toNum d2
> + toOct d1 Nothing Nothing = chr $ toNum d1
> + toOct _ _ _ = error $ "toOct: invariant failed"
> + toNum :: Char -> Int
> + toNum c = ord c - ord '0'
> + octDig :: Parser Char
> + octDig = pSatisfy (\c -> '0' <= c && c <= '7') (Insertion "0" '0' 5)
>
> -- | Parse boolean expressions
> pTest :: Parser Test
> @@ -175,7 +211,7 @@ pTest = pChainl bOps pTestBase where
> bOps = And <$ pSymbol "&&"
> <<|> Or <$ pSymbol "||"
> pTestBase = pParens pTest
> - <<|> Not <$ pSym '!'
> + <<|> Not <$ lexeme (pSym '!')
> <*> pTest
> <<|> IsEOF <$ pSymbol "ISEOF"
> <<|> Match <$ pSymbol "MATCH"
> @@ -203,9 +239,14 @@ pTest = pChainl bOps pTestBase where
> ] ]
>
>
> +-- | Parse a variable (possibly with array indices)
> +pVar :: Parser Var
> +pVar = Var <$> identifier
> + <*> ( pLBracket *> pListSep pComma pExpr <* pRBracket <<|> pure [])
> +
> --------------------------------------------------------------------------------
> -- Lexing
> --------------------------------------------------------------------------------
>
> identifier :: Parser String
> -identifier = lexeme $ (:) <$> pLetter <*> pMunch isAlphaNum
> +identifier = lexeme $ (:) <$> pLower <*> pMunch (\c -> isLower c || isDigit c)
>
> -----------------------------------------------------------------------
>
> Summary of changes:
> haskell_edsl/checktestdata.cabal | 1 +
> haskell_edsl/src/Checktestdata/Core.hs | 22 ++
> haskell_edsl/src/Checktestdata/Script/AST.hs | 16 +-
> .../src/Checktestdata/Script/Interpreter.hs | 304 ++++++++++++---------
> haskell_edsl/src/Checktestdata/Script/Parser.hs | 101 +++++--
> 5 files changed, 272 insertions(+), 172 deletions(-)
>
2
1