You are here

Agreguesi i feed

Joachim Breitner: Showcasing Applicative

Planet Debian - Mër, 26/10/2016 - 6:00pd

My plan for this week’s lecture of the CIS 194 Haskell course at the University of Pennsylvania is to dwell a bit on the concept of Functor, Applicative and Monad, and to highlight the value of the Applicative abstraction.

I quite like the example that I came up with, so I want to share it here. In the interest of long-term archival and stand-alone presentation, I include all the material in this post.1

Imports

In case you want to follow along, start with these imports:

import Data.Char import Data.Maybe import Data.List import System.Environment import System.IO import System.Exit The parser

The starting point for this exercise is a fairly standard parser-combinator monad, which happens to be the result of the student’s homework from last week:

newtype Parser a = P (String -> Maybe (a, String)) runParser :: Parser t -> String -> Maybe (t, String) runParser (P p) = p parse :: Parser a -> String -> Maybe a parse p input = case runParser p input of Just (result, "") -> Just result _ -> Nothing -- handles both no result and leftover input noParserP :: Parser a noParserP = P (\_ -> Nothing) pureParserP :: a -> Parser a pureParserP x = P (\input -> Just (x,input)) instance Functor Parser where fmap f p = P $ \input -> do (x, rest) <- runParser p input return (f x, rest) instance Applicative Parser where pure = pureParserP p1 <*> p2 = P $ \input -> do (f, rest1) <- runParser p1 input (x, rest2) <- runParser p2 rest1 return (f x, rest2) instance Monad Parser where return = pure p1 >>= k = P $ \input -> do (x, rest1) <- runParser p1 input runParser (k x) rest1 anyCharP :: Parser Char anyCharP = P $ \input -> case input of (c:rest) -> Just (c, rest) [] -> Nothing charP :: Char -> Parser () charP c = do c' <- anyCharP if c == c' then return () else noParserP anyCharButP :: Char -> Parser Char anyCharButP c = do c' <- anyCharP if c /= c' then return c' else noParserP letterOrDigitP :: Parser Char letterOrDigitP = do c <- anyCharP if isAlphaNum c then return c else noParserP orElseP :: Parser a -> Parser a -> Parser a orElseP p1 p2 = P $ \input -> case runParser p1 input of Just r -> Just r Nothing -> runParser p2 input manyP :: Parser a -> Parser [a] manyP p = (pure (:) <*> p <*> manyP p) `orElseP` pure [] many1P :: Parser a -> Parser [a] many1P p = pure (:) <*> p <*> manyP p sepByP :: Parser a -> Parser () -> Parser [a] sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1))) `orElseP` pure []

A parser using this library for, for example, CSV files could take this form:

parseCSVP :: Parser [[String]] parseCSVP = manyP parseLine where parseLine = parseCell `sepByP` charP ',' <* charP '\n' parseCell = do charP '"' content <- manyP (anyCharButP '"') charP '"' return content We want EBNF

Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:

cell = '"', {not-quote}, '"'; line = (cell, {',', cell} | ''), newline; csv = {line};

It is straightforward to create a Haskell data type to represent an EBNF syntax description. Here is a simple EBNF library (data type and pretty-printer) for your convenience:

data RHS = Terminal String | NonTerminal String | Choice RHS RHS | Sequence RHS RHS | Optional RHS | Repetition RHS deriving (Show, Eq) ppRHS :: RHS -> String ppRHS = go 0 where go _ (Terminal s) = surround "'" "'" $ concatMap quote s go _ (NonTerminal s) = s go a (Choice x1 x2) = p a 1 $ go 1 x1 ++ " | " ++ go 1 x2 go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", " ++ go 2 x2 go _ (Optional x) = surround "[" "]" $ go 0 x go _ (Repetition x) = surround "{" "}" $ go 0 x surround c1 c2 x = c1 ++ x ++ c2 p a n | a > n = surround "(" ")" | otherwise = id quote '\'' = "\\'" quote '\\' = "\\\\" quote c = [c] type Production = (String, RHS) type BNF = [Production] ppBNF :: BNF -> String ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";") Code to produce EBNF

We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:

newtype Grammar a = G RHS ppGrammar :: Grammar a -> String ppGrammar (G rhs) = ppRHS rhs

So a value of type Grammar t is a description of the textual representation of the Haskell type t.

Here is one simple example:

anyCharG :: Grammar Char anyCharG = G (NonTerminal "char")

Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:

charG :: Char -> Grammar () charG c = G (Terminal [c])

A combinator that creates new grammar from two existing grammars:

orElseG :: Grammar a -> Grammar a -> Grammar a orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)

We want the convenience of our well-known type classes in order to combine these values some more:

instance Functor Grammar where fmap _ (G rhs) = G rhs instance Applicative Grammar where pure x = G (Terminal "") (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)

Note how the Functor instance does not actually use the function. How should it? There are no values inside a Grammar!

We cannot define a Monad instance for Grammar: We would start with (G rhs1) >>= k = …, but there is simply no way of getting a value of type a that we can feed to k. So we will do without a Monad instance. This is interesting, and we will come back to that later.

Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:

manyG :: Grammar a -> Grammar [a] manyG p = (pure (:) <*> p <*> manyG p) `orElseG` pure [] many1G :: Grammar a -> Grammar [a] many1G p = pure (:) <*> p <*> manyG p sepByG :: Grammar a -> Grammar () -> Grammar [a] sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) `orElseG` pure []

Let us run a small example:

dottedWordsG :: Grammar [String] dottedWordsG = many1G (manyG anyCharG <* charG '.') *Main> putStrLn $ ppGrammar dottedWordsG '', ('', char, ('', char, ('', char, ('', char, ('', char, ('', …

Oh my, that is not good. Looks like the recursion in manyG does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many a primitive:

manyG :: Grammar a -> Grammar [a] manyG (G rhs) = G (Repetition rhs)

With this definition, we already get a simple grammar for dottedWordsG:

*Main> putStrLn $ ppGrammar dottedWordsG '', {char}, '.', {{char}, '.'}

This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string ('') in a sequence (…,…). We do not want that.

Why is it there in the first place? Because our Applicative instance is not lawful! Remember that pure id <*> g == g should hold. One way to achieve that is to improve the Applicative instance to optimize this case away:

instance Applicative Grammar where pure x = G (Terminal "") G (Terminal "") <*> G rhs2 = G rhs2 G rhs1 <*> G (Terminal "") = G rhs1 (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2) ``` Now we get what we want: *Main> putStrLn $ ppGrammar dottedWordsG {char}, '.', {{char}, '.'}

Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative combinators, i.e. avoiding (>>=), (>>), return and do-notation:

parseCSVP :: Parser [[String]] parseCSVP = manyP parseLine where parseLine = parseCell `sepByP` charG ',' <* charP '\n' parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'

And now we try to rewrite the code to produce Grammar instead of Parser. This is straightforward with the exception of anyCharButP. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a “primitive” grammar, i.e. introducing a non-terminal in the EBNF without a production rule – pretty much like we did for anyCharG:

primitiveG :: String -> Grammar a primitiveG s = G (NonTerminal s) parseCSVG :: Grammar [[String]] parseCSVG = manyG parseLine where parseLine = parseCell `sepByG` charG ',' <* charG '\n' parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'

Of course the names parse… are not quite right any more, but let us just leave that for now.

Here is the result:

*Main> putStrLn $ ppGrammar parseCSVG {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), ' '}

The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n' with newlineG:

newlineG :: Grammar () newlineG = primitiveG "newline"

Now we get

*Main> putStrLn $ ppGrammar parseCSVG {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline}

which is nice and correct, but still not quite the easily readable EBNF that we saw further up.

Code to produce EBNF, with productions

We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:

newtype Grammar a = G (BNF, RHS) runGrammer :: String -> Grammar a -> BNF runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)] ppGrammar :: String -> Grammar a -> String ppGrammar main g = ppBNF $ runGrammer main g

Now we have to adjust all our primitive combinators (but not the derived ones!):

charG :: Char -> Grammar () charG c = G ([], Terminal [c]) anyCharG :: Grammar Char anyCharG = G ([], NonTerminal "char") manyG :: Grammar a -> Grammar [a] manyG (G (prods, rhs)) = G (prods, Repetition rhs) mergeProds :: [Production] -> [Production] -> [Production] mergeProds prods1 prods2 = nub $ prods1 ++ prods2 orElseG :: Grammar a -> Grammar a -> Grammar a orElseG (G (prods1, rhs1)) (G (prods2, rhs2)) = G (mergeProds prods1 prods2, Choice rhs1 rhs2) instance Functor Grammar where fmap _ (G bnf) = G bnf instance Applicative Grammar where pure x = G ([], Terminal "") G (prods1, Terminal "") <*> G (prods2, rhs2) = G (mergeProds prods1 prods2, rhs2) G (prods1, rhs1) <*> G (prods2, Terminal "") = G (mergeProds prods1 prods2, rhs1) G (prods1, rhs1) <*> G (prods2, rhs2) = G (mergeProds prods1 prods2, Sequence rhs1 rhs2) primitiveG :: String -> Grammar a primitiveG s = G (NonTerminal s)

The use of nub when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.

Did we gain anything? Not yet:

*Main> putStr $ ppGrammar "csv" (parseCSVG) csv = {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline};

But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:

nonTerminal :: String -> Grammar a -> Grammar a nonTerminal name (G (prods, rhs)) = G (prods ++ [(name, rhs)], NonTerminal name)

Ample use of this in parseCSVG yields the desired result:

parseCSVG :: Grammar [[String]] parseCSVG = manyG parseLine where parseLine = nonTerminal "line" $ parseCell `sepByG` charG ',' <* newline parseCell = nonTerminal "cell" $ charG '"' *> manyG (primitiveG "not-quote") <* charG '" *Main> putStr $ ppGrammar "csv" (parseCSVG) cell = '"', {not-quote}, '"'; line = (cell, {',', cell} | ''), newline; csv = {line};

This is great!

Unifying parsing and grammar-generating

Note how simliar parseCSVG and parseCSVP are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!

And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser and Grammer differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char and orElse. It includes many, too: Although manyP is not primitive, manyG is. It also includes nonTerminal, which does not exist in the world of parsers (yet), but we need it for the grammars.

The primitiveG function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String, when used as a descriptive non-terminal in a grammar, and a Parser a, used in the parsing code.

Finally, the type classes that we except, Applicative (and thus Functor), are added as constraints on our type class:

class Applicative f => Descr f where char :: Char -> f () many :: f a -> f [a] orElse :: f a -> f a -> f a primitive :: String -> Parser a -> f a nonTerminal :: String -> f a -> f a

The instances are easily written:

instance Descr Parser where char = charP many = manyP orElse = orElseP primitive _ p = p nonTerminal _ p = p instance Descr Grammar where char = charG many = manyG orElse = orElseG primitive s _ = primitiveG s nonTerminal s g = nonTerminal s g

And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:

many1 :: Descr f => f a -> f [a] many1 p = pure (:) <*> p <*> many p anyChar :: Descr f => f Char anyChar = primitive "char" anyCharP dottedWords :: Descr f => f [String] dottedWords = many1 (many anyChar <* char '.') sepBy :: Descr f => f a -> f () -> f [a] sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) `orElse` pure [] newline :: Descr f => f () newline = primitive "newline" (charP '\n')

And thus we now have our CSV parser/grammar generator:

parseCSV :: Descr f => f [[String]] parseCSV = many parseLine where parseLine = nonTerminal "line" $ parseCell `sepBy` char ',' <* newline parseCell = nonTerminal "cell" $ char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'

We can now use this definition both to parse and to generate grammars:

*Main> putStr $ ppGrammar2 "csv" (parseCSV) cell = '"', {not-quote}, '"'; line = (cell, {',', cell} | ''), newline; csv = {line}; *Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n" Just [["ab","cd"],["","de"],[]] The INI file parser and grammar

As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week’s homework) again using applicative style2:

parseINIP :: Parser INIFile parseINIP = many1P parseSection where parseSection = (,) <$ charP '[' <*> parseIdent <* charP ']' <* charP '\n' <*> (catMaybes <$> manyP parseLine) parseIdent = many1P letterOrDigitP parseLine = parseDecl `orElseP` parseComment `orElseP` parseEmpty parseDecl = Just <$> ( (,) <*> parseIdent <* manyP (charP ' ') <* charP '=' <* manyP (charP ' ') <*> many1P (anyCharButP '\n') <* charP '\n') parseComment = Nothing <$ charP '#' <* many1P (anyCharButP '\n') <* charP '\n' parseEmpty = Nothing <$ charP '\n'

Transforming that to a generic description is quite straightforward. We use primitive again to wrap letterOrDigitP:

descrINI :: Descr f => f INIFile descrINI = many1 parseSection where parseSection = (,) <* char '[' <*> parseIdent <* char ']' <* newline <*> (catMaybes <$> many parseLine) parseIdent = many1 (primitive "alphanum" letterOrDigitP) parseLine = parseDecl `orElse` parseComment `orElse` parseEmpty parseDecl = Just <$> ( (,) <*> parseIdent <* many (char ' ') <* char '=' <* many (char ' ') <*> many1 (primitive "non-newline" (anyCharButP '\n')) <* newline) parseComment = Nothing <$ char '#' <* many1 (primitive "non-newline" (anyCharButP '\n')) <* newline parseEmpty = Nothing <$ newline

This yields this not very helpful grammar (abbreviated here):

*Main> putStr $ ppGrammar2 "ini" descrINI ini = '[', alphanum, {alphanum}, ']', newline, {alphanum, {alphanum}, {' '}…

But with a few uses of nonTerminal, we get something really nice:

descrINI :: Descr f => f INIFile descrINI = many1 parseSection where parseSection = nonTerminal "section" $ (,) <$ char '[' <*> parseIdent <* char ']' <* newline <*> (catMaybes <$> many parseLine) parseIdent = nonTerminal "identifier" $ many1 (primitive "alphanum" letterOrDigitP) parseLine = nonTerminal "line" $ parseDecl `orElse` parseComment `orElse` parseEmpty parseDecl = nonTerminal "declaration" $ Just <$> ( (,) <$> parseIdent <* spaces <* char '=' <* spaces <*> remainder) parseComment = nonTerminal "comment" $ Nothing <$ char '#' <* remainder remainder = nonTerminal "line-remainder" $ many1 (primitive "non-newline" (anyCharButP '\n')) <* newline parseEmpty = Nothing <$ newline spaces = nonTerminal "spaces" $ many (char ' ') *Main> putStr $ ppGrammar "ini" descrINI identifier = alphanum, {alphanum}; spaces = {' '}; line-remainder = non-newline, {non-newline}, newline; declaration = identifier, spaces, '=', spaces, line-remainder; comment = '#', line-remainder; line = declaration | comment | newline; section = '[', identifier, ']', newline, {line}; ini = section, {section}; Recursion (variant 1)

What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:

const = digit, {digit}; spaces = {' ' | newline}; atom = const | '(', spaces, expr, spaces, ')', spaces; mult = atom, {spaces, '*', spaces, atom}, spaces; plus = mult, {spaces, '+', spaces, mult}, spaces; expr = plus;

The production of expr is recursive (via plus, mult, atom). We have seen above that simply defining a Grammar a recursively does not go well.

One solution is to add a new combinator for explicit recursion, which replaces nonTerminal in the method:

class Applicative f => Descr f where … recNonTerminal :: String -> (f a -> f a) -> f a instance Descr Parser where … recNonTerminal _ p = let r = p r in r instance Descr Grammar where … recNonTerminal = recNonTerminalG recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a recNonTerminalG name f = let G (prods, rhs) = f (G ([], NonTerminal name)) in G (prods ++ [(name, rhs)], NonTerminal name) nonTerminal :: Descr f => String -> f a -> f a nonTerminal name p = recNonTerminal name (const p) runGrammer :: String -> Grammar a -> BNF runGrammer main (G (prods, NonTerminal nt)) | main == nt = prods runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]

The change in runGrammer avoids adding a pointless expr = expr production to the output.

This lets us define a parser/grammar-generator for the arithmetic expressions given above:

data Expr = Plus Expr Expr | Mult Expr Expr | Const Integer deriving Show mkPlus :: Expr -> [Expr] -> Expr mkPlus = foldl Plus mkMult :: Expr -> [Expr] -> Expr mkMult = foldl Mult parseExpr :: Descr f => f Expr parseExpr = recNonTerminal "expr" $ \ exp -> ePlus exp ePlus :: Descr f => f Expr -> f Expr ePlus exp = nonTerminal "plus" $ mkPlus <$> eMult exp <*> many (spaces *> char '+' *> spaces *> eMult exp) <* spaces eMult :: Descr f => f Expr -> f Expr eMult exp = nonTerminal "mult" $ mkPlus <$> eAtom exp <*> many (spaces *> char '*' *> spaces *> eAtom exp) <* spaces eAtom :: Descr f => f Expr -> f Expr eAtom exp = nonTerminal "atom" $ aConst `orElse` eParens exp aConst :: Descr f => f Expr aConst = nonTerminal "const" $ Const . read <$> many1 digit eParens :: Descr f => f a -> f a eParens inner = id <$ char '(' <* spaces <*> inner <* spaces <* char ')' <* spaces

And indeed, this works:

*Main> putStr $ ppGrammar "expr" parseExpr const = digit, {digit}; spaces = {' ' | newline}; atom = const | '(', spaces, expr, spaces, ')', spaces; mult = atom, {spaces, '*', spaces, atom}, spaces; plus = mult, {spaces, '+', spaces, mult}, spaces; expr = plus; Recursion (variant 2)

Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal and explicitly passing around the recursive call (i.e. the exp in the example). To implement that we have to adjust our Grammar type as follows:

newtype Grammar a = G ([String] -> (BNF, RHS))

The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:

nonTerminalG :: String -> (Grammar a) -> Grammar a nonTerminalG name (G g) = G $ \seen -> if name `elem` seen then ([], NonTerminal name) else let (prods, rhs) = g (name : seen) in (prods ++ [(name, rhs)], NonTerminal name)

After adjusting the other primitives of Grammar (including the Functor and Applicative instances, wich now again have nonTerminal) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:

parseExp :: Descr f => f Expr parseExp = nonTerminal "expr" $ ePlus ePlus :: Descr f => f Expr ePlus = nonTerminal "plus" $ mkPlus <$> eMult <*> many (spaces *> char '+' *> spaces *> eMult) <* spaces eMult :: Descr f => f Expr eMult = nonTerminal "mult" $ mkPlus <$> eAtom <*> many (spaces *> char '*' *> spaces *> eAtom) <* spaces eAtom :: Descr f => f Expr eAtom = nonTerminal "atom" $ aConst `orElse` eParens parseExp

Note that the recursion is only going to work if there is at least one call to nonTerminal somewhere around the recursive calls. We still cannot implement many as naively as above.

Homework

If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:

identifier = letter, {letter | digit | '-'}; spaces = {' ' | newline}; quoted-char = non-quote-or-backslash | '\\', '\\' | '\\', '\''; terminal = '\'', {quoted-char}, '\'', spaces; non-terminal = identifier, spaces; option = '[', spaces, rhs, spaces, ']', spaces; repetition = '{', spaces, rhs, spaces, '}', spaces; group = '(', spaces, rhs, spaces, ')', spaces; atom = terminal | non-terminal | option | repetition | group; sequence = atom, {spaces, ',', spaces, atom}, spaces; choice = sequence, {spaces, '|', spaces, sequence}, spaces; rhs = choice; production = identifier, spaces, '=', spaces, rhs, ';', spaces; bnf = production, {production};

This grammar is set up so that the precedence of , and | is correctly implemented: a , b | c will parse as (a, b) | c.

In this syntax for BNF, terminal characters are quoted, i.e. inside '…', a ' is replaced by \' and a \ is replaced by \\ – this is done by the function quote in ppRHS.

If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:

*Main> let bnf1 = runGrammer "expr" parseExpr *Main> let bnf2 = runGrammer "expr" parseBNF *Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF *Main> f bnf1 == bnf1 True *Main> f bnf2 == bnf2 True

The last line is quite meta: We are using parseBNF as a parser on the pretty-printed grammar produced from interpreting parseBNF as a grammar.

Conclusion

We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.

Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative, and not Monad. Applicative is less powerful, so by giving less power to the user of our Descr interface, the other side, i.e. the implementation, can be more powerful.

The reason why Applicative is ok, but Monad is not, is that in Applicative, the results do not affect the shape of the computation, whereas in Monad, the whole point of the bind operator (>>=) is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!

We have also seen that a phantom type, namely the parameter of Grammar, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG ensures that both grammars that are combined here indeed describe something of the same type.

  1. It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using Applicative yesterday.

  2. I like how in this alignment of <*> and <* the > point out where the arguments are that are being passed to the function on the left.

Dirk Eddelbuettel: Rblpapi 0.3.5

Planet Debian - Mër, 26/10/2016 - 4:14pd

A new release of Rblpapi is now on CRAN. Rblpapi provides a direct interface between R and the Bloomberg Terminal via the C++ API provided by Bloomberg Labs (but note that a valid Bloomberg license and installation is required).

This is the sixth release since the package first appeared on CRAN last year. This release brings new functionality via new (getPortfolio()) and extended functions (getTicks()) as well as several fixes:

Changes in Rblpapi version 0.3.5 (2016-10-25)
  • Add new function getPortfolio to retrieve portfolio data via bds (John in #176)

  • Extend getTicks() to (optionally) return non-numeric data as part of data.frame or data.table (Dirk in #200)

  • Similarly extend getMultipleTicks (Dirk in #202)

  • Correct statement on timestamp for getBars (Closes issue #192)

  • Minor edits to a few files in order to either please R(-devel) CMD check --as-cran, or update documentation

Courtesy of CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the Rblpapi page. Questions, comments etc should go to the issue tickets system at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Laura Arjona Reina: Rankings, Condorcet and free software: Calculating the results for the Stretch Artwork Survey

Planet Debian - Mar, 25/10/2016 - 10:11md
We had 12 candidates for the Debian Stretch Artwork and a survey was set up for allowing people to vote which one they prefer.

The survey was run in my LimeSurvey instance, surveys.larjona.net. LimeSurvey  its a nice free software with a lot of features. It provides a “Ranking” question type, and it was very easy for allowing people to “vote” in the Debian style (Debian uses the Condorcet method in its elections).

However, although LimeSurvey offers statistics and even graphics to show the results of many type of questions, its output for the Ranking type is not useful, so I had to export the data and use another tool to find the winner.

Export the data from LimeSurvey I’ve created a read-only user to visit the survey site. With this visitor you can explore the survey questionnaire, its results, and export the data. URL: https://surveys.larjona.net/admin Username: stretch Password: artwork First attempt, the quick and easy (and nonfree, I guess) There is an online tool to calculate the Condorcet winner, http://www.ericgorr.net/condorcet/  The steps I followed to feed the tool with the data from LimeSurvey were these: 1.- Went to admin interface of lime survey, selected the stretch artwork survey, responses and statistics, export results to application 2.- Selected “Completed responses only”, “Question codes”, “Answer codes”, and exported to CSV. (results_stretch1.csv) 3.- Opened the CSV with LibreOffice Calc, and removed these columns: id    submitdate    lastpage    startlanguage 4.- Remove the first row containing the headers and saved the result (results_stretch2.csv) 5.- In commandline: sort results_stretch2.csv | uniq -c > results_stretch3.csv 6.- Opened results_stretch3.csv with LibreOffice Calc and “merge delimitors” when importing. 7.- Removed the first column (blank) and added a column between the numbers and the first ranked option, and fulfilled that column with “:” value. Saved (results_stretch4.csv) 8.- Opened results_stretch4.csv with my preferred editor and search and replace “,:,” for “:” and after that, search and replace “,” for “>”. Save the result (results_stretch5.csv) 9.- Went to http://condorcet.ericgorr.net/, selected Condorcet basic, “tell me some things”, and pasted the contents of results_stretch5.csv there. The results are in results_stretch1.html But where is the source code of this Condorcet tool? I couldn’t find the source code (nor license) of the solver by Eric Gorr. The tool is mentioned in http://www.accuratedemocracy.com/z_tools.htm where other tools are listed and when the tool is libre software, is noted so. But not in this case. There, I found another tool, VoteEngine, which is open source, so I tried with that. Second attempt: VoteEngine, a Free Open Source Software tool made with Python I used a modification of voteengine-0.99 (the original zip is available in http://vote.sourceforge.net/ and a diff with the changes I made (basically, Numeric -> numpy and Int -> int, inorder that works in Debian stable), here. Steps 1 to 4 are the same as in the first attempt. 5.- Sorted alphabetically the different 12 options to vote, and assigned a letter to each one (saved the assignments in a file called  stretch_key.txt). 6.- Opened results_stretch2.csv with my favorite editor, and search and replace the name of the different options, for their corresponding letter in stretch_key.txt file. Searched and replaced “,” for ” ” (space). Then, saved the results into the file results_stretch3_voteengine.txt 7.- Copied the input.txt file from voteengine-0.99 into stretch.txt and edited the options to our needs. Pasted the contents of results_stretch3_voteengine.cvs at the end of stretch.txt 8.-In the commandline ./voteengine.py <stretch.txt  > winner.txt (winner.txt contains the results for the Condorcet method). 9.- I edited again stretch.txt to change the method to shulze and calculated the results, and again with the smith method. The winner in the 3 methods is the same. I pasted the summary of these 3 methods (shulze and smith provide a ranked list) in stretch_results.txt If it can be done, it can be done with R… I found the algstat R package: https://cran.r-project.org/web/packages/algstat/index.html which includes a “condorcet” function but I couldn’t make it work with the data. I’m not sure how the data needs to be shaped. I’m sure that this can be done in R and the problem is me, in this case. Comments are welcome, and I’ll try to ask to a friend whose R skills are better than mine!
And another SaaS I found https://www.condorcet.vote/ and its source code. It would be interesting to deploy a local instance to drive future surveys, but for this time I didn’t want to fight with PHP in order to use only the “solver” part, nor install another SaaS in my home server just to find that I need some other dependency or whatever. I’ll keep an eye on this, though, because it looks like a modern and active project. Finally, devotee Well and which software Debian uses for its elections?  There is a git repository with devotee, you can clone it: https://vote.debian.org/~secretary/devotee.git/ I found that although the tool is quite modular, it’s written specifically for the Debian case (votes received by mail, GPG signed, there is a quorum, and other particularities) and I was not sure if I could use it with my data. It is written in Perl and then I understood it worse than the Python from VoteEngine. Maybe I’ll return to it, though, when I have more time, to try to put our data in the shape of a typicall tally.txt file and then see if the module solving the condorcet winner can work for me. That’s all, folks! (for now…) Comments You can coment on this blog post in this pump.io thread
Filed under: Tools Tagged: data mining, Debian, English, SaaS, statistics

Jose M. Calhariz: New packages for Amanda on the works

Planet Debian - Mar, 25/10/2016 - 8:41md

Because of the upgrade of perl, amanda is currently broken on testing and unstable on Debian. The problem is known and I am working with my sponsor to create new packages to solve the problem. Please hang a little more.

Bits from Debian: "softWaves" will be the default theme for Debian 9

Planet Debian - Mar, 25/10/2016 - 7:50md

The theme "softWaves" by Juliette Taka Belin has been selected as default theme for Debian 9 'stretch'.

After the Debian Desktop Team made the call for proposing themes, a total of twelve choices have been submitted, and any Debian contributor has received the opportunity to vote on them in a survey. We received 3,479 responses ranking the different choices, and softWaves has been the winner among them.

We'd like to thank all the designers that have participated providing nice wallpapers and artwork for Debian 9, and encourage everybody interested in this area of Debian, to join the Design Team. It is being considered to package all of them so they are easily available in Debian. If you want to help in this effort, or package any other artwork (for example, particularly designed to be accessibility-friendly), please contact the Debian Desktop Team, but hurry up, because the freeze for new packages in the next release of Debian starts on January 5th, 2017.

This is the second time that Debian ships a theme by Juliette Belin, who also created the theme "Lines" that enhances our actual stable release, Debian 8. Congratulations, Juliette, and thank you very much for your continued commitment to Debian!

Julian Andres Klode: Introducing DNS66, a host blocker for Android

Planet Debian - Mar, 25/10/2016 - 6:20md

I’m proud (yes, really) to announce DNS66, my host/ad blocker for Android 5.0 and newer. It’s been around since last Thursday on F-Droid, but it never really got a formal announcement.

DNS66 creates a local VPN service on your Android device, and diverts all DNS traffic to it, possibly adding new DNS servers you can configure in its UI. It can use hosts files for blocking whole sets of hosts or you can just give it a domain name to block (or multiple hosts files/hosts). You can also whitelist individual hosts or entire files by adding them to the end of the list. When a host name is looked up, the query goes to the VPN which looks at the packet and responds with NXDOMAIN (non-existing domain) for hosts that are blocked.

You can find DNS66 here:

F-Droid is the recommended source to install from. DNS66 is licensed under the GNU GPL 3, or (mostly) any later version.

Implementation Notes

DNS66’s core logic is based on another project,  dbrodie/AdBuster, which arguably has the cooler name. I translated that from Kotlin to Java, and cleaned up the implementation a bit:

All work is done in a single thread by using poll() to detect when to read/write stuff. Each DNS request is sent via a new UDP socket, and poll() polls over all UDP sockets, a Device Socket (for the VPN’s tun device) and a pipe (so we can interrupt the poll at any time by closing the pipe).

We literally redirect your DNS servers. Meaning if your DNS server is 1.2.3.4, all traffic to 1.2.3.4 is routed to the VPN. The VPN only understands DNS traffic, though, so you might have trouble if your DNS server also happens to serve something else. I plan to change that at some point to emulate multiple DNS servers with fake IPs, but this was a first step to get it working with fallback: Android can now transparently fallback to other DNS servers without having to be aware that they are routed via the VPN.

We also need to deal with timing out queries that we received no answer for: DNS66 stores the query into a LinkedHashMap and overrides the removeEldestEntry() method to remove the eldest entry if it is older than 10 seconds or there are more than 1024 pending queries. This means that it only times out up to one request per new request, but it eventually cleans up fine.

 


Filed under: Android, Uncategorized

Michal &#268;iha&#345;: New features on Hosted Weblate

Planet Debian - Mar, 25/10/2016 - 6:00md

Today, new version has been deployed on Hosted Weblate. It brings many long requested features and enhancements.

Adding project to watched got way simpler, you can now do it on the project page using watch button:

Another feature which will be liked by project admins is that they can now change project metadata without contacting me. This works for both project and component level:

And adding some fancy things, there is new badge showing status of translations into all languages. This is how it looks for Weblate itself:

As you can see it can get pretty big for projects with many translations, but you get complete picture of the translation status in it.

You can find all these features in upcoming Weblate 2.9 which should be released next week. Complete list of changes in Weblate 2.9 is described in our documentation.

Filed under: Debian English phpMyAdmin SUSE Weblate | 0 comments

Jaldhar Vyas: Aaargh gcc 5.x You Suck

Planet Debian - Mar, 25/10/2016 - 8:45pd

Aaargh gcc 5.x You Suck

I had to write a quick program today which is going to be run many thousands of times a day so it has to run fast. I decided to do it in c++ instead of the usual perl or javascript because it seemed appropriate and I've been playing around a lot with c++ lately trying to update my knowledge of its' modern features. So 200 LOC later I was almost done so I ran the program through valgrind a good habit I've been trying to instill. That's when I got a reminder of why I avoid c++.

==37698== HEAP SUMMARY: ==37698== in use at exit: 72,704 bytes in 1 blocks ==37698== total heap usage: 5 allocs, 4 frees, 84,655 bytes allocated ==37698== ==37698== LEAK SUMMARY: ==37698== definitely lost: 0 bytes in 0 blocks ==37698== indirectly lost: 0 bytes in 0 blocks ==37698== possibly lost: 0 bytes in 0 blocks ==37698== still reachable: 72,704 bytes in 1 blocks ==37698== suppressed: 0 bytes in 0 blocks

One of things I've learnt which I've been trying to apply more rigorously is to avoid manual memory management (news/deletes.) as much as possible in favor of modern c++ features such as std::unique_ptr etc. By my estimation there should only be three places in my code where memory is allocated and none of them should leak. Where do the others come from? And why is there a missing free (or delete.) Now the good news is that valgrind is saying that the memory is not technically leaking. It is still reachable at exit but that's ok because the OS will reclaim it. But this program will run a lot and I think it could still lead to problems over time such as memory fragmentation so I wanted to understand what was going on. Not to mention the bad aesthetics of it.

My first assumption (one which has served me well over the years) was to assume that I had screwed up somewhere. Or perhaps it could some behind the scenes compiler magic. It turned out to be the latter -- sort of as I found out only after two hours of jiggling code in different ways and googling for clues. That's when I found this Stack Overflow question which suggests that it is either a valgrind or compiler bug. The answer specifically mentions gcc 5.1. I was using Ubuntu LTS which has gcc 5.4 so I have just gone ahead and assumed all 5.x versions of gcc have this problem. Sure enough, compiling the same program on Debian stable which has gcc 4.9 gave this...

==6045== ==6045== HEAP SUMMARY: ==6045== in use at exit: 0 bytes in 0 blocks ==6045== total heap usage: 3 allocs, 3 frees, 10,967 bytes allocated ==6045== ==6045== All heap blocks were freed -- no leaks are possible ==6045==

...Much better. The executable was substantially smaller too. The time was not a total loss however. I learned that valgrind is pronounced val-grinned (it's from Norse mythology.) not val-grind as I had thought. So I have that going for me which is nice.

Gunnar Wolf: On the results of vote "gr_private2"

Planet Debian - Mar, 25/10/2016 - 3:46pd

Given that I started the GR process, and that I called for discussion and votes, I feel somehow as my duty to also put a simple wrap-around to this process. Of course, I'll say many things already well-known to my fellow Debian people, but also non-debianers read this.

So, for further context, if you need to, please read my previous blog post, where I was about to send a call for votes. It summarizes the situation and proposals; you will find we had a nice set of messages in debian-vote@lists.debian.org during September; I have to thank all the involved parties, much specially to Ian Jackson, who spent a lot of energy summing up the situation and clarifying the different bits to everyone involved.

So, we held the vote; you can be interested in looking at the detailed vote statistics for the 235 correctly received votes, and most importantly, the results:

First of all, I'll say I'm actually surprised at the results, as I expected Ian's proposal (acknowledge difficulty; I actually voted this proposal as my top option) to win and mine (repeal previous GR) to be last; turns out, the winner option was Iain's (remain private). But all in all, I am happy with the results: As I said during the discussion, I was much disappointed with the results to the previous GR on this topic — And, yes, it seems the breaking point was when many people thought the privacy status of posted messages was in jeopardy; we cannot really compare what I would have liked to have in said vote if we had followed the strategy of leaving the original resolution text instead of replacing it, but I believe it would have passed. In fact, one more surprise of this iteration was that I expected Further Discussion to be ranked higher, somewhere between the three explicit options. I am happy, of course, we got such an overwhelming clarity of what does the project as a whole prefer.

And what was gained or lost with this whole excercise? Well, if nothing else, we gain to stop lying. For over ten years, we have had an accepted resolution binding us to release the messages sent to debian-private given such-and-such-conditions... But never got around to implement it. We now know that debian-private will remain private... But we should keep reminding ourselves to use the list as little as possible.

For a project such as Debian, which is often seen as a beacon of doing the right thing no matter what, I feel being explicit about not lying to ourselves of great importance. Yes, we have the principle of not hiding our problems, but it has long been argued that the use of this list is not hiding or problems. Private communication can happen whenever you have humans involved, even if administratively we tried to avoid it.

Any of the three running options could have won, and I'd be happy. My #1 didn't win, but my #2 did. And, I am sure, it's for the best of the project as a whole.

Chris Lamb: Concorde

Planet Debian - Hën, 24/10/2016 - 8:59md

Today marks the 13th anniversary since the last passenger flight from New York arrived in the UK. Every seat was filled, a feat that had become increasingly rare for a plane that was a technological marvel but a commercial flop….


  • Only 20 aircraft were ever built despite 100 orders, most of them cancelled in the early 1970s.
  • Taxiing to the runway consumed 2 tons of fuel.
  • The white colour scheme was specified to reduce the outer temperature by about 10°C.
  • In a promotional deal with Pepsi, F-BTSD was temporarily painted blue. Due to the change of colour, Air France were advised to remain at Mach 2 for no more than 20 minutes at a time.
  • At supersonic speed the fuselage would heat up and expand by as much as 30cm. The most obvious manifestation of this was a gap that opened up on the flight deck between the flight engineer's console and the bulkhead. On some aircraft conducting a retiring supersonic flight, the flight engineers placed their caps in this expanded gap, permanently wedging the cap as it shrank again.
  • At Concorde's altitude a breach of cabin integrity would result in a loss of pressure so severe that passengers would quickly suffer from hypoxia despite application of emergency oxygen. Concorde was thus built with smaller windows to reduce the rate of loss in such a breach.
  • The high cruising altitude meant passengers received almost twice the amount of radiation as a conventional long-haul flight. To prevent excessive exposure, the flight deck comprised of a radiometer; if the radiation level became too high, pilots would descend below 45,000 feet.
  • BA's service had a greater number of passengers who booked a flight and then failed to appear than any other aircraft in their fleet.
  • Market research later in Concorde's life revealed that customers thought Concorde was more expensive than it actually was. Ticket prices were progressively raised to match these perceptions.
  • The fastest transatlantic airliner flight was from New York JFK to London Heathrow on 7 February 1996 by British Airways' G-BOAD in 2 hours, 52 minutes, 59 seconds from takeoff to touchdown. It was aided by a 175 mph tailwind.


See also: A Rocket to Nowhere.

Reproducible builds folks: Reproducible Builds: week 78 in Stretch cycle

Planet Debian - Hën, 24/10/2016 - 6:10md

What happened in the Reproducible Builds effort between Sunday October 16 and Saturday October 22 2016:

Media coverage Upcoming events buildinfo.debian.net

In order to build packages reproducibly, you not only need identical sources but also some external definition of the environment used for a particular build. This definition includes the inputs and the outputs and, in the Debian case, are available in a $package_$architecture_$version.buildinfo file.

We anticipate the next dpkg upload to sid will create .buildinfo files by default. Whilst it's clear that we also need to teach dak to deal with them (#763822) its not actually clear how to handle .buildinfo files after dak has processed them and how to make them available to the world.

To this end, Chris Lamb has started development on a proof-of-concept .buildinfo server to see what issues arise. Source

Reproducible work in other projects
  • Ximin Luo submitted a patch to GCC as a prerequisite for future patches to make debugging symbols reproducible.
Packages reviewed and fixed, and bugs filed Reviews of unreproducible packages

99 package reviews have been added, 3 have been updated and 6 have been removed in this week, adding to our knowledge about identified issues.

6 issue types have been added:

Weekly QA work

During of reproducibility testing, some FTBFS bugs have been detected and reported by:

  • Chris Lamb (23)
  • Daniel Reichelt (2)
  • Lucas Nussbaum (1)
  • Santiago Vila (18)
diffoscope development tests.reproducible-builds.org
  • h01ger increased the diskspace for reproducible content on Jenkins. Thanks to ProfitBricks.
  • Valerie Young supplied a patch to make Python SQL interface more SQLite/PostgresSQL agnostic.
  • lynxis worked hard to make LEDE and OpenWrt builds happen on two hosts.
Misc.

Our poll to find a good time for an IRC meeting is still running until Tuesday, October 25st; please reply as soon as possible.

We need a logo! Some ideas and requirements for a Reproducible Builds logo have been documented in the wiki. Contributions very welcome, even if simply by forwarding this information.

This week's edition was written by Chris Lamb & Holger Levsen and reviewed by a bunch of Reproducible Builds folks on IRC.

Dirk Eddelbuettel: Word Marathon Majors: Five Star Finisher!

Planet Debian - Hën, 24/10/2016 - 4:41pd

A little over eight years ago, I wrote a short blog post which somewhat dryly noted that I had completed the five marathons constituting the World Marathon Majors. I had completed Boston, Chicago and New York during 2007, adding London and then Berlin (with a personal best) in 2008. The World Marathon Majors existed then, but I was not aware of a website. The organisation was aiming to raise the profile of the professional and very high-end aspect of the sport. But marathoning is funny as they let somewhat regular folks like you and me into the same race. And I always wondered if someone kept track of regular folks completing the suite...

I have been running a little less the last few years, though I did get around to complete the Illinois Marathon earlier this year (only tweeted about it and still have not added anything to the running section of my blog). But two weeks ago, I was once again handing out water cups at the Chicago Marathon, sending along two tweets when the elite wheelchair and elite male runners flew by. To the first, the World Marathon Majors account replied, which lead me to their website. Which in turn lead me to the Five Star Finisher page, and the newer / larger Six Star Finisher page now that Tokyo has been added.

And in short, one can now request one's record to be added (if they check out). So I did. And now I am on the Five Star Finisher page!

I don't think I'll ever surpass that as a runner. The table header and my row look like this:

If only my fifth / sixth grade physical education teacher could see that---he was one of those early running nuts from the 1970s and made us run towards / around this (by now enlarged) pond and boy did I hate that :) Guess it did have some long lasting effects. And I casually circled the lake a few years ago, starting much further away from my parents place. Once you are in the groove for distance...

But leaving that aside, running has been fun and I with some luck I may have another one or two marathons or Ragnar Relays left. The only really bad part about this is that I may have to get myself to Tokyo after all (for something that is not an ISM workshop) ...

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Daniel Silverstone: Gitano - Approaching Release - Deprecated commands

Planet Debian - Hën, 24/10/2016 - 4:24pd

As mentioned previously I am working toward getting Gitano into Stretch. Last time we spoke about lace, on which a colleague and friend of mine (Richard Maw) did a large pile of work. This time I'm going to discuss deprecation approaches and building more capability out of fewer features.

First, a little background -- Gitano is written in Lua which is a deliberately small language whose authors spend more time thinking about what they can remove from the language spec than they do what they could add in. I first came to Lua in the 3.2 days, a little before 4.0 came out. (The authors provide a lovely timeline in case you're interested.) With each of the releases of Lua which came after 3.2, I was struck with how the authors looked to take a number of features which the language had, and collapse them into more generic, more powerful, smaller, fewer features.

This approach to design stuck with me over the subsequent decade, and when I began Gitano I tried to have the smallest number of core features/behaviours, from which could grow the power and complexity I desired. Gitano is, at its core, a set of files in a single format (clod) stored in a consistent manner (Git) which mediate access to a resource (Git repositories). Some of those files result in emergent properties such as the concept of the 'owner' of a repository (though that can simply be considered the value of the project.owner property for the repository). Indeed the concept of the owner of a repository is a fiction generated by the ACL system with a very small amount of collusion from the core of Gitano. Yet until recently Gitano had a first class command set-owner which would alter that one configuration value.

[gitano] set-description ---- Set the repo's short description (Takes a repo) [gitano] set-head ---- Set the repo's HEAD symbolic reference (Takes a repo) [gitano] set-owner ---- Sets the owner of a repository (Takes a repo)

Those of you with Gitano installations may see the above if you ask it for help. Yet you'll also likely see:

[gitano] config ---- View and change configuration for a repository (Takes a repo)

The config command gives you access to the repository configuration file (which, yes, you could access over git instead, but the config command can be delegated in a more fine-grained fashion without having to write hooks). Given the config command has all the functionality of the three specific set-* commands shown above, it was time to remove the specific commands.

Migrating

If you had automation which used the set-description, set-head, or set-owner commands then you will want to switch to the config command before you migrate your server to the current or any future version of Gitano.

In brief, where you had:

ssh git@gitserver set-FOO repo something

You now need:

ssh git@gitserver config repo set project.FOO something

It looks a little more wordy but it is consistent with the other features that are keyed from the project configuration, such as:

ssh git@gitserver config repo set cgitrc.section Fooble Section Name

And, of course, you can see what configuration is present with:

ssh git@gitserver config repo show

Or look at a specific value with:

ssh git@gitserver config repo show specific.key

As always, you can get more detailed (if somewhat cryptic) help with:

ssh git@gitserver help config

Next time I'll try and touch on the new PGP/GPG integration support.

Vincent Sanders: Rabbit of Caerbannog

Planet Debian - Dje, 23/10/2016 - 11:27md
Subsequent to my previous use of American Fuzzy Lop (AFL) on the NetSurf bitmap image library I applied it to the gif library which, after fixing the test runner, failed to produce any crashes but did result in a better test corpus improving coverage above 90%

I then turned my attention to the SVG processing library. This was different to the bitmap libraries in that it required parsing a much lower density text format and performing operations on the resulting tree representation.

The test program for the SVG library needed some improvement but is very basic in operation. It takes the test SVG, parses it using libsvgtiny and then uses the parsed output to write out an imagemagick mvg file.

The libsvg processing uses the NetSurf DOM library which in turn uses an expat binding to parse the SVG XML text. To process this with AFL required instrumenting not only the XVG library but the DOM library. I did not initially understand this and my first run resulted in a "map coverage" indicating an issue. Helpfully the AFL docs do cover this so it was straightforward to rectify.

Once the test program was written and environment set up an AFL run was started and left to run. The next day I was somewhat alarmed to discover the fuzzer had made almost no progress and was running very slowly. I asked for help on the AFL mailing list and got a polite and helpful response, basically I needed to RTFM.

I must thank the members of the AFL mailing list for being so helpful and tolerating someone who ought to know better asking  dumb questions.

After reading the fine manual I understood I needed to ensure all my test cases were as small as possible and further that the fuzzer needed a dictionary as a hint to the file format because the text file was of such low data density compared to binary formats.

I crafted an SVG dictionary based on the XML one, ensured all the seed SVG files were as small as possible and tried again. The immediate result was thousands of crashes, nothing like being savaged by a rabbit to cause a surprise.

Not being in possession of the appropriate holy hand grenade I resorted instead to GDB and electric fence. Unlike the bitmap library crashes memory bounds issues simply did not feature in the crashes.Instead they mainly centered around actual logic errors when constructing and traversing the data structures.

For example Daniel Silverstone fixed an interesting bug where the XML parser binding would try and go "above" the root node in the tree if the source closed more tags than it opened which resulted in wild pointers and NULL references.

I found and squashed several others including dealing with SVG which has no valid root element and division by zero errors when things like colour gradients have no points.

I find it interesting that the type and texture of the crashes completely changed between the SVG and binary formats. Perhaps it is just the nature of the textural formats that causes this although it might be due to the techniques used to parse the formats.

Once all the immediately reproducible crashes were dealt with I performed a longer run. I used my monster system as previously described and ran the fuzzer for a whole week.

Summary stats
=============

Fuzzers alive : 10
Total run time : 68 days, 7 hours
Total execs : 9268 million
Cumulative speed : 15698 execs/sec
Pending paths : 0 faves, 2501 total
Pending per fuzzer : 0 faves, 250 total (on average)
Crashes found : 9 locally unique
After burning almost seventy days of processor time AFL found me another nine crashes and possibly more importantly a test corpus that generates over 90% coverage.

A useful tool that AFL provides is afl-cmin. This reduces the number of test files in a corpus to only those that are required to exercise all the code paths reached by the test set. In this case it reduced the number of files from 8242 to 2612

afl-cmin -i queue_all/ -o queue_cmin -- test_decode_svg @@ 1.0 /dev/null
corpus minimization tool for afl-fuzz by

[+] OK, 1447 tuples recorded.
[*] Obtaining traces for input files in 'queue_all/'...
Processing file 8242/8242...
[*] Sorting trace sets (this may take a while)...
[+] Found 23812 unique tuples across 8242 files.
[*] Finding best candidates for each tuple...
Processing file 8242/8242...
[*] Sorting candidate list (be patient)...
[*] Processing candidates and writing output files...
Processing tuple 23812/23812...
[+] Narrowed down to 2612 files, saved in 'queue_cmin'.
Additionally the actual information within the test files can be minimised with the afl-tmin tool. This must be run on each file individually and can take a relatively long time. Fortunately with GNU parallel one can run many of these jobs simultaneously which merely required another three days of CPU time to process. The resulting test corpus weighs in at a svelte 15 Megabytes or so against the 25 Megabytes before minimisation.

The result is yet another NetSurf library significantly improved by the use of AFL both from finding and squashing crashing bugs and from having a greatly improved test corpus to allow future library changes with a high confidence there will not be any regressions.

Jaldhar Vyas: What I Did During My Summer Vacation

Planet Debian - Dje, 23/10/2016 - 7:01pd
Thats So Raven

If I could sum up the past year in one word, that word would be distraction. There have been so many strange, confusing or simply unforseen things going on I have had trouble focusing like never before.

For instance, on the opposite side of the street from me is one of Jersey City's old resorvoirs. It's not used for drinking water anymore and the city eventually plans on merging it into the park on the other side. In the meantime it has become something of a wildlife refuge. Which is nice except one of the newly settled critters was a bird of prey -- the consensus is possibly some kind of hawk or raven. Starting your morning commute under the eyes of a harbinger of death is very goth and I even learned to deal with the occasional piece of deconstructed rodent on my doorstep but nighttime was a big problem. For contrary to popular belief, ravens do not quoth "nevermore" but "KRRAAAA". Very loudly. Just as soon as you have drifted of to sleep. Eventually my sleep-deprived neighbors and I appealed to the NJ division of enviromental protection to get it removed but by the time they were ready to swing into action the bird had left for somewhere more congenial like Transylvania or Newark.

Or here are some more complete wastes of time: I go the doctor for my annual physical. The insurance company codes it as Adult Onset Diabetes by accident. One day I opened the lid of my laptop and there's a "ping" sound and a piece of the hinge flies off. Apparently that also severed the connection to the screen and naturally the warranty had just expired so I had to spend the next month tethered to an external monitor until I could afford to buy a new one. Mix in all the usual social, political, family and work drama and you can see that it has been a very trying time for me.

Dovecot

I have managed to get some Debian work done. On Dovecot, my principal package, I have gotten tremendous support from Apollon Oikonomopolous who I belatedly welcome as a member of the Dovecot maintainer team. He has been particularly helpful in fixing our systemd support and cleaning out a lot of the old and invalid bugs. We're in pretty good shape for the freeze. Upstream has released an RC of 2.2.26 and hopefully the final version will be out in the next couple of days so we can include it in Stretch. We can always use more help with the package so let me know if you're interested.

Debian-IN

Most of the action has been going on without me but I've been lending support and sponsoring whenever I can. We have several new DDs and DMs but still no one north of the Vindhyas I'm afraid.

Debian Perl Group

gregoa did a ping of inactive maintainers and I regretfully had to admit to myself that I wasn't going to be of use anytime soon so I resigned. Perl remains my favorite language and I've actually been more involved in the meetings of my local Perlmongers group so hopefully I will be back again one day. And I still maintain the Perl modules I wrote myself.

Debian-Axe-Murderers*

May have gained a recruit.

*Stricly speaking it should be called Debian-People-Who-Dont-Think-Faults-in-One-Moral-Domain-Such-As-For-Example-Axe-Murdering-Should-Leak-Into-Another-Moral-Domain-Such-As-For-Example-Debian but come on, that's just silly.

Ingo Juergensmann: Automatically update TLSA records on new Letsencrypt Certs

Planet Debian - Dje, 23/10/2016 - 12:29pd

I've been using DNSSEC for some quite time now and it is working quite well. When LetsEncrypt went public beta I jumped on the train and migrated many services to LE-based TLS. However there was still one small problem with LE certs: 

When there is a new cert, all of the old TLSA resource records are not valid anymore and might give problems to strict DNSSEC checking clients. It took some while until my pain was big enough to finally fix it by some scripts.

There are at least two scripts involved:

1) dnssec.sh
This script does all of my DNSSEC handling. You can just do a "dnssec.sh enable-dnssec domain.tld" and everything is configured so that you only need to copy the appropriate keys into the webinterface of your DNS registry.

host:~/bin# dnssec.sh No parameter given. Usage: dnsec.sh MODE DOMAIN MODE can be one of the following: enable-dnssec : perform all steps to enable DNSSEC for your domain edit-zone : safely edit your zone after enabling DNSSEC create-dnskey : create new dnskey only load-dnskey : loads new dnskeys and signs the zone with them show-ds : shows DS records of zone zoneadd-ds : adds DS records to the zone file show-dnskey : extract DNSKEY record that needs to uploaded to your registrar update-tlsa : update TLSA records with new TLSA hash, needs old and new TLSA hashes as additional parameters

For updating zone-files just do a "dnssech.sh edit-zone domain.tld" to add new records and such and the script will take care e.g. of increasing the serial of the zone file. I find this very convenient, so I often use this script for non-DNSSEC-enabled domains as well.

However you can spot the command line option "update-tlsa". This option needs the old and the new TLSA hashes beside the domain.tld parameter. However, this option is used from the second script: 

2) check_tlsa.sh
This is a quite simple Bash script that parses the domains.txt from letsencrypt.sh script, looking up the old TLSA hash in the zone files (structured in TLD/domain.tld directories), compare the old with the new hash (by invoking tlsagen.sh) and if there is a difference in hashes, call dnssec.sh with the proper parameters: 

#!/bin/bash set -e LEPATH="/etc/letsencrypt.sh" for i in `cat /etc/letsencrypt.sh/domains.txt | awk '{print $1}'` ; do         domain=`echo $i | awk 'BEGIN {FS="."} ; {print $(NF-1)"."$NF}'`         #echo -n "Domain: $domain"         TLD=`echo $i | awk 'BEGIN {FS="."}; {print $NF}'`         #echo ", TLD: $TLD"         OLDTLSA=`grep -i "in.*tlsa" /etc/bind/${TLD}/${domain} | grep ${i} | head -n 1 | awk '{print $NF}'`         if [ -n "${OLDTLSA}" ] ; then                 #echo "--> ${OLDTLSA}"                 # Usage: tlsagen.sh cert.pem host[:port] usage selector mtype                 NEWTLSA=`/path/to/tlsagen.sh $LEPATH/certs/${i}/fullchain.pem ${i} 3 1 1 | awk '{print $NF}'`                 #echo "==> $NEWTLSA"                 if [ "${OLDTLSA}" != "${NEWTLSA}" ] ; then                         /path/to/dnssec.sh update-tlsa ${domain} ${OLDTLSA} ${NEWTLSA} > /dev/null                         echo "TLSA RR update for ${i}"                 fi         fi done

So, quite simple and obviously a quick hack. For sure someone else can write a cleaner and more sophisticated implementation to do the same stuff, but at least it works for meTM. Use it on your own risk and do whatever you want with these scripts (licensed under public domain).

You can invoke check_tlsa.sh right after your crontab call for letsencrypt.sh. In a more sophisticated way it should be fairly easy to invoke these scripts from letsencrypt.sh post hooks as well.
Please find the files attached to this page (remove the .txt extension after saving, of course).

 

AttachmentSize check_tlsa.sh.txt812 bytes dnssec.sh.txt3.88 KB Kategorie: DebianTags: DebianLetsEncryptDNSSoftware 

Matthieu Caneill: Debugging 101

Planet Debian - Dje, 23/10/2016 - 12:00pd

While teaching this semester a class on concurrent programming, I realized during the labs that most of the students couldn't properly debug their code. They are at the end of a 2-year cursus, know many different programming languages and frameworks, but when it comes to tracking down a bug in their own code, they often lacked the basics. Instead of debugging for them I tried to give them general directions that they could apply for the next bugs. I will try here to summarize the very first basic things to know about debugging. Because, remember, writing software is 90% debugging, and 10% introducing new bugs (that is not from me, but I could not find the original quote).

So here is my take at Debugging 101.

Use the right tools

Many good tools exist to assist you in writing correct software, and it would put you behind in terms of productivity not to use them. Editors which catch syntax errors while you write them, for example, will help you a lot. And there are many features out there in editors, compilers, debuggers, which will prevent you from introducing trivial bugs. Your editor should be your friend; explore its features and customization options, and find an efficient workflow with them, that you like and can improve over time. The best way to fix bugs is not to have them in the first place, obviously.

Test early, test often

I've seen students writing code for one hour before running make, that would fail so hard that hundreds of lines of errors and warnings were outputted. There are two main reasons doing this is a bad idea:

  • You have to debug all the errors at once, and the complexity of solving many bugs, some dependent on others, is way higher than the complexity of solving a single bug. Moreover, it's discouraging.
  • Wrong assumptions you made at the beginning will make the following lines of code wrong. For example if you chose the wrong data structure for storing some information, you will have to fix all the code using that structure. It's less painful to realize earlier it was the wrong one to choose, and you have more chances of knowing that if you compile and execute often.

I recommend to test your code (compilation and execution) every few lines of code you write. When something breaks, chances are it will come from the last line(s) you wrote. Compiler errors will be shorter, and will point you to the same place in the code. Once you get more confident using a particular language or framework, you can write more lines at once without testing. That's a slow process, but it's ok. If you set up the right keybinding for compiling and executing from within your editor, it shouldn't be painful to test early and often.

Read the logs

Spot the places where your program/compiler/debugger writes text, and read it carefully. It can be your terminal (quite often), a file in your current directory, a file in /var/log/, a web page on a local server, anything. Learn where different software write logs on your system, and integrate reading them in your workflow. Often, it will be your only information about the bug. Often, it will tell you where the bug lies. Sometimes, it will even give you hints on how to fix it.

You may have to filter out a lot of garbage to find relevant information about your bug. Learn to spot some keywords like error or warning. In long stacktraces, spot the lines concerning your files; because more often, your code is to be blamed, rather than deeper library code. grep the logs with relevant keywords. If you have the option, colorize the output. Use tail -f to follow a file getting updated. There are so many ways to grasp logs, so find what works best with you and never forget to use it!

Print foobar

That one doesn't concern compilation errors (unless it's a Makefile error, in that case this file is your code anyway).

When the program logs and output failed to give you where an error occured (oh hi Segmentation fault!), and before having to dive into a memory debugger or system trace tool, spot the portion of your program that causes the bug and add in there some print statements. You can either print("foo") and print("bar"), just to know that your program reaches or not a certain place in your code, or print(some_faulty_var) to get more insights on your program state. It will give you precious information.

stderr >> "foo" >> endl; my_db.connect(); // is this broken? stderr >> "bar" >> endl;

In the example above, you can be sure it is the connection to the database my_db that is broken if you get foo and not bar on your standard error.

(That is an hypothetical example. If you know something can break, such as a database connection, then you should always enclose it in a try/catch structure).

Isolate and reproduce the bug

This point is linked to the previous one. You may or may not have isolated the line(s) causing the bug, but maybe the issue is not always raised. It can depend on many other things: the program or function parameters, the network status, the amount of memory available, the decisions of the OS scheduler, the user rights on the system or on some files, etc. More generally, any assumption you made on any external dependency can appear to be wrong (even if it's right 99% of the time). According to the context, try to isolate the set of conditions that trigger the bug. It can be as simple as "when there is no internet connection", or as complicated as "when the CPU load of some external machine is too high, it's a leap year, and the input contains illegal utf-8 characters" (ok, that one is fucked up; but it surely happens!). But you need to reliably be able to reproduce the bug, in order to be sure later that you indeed fixed it.

Of course when the bug is triggered at every run, it can be frustrating that your program never works but it will in general be easier to fix.

RTFM

Always read the documentation before reaching out for help. Be it man, a book, a website or a wiki, you will find precious information there to assist you in using a language or a specific library. It can be quite intimidating at first, but it's often organized the same way. You're likely to find a search tool, an API reference, a tutorial, and many examples. Compare your code against them. Check in the FAQ, maybe your bug and its solution are already referenced there.

You'll rapidly find yourself getting used to the way documentation is organized, and you'll be more and more efficient at finding instantly what you need. Always keep the doc window open!

Google and Stack Overflow are your friends

Let's be honest: many of the bugs you'll encounter have been encountered before. Learn to write efficient queries on search engines, and use the knowledge you can find on questions&answers forums like Stack Overflow. Read the answers and comments. Be wise though, and never blindly copy and paste code from there. It can be as bad as introducing malicious security issues into your code, and you won't learn anything. Oh, and don't copy and paste anyway. You have to be sure you understand every single line, so better write them by hand; it's also better for memorizing the issue.

Take notes

Once you have identified and solved a particular bug, I advise to write about it. No need for shiny interfaces: keep a list of your bugs along with their solutions in one or many text files, organized by language or framework, that you can easily grep.

It can seem slightly cumbersome to do so, but it proved (at least to me) to be very valuable. I can often recall I have encountered some buggy situation in the past, but don't always remember the solution. Instead of losing all the debugging time again, I search in my bug/solution list first, and when it's a hit I'm more than happy I kept it.

Further reading degugging

Remember this was only Debugging 101, that is, the very first steps on how to debug code on your own, instead of getting frustrated and helplessly stare at your screen without knowing where to begin. When you'll write more software, you'll get used to more efficient workflows, and you'll discover tools that are here to assist you in writing bug-free code and spotting complex bugs efficiently. Listed below are some of the tools or general ideas used to debug more complex software. They belong more to a software engineering course than a Debugging 101 blog post. But it's good to know as soon as possible these exist, and if you read the manuals there's no reason you can't rock with them!

  • Loggers. To make the "foobar" debugging more efficient, some libraries are especially designed for the task of logging out information about a running program. They often have way more features than a simple print statement (at the price of being over-engineered for simple programs): severity levels (info, warning, error, fatal, etc), output in rotating files, and many more.

  • Version control. Following the evolution of a program in time, over multiple versions, contributors and forks, is a hard task. That's where version control plays: it allows you to keep the entire history of your program, and switch to any previous version. This way you can identify more easily when a bug was introduced (and by whom), along with the patch (a set of changes to a code base) that introduced it. Then you know where to apply your fix. Famous version control tools include Git, Subversion, and Mercurial.

  • Debuggers. Last but not least, it wouldn't make sense to talk about debugging without mentioning debuggers. They are tools to inspect the state of a program (for example the type and value of variables) while it is running. You can pause the program, and execute it line by line, while watching the state evolve. Sometimes you can also manually change the value of variables to see what happens. Even though some of them are hard to use, they are very valuable tools, totally worth diving into!

Don't hesitate to comment on this, and provide your debugging 101 tips! I'll be happy to update the article with valuable feedback.

Happy debugging!

Iain R. Learmonth: The Domain Name System

Planet Debian - Sht, 22/10/2016 - 8:15md

As I posted yesterday, we released PATHspider 1.0.0. What I didn’t talk about in that post was an event that occured only a few hours before the release.

Everything was going fine, proofreading of the documentation was in progress, a quick git push with the documentation updates and… CI FAILED!?! Our CI doesn’t build the documentation, only tests the core code. I’m planning to release real soon and something has broken.

Starting to panic.

irl@orbiter# ./tests.sh ................ ---------------------------------------------------------------------- Ran 16 tests in 0.984s OK

This makes no sense. Maybe I forgot to add a dependency and it’s been broken for a while? I scrutinise the dependencies list and it all looks fine.

In fairness, probably the first thing I should have done is look at the build log in Jenkins, but I’ve never had a failure that I couldn’t reproduce locally before.

It was at this point that I realised there was something screwy going on. A sigh of relief as I realise that there’s not a catastrophic test failure but now it looks like maybe there’s a problem with the University research group network, which is arguably worse.

Being focussed on getting the release ready, I didn’t realise that the Internet was falling apart. Unknown to me, a massive DDoS attack against Dyn, a major DNS host, was in progress. After a few attempts to debug the problem, I hardcoded a line into /etc/hosts, still believing it to be a localised issue.

192.30.253.112 github.com

I’ve just removed this line as the problem seems to have resolved itself for now. There are two main points I’ve taken away from this:

  • CI failure doesn’t necessarily mean that your code is broken, it can also indicate that your CI infrastructure is broken.
  • Decentralised internetwork routing is pretty worthless when the centralised name system goes down.

This afternoon I read a post by [tj] on the 57North Planet, and this is where I learnt what had really happened. He mentions multicast DNS and Namecoin as distributed name system alternatives. I’d like to add some more to that list:

Only the first of these is really a distributed solution.

My idea with ICMP Domain Name Messages is that you send an ICMP message to a webserver. Somewhere along the path, you’ll hit either a surveillance or censorship middlebox. These middleboxes can provide value by caching any DNS replies that are seen so that an ICMP DNS request message will cause the message to not be forwarded but a reply is generated to provide the answer to the query. If the middlebox cannot generate a reply, it can still forward it to other surveillance and censorship boxes.

I think this would be a great secondary use for the NSA and GCHQ boxen on the Internet, clearly fits within the scope of “defending national security” as if DNS is down the Internet is kinda dead, plus it’d make it nice and easy to find the boxes with PATHspider.

Dirk Eddelbuettel: RcppArmadillo 0.7.500.0.0

Planet Debian - Sht, 22/10/2016 - 5:43md

A few days ago, Conrad released Armadillo 7.500.0. The corresponding RcppArmadillo release 0.7.500.0.0 is now on CRAN (and will get into Debian shortly).

Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language--and is widely used by (currently) 274 other packages on CRAN.

Changes in this release relative to the previous CRAN release are as follows:

Changes in RcppArmadillo version 0.7.500.0.0 (2016-10-20)
  • Upgraded to Armadillo release 7.500.0 (Coup d'Etat)

    • Expanded qz() to optionally specify ordering of the Schur form

    • Expanded each_slice() to support matrix multiplication

Courtesy of CRANberries, there is a diffstat report. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Christoph Egger: Running Debian on the ClearFog

Planet Debian - Sht, 22/10/2016 - 12:37md

Back in August, I was looking for a Homeserver replacement. During FrOSCon I was then reminded of the Turris Omnia project by NIC.cz. The basic SoC (Marvel Armada 38x) seemed to be nice hand have decent mainline support (and, with the turris, users interested in keeping it working). Only I don't want any WIFI and I wasn't sure the standard case would be all that usefully. Fortunately, there's also a simple board available with the same SoC called ClearFog and so I got one of these (the Base version). With shipping and the SSD (the only 2242 M.2 SSD with 250 GiB I could find, a ADATA SP600) it slightly exceeds the budget but well.

When installing the machine, the obvious goal was to use mainline FOSS components only if possible. Fortunately there's mainline kernel support for the device as well as mainline U-Boot. First attempts to boot from a micro SD card did not work out at all, both with mainline U-Boot and the vendor version though. Turns out the eMMC version of the board does not support any micro SD cards at all, a fact that is documented but others failed to notice as well.

U-Boot

As the board does not come with any loader on eMMC and booting directly from M.2 requires removing some resistors from the board, the easiest way is using UART for booting. The vendor wiki has some shell script wrapping an included C fragment to feed U-Boot to the device but all that is really needed is U-Boot's kwboot utility. For some reason the SPL didn't properly detect UART booting on my device (wrong magic number) but patching the if (in arch-mvebu's spl.c) and always assume UART boot is an easy way around.

The plan then was to boot a Debian armhf rootfs with a defconfig kernel from USB stick. and install U-Boot and the rootfs to eMMC from within that system. Unfortunately U-Boot seems to be unable to talk to the USB3 port so no kernel loading from there. One could probably make UART loading work but switching between screen for serial console and xmodem seemed somewhat fragile and I never got it working. However ethernet can be made to work, though you need to set eth1addr to eth3addr (or just the right one of these) in U-Boot, saveenv and reboot. After that TFTP works (but is somewhat slow).

eMMC

There's one last step required to allow U-Boot and Linux to access the eMMC. eMMC is wired to the same PINs as the SD card would be. However the SD card has an additional indicator pin showing whether a card is present. You might be lucky inserting a dummy card into the slot or go the clean route and remove the pin specification from the device tree.

--- a/arch/arm/dts/armada-388-clearfog.dts +++ b/arch/arm/dts/armada-388-clearfog.dts @@ -306,7 +307,6 @@ sdhci@d8000 { bus-width = <4>; - cd-gpios = <&gpio0 20 GPIO_ACTIVE_LOW>; no-1-8-v; pinctrl-0 = <&clearfog_sdhci_pins &clearfog_sdhci_cd_pins>;

Next Up is flashing the U-Boot to eMMC. This seems to work with the vendor U-Boot but proves to be tricky with mainline. The fun part boils down to the fact that the boot firmware reads the first block from eMMC, but the second from SD card. If you write the mainline U-Boot, which was written and tested for SD card, to eMMC the SPL will try to load the main U-Boot starting from it's second sector from flash -- obviously resulting in garbage. This one took me several tries to figure out and made me read most of the SPL code for the device. The fix however is trivial (apart from the question on how to support all different variants from one codebase, which I'll leave to the U-Boot developers):

--- a/include/configs/clearfog.h +++ b/include/configs/clearfog.h @@ -143,8 +143,7 @@ #define CONFIG_SPL_LIBDISK_SUPPORT #define CONFIG_SYS_MMC_U_BOOT_OFFS (160 << 10) #define CONFIG_SYS_U_BOOT_OFFS CONFIG_SYS_MMC_U_BOOT_OFFS -#define CONFIG_SYS_MMCSD_RAW_MODE_U_BOOT_SECTOR ((CONFIG_SYS_U_BOOT_OFFS / 512)\ - + 1) +#define CONFIG_SYS_MMCSD_RAW_MODE_U_BOOT_SECTOR (CONFIG_SYS_U_BOOT_OFFS / 512) #define CONFIG_SYS_U_BOOT_MAX_SIZE_SECTORS ((512 << 10) / 512) /* 512KiB */ #ifdef CONFIG_SPL_BUILD #define CONFIG_FIXED_SDHCI_ALIGNED_BUFFER 0x00180000 /* in SDRAM */ Linux

Now we have a System booting from eMMC with mainline U-Boot (which is a most welcome speedup compared to the UART and TFTP combination from the beginning). Getting to fine-tune linux on the device -- we want to install the armmp Debian kernel and have it work. As all the drivers are build as modules for that kernel this also means initrd support. Funnily U-Boots bootz allows booting a plain vmlinux kernel but I couldn't get it to boot a plain initrd. Passing a uImage initrd and a normal kernel however works pretty well. Back when I first tried there were some modules missing and ethernet didn't work with the PHY driver built as a module. In the meantime the PHY problem was fixed in the Debian kernel and almost all modules already added. Ben then only added the USB3 module on my suggestion and as a result, unstable's armhf armmp kernel should work perfectly well on the device (you still need to patch the device tree similar to the patch above). Still missing is an updated flash-kernel to automatically generate the initrd uImage which is work in progress but got stalled until I fixed the U-Boot on eMMC problem and everything should be fine -- maybe get debian u-boot builds for that board.

Pro versus Base

The main difference so far between the Pro and the Base version of the ClearFog is the switch chip which is included on the Pro. The Base instead "just" has two gigabit ethernet ports and a SFP. Both, linux' and U-Boot's device tree are intended for the Pro version which makes on of the ethernet ports unusable (it tries to find the switch behind the ethernet port which isn't there). To get both ports working (or the one you settled on earlier) there's a second patch to the device tree (my version might be sub-optimal but works), U-Boot -- the linux-kernel version is a trivial adaption:

--- a/arch/arm/dts/armada-388-clearfog.dts +++ b/arch/arm/dts/armada-388-clearfog.dts @@ -89,13 +89,10 @@ internal-regs { ethernet@30000 { mac-address = [00 50 43 02 02 02]; + managed = "in-band-status"; + phy = <&phy1>; phy-mode = "sgmii"; status = "okay"; - - fixed-link { - speed = <1000>; - full-duplex; - }; }; ethernet@34000 { @@ -227,6 +224,10 @@ pinctrl-0 = <&mdio_pins>; pinctrl-names = "default"; + phy1: ethernet-phy@1 { /* Marvell 88E1512 */ + reg = <1>; + }; + phy_dedicated: ethernet-phy@0 { /* * Annoyingly, the marvell phy driver @@ -386,62 +386,6 @@ tx-fault-gpio = <&expander0 13 GPIO_ACTIVE_HIGH>; }; - dsa@0 { - compatible = "marvell,dsa"; - dsa,ethernet = <&eth1>; - dsa,mii-bus = <&mdio>; - pinctrl-0 = <&clearfog_dsa0_clk_pins &clearfog_dsa0_pins>; - pinctrl-names = "default"; - #address-cells = <2>; - #size-cells = <0>; - - switch@0 { - #address-cells = <1>; - #size-cells = <0>; - reg = <4 0>; - - port@0 { - reg = <0>; - label = "lan1"; - }; - - port@1 { - reg = <1>; - label = "lan2"; - }; - - port@2 { - reg = <2>; - label = "lan3"; - }; - - port@3 { - reg = <3>; - label = "lan4"; - }; - - port@4 { - reg = <4>; - label = "lan5"; - }; - - port@5 { - reg = <5>; - label = "cpu"; - }; - - port@6 { - /* 88E1512 external phy */ - reg = <6>; - label = "lan6"; - fixed-link { - speed = <1000>; - full-duplex; - }; - }; - }; - }; - gpio-keys { compatible = "gpio-keys"; pinctrl-0 = <&rear_button_pins>; Conclusion

Apart from the mess with eMMC this seems to be a pretty nice device. It's now happily running with a M.2 SSD providing enough storage for now and still has a mSATA/mPCIe plug left for future journeys. It seems to be drawing around 5.5 Watts with SSD and one Ethernet connected while mostly idle and can feed around 500 Mb/s from disk over an encrypted ethernet connection which is, I guess, not too bad. My plans now include helping to finish flash-kernel support, creating a nice case and probably get it deployed. I might bring it to FOSDEM first though.

Working on it was really quite some fun (apart from the frustrating parts finding the one-block-offset ..) and people were really helpful. Big thanks here to Debian's arm folks, Ben Hutchings the kernel maintainer and U-Boot upstream (especially Tom Rini and Stefan Roese)

Faqet

Subscribe to AlbLinux agreguesi