You are here

Planet Debian

Subscribe to Feed Planet Debian
Planet Debian -
Përditësimi: 5 months 3 javë më parë

nanotime 0.2.3

Dje, 30/09/2018 - 5:14md

A minor maintenance release of the nanotime package for working with nanosecond timestamps just arrived on CRAN.

nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it now uses a more rigorous S4-based approach thanks to a rewrite by Leonardo Silvestri.

This release disables some tests on the Slowlaris platform we are asked to conform to (which is a good thing as wider variety of test platforms widens test converage) yet have no real access to (which is bad thing, obviously) beyind what the helpful rhub service offers. We also updated the Travis setup. No code changes.

Changes in version 0.2.3 (2018-09-30)
  • Skip some tests on Solaris which seems borked with timezones. As we have no real, no fixed possible (Dirk in #42).

  • Update Travis setup

Once this updates on the next hourly cron iteration, we also have a diff to the previous version thanks to CRANberries. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Dirk Eddelbuettel Thinking inside the box

All i wanted to do is check an error code

Dje, 30/09/2018 - 2:03md
I was feeling a little under the weather last week and did not have enough concentration to work on developing a new NetSurf feature as I had planned. Instead I decided to look at a random bug from our worryingly large collection.

This lead me to consider the HTML form submission function at which point it was "can open, worms everywhere". The code in question has a fairly simple job to explain:
  1. A user submits a form (by clicking a button or such) and the Document Object Model (DOM) is used to create a list of information in the web form.
  2. The list is then converted to the appropriate format for sending to the web site server.
  3. An HTTP request is made using the correctly formatted information to the web server.
    However the code I was faced with, while generally functional, was impenetrable having accreted over a long time.

    At this point I was forced into a diversion to fix up the core URL library handling of query strings (this is used when the form data is submitted as part of the requested URL) which was necessary to simplify some complicated string handling and make the implementation more compliant with the specification.

    My next step was to add some basic error reporting instead of warning the user the system was out of memory for every failure case which was making debugging somewhat challenging. I was beginning to think I had discovered a series of very hairy yaks although at least I was not trying to change a light bulb which can get very complicated.

    At this point I ran into the form_successful_controls_dom() function which performs step one of the process. This function had six hundred lines of code, hundreds of conditional branches 26 local variables and five levels of indentation in places. These properties combined resulted in a cyclomatic complexity metric of 252. For reference programmers generally try to keep a single function to no more than a hundred lines of code with as few local variables as possible resulting in a CCM of 20.

    I now had a choice:

    • I could abandon investigating the bug, because even if I could find the issue changing such a function without adequate testing is likely to introduce several more.
    • I could refactor the function into multiple simpler pieces.
    I slept on this decision and decided to at least try to refactor the code in an attempt to pay back a little of the technical debt in the browser (and maybe let me fix the bug). After several hours of work the refactored source has the desirable properties of:
    • multiple straightforward functions
    • no function much more than a hundred lines long
    • resource lifetime is now obvious and explicit
    • errors are correctly handled and reported

    I carefully examined the change in generated code and was pleased to see the compiler output had become more compact. This is an important point that less experienced programmers sometimes miss, if your source code is written such that a compiler can reason about it easily you often get much better results than the compact alternative. However even if the resulting code had been larger the improved source would have been worth it.
    After spending over ten hours working on this bug I have not resolved it yet, indeed one might suggest I have not even directly considered it yet! I wanted to use this to explain a little to users who have to wait a long time for their issues to get resolved (in any project not just NetSurf) just how much effort is sometimes involved in a simple bug.
    Vincent Sanders Vincents Random Waffle

    RcppAPT 0.0.5

    Dje, 30/09/2018 - 2:08pd

    A new version of RcppAPT – our interface from R to the C++ library behind the awesome apt, apt-get, apt-cache, … commands and their cache powering Debian, Ubuntu and the like – is now on CRAN.

    This version is a bit of experiment. I had asked on the r-package-devel and r-devel list how I could suppress builds on macOS. As it does not have the required libapt-pkg-dev library to support the apt, builds always failed. CRAN managed to not try on Solaris or Fedora, but somewhat macOS would fail. Each. And. Every. Time. Sadly, nobody proposed a working solution.

    So I got tired of this. Now we detect where we build, and if we can infer that it is not a Debian or Ubuntu (or derived system) and no libapt-pkg-dev is found, we no longer fail. Rather, we just set a #define and at compile-time switch to essentially empty code. Et voilà: no more build errors.

    And as before, if you want to use the package to query the system packaging information, build it on system using apt and with its libapt-pkg-dev installed.

    A few other cleanups were made too.

    Changes in version 0.0.5 (2017-09-29)
    • NAMESPACE now sets symbol registration

    • configure checks for suitable system, no longer errors if none found, but sets good/bad define for the build

    • Existing C++ code is now conditional on having a 'good' build system, or else alternate code is used (which succeeds everywhere)

    • Added suitable() returning a boolean with configure result

    • Tests are conditional on suitable() to test good builds

    • The Travis setup was updated

    • The vignette was updated and expanded

    Courtesy of CRANberries, there is also a diffstat report for this release.

    A bit more information about the package is available here as well as as the GitHub repo.

    This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

    Dirk Eddelbuettel Thinking inside the box

    Valutakrambod - A python and bitcoin love story

    Sht, 29/09/2018 - 10:20md

    It would come as no surprise to anyone that I am interested in bitcoins and virtual currencies. I've been keeping an eye on virtual currencies for many years, and it is part of the reason a few months ago, I started writing a python library for collecting currency exchange rates and trade on virtual currency exchanges. I decided to name the end result valutakrambod, which perhaps can be translated to small currency shop.

    The library uses the tornado python library to handle HTTP and websocket connections, and provide a asynchronous system for connecting to and tracking several services. The code is available from github.

    There are two example clients of the library. One is very simple and list every updated buy/sell price received from the various services. This code is started by running bin/btc-rates and call the client code in valutakrambod/ The simple client look like this:

    import functools import tornado.ioloop import valutakrambod class SimpleClient(object): def __init__(self): = [] self.streams = [] pass def newdata(self, service, pair, changed): print("%-15s %s-%s: %8.3f %8.3f" % ( service.servicename(), pair[0], pair[1], service.rates[pair]['ask'], service.rates[pair]['bid']) ) async def refresh(self, service): await service.fetchRates(service.wantedpairs) def run(self): self.ioloop = tornado.ioloop.IOLoop.current() = valutakrambod.service.knownServices() for e in service = e() service.subscribe(self.newdata) stream = service.websocket() if stream: self.streams.append(stream) else: # Fetch information from non-streaming services immediately self.ioloop.call_later(len(, functools.partial(self.refresh, service)) # as well as regularly service.periodicUpdate(60) for stream in self.streams: stream.connect() try: self.ioloop.start() except KeyboardInterrupt: print("Interrupted by keyboard, closing all connections.") pass for stream in self.streams: stream.close()

    The library client loops over all known "public" services, initialises it, subscribes to any updates from the service, checks and activates websocket streaming if the service provide it, and if no streaming is supported, fetches information from the service and sets up a periodic update every 60 seconds. The output from this client can look like this:

    Bl3p BTC-EUR: 5687.110 5653.690 Bl3p BTC-EUR: 5687.110 5653.690 Bl3p BTC-EUR: 5687.110 5653.690 Hitbtc BTC-USD: 6594.560 6593.690 Hitbtc BTC-USD: 6594.560 6593.690 Bl3p BTC-EUR: 5687.110 5653.690 Hitbtc BTC-USD: 6594.570 6593.690 Bitstamp EUR-USD: 1.159 1.154 Hitbtc BTC-USD: 6594.570 6593.690 Hitbtc BTC-USD: 6594.580 6593.690 Hitbtc BTC-USD: 6594.580 6593.690 Hitbtc BTC-USD: 6594.580 6593.690 Bl3p BTC-EUR: 5687.110 5653.690 Paymium BTC-EUR: 5680.000 5620.240

    The exchange order book is tracked in addition to the best buy/sell price, for those that need to know the details.

    The other example client is focusing on providing a curses view with updated buy/sell prices as soon as they are received from the services. This code is located in bin/btc-rates-curses and activated by using the '-c' argument. Without the argument the "curses" output is printed without using curses, which is useful for debugging. The curses view look like this:

    Name Pair Bid Ask Spr Ftcd Age BitcoinsNorway BTCEUR 5591.8400 5711.0800 2.1% 16 nan 60 Bitfinex BTCEUR 5671.0000 5671.2000 0.0% 16 22 59 Bitmynt BTCEUR 5580.8000 5807.5200 3.9% 16 41 60 Bitpay BTCEUR 5663.2700 nan nan% 15 nan 60 Bitstamp BTCEUR 5664.8400 5676.5300 0.2% 0 1 1 Bl3p BTCEUR 5653.6900 5684.9400 0.5% 0 nan 19 Coinbase BTCEUR 5600.8200 5714.9000 2.0% 15 nan nan Kraken BTCEUR 5670.1000 5670.2000 0.0% 14 17 60 Paymium BTCEUR 5620.0600 5680.0000 1.1% 1 7515 nan BitcoinsNorway BTCNOK 52898.9700 54034.6100 2.1% 16 nan 60 Bitmynt BTCNOK 52960.3200 54031.1900 2.0% 16 41 60 Bitpay BTCNOK 53477.7833 nan nan% 16 nan 60 Coinbase BTCNOK 52990.3500 54063.0600 2.0% 15 nan nan MiraiEx BTCNOK 52856.5300 54100.6000 2.3% 16 nan nan BitcoinsNorway BTCUSD 6495.5300 6631.5400 2.1% 16 nan 60 Bitfinex BTCUSD 6590.6000 6590.7000 0.0% 16 23 57 Bitpay BTCUSD 6564.1300 nan nan% 15 nan 60 Bitstamp BTCUSD 6561.1400 6565.6200 0.1% 0 2 1 Coinbase BTCUSD 6504.0600 6635.9700 2.0% 14 nan 117 Gemini BTCUSD 6567.1300 6573.0700 0.1% 16 89 nan Hitbtc+BTCUSD 6592.6200 6594.2100 0.0% 0 0 0 Kraken BTCUSD 6565.2000 6570.9000 0.1% 15 17 58 Exchangerates EURNOK 9.4665 9.4665 0.0% 16 107789 nan Norgesbank EURNOK 9.4665 9.4665 0.0% 16 107789 nan Bitstamp EURUSD 1.1537 1.1593 0.5% 4 5 1 Exchangerates EURUSD 1.1576 1.1576 0.0% 16 107789 nan BitcoinsNorway LTCEUR 1.0000 49.0000 98.0% 16 nan nan BitcoinsNorway LTCNOK 492.4800 503.7500 2.2% 16 nan 60 BitcoinsNorway LTCUSD 1.0221 49.0000 97.9% 15 nan nan Norgesbank USDNOK 8.1777 8.1777 0.0% 16 107789 nan

    The code for this client is too complex for a simple blog post, so you will have to check out the git repository to figure out how it work. What I can tell is how the three last numbers on each line should be interpreted. The first is how many seconds ago information was received from the service. The second is how long ago, according to the service, the provided information was updated. The last is an estimate on how often the buy/sell values change.

    If you find this library useful, or would like to improve it, I would love to hear from you. Note that for some of the services I've implemented a trading API. It might be the topic of a future blog post.

    As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

    Petter Reinholdtsen Petter Reinholdtsen - Entries tagged english

    Pulling back

    Sht, 29/09/2018 - 6:15md

    I've updated my fork of the monkey programming language to allow object-based method calls.

    That's allowed me to move some of my "standard-library" code into Monkey, and out of Go which is neat. This is a simple example:

    // // Reverse a string, // function string.reverse() { let r= ""; let l = len(self); for( l > 0 ) { r += self[l-1]; l--; } return r; }

    Usage is the obvious:

    puts( "Steve".reverse() );


    let s = "Input"; s = s.reverse(); puts( s + "\n" );

    Most of the pain here was updating the parser to recognize that "." meant a method-call was happening, once that was done it was otherwise only a matter of passing the implicit self object to the appropriate functions.

    This work was done in a couple of 30-60 minute chunks. I find that I'm only really able to commit to that amount of work these days, so I've started to pull back from other projects.

    Oiva is now 21 months old and he sucks up all my time & energy. I can't complain, but equally I can't really start concentrating on longer-projects even when he's gone to sleep.

    And that concludes my news for the day.

    Goodnight dune..

    Steve Kemp Steve Kemp's Blog

    Self-plotting output from feedgnuplot and python-gnuplotlib

    Sht, 29/09/2018 - 1:40md

    I just made a small update to feedgnuplot (version 1.51) and to python-gnuplotlib (version 0.23). Demo:

    $ seq 5 | feedgnuplot --hardcopy $ ./ [plot pops up] $ cat #!/usr/bin/gnuplot set grid set boxwidth 1 histbin(x) = 1 * floor(0.5 + x/1) plot '-' notitle 1 1 2 2 3 3 4 4 5 5 e pause mouse close

    I.e. there's now support for a fake gp terminal that's not a gnuplot terminal at all, but rather a way to produce a self-executable gnuplot script. 99% of this was already implemented in --dump, but this way to access that functionality is much nicer. In fact, the machine running feedgnuplot doesn't even need to have gnuplot installed at all. I needed this because I was making complicated plots on a remote box, and X-forwarding was being way too slow. Now the remote box creates the self-plotting gnuplot scripts, I scp those, evaluate them locally, and then work with interactive visualizations.

    The python frontend gnuplotlib has received an analogous update.

    Dima Kogan Dima Kogan

    MicroDebConf Brasília 2018

    Pre, 28/09/2018 - 11:20md

    After I came back to my home city (Brasília) I felt the necessity to promote and help people to contribute to Debian, some old friends from my former university (Univesrity of Brasília) and the local comunnity (Debian Brasília) came up with the idea to run a Debian related event and I just thought: “That sounds amazing!”. We contacted the university to book a small auditorium there for an entire day. After that we started to think, how should we name the event? The Debian Day was more or less one month ago, someone speculated a MiniDebConf but I thought that it was going to be much smaller than regular MiniDebConfs. So we decided to use a term that we used sometime ago here in Brasília, we called MicroDebConf :)

    MicroDebConf Brasília 2018 took place at Gama campus of University of Brasília on September 8th. It was amazing, we gathered a lot of students from university and some high schools, and some free software enthisiastics too. We had 44 attendees in total, we did not expect all these people in the begining! During the day we presented to them what is Debian Project and the many different ways to contribute to it.

    Since our focus was newcommers we started from the begining explaining how to use Debian properly, how to interact with the community and how to contribute. We also introduced them to some other subjects such as management of PGP keys, network setup with Debian and some topics about Linux kernel contributions. As you probably know, students are never satisfied, sometimes the talks are too easy and basic and other times are too hard and complex to follow. Then we decided to balance the talks level, we started from Debian basics and went over details of Linux kernel implementation. Their feedback was positive, so I think that we should do it again, atract students is always a challenge.

    In the end of the day we had some discussions regarding what should we do to grow our local community? We want more local people actually contributing to free software projects and specially Debian. A lot of people were interested but some of them said that they need some guidance, the life of a newcommer is not so easy for now.

    After some discussion we came up with the idea of a study group about Debian packaging, we will schedule meetings every week (or two weeks, not decided yet), and during these meetings we will present about packaging (good practices, tooling and anything that people need) and do some hands-on work. My intention is document everything that we will do to facilitate the life of future newcommers that wants to do Debian packaging. My main reference for this study groups has been LKCamp, they are a more consolidated group and their focus is to help people start contributing to Linux kernel.

    In my opinion, this kind of initiative could help us on bring new blood to the project and disseminate the free software ideas/culture. Other idea that we have is to promote Debian and free software in general to non technical people. We realized that we need to reach these people if we want a broader community, we do not know how exactly yet but it is in our radar.

    After all these talks and discussions we needed some time to relax, and we did that together! We went to a bar and got some beer (except people with less than 18 years old :) and food. Of course that ours discussions about free software kept running all night long.

    The following is an overview about this conference:

    • We probably defined this term and are the first organizing a MicroDebConf (we already did it in 2015). We should promote more this kind of local events

    • I guess we inspired a lot of young people to contribute to Debian (and free software in general)

    • We defined a way to help local people starting contributing to Debian with packaging. I really like this idea of a study group, meet people in person is always the best way to create bonds

    • Now we hopefully will have a stronger Debian community in Brasília - Brazil \o/

    Last but not least, I would like to thank LAPPIS (a research lab which I was part in my undergrad), they helped us with all the logistics and buroucracies, and Collabora for the coffee break sponsorship! Collabora, LAPPIS and us share the same goal: promote FLOSS to all these young people and make our commuity grow!

    Lucas Kanashiro Lucas Kanashiro’s blog


    Enj, 27/09/2018 - 10:12md

    Aristotle’s distinction in EN between brutishness and vice might be comparable to the distinction in Dungeons & Dragons between chaotic evil and lawful evil, respectively.

    I’ve always thought that the forces of lawful evil are more deeply threatening than those of chaotic evil. In the Critical Hit podcast, lawful evil is equated with tyranny.

    Of course, at least how I run it, Aristotelian ethics involves no notion of evil, only mistakes about the good.

    Sean Whitton Notes from the Library

    Debian Policy call for participation -- September 2018

    Enj, 27/09/2018 - 10:07md

    Here’s a summary of some of the bugs against the Debian Policy Manual that are thought to be easy to resolve.

    Please consider getting involved, whether or not you’re an existing contributor.

    For more information, see our README.

    #152955 force-reload should not start the daemon if it is not running

    #172436 BROWSER and sensible-browser standardization

    #188731 Also strip .comment and .note sections

    #212814 Clarify relationship between long and short description

    #273093 document interactions of multiple clashing package diversions

    #314808 Web applications should use /usr/share/package, not /usr/share/doc/package

    #348336 Clarify Policy around shared configuration files

    #425523 Describe error unwind when unpacking a package fails

    #491647 debian-policy: X font policy unclear around TTF fonts

    #495233 debian-policy: README.source content should be more detailed

    #649679 [copyright-format] Clarify what distinguishes files and stand-alone license paragraphs.

    #682347 mark ‘editor’ virtual package name as obsolete

    #685506 copyright-format: new Files-Excluded field

    #685746 debian-policy Consider clarifying the use of recommends

    #694883 copyright-format: please clarify the recommended form for public domain files

    #696185 [copyright-format] Use short names from SPDX.

    #697039 expand cron and init requirement to check binary existence to other scripts

    #722535 debian-policy: To document: the “Binary-Only” field in Debian changes files.

    #759316 Document the use of /etc/default for cron jobs

    #770440 debian-policy: policy should mention systemd timers

    #780725 PATH used for building is not specified

    #794653 Recommend use of dpkg-maintscript-helper where appropriate

    #809637 DEP-5 does not support filenames with blanks

    #824495 debian-policy: Source packages “can” declare relationships

    #833401 debian-policy: virtual packages: dbus-session-bus, dbus-default-session-bus

    #845715 debian-policy: Please document that packages are not allowed to write outside their source directories

    #850171 debian-policy: Addition of having an ‘EXAMPLES’ section in manual pages debian policy 12.1

    #853779 debian-policy: Clarify requirements about update-rc.d and invoke-rc.d usage in maintainer scripts

    #904248 Add netbase to build-essential

    Sean Whitton Notes from the Library

    My Work on Debian LTS (September 2018)

    Enj, 27/09/2018 - 11:40pd

    In September 2018, I did 10 hours of work on the Debian LTS project as a paid contributor. Thanks to all LTS sponsors for making this possible.

    This is my list of work done in September 2018:

    • Upload of polarssl (DLA 1518-1) [1].
    • Work on CVE-2018-16831 discovered in the smarty3 package. Plan (A) was to backport latest smarty3 release to Debian stretch and jessie, but runtime tests against GOsa² (one of the PHP applications that utilize smarty3) already failed for Debian stretch. So, this plan was dropped. Plan (B) then was extracting a patch [2] for fixing this issue in Debian stretch's smarty3 package version from a manifold of upstream code changes; finally with the realization that smarty3 in Debian jessie is very likely not affected. Upstream feedback is still pending, upload(s) will occur in the coming week (first week of Octobre).





    sunweaver sunweaver's blog

    A nice oneliner

    Mër, 26/09/2018 - 7:51md

    Pop quiz! Let's say I have a datafile describing some items (images and feature points in this example):

    # filename x y 000.jpg 79.932824 35.609049 000.jpg 95.174662 70.876506 001.jpg 19.655072 52.475315 002.jpg 19.515351 33.077847 002.jpg 3.010392 80.198282 003.jpg 84.183099 57.901647 003.jpg 93.237358 75.984036 004.jpg 99.102619 7.260851 005.jpg 24.738357 80.490116 005.jpg 53.424477 27.815635 .... .... 149.jpg 92.258132 99.284486

    How do I get a random subset of N images, using only the shell and standard commandline tools?


    $ N=5; ( echo '# filename'; seq 0 149 | shuf | head -n $N | xargs -n1 printf "%03d.jpg\n" | sort) | vnl-join -j filename input.vnl - # filename x y 017.jpg 41.752204 96.753914 017.jpg 86.232504 3.936258 027.jpg 41.839110 89.148368 027.jpg 82.772742 27.880592 067.jpg 57.790706 46.153623 067.jpg 87.804939 15.853087 076.jpg 41.447477 42.844849 076.jpg 93.399829 64.552090 142.jpg 18.045497 35.381083 142.jpg 83.037867 17.252172 Dima Kogan Dima Kogan

    Shannon’s Ghost

    Mër, 26/09/2018 - 4:34pd

    I’m spending the 2018-2019 academic year as a fellow at the Center for Advanced Study in the Behavioral Sciences (CASBS) at Stanford.

    Claude Shannon on a bicycle.

    Every CASBS study is labeled with a list of  “ghosts” who previously occupied the study. This year, I’m spending the year in Study 50 where I’m haunted by an incredible cast that includes many people whose scholarship has influenced and inspired me.

    The top part of the list of ghosts in Study #50 at CASBS.

    Foremost among this group is Study 50’s third occupant: Claude Shannon

    At 21 years old, Shannon’s masters thesis (sometimes cited as the most important masters thesis in history) proved that electrical circuits could encode any relationship expressible in Boolean logic and opened the door to digital computing. Incredibly, this is almost never cited as Shannon’s most important contribution. That came in 1948 when he published a paper titled A Mathematical Theory of Communication which effectively created the field of information theory. Less than a decade after its publication, Aleksandr Khinchin (the mathematician behind my favorite mathematical constant) described the paper saying:

    Rarely does it happen in mathematics that a new discipline achieves the character of a mature and developed scientific theory in the first investigation devoted to it…So it was with information theory after the work of Shannon.

    As someone whose own research is seeking to advance computation and mathematical study of communication, I find it incredibly propitious to be sharing a study with Shannon.

    Although I teach in a communication department, I know Shannon from my background in computing. I’ve always found it curious that, despite the fact Shannon’s 1948 paper is almost certainly the most important single thing ever published with the word “communication” in its title, Shannon is rarely taught in communication curricula is sometimes completely unknown to communication scholars.

    In this regard, I’ve thought a lot about this passage in Robert’s Craig’s  influential article “Communication Theory as a Field” which argued:

    In establishing itself under the banner of communication, the discipline staked an academic claim to the entire field of communication theory and research—a very big claim indeed, since communication had already been widely studied and theorized. Peters writes that communication research became “an intellectual Taiwan-claiming to be all of China when, in fact, it was isolated on a small island” (p. 545). Perhaps the most egregious case involved Shannon’s mathematical theory of information (Shannon & Weaver, 1948), which communication scholars touted as evidence of their field’s potential scientific status even though they had nothing whatever to do with creating it, often poorly understood it, and seldom found any real use for it in their research.

    In preparation for moving into Study 50, I read a new biography of Shannon by Jimmy Soni and Rob Goodman and was excited to find that Craig—although accurately describing many communication scholars’ lack of familiarity—almost certainly understated the importance of Shannon to communication scholarship.

    For example, the book form of Shannon’s 1948 article was published by University Illinois on the urging of and editorial supervision of Wilbur Schramm (one of the founders of modern mass communication scholarship) who was a major proponent of Shannon’s work. Everett Rogers (another giant in communication) devotes a chapter of his “History of Communication Studies”² to Shannon and to tracing his impact in communication. Both Schramm and Rogers built on Shannon in parts of their own work. Shannon has had an enormous impact, it turns out, in several subareas of communication research (e.g., attempts to model communication processes).

    Although I find these connections exciting. My own research—like most of the rest of communication—is far from the substance of technical communication processes at the center of Shannon’s own work. In this sense, it can be a challenge to explain to my colleagues in communication—and to my fellow CASBS fellows—why I’m so excited to be sharing a space with Shannon this year.

    Upon reflection, I think it boils down to two reasons:

    1. Shannon’s work is both mathematically beautiful and incredibly useful. His seminal 1948 article points to concrete ways that his theory can be useful in communication engineering including in compression, error correcting codes, and cryptography. Shannon’s focus on research that pushes forward the most basic type of basic research while remaining dedicated to developing solutions to real problems is a rare trait that I want to feature in my own scholarship.
    2. Shannon was incredibly playful. Shannon played games, juggled constantly, and was always seeking to teach others to do so. He tinkered, rode unicycles, built a flame-throwing trumpet, and so on. With Marvin Minsky, he invented the “ultimate machine”—a machine that’s only function is to turn itself off—which he kept on his desk.

      A version of the Shannon’s “ultimate machine” that is sitting on my desk at CASBS.

    I have no misapprehension that I will accomplish anything like Shannon’s greatest intellectual achievements during my year at CASBS. I do hope to be inspired by Shannon’s creativity, focus on impact, and playfulness. In my own little ways, I hope to build something at CASBS that will advance mathematical and computational theory in communication in ways that Shannon might have appreciated.

    1. Incredibly, the year that Shannon was in Study 50, his neighbor in Study 51 was Milton Friedman. Two thoughts: (i) Can you imagine?! (ii) I definitely chose the right study!
    2. Rogers book was written, I found out, during his own stint at CASBS. Alas, it was not written in Study 50.
    Benjamin Mako Hill copyrighteous

    GSoC 2018: Final Report

    Mar, 25/09/2018 - 7:08md

    This is my final report of my Google Summer of Code 2018, it also serves as my final code submission.

    For the last 3 months I have been working with Debian on the project Extracting Data from PDF Invoices and Bills Details. Information about the project can be found here:

    My mentor and I agreed to modify the work to be done in the Summer. Already discussed here:
    We will advance the ecosystem for machine-readable invoice exchange and make it easily accessible for the whole Python community by making the following contributions:
    • Python library to read/write/add/edit Factur-x metadata in different XML-flavors in Python.
    • Command line interface to process PDF files and access the main library functions.
    • Way to add structured data to existing files or from legacy accounting systems. (via invoice2data project)
    • New desktop GUI to add, edit, import and export factur-x metadata in- and out of PDF files.
    Short overviewThe project work can be bifurcated into two parts:
    • Main Deliverable: GUI creation for Factur-X Library
    • Pre-requisites for Main Deliverable: Improvements to invoice2data library and updating Factur-X library to a working state
    Contributions to invoice2dataA modular Python library to support your accounting process. Tested on Python 2.7 and 3.4+. Main steps:
    1. extracts text from PDF files using different techniques, like pdftotext, pdfminer or tesseract OCR.
    2. searches for regex in the result using a YAML-based template system
    3. saves results as CSV, JSON or XML or renames PDF files to match the content.
    My contributions:
    Contributions to Factur-XFactur-X is a EU standard for embedding XML representations of invoices in PDF files. This library provides an interface for reading, editing and saving the this metadata. My contributions:
    Organisation PageAn organisation created on github, invoice-x, to tie down all the repository at a single place.
    link to organisation page:
    Organisation Website A static website briefly explaining the whole project. Link to website:
    Main Deliverable RepositoryThis repository contains the code to make GUI for Factur-x Library. Link to the repository:

    invoicex-gui: invoice2data integration with invoicex-gui and factur-x-ng
    OverviewPre-requisites for Main DeliverableFactur-XTo work on GUI creation for Factur-X, I first needed to update Factur-x library to a working state. My mentor, Manuel, did the initial refactoring of the project after forking the original repository,

    Since then I have added a few features to the library:
    • Fix checking of embedded resources
    • Converting the documentation format from md to rst
    • Added unit tests for factur-x
    • Added new feature to export metadata in JSON and YAML format
    • Cleaned XML template to add
    • Added validation of country and currency codes with ISO standards.
    • Implemented Command Line Options
    Invoice2dataI started contributing to invoice2data in the month of February. Invoice2data became the first open source project I contributed to. The first contribution was just fixing a typo in the documentation, but this introduced me to the world of Free Open Source Software (FOSS).

    Since, I have been selected for Google Summer of Code 2018, I have added the following commits:
    • Removed required fields in favour of providing flexibility to extract data
    • Added feature to extract all fields mentioned in template
    • Updated README and worked on conversion of md to rst
    • Added checks for dependencies: tesseract and imagemagick
    • Changed subprocess input form normal string to list
    • Added more tests and checked coverage locally
    • Fixed the ways invoice2data handles lists
    Main DeliverableInvoicex-GUI My main deliverable was to make Graphical User Interface for Factur-X library. For this I used PyQt-5 framework. The other options for the same were Kivy and wxWidgets. I have some prior experience with PyQt-5 and a bug in Kivy related to touchpad driver of Debian inclined me to use PyQt-5.

    The making the GUI was one of the most challenging part of the GSoC project. The lack of documentation for PyQt-5 didn’t help much. I have 3 years of experience with C++ and used it to learn more about PyQt-5 through its original documentation for Qt which is in C++.

    The GUI includes:
    • Selected PDF and searching for any embedded standard
    • If no standard is found, give a pop up to select the standard to be added
    • Edit metadata of existing embedded standard
    • Export metadata
    • Validate Metadata
    • Use invoice2data to extract field data from invoice
    Weekly Work Done (week 1) (week 2) (week 3) (week 4) (week 5) (week 6) (week 7) (week 8) (week 9, 10) (week 11) (week 12)  Harshit Joshi Harshit Joshi's Blog

    Reproducible Builds: Weekly report #178

    Mar, 25/09/2018 - 6:53md

    Here’s what happened in the Reproducible Builds effort between Sunday September 16 and Saturday September 22 2018:

    Patches filed diffoscope development

    diffoscope version 102 was uploaded to Debian unstable by Mattia Rizzolo. It included contributions already covered in previous weeks as well as new ones from:

    Test framework development

    There were a number of updates to our Jenkins-based testing framework that powers this month, including:


    This week’s edition was written by Bernhard M. Wiedemann, Chris Lamb, Daniel Shahaf, Holger Levsen, Jelle van der Waa, Vagrant Cascadian & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

    Reproducible builds folks

    Crossing the Great St Bernard Pass

    Mar, 25/09/2018 - 4:26md

    It's a great day for the scenic route to Italy, home of Beethoven's Swiss cousins.

    What goes up, must come down...

    Descent into the Aosta valley

    Daniel.Pocock - debian

    Smallish haul

    Mar, 25/09/2018 - 6:34pd

    It's been a little while since I've made one of these posts, and of course I'm still picking up this and that. Books won't buy themselves!

    Elizabeth Bear & Katherine Addison — The Cobbler's Boy (sff)
    P. Djèlí Clark — The Black God's Drums (sff)
    Sabine Hossenfelder — Lost in Math (nonfiction)
    N.K. Jemisin — The Dreamblood Duology (sff)
    Mary Robinette Kowal — The Calculating Stars (sff)
    Yoon Ha Lee — Extracurricular Activities (sff)
    Seanan McGuire — Night and Silence (sff)
    Bruce Schneier — Click Here to Kill Everyone (nonfiction)

    I have several more pre-orders that will be coming out in the next couple of months. Still doing lots of reading, but behind on writing up reviews, since work has been busy and therefore weekends have been low-energy. That should hopefully change shortly.

    Russ Allbery Eagle's Path

    Archiving web sites

    Mar, 25/09/2018 - 2:00pd

    I recently took a deep dive into web site archival for friends who were worried about losing control over the hosting of their work online in the face of poor system administration or hostile removal. This makes web site archival an essential instrument in the toolbox of any system administrator. As it turns out, some sites are much harder to archive than others. This article goes through the process of archiving traditional web sites and shows how it falls short when confronted with the latest fashions in the single-page applications that are bloating the modern web.

    Converting simple sites

    The days of handcrafted HTML web sites are long gone. Now web sites are dynamic and built on the fly using the latest JavaScript, PHP, or Python framework. As a result, the sites are more fragile: a database crash, spurious upgrade, or unpatched vulnerability might lose data. In my previous life as web developer, I had to come to terms with the idea that customers expect web sites to basically work forever. This expectation matches poorly with "move fast and break things" attitude of web development. Working with the Drupal content-management system (CMS) was particularly challenging in that regard as major upgrades deliberately break compatibility with third-party modules, which implies a costly upgrade process that clients could seldom afford. The solution was to archive those sites: take a living, dynamic web site and turn it into plain HTML files that any web server can serve forever. This process is useful for your own dynamic sites but also for third-party sites that are outside of your control and you might want to safeguard.

    For simple or static sites, the venerable Wget program works well. The incantation to mirror a full web site, however, is byzantine:

    $ nice wget --mirror --execute robots=off --no-verbose --convert-links \ --backup-converted --page-requisites --adjust-extension \ --base=./ --directory-prefix=./ --span-hosts \,

    The above downloads the content of the web page, but also crawls everything within the specified domains. Before you run this against your favorite site, consider the impact such a crawl might have on the site. The above command line deliberately ignores [robots.txt][] rules, as is now common practice for archivists, and hammer the website as fast as it can. Most crawlers have options to pause between hits and limit bandwidth usage to avoid overwhelming the target site.

    The above command will also fetch "page requisites" like style sheets (CSS), images, and scripts. The downloaded page contents are modified so that links point to the local copy as well. Any web server can host the resulting file set, which results in a static copy of the original web site.

    That is, when things go well. Anyone who has ever worked with a computer knows that things seldom go according to plan; all sorts of things can make the procedure derail in interesting ways. For example, it was trendy for a while to have calendar blocks in web sites. A CMS would generate those on the fly and make crawlers go into an infinite loop trying to retrieve all of the pages. Crafty archivers can resort to regular expressions (e.g. Wget has a --reject-regex option) to ignore problematic resources. Another option, if the administration interface for the web site is accessible, is to disable calendars, login forms, comment forms, and other dynamic areas. Once the site becomes static, those will stop working anyway, so it makes sense to remove such clutter from the original site as well.

    JavaScript doom

    Unfortunately, some web sites are built with much more than pure HTML. In single-page sites, for example, the web browser builds the content itself by executing a small JavaScript program. A simple user agent like Wget will struggle to reconstruct a meaningful static copy of those sites as it does not support JavaScript at all. In theory, web sites should be using progressive enhancement to have content and functionality available without JavaScript but those directives are rarely followed, as anyone using plugins like NoScript or uMatrix will confirm.

    Traditional archival methods sometimes fail in the dumbest way. When trying to build an offsite backup of a local newspaper (, I found that WordPress adds query strings (e.g. ?ver=1.12.4) at the end of JavaScript includes. This confuses content-type detection in the web servers that serve the archive, which rely on the file extension to send the right Content-Type header. When such an archive is loaded in a web browser, it fails to load scripts, which breaks dynamic websites.

    As the web moves toward using the browser as a virtual machine to run arbitrary code, archival methods relying on pure HTML parsing need to adapt. The solution for such problems is to record (and replay) the HTTP headers delivered by the server during the crawl and indeed professional archivists use just such an approach.

    Creating and displaying WARC files

    At the Internet Archive, Brewster Kahle and Mike Burner designed the ARC (for "ARChive") file format in 1996 to provide a way to aggregate the millions of small files produced by their archival efforts. The format was eventually standardized as the WARC ("Web ARChive") specification that was released as an ISO standard in 2009 and revised in 2017. The standardization effort was led by the International Internet Preservation Consortium (IIPC), which is an "international organization of libraries and other organizations established to coordinate efforts to preserve internet content for the future", according to Wikipedia; it includes members such as the US Library of Congress and the Internet Archive. The latter uses the WARC format internally in its Java-based Heritrix crawler.

    A WARC file aggregates multiple resources like HTTP headers, file contents, and other metadata in a single compressed archive. Conveniently, Wget actually supports the file format with the --warc parameter. Unfortunately, web browsers cannot render WARC files directly, so a viewer or some conversion is necessary to access the archive. The simplest such viewer I have found is pywb, a Python package that runs a simple webserver to offer a Wayback-Machine-like interface to browse the contents of WARC files. The following set of commands will render a WARC file on http://localhost:8080/:

    $ pip install pywb $ wb-manager init example $ wb-manager add example crawl.warc.gz $ wayback

    This tool was, incidentally, built by the folks behind the Webrecorder service, which can use a web browser to save dynamic page contents.

    Unfortunately, pywb has trouble loading WARC files generated by Wget because it followed an inconsistency in the 1.0 specification, which was fixed in the 1.1 specification. Until Wget or pywb fix those problems, WARC files produced by Wget are not reliable enough for my uses, so I have looked at other alternatives. A crawler that got my attention is simply called crawl. Here is how it is invoked:

    $ crawl

    (It does say "very simple" in the README.) The program does support some command-line options, but most of its defaults are sane: it will fetch page requirements from other domains (unless the -exclude-related flag is used), but does not recurse out of the domain. By default, it fires up ten parallel connections to the remote site, a setting that can be changed with the -c flag. But, best of all, the resulting WARC files load perfectly in pywb.

    Future work and alternatives

    There are plenty more resources for using WARC files. In particular, there's a Wget drop-in replacement called Wpull that is specifically designed for archiving web sites. It has experimental support for PhantomJS and youtube-dl integration that should allow downloading more complex JavaScript sites and streaming multimedia, respectively. The software is the basis for an elaborate archival tool called ArchiveBot, which is used by the "loose collective of rogue archivists, programmers, writers and loudmouths" at ArchiveTeam in its struggle to "save the history before it's lost forever". It seems that PhantomJS integration does not work as well as the team wants, so ArchiveTeam also uses a rag-tag bunch of other tools to mirror more complex sites. For example, snscrape will crawl a social media profile to generate a list of pages to send into ArchiveBot. Another tool the team employs is crocoite, which uses the Chrome browser in headless mode to archive JavaScript-heavy sites.

    This article would also not be complete without a nod to the HTTrack project, the "website copier". Working similarly to Wget, HTTrack creates local copies of remote web sites but unfortunately does not support WARC output. Its interactive aspects might be of more interest to novice users unfamiliar with the command line.

    In the same vein, during my research I found a full rewrite of Wget called Wget2 that has support for multi-threaded operation, which might make it faster than its predecessor. It is missing some features from Wget, however, most notably reject patterns, WARC output, and FTP support but adds RSS, DNS caching, and improved TLS support.

    Finally, my personal dream for these kinds of tools would be to have them integrated with my existing bookmark system. I currently keep interesting links in Wallabag, a self-hosted "read it later" service designed as a free-software alternative to Pocket (now owned by Mozilla). But Wallabag, by design, creates only a "readable" version of the article instead of a full copy. In some cases, the "readable version" is actually unreadable and Wallabag sometimes fails to parse the article. Instead, other tools like bookmark-archiver or reminiscence save a screenshot of the page along with full HTML but, unfortunately, no WARC file that would allow an even more faithful replay.

    The sad truth of my experiences with mirrors and archival is that data dies. Fortunately, amateur archivists have tools at their disposal to keep interesting content alive online. For those who do not want to go through that trouble, the Internet Archive seems to be here to stay and Archive Team is obviously working on a backup of the Internet Archive itself.

    This article first appeared in the Linux Weekly News.

    As usual, here's the list of issues and patches generated while researching this article:

    I also want to personally thank the folks in the #archivebot channel for their assistance and letting me play with their toys.

    The Pamplemousse crawl is now available on the Internet Archive, it might end up in the wayback machine at some point if the Archive curators think it is worth it.

    Another example of a crawl is this archive of two Bloomberg articles which the "save page now" feature of the Internet archive wasn't able to save correctly. But could! Those pages can be seen in the web recorder player to get a better feel of how faithful a WARC file really is.

    Finally, this article was originally written as a set of notes and documentation in the archive page which may also be of interest to my readers.

    Antoine Beaupré pages tagged debian-planet

    VLC in Debian now can do bittorrent streaming

    Hën, 24/09/2018 - 9:20md

    Back in February, I got curious to see if VLC now supported Bittorrent streaming. It did not, despite the fact that the idea and code to handle such streaming had been floating around for years. I did however find a standalone plugin for VLC to do it, and half a year later I decided to wrap up the plugin and get it into Debian. I uploaded it to NEW a few days ago, and am very happy to report that it entered Debian a few hours ago, and should be available in Debian/Unstable tomorrow, and Debian/Testing in a few days.

    With the vlc-plugin-bittorrent package installed you should be able to stream videos using a simple call to


    It can handle magnet links too. Now if only native vlc had bittorrent support. Then a lot more would be helping each other to share public domain and creative commons movies. The plugin need some stability work with seeking and picking the right file in a torrent with many files, but is already usable. Please note that the plugin is not removing downloaded files when vlc is stopped, so it can fill up your disk if you are not careful. Have fun. :)

    I would love to get help maintaining this package. Get in touch if you are interested.

    As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

    Petter Reinholdtsen Petter Reinholdtsen - Entries tagged english

    Handling an old Digital Photo Frame (AX203) with Debian (and gphoto2)

    Sht, 22/09/2018 - 8:20md

    Some days ago I found an key chain at home that was a small digital photo frame, and it seems that was not used since 2009 (old times when I was not using Debian at home yet). The photo frame was still working (I connected it with an USB cable and after some seconds, it turned on), and showed 37 photos from 2009 indeed.

    When I connected it with USB cable to the computer, it was asking “Connect USB? Yes/No” I pressed the button saying “yes” and nothing happened in the computer (I was expecting an USB drive to be shown in Dolphin, but no).

    I looked at “dmesg” output and it was shown as a CDROM:

    [ 1620.497536] usb 3-2: new full-speed USB device number 4 using xhci_hcd [ 1620.639507] usb 3-2: New USB device found, idVendor=1908, idProduct=1320 [ 1620.639513] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 1620.639515] usb 3-2: Product: Photo Frame [ 1620.639518] usb 3-2: Manufacturer: BUILDWIN [ 1620.640549] usb-storage 3-2:1.0: USB Mass Storage device detected [ 1620.640770] usb-storage 3-2:1.0: Quirks match for vid 1908 pid 1320: 20000 [ 1620.640807] scsi host7: usb-storage 3-2:1.0 [ 1621.713594] scsi 7:0:0:0: CD-ROM buildwin Photo Frame 1.01 PQ: 0 ANSI: 2 [ 1621.715400] sr 7:0:0:0: [sr1] scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray [ 1621.715745] sr 7:0:0:0: Attached scsi CD-ROM sr1 [ 1621.715932] sr 7:0:0:0: Attached scsi generic sg1 type 5

    But not automounted.
    I mounted it and then looked at the files, but I couldn’t find photos there, only these files:

    Autorun.inf FEnCodeUnicode.dll LanguageUnicode.ini DPFMate.exe flashlib.dat StartInfoUnicode.ini

    The Autorun.inf file was pointing to the DPFMate.exe file.

    I connected the device to a Windows computer and then I could run the DPFMate.exe program, and it was a program to manage the photos in the device.

    I was wondering if I could manage the device from Debian and then searched for «dpf “digital photo frame” linux dpfmate» and found this page:

    Yes, that one was my key chain!

    I looked for gphoto in Debian, going to and then learned that the program I need to install was gphoto2.
    I installed it and then went to its Quick Start Guide to learn how to access the device, get the photos etc. In particular, I used these commands:

    gphoto2 --auto-detect Model Port ---------------------------------------------------------- AX203 USB picture frame firmware ver 3.4.x usbscsi:/dev/sg1 gphoto2 --get-all-files

    (it copied all the pictures that were in the photo frame, to the current folder in my computer)

    gphoto2 --upload-file=name_of_file

    (to put some file in the photo frame)

    gphoto2 --delete-file=1-38

    (to delete the file 1 to 38 in the photo frame).

    larjona English – The bright side