Export Drupal nodes to CSV

Want to extract all your content from Drupal? I bet you do… seems like the world has gotten a distaste for all things Drupal and is jumping on the WordPress or even Jekyll bandwagon. Anyway, this query will probably get you what you need:

SELECT DISTINCT no.nid, n.title, u.name, n.timestamp, n.body, group_concat(DISTINCT td.name) as terms FROM users u, node no, term_data td, node_revisions n LEFT JOIN (term_node) ON (n.nid = term_node.nid) WHERE n.uid = u.uid AND no.vid=n.vid AND term_node.tid = td.tid GROUP BY no.nid

This was inspired by a more complicated method here; but why use node.js and a long JavaScript script to do something that can be done in one SQL statement?

Disclaimer: I’m still waiting for confirmation from my collaborator that this is actually what he needs, so there might be some edits coming.

Get pnet to do UDP blocking

Pnet is a MEX-file-based toolbox for Octave/MATLAB. The help instructions would lead you to believe that UDP reads block by default, but this didn’t work for me: I had to specify a timeout. You can specify Inf, but given that you can’t ctrl-c out of a program whilst the MEX part is running, I’d recommend choosing a reasonable value (like 20, = 20 seconds).

Here’s a simple example:
data = pnet(sock, 'read')

Compiling pnet (or any other MEX file) with Visual Studio Express and Octave

Octave supports MEX files, just like MATLAB. Here’s how to compile a MEX file (in my case pnet):

  • Run Developer Tools (2012)
  • Change Octave’s math.h to specify correct MS math.h
  • Add Octave ‘bin’ folder to path
  • Change to the folder with the files
  • Run command:
    mkoctfile --mex -lws2_32.lib -DWIN32 -v -o pnet pnet.c
  • The last command fails because a random .lib gets added somehow. Run manually (change paths for your installation):
    link -nologo -DLL -out:pnet.mex pnet.o ws2_32.lib -LIBPATH:C:\bin\Octave-3.6.4\lib\octave\3.6.4 -LIBPATH:C:\bin\Octave-3.6.4\lib octinterp.lib octave.lib cruft.lib -export:mexFunction -incremental:no dirent.lib msvcmath.lib
    I’ve forgotten how I figured that out, so sorry if I’m not giving credit if it’s due somewhere!

R rJava (.jinit()) stops working – make it work again

Windows 8, 64 bit, 64 bit Java 7, 64 bit R…

Apparently (!) it worked before. And then R said it was “Unable to create a class loader” (no arguments) or that it couldn’t start the JVM (with arguments) when using .jinit.

Unfortunately, I can’t say exactly what was wrong, or why it apparently stopped working, but these two things fixed it:

Add jwm.dll to the path

Set the JAVA_HOME environment variable to the location of Java (one folder above bin)

And I used the more recent JDK folders (including the server folder within the JRE that came with the JDK) and not the JRE-only that was also installed but appeared to be an earlier version.

Resizing an LVM 2 filesystem on Ubuntu inside Virtualbox (Mac)

Problem 1, VirtualBox can’t resize this kind of Linux volume, so, on the Mac:

Create a new HD (for me, ~10G)
VBoxManage createhd –filename Ubuntu.vdi –size 10000
Copy contents
VBoxManage clonehd CartoDB.vdi Ubuntu.vdi –existing

Then, Download GParted ISO, latest version so that you can resize LVM volumes
Add as CD volume to the machine and boot it, Gparted runs
Deactivate the volume, and resize to fill space, apply changes
Shutdown, ensure CD is no longer mounted in VirtualBox, reboot to real Ubuntu

Then, back in real Ubuntu (this is live resizing, use at your own risk!)
sudo lvextend /dev/lubuntu-vg/root /dev/sda5
sudo resize2fs /dev/mapper/lubuntu–vg/root


Cobbled together from:





…and quite possibly a few others

Using D3.js and reveal.js for better organised data presentation

D3.js is a Javascript library for attaching data to web elements for the purposes of visualising the data. Once the visualisation is more than a little complicated, this usually means generating SVG elements. SVG (Scalable Vector Graphics) is a vector graphics format – it’s a text based description of lines, points and so on. Due its nature, it is scalable in size without any loss in quality because the elements are simply redrawn.

I’m (slowly) putting together a PhD thesis at the moment, as well as occasionally having to present presentations. One thing that has always infuriated me about research is the clutter that arises from multiple programming languages, scripts, graphics outputs, versions of files, presentations and papers. You think to yourself, where is that nice figure I made of [insert great discovery here], or you wonder which version of a script actually led to a particular figure. Of course, decent file organisation is important, but I think part of the problem also comes from figures inevitably getting duplicated depending on their use. A publisher might need the figure in high-res PNG or PDF or EPS, and once a figure is in Powerpoint you often lose track of where it came from. And then there is the increasing push to put stuff online.

D3 is great at generating figures, independent of the data source, and its a natural fit for displaying data on the web. But programmable SVG files seem like something too good to only use on the web. I’ve also come round to the idea that HTML displayed on Chrome in full-screen mode is a better fit for me that Powerpoint. That’s partly due to some of the above problems, but also for the same reasons that people use LaTeX instead of Word. In the end its a personal choice, I’m happy to admit that LaTeX and HTML have their disadvantages compared to all-in-one tools like Word and Powerpoint. I was using my own cobbled-together solution until I discovered I was only reinventing the wheel. There are several solutions out there, but I like reveal.js a lot, and was able to adapt it to look suitably similar (without removing all of its innovative concepts!) to our department’s Powerpoint template pretty easily.

Anyway, now that most of my workflow can use named files for displaying graphics (presentations and text documents) I decided I wanted a way to have my scripts output data for plotting (so I’m programming language neutral), plot it with D3, and as automatically-as-possible generate identical versions in SVG, PDF and PNG. SVG is for the web or posters (i.e. Illustrator), PDF is for LaTeX and PNG is for, well, everything else, for example Powerpoint, which (no man being an island), we can never escape.

This is how it works:

  • The data sits in a CSV file.
  • An appropriate HTML file contains the D3 plotting code. So far I use one HTML file per plot; it seems every figure is different enough to warrant that, but of course some bits could be put into a common .js file to avoid duplication.
  • I run a single Python script with three arguments: python outputfiles.py [source html file] [html element name] [output prefix]
  • I then end up with the three files: [image].svg, [image].pdf, [image].png

    So, what’s going on in this Python script?

    import sys
    import cairosvg
    from subprocess import call

    source = sys.argv[1]
    element = sys.argv[2]
    target = sys.argv[3]

    command = "../bin/phantomjs ../lib/extract.js \"" + source + "\" " + element + " >> \"" + target + ".svg\""

    call(command, shell=True)

    svg = open(target + ".svg").read()
    fout = open(target + ".png",'w')

    svg = open(target + ".svg").read()
    fout = open(target + ".pdf",'w')

    PhantomJS is used to run a piece of Javascript (extract.js) without a browser. This Javascript can extract any named element in the page and write it to a file, in my case an SVG file. Then, the cairosvg module (which requires the pycairo module; I don’t want to give any advice on adding modules to your Python environment as I’m not a Python guru and spent a lot of time messing around until it was installed… Google is your friend, as they say) lets us write a PNG and PDF file.

    One very important point: do your SVG styling using .style("attribute", "value") function calls and not using style sheets. If you use style sheets, your SVG is being styled by the page and the styles are not integrated into the SVG element, which becomes your file, and which will end up looking very different from what you see on the webpage.

    Finally, I have everything set up in a slightly pedantic folder structure (I’ve lost count of how many new miracle folder structures I’ve come up with, but this is perhaps the most all-encompassing yet) which looks like:

    • bin
      • Executable code, i.e. phantomjs
    • data
      • Data, regardless of whether its raw or processed from some script (there’s a big gray area inbetween those seemingly distinguishable two things
    • figures
      • One folder per figure, usually containing an html file for plotting and the files output from the script above. But I also put readymade figures, and videos (see blog post on .webm format***) in here.
    • lib
      • Common files, at the moment all Javascript
    • presentations
      • Presentations, in the case of Impress.js (see below), this is a simple HTML file which can reference the figures using simple relative paths
    • scripts
      • Scripts for doing stuff – e.g. the Python script for outputting the images and a script for generating screenshots from SUMO. I suppose the distinction between this and lib is that things in lib should be referenced whereas things in here should be run.
    • text
      • Editable text – LaTeX files live in here

    So now, in theory, I can very quickly establish what data the figures in my thesis and presentations are based on, and always have the latest figure for a particular topic to hand. One advantage of keeping as many things as possible in text is that you can use the same editor for all of them, whether it be a text editor, an IDE, or a mobile app (I use PlainText on the iPad).

    In case this system resonates with you in any way, you can download a sample archive containing all the necessary bits and pieces (apart from the additions to your Python environment) here. Please note that the PhantomJS/CairoSVG part is only tested on Mac and the PhantomJS binary under /bin is the Mac version. In any case its probably worthwhile updating PhantomJS, reveal.js and D3.js to the latest versions after you download.

    Generate a movie from SUMO

    SUMO, the open-source traffic simulation tool, does not have a simple option to output a movie. Using TraCI, the UDP-based control interface for SUMO, you can quickly generate a set of images from your simulation:

    import traci
    import sys
    import subprocess

    #change if necessary
    PORT = 8816

    sumoProcess = subprocess.Popen("C:\\path\\to\\sumo-gui \"C:\\path\\to\\simfile.sumocfg\"", shell=True, stdout=sys.stdout)

    print "SUMO Started"

    step = 0
    #Please note these numbers are x100 milliseconds only because the SUMO simulation is set to have timesteps of 0.1s - i.e. the numbers are not in time but in timesteps
    #Set movie generation end
    while step 25000): traci.gui.screenshot("View #0", "..\\figures\\figurepath\\sumovideo\\sumo2Dvideo"+str(step)+".png")
    step += 1


    Once this is done, you’ll have a set of images. Just open the first one in Virtualdub Mod, and it will show you all of them as a movie which you can output to the format of your choice – for example .webm.

    Generating HTML 5 videos for Chrome (with Virtualdub Mod)

    I finally took the time to educate myself on the virtues of HTML 5 video. I’ve been doing presentations in HTML for a while now, but had been using browser plugins, which were awkward. Chrome’s preferred format is “VP8″ inside a container called “webm”. I often am preparing videos for presentations with Virtualdub Mod, as its an extremely fast and simple (and free) way to quickly edit AVI video file(s) or image sequences into a compressed movie. And you can even generate modern .webm videos with it; here’s how:

    1. Download and install the Video for Windows VP8 codec. You can ignore the warning about 64bit systems, although its probably true that media player, powerpoint 64bit etc. won’t cope, VirtualDub is 32 bit and can use the codec. There is a VP8 DirectShow Filter available for playback on more modern video software (Video for Windows is, err, old).
    2. Edit video as normal. Go to File… Save As, as normal. Select the Matroska (.mkv) container, Full Processing mode and under Configure, select the Google VP8 codec. You can change the target bit rate if desired, but the default was fine for me.
    3. Save! Be sure to (re)name the file with a .webm extension – .webm is an extension of the Matroska format

    Another way to prepare .webm files if you are just doing a conversion is to use VLC‘s Convert/Save function.