In distributed systems, it’s extroardinarily common to want to split a large dataset across some number of physical shards or partitions. This is commonly done by taking the key, hashing it, and then taking the hash modulo the number of partitions:
Well, time flies! It’s well over a year ago that I promised to continue my writeups on the ethereum-haskell
project with some notes on the merkle trie implementation – but it’s been a long time since I’ve had time or inclination to chip at that project, and my notes have been collecting dust.
Today, I’m delighted to announce the 0.2 release of the coast
project: a high-level streaming toolkit written in Scala. coast
is designed around Kafka’s partitioned log model, and supports complex streaming topologies with unusually strong messaging guarantees and no need for a central coordinator. The current release includes a new backend that compiles to Samza and supports exactly-once semantics for messages and state, support for cyclic dataflow graphs, and a bunch of improvements to the core library and documentation.
Exactly-once messaging is something of a holy grail in the Kafka ecosystem – widely sought-after but rarely encountered. There are a handful of systems that promise exactly-once semantics, but none of them are a general-purpose solution: they’re often too task-specific, too heavyweight, or too broken, and sometimes all three. Complicating the picture is the fact that exactly-once message delivery is, in general, impossible.
This is my second post on a Haskell codebase I’ve been working on – a reimplementation of the Ethereum cryptocurrency / application platform in Haskell. (If you missed it, you might want to read the original post first.) Like last time, we’ll take some ideas from the original project, look at how they translate into Haskell, and compare things with one of the official implementations.
A lot of folks have been complaining about the gap between knowing elementary Haskell — the sort you learn in language tutorials — and real-world programming. Even when you’re comfortable with the core language, it’s not always obvious where to go from there, or how to translate those ideas into a larger system.