It’s been a little over five years since I first published decline
, a functional and UNIXy command-line parser for Scala. (If you’re not familiar with decline
, the documentation might help.)
In distributed systems, it’s extroardinarily common to want to split a large dataset across some number of physical shards or partitions. This is commonly done by taking the key, hashing it, and then taking the hash modulo the number of partitions:
Today, I’m delighted to announce the 0.2 release of the coast
project: a high-level streaming toolkit written in Scala. coast
is designed around Kafka’s partitioned log model, and supports complex streaming topologies with unusually strong messaging guarantees and no need for a central coordinator. The current release includes a new backend that compiles to Samza and supports exactly-once semantics for messages and state, support for cyclic dataflow graphs, and a bunch of improvements to the core library and documentation.
Exactly-once messaging is something of a holy grail in the Kafka ecosystem – widely sought-after but rarely encountered. There are a handful of systems that promise exactly-once semantics, but none of them are a general-purpose solution: they’re often too task-specific, too heavyweight, or too broken, and sometimes all three. Complicating the picture is the fact that exactly-once message delivery is, in general, impossible.