CPM 2017

28th Annual Symposium on Combinatorial Pattern Matching

Warsaw, Poland, July 4-6, 2017


Papers and Proceedings

Local information


Keynote speakers

Artur Jeż (University of Wrocław, Poland)

Recompression of SLPs

In this talk I will survey the recompression technique in case of SLPs. The technique is based on applying simple compression operations (replacement of pairs of two different letters by a new letter and replacement of maximal repetition of a letter by a new symbol) to strings represented by SLPs. To this end we modify the SLPs, so that performing such compression operations on SLPs is possible. For instance, when we want to replace ab in the string and SLP has a production X → aY and the string generated by Y is bw, then we alter the rule of Y so that it generates w and replace Y with bY in all rules. In this way the rule becomes X → abY and so ab can be replaced, similar operations are defined for the right sides of the nonterminals. As a result, we are interested mostly in the SLP representation rather than the string itself and its combinatorial properties. What we need to control, though, is the size of the SLP. With appropriate choices of substrings to be compressed it can be shown that it stays linear.

The proposed method turned out to be surprisingly efficient and applicable in various scenarios: for instance it can be used to test the equality of SLPs in time O(n log N), where n is the size of the SLP and N the length of the generated string; on the other hand it can be used to approximate the smallest SLP for a given string, with the approximation ratio O(log(n/g)) where n is the length of the string and g the size of the smallest SLP for this string, matching the best known bounds.

Giovanni Manzini (University of Eastern Piedmont and IIT-CNR, Italy)

Wheeler Graphs: Variations on a Theme by Burrows and Wheeler

The famous Burrows-Wheeler Transform was originally defined for single strings but variations have been developed for sets of strings, labelled trees, de Bruijn graphs, alignments, etc. In this talk we propose a unifying view that includes many of these variations and that we hope will simplify the search for more.

Somewhat surprisingly we get our unifying view by considering the Nondeterministic Finite Automata related to different pattern-matching problems. We show that the state graphs associated with these automata have common properties that we summarize with the concept of a Wheeler graph1. Using the notion of a Wheeler graph, we show that it is possible to process strings efficiently even if the automaton is nondeterministic. In addition, we show that Wheeler graphs can be compactly represented and traversed using up to three arrays with additional data structures supporting efficient rank and select operations. It turns out that these arrays coincide with, or are substantially equivalent to, the output of many Burrows-Wheeler Transform variants described in the literature.

This is joint work with Travis Gagie and Jouni Sirén.

1On many occasions Mike Burrows stated that the original idea of the transformation is due to David Wheeler. We therefore decided to name this graph class after this pioneer of computer science.

Marcin Mucha (University of Warsaw, Poland)

Shortest Superstring

In the Shortest Superstring problem (SS) one has to find a shortest string s containing given strings s1,…,sn as substrings. The problem is NP-hard, so a natural question is that of its approximability.

One natural approach to approximately solving SS is the following GREEDY heuristic: repeatedly merge two strings with the largest overlap until only a single string is left. This heuristic is conjectured to be a 2-approximation, but even after 30 years since the conjecture has been posed, we are still very far from proving it. The situation is better for non-greedy approximation algorithms, where several approaches yielding 2.5-approximation (and better) are known.

In this talk, we will survey the main results in the area, focusing on the fundamental ideas and intuitions.