Python-style string split (and strip / trim) in Haskell
Haskell’s an incredible language but I’m repeatedly struck by (what I perceive as) gaps in its standard libraries (though as proven yesterday I could know them better). For example, there aren’t any cross-platform file path manipulation routines in the style of python’s os.path (though not for long, thanks to Neil Mitchell). Another lacking area seems to be string manipulation.
Today, I needed to split a string up according to some token, where that token is itself a string, not a single character. Thus, a split in the style of python’s strings’ built in split() method. That is, I wanted to be able to do this:
*StringSplit> split ", " "one, two, three, four"
["one","two","three","four"]
*StringSplit> split "an" "banana"
["b","","a"]
To my amazement, this isn’t a standard library function, and I couldn’t find hundreds of examples of how to do it via google. I couldn’t even find one! I could find a number of variants where the token is just a single Char, usually utilising takeWhile and the like. Gladly this led me to a reasonable (I hope) solution. This example splits on Chars but in the comments I was introduced to unfoldr, which looked tantalisingly close to what I wanted. All I needed to do now was write the function unfoldr calls. About half an hour and much hoogling later, hey pesto:
module StringSplit where
import List
-- | String split in style of python string.split()
split :: String -> String -> [String]
split tok splitme = unfoldr (sp1 tok) splitme
where sp1 _ "" = Nothing
sp1 t s = case find (t `isSuffixOf`) (inits s) of
Nothing -> Just (s, "")
Just p -> Just (take ((length p) - (length t)) p,
drop (length p) s)
All of the complexity is in sp1 (short for “split once”), which performs a single split. sp1’s type is String -> String -> Maybe (String, String); the section (sp1 tok) then has the right type for unfoldr (sections just seem to be cropping up everywhere lately, eh?). The point of sp1 is to split s at the first occurrence of t. It does this by using inits to build all prefices of s and finding the first one having t as a suffix. If found, it splits the string at the token, throws the token away, and return the (before, after) pair. If the token isn’t found, it just returns s paired with an empty string; that gets passed by unfoldr to sp1 on the next call, causing sp1 to return Nothing, causing unfoldr to finish – whew!
That last bit sounds a bit complicated, but it was necessary to fix a bug present in my first implementation of sp1:
sp1' t s = case find (t `isSuffixOf`) (inits s) of
Nothing -> Nothing
Just p -> Just (take ((length p) - (length t)) p,
drop (length p) s)
This misses the final substring, so split "n" "banana" would return ["ba", "a"] instead of ["ba", "a", "a"]. Hence the whole empty string/Nothing/stop machinery.
I could not have written this code two weeks ago, I think. Thus, I’m sure there are better ways to do this, and would love to hear them – and any criticism. I was a wee bit worried about using inits and find in this way for finding a substring (no standard substring function?). Haskell’s lazy nature ought to help me here though: inits will only build the entire list of prefices if t isn’t in s at all. For anything I’m planning to use this for, it ought to be fine; it’s probably not the best general way to do it though.
Postscript
For good measure, since I had to implement it later that evening, here’s a string.strip() (aka trim):
strip :: String -> String
strip s = dropWhile ws $ reverse $ dropWhile ws $ reverse s
where ws = (`elem` [' ', '\n', '\t', '\r'])
I’m particularly proud of the ws part. :-) Look at me, all full of Haskell pride. ;-)
Postpostscript
I realised I can generalise split beyond strings, to work on lists of anything of class Eq:
-- | A function inspired by python's string.split(). A list is split
-- on a separator which is itself a list (not a single element).
split :: Eq a => [a] -> [a] -> [[a]]
split tok splitme = unfoldr (sp1 tok) splitme
where sp1 _ [] = Nothing
sp1 t s = case find (t `isSuffixOf`) $ (inits s) of
Nothing -> Just (s, [])
Just p -> Just (take (length p - length t) p,
drop (length p) s)
*StringSplit> split [2,3] [1,2,3,4,3,2,1,2,3,4]
[[1],[4,3,2,1],[4]]
9 Responses to “Python-style string split (and strip / trim) in Haskell”
Leave a reply
You can use HTML, but you don't have to. Formatting tips (for code, quotes, etc.) here.





I think most people just use Text.Regex.splitRegex.
Nice — thanks!
I should definitely put aside some time to get to know the standard libraries better – that much is becoming clear. Still, it was a good exercise, and now at least there’s a good hit if you google for, say, “python string split in haskell”. :-)
Cheers!
For reference, here’s my attempt at the same function. I’m not sure it’s any more elegant, but may be useful if only because it takes a slightly different approach.
split tok = spl [] where
spl ws [] = [ws]
spl ws vs | tok `isPrefixOf` vs = ws : spl [] (drop (length tok) vs)
spl ws (v:vs) = spl ( ws ++ [v] ) vs
Small style point:
Since you don’t change the ‘t’ parameter passed through ’sp1′, you might as well just use ‘tok’ from the outer function, inside of ’sp1′. One less thing to keep track of when reading the function later. So you get:
Good point, Kurt. I think I originally had sp1 factored out into a standalone function. Thanks!
Mattew,
John Goerzen has a library called MissingH. It contains all sorts of useful items that are not present in the standard Haskell libraries. One of which is a Data.String library containing what you have implemented (and more):
http://software.complete.org/missingh/static/doc/Data-String.html
John has a Python background and you can see the influence just by looking at this library.
Pete
I had the exact same need for a string splitting command a while back, except I wanted to use the command from a shell script. I decided to write my solution in Haskell naturally. My solution is here: http://hpaste.org/1929?lines=true, and the sources are here: (http://kmbphoto.net/strsplit.lhs, http://kmbphoto.net/strjoin.lhs) with pdfs from the literate haskell here: (http://kmbphoto.net/strsplit.pdf, http://kmbphoto.net/strjoin.pdf).
Here are the main functions:
Check it out: Bryan O’Sullivan shows me how it’s done.
Just googled ‘haskell split’ because i needed that functionality, read your post, then came up with my own solution:
Prelude> foldl (\a b -> if b==’n’ then (a ++ [[]]) else (init a) ++ [(last a) ++ [b]]) [[]] “banana”
["ba","a","a"]
Well, not as fancy, but alot smaller.