Friday, February 14, 2014

Hidden Markov Models for Natural Language Tagging, in Haskell

I became intrigued in Hidden Markov Models after reading Kurzweil, who claims they can be used to model the thinking process in the brain. There is much debate about that, but these are interesting AI structures. This page I think has a good introduction.

I was working though the (partial) online book on Natural Language Processing with Haskell, and thought of combining the two. I used Mike Izbicki's hmm library for a one order Hidden Markov Model implementation. Once I initialized the model properly using the training data, I got around 91% accuracy on tagging, which is on par with the rules based approach presented in the nlpwp book.

I used the strategy outline in this paper to deal with unknown words (words in the test set not met in the training set): replace these words with a token that is also used for low frequency words in the training set. So far I've used only one token but I suppose being a bit more fined grained (to distinguish words starting with a capital letter, currency amounts, numbers) will improve results.

Performance is not very good even with some parallelism, so I think I need to spend more time on it, but it's definitely encouraging. It'll be a little bit of time till I have a thinking brain, though, but there is hope!

Sunday, February 02, 2014

Beginning Haskell: the Book!

It's my pleasure to relay the news of the publication of a new Haskell book, Beginning Haskell. It's written by one of EclipseFP contributors, Alejandro, and I've had the pleasure to be a technical reviewer of the book. I learned a lot, so I can heartily recommend it! It takes the reader from the very basics of functional programming to some intermediate and advanced Haskell techniques. Alejandro of course talks about it better than me.

Happy Haskell Reading! Thanks Alejandro for the great work!