Re: [Snowball-discuss] Spanish stemmer with accents stripped before stemming

From: Martin Porter (martin.porter@grapeshot.co.uk)
Date: Mon May 21 2007 - 12:49:51 BST


I find I can't connect to 200.67.231.185, so I'm not too sure what's
going on here. Obviously to us, it a bit easier to look at the problem
from the snowball angle, rather than think about the generated java
after it's been put inside lucene! As far as the snowball script is
concerned, I believe you could strip out accents from the source,
eliminate the duplicate strings in the amongs(..) that would result, and
recompile, getting the effect you want.

(Incidentally, I have hit this problem with Spanish stemming before, but
it was a long while ago -- before the development of snowball.)

Also I'm not familiar with the java codegenerated output. I don't know
if Richard Boulton (who write the java codegenerator) has anything more
to add at this stage?

Martin



This archive was generated by hypermail 2.1.3 : Thu Sep 20 2007 - 12:02:49 BST