This explains (or tries to explain) how I deal with finding plural words in my various lexicons. I wanted to do this so my polygon solver can provide options to exclude plural words, or in some cases, plural words that end in 's'.
My data sources are just lists of words - I have no idea which words in these lists are plurals. So I needed to find a way to automate the process of deciding whether (or not) a word is a plural. My list of 224000 words was way too long to go through by hand.
Then I found this (rather amazing) python module inflect.py. This module has a singular_noun() method which indicates if a potentially plural noun has a singular version. For example, singular_noun("brethren")
gives me "brother"
. Which is great. However, pure inflection does not always work: e.g. singular_noun("asparagus")
gives me "asparagu"
, which is not a word (at least not one that I'm familiar with).
To overcome this, I additionally check that the result from singular_noun(potential_plural)
is also in my dictionary. If it is, then potential_plural
is added to my list of plural nouns.
This seemed to work out pretty well, except that I noticed that some (~0.1%) of the resulting words (mostly ending with 'ss') were clearly not plurals. After a little clear up, I have not had to greatly modify the list thus obtained (61500 plural words), but it has undergone some improvements thanks to helpful suggestions from people out there. If you do find any anomalies then *please* let me know!
All in all, I think the algorithm for detecting plural words is highly (99%) accurate but by no means foolproof.
I hope all this makes at least some sort of sense!