This explains (or tries to explain) how I deal with finding plural words in my various lexicons. I wanted to do this so my polygon solver can provide options to exclude plural words, or in some cases, plural words that end in 's'.
My data sources are just lists of words - I have no idea which words in these lists are plurals. So I needed to find a way to automate the process of deciding whether (or not) a word is a plural. My list of 224000 words was way too long to go through by hand.
Then I found this (rather amazing) python module inflect.py. This module has a singular_noun() method which indicates if a potentially plural noun has a singular version. For example, singular_noun("brethren")
gives me "brother"
. Which is great. However, pure inflection does not always work: e.g. singular_noun("asparagus")
gives me "asparagu"
, which is not a word (at least not one that I'm familiar with).
To overcome this, I additionally check that the result from singular_noun(potential_plural)
is also in my dictionary. If it is, then potential_plural
is added to my list of plural nouns. I have not (yet) had to manually edit the list thus obtained (61600 plural words). If you do find any anomalies then please let me know.