Generating fake words
2020-09-23
Selected words generated by some code I stayed up to write:
- ovem
- derombithere
- tumby
- itopod
- beimaran
- hators
- tathe
- keed
- thinche
Except for really short ones, almost all words generated are pronounceable which is cool! Up next: upgrading from bigram to trigram model to see if words improve, generalizing to n-gram?, training on different languages to see if humans can recognize what language a fake word is in.
Ok, generalized to generate words based on an ngram model for any n. Using a trigram model seems to be a good balance between getting things that aren't nonsense, and not overfitting. Some post-upgrade words:
- thill
- fallince
- caut
- jand
- yed
- ulcaminds
- cill
- priame
- joilve
- fieforthe
- rachat
Spanish:
- señorecísies
- donre
- mentomo
- reyeno
- golunchabe
- wamentó
- hagrarrio
- ínsabocesgo
- halmetigar
- gañor
- órino
- últuro
- antralles
German:
- maler
- ung
- parbsch
- frilleschon
- wein
- trös
- vereitt
- ält
- grich
- zumpfstissel
- zaugn
- bäuer
- lauer
- täur
- magdaß
- chwort
Selected responses:
- "derombitheres, itopods, and beimarans are definitely all extinct genera of animals. derombitheres are a genus of mammals somewhere between horses and elephants, itopods are an extremely bizarre Cambrian clade which are probably a cousin to Hallucigenia (though this is disputed), beimarans are a type of protist that flourished briefly during the Triassic"
- "foucault's homies definitely called him caut back in the day"
- "That’s thill as hell"
- "fallince for sure sounds like a word i wanna learn how to use"
See the source code.
Notes from the future:
The models are trained on the Illiad (I don't know which translation) for English, Don Quixote for Spanish, and Der Gwissenswurm for German.