Generating fake words 2020-09-23 Selected words generated by some code I stayed up to write: - ovem - derombithere - tumby - itopod - beimaran - hators - tathe - keed - thinche Except for really short ones, almost all words generated are pronounceable which is cool! Up next: upgrading from bigram to trigram model to see if words improve, generalizing to n-gram?, training on different languages to see if humans can recognize what language a fake word is in. Ok, generalized to generate words based on an ngram model for any n. Using a trigram model seems to be a good balance between getting things that aren't nonsense, and not overfitting. Some post-upgrade words: - thill - fallince - caut - jand - yed - ulcaminds - cill - priame - joilve - fieforthe - rachat Spanish: - señorecísies - donre - mentomo - reyeno - golunchabe - wamentó - hagrarrio - ínsabocesgo - halmetigar - gañor - órino - últuro - antralles German: - maler - ung - parbsch - frilleschon - wein - trös - vereitt - ält - grich - zumpfstissel - zaugn - bäuer - lauer - täur - magdaß - chwort Selected responses: - "derombitheres, itopods, and beimarans are definitely all extinct genera of animals. derombitheres are a genus of mammals somewhere between horses and elephants, itopods are an extremely bizarre Cambrian clade which are probably a cousin to Hallucigenia (though this is disputed), beimarans are a type of protist that flourished briefly during the Triassic" - "foucault's homies definitely called him caut back in the day" - "That’s thill as hell" - "fallince for sure sounds like a word i wanna learn how to use" See the source code. Notes from the future: The models are trained on the Illiad (I don't know which translation) for English, Don Quixote for Spanish, and Der Gwissenswurm for German.