Midterm-project.github.io

View the Project on GitHub amklin/Midterm-project.github.io

Midterm Project DHML

Midterm Project

Amanda Lin, Anna Muyan Li

Finding the locations or countries in the poems of Marianne Moore using machine learning and natural language processing.

  1. The method Word2Vec worked well. Our original dataset of poems was limited, so we utilized a pre-trained model from Gensim, trained on data from Google News. Their model was trained on a bigger dataset, which included place names.
    This allowed us to find associations between a particular place name and objects associated with it. We defined an association function that would go through the list of poems and return the words with a high association to that place name, as well as the poem in which those place names are found.
  2. Our LDA model did not work. We could not find the optimal number for the number of passes and the number of topics for LDA. Additionally, because we used the list of poems as our data, the model could not determine which nouns were location names.
    If we wanted to improve this model, we could have tried to use an pre-trained model.
  3. If we had time, we would try Named Entity Recognition (NER). This would allow us to recognize which nouns are proper place names, so we can identify the locations.
    This could probably work hand in hand with our Word2Vec model. Once we found all the place names, we can plug it into our association function, and find the related features or locations in that country or near that location.

Our resources

NLTK Library
Gensim Library
Spacy Library
re Library

View Our Code!