An analysis of Lincoln's words

2013-02-12 7 min read

    On Saturday, I finished Team of Rivals and while looking at my calendar noticed that it was also Lincoln’s birthday this week. What better way to celebrate his birthday than to analyze his speeches and letters? I downloaded the 7 volume set containing his speeches, letters, and essays from Project Gutenberg and spent a few hours on Sunday cleaning the text and writing a parsing script. On Monday, I started analyzing the text to see if I could make sense of it.

    I was able to get 1,458 documents containing almost 16,500 sentences and a little over 547,000 words. I tried getting the date each letter was written or speech was given but was only able to get it for 60% of the documents. That was enough to get some insights.

    Number of speeches/letters by year

    I suspect a lot of his early writing and speeches and were lost since they just weren't preserved as well as his later speeches and letters

    Trend of phrases

    I wanted to examine the phrases that he most commonly used over time in order to see whether there were any noticeable changes and whether they meant something. Turns out there was some interesting stuff here that's highlighted in green.

    • Slavery - There are references to slavery across the entire date range with the Dred Scott decision and the Missouri Compromise appearing as common phrases in the 1850s.
    • Civil War Generals - You can trace the career of the generals during the Civil War based on their mentions. General Hooker was mentioned in 1862 and 1863; General Meade in 1863 and 1864; and General Grant in 1864 and 1865. This echoes history: General Hooker was replaced by General Meade in 1863 with General Grant being in command of the Union Army in October of 1863.
    • The Presidency - When Lincoln was elected president in 1860, he started finishing his letters with the phrase "Lincoln, President of." During the presidency we also see mentions of his cabinet: Stanton and Seward.

    *The table below was generated by looking at the top 20 three word phrases used in each year range and then consolidated into a top 100 list across the entire dataset. The X indicates that the phrase was in the top 20 three word phrases for that year range. I highlighted the interesting rows in green.

    the united statesXXXXXXXXX
    of the unitedXXXXXXXXX
    i do notXXXXXXXXX
    the secretary ofXXXXXXX
    secretary of warXXXXXX
    in regard toXXXXXXXX
    the people ofXXXXXXXXX
    of the peopleXXXXXXXX
    president of theXXXXXXXXX
    in favor ofXXXXXXXX
    my dear sirXXXXXXXX
    as well asXXXXXXXX
    so far asXXXXXXXXX
    dred scott decisionX
    there is noXXXXXXXXX
    by the presidentXXXXXXXX
    the supreme courtXXXXX
    united states andXXXXXXXX
    of the unionXXXXXXXXX
    that it isXXXXXXXXX
    it is aXXXXXXXX
    that judge douglasX
    the dred scottX
    that there isXXXXXXXXX
    institution of slaveryXXXX
    secretary of stateXXXXXXXX
    the missouri compromiseXX
    to say thatXXXXXXXX
    of the stateXXXXXXXXX
    the state ofXXXXXXXXX
    of the governmentXXXXXXXXX
    major general mcclellanXXX
    of the countryXXXXXXXX
    secretary of theXXXXXXX
    of the armyXXXXXX
    it is notXXXXXXXXX
    of the potomacXXXXX
    part of theXXXXXXXX
    one of theXXXXXXXX
    united states toXXXXXXX
    washington d cXXXXX
    house of representativesXXXXXXXXX
    as to theXXXXXXXXX
    harper s ferryXXXXX
    the public safetyXXXX
    major general hookerXX
    the gentleman fromXXX
    lieutenant general grantXX
    major general halleckXXX
    major general meadeXX
    of the enemyXXXXXX
    the union andXXXXXXX
    the day ofXXXXXXXXX
    the president ofXXXXXXXX
    the rio grandeX
    the senate andXXXXXXX
    to the senateXXXXXX
    army of theXXXXXX
    city point vaXX
    and house ofXXXXX
    executive mansion washingtonXXXXX
    of the treasuryXXXXXX
    of the secretaryXXXXX
    of the bankXX
    of the publicXXXXXX
    of the warXXXXXXX
    yours very trulyXXXXXX
    as may beXXXXXXXX
    he did notXXXX
    lincoln president ofXXXXXX
    m stanton secretaryXXXX
    stanton secretary ofXXXX
    the war departmentXXXXXX
    i shall beXXXXXXXXX
    william h sewardXXXXXX
    edwin m stantonXXXX
    for the purposeXXXXXXXXX
    general grant cityXX
    i have beenXXXXXXXX
    is to beXXXXXXX
    it will beXXXXXXXX
    it would beXXXXXXXX
    of all theXXXXXXXXX
    of the departmentXXXXX
    the post officeXXXXXX
    the public landsXXXXX
    yours of theXXXXXXX
    at p mXXXX
    grant city pointXX
    h seward secretaryXXXXXX
    i have noXXXXXXXXX
    in relation toXXXXXXXX
    seward secretary ofXXXXXX
    that i haveXXXXXXXXX
    as follows toXXXXX
    dear sir yoursXXXXXX
    sir yours ofXXXXXXX
    dear sir iXXXXXXX
    ought to beXXXXXXXXX
    of the isXXXXXXXX

    Phrase word clouds

    I tried visualizing the table above as word clouds but in hindsight don't think it was the best way to display the data. It did give me an excuse to play around with D3 library though.

    As usual, the code’s up on Github.