Hillary Clinton and Presidential Language Fingerprints

Abstract

“Hillary Clinton & Presidential Language” is designed to answer the question: Does Hillary Clinton Speak Presidential? During the current race for the 2016 Presidential Election, Hillary Clinton is one of the candidates with the most experience in politics, yet she fits into a demographic highly criticized within the linguistic community. As a woman, she is criticized for speaking more tentatively by including tag questions or hedges (Lakoff), which have been identified as markers of “women’s language” by Robin Lakoff in the essay “Language and Woman’s Place.” Hillary’s use of these linguistic markers was proven in an article from 2007 entitled “Gender Differences in Media Interviews of Bill and Hillary Clinton.” The authors Suleiman and O’Connell observed that Hillary used the phrases “so” and “you know” more frequently than her husband Bill (Suleiman). This data collected from TV interviews does not reflect the poised politician that she is in formal settings nor does it account for her monumental career growth since the study was originally published.

Hillary has held the positions of Senator, Secretary of State, and today, she is the democratic frontrunner for the 2016 presidential election. Due to her changing roles as a leader and politician, it is both timely and relevant to analyze her “linguistic fingerprint” to determine whether a woman presidential candidate displays different linguistics characteristics than her counterparts who are men. It is hypothesized that Hillary Clinton’s “linguistic fingerprint” will not prove a gendered difference between politicians whether women or men, and that Hillary will speak very similarly to other men who are politicians and past presidents. This study should prove the validity of a woman president and expose that the linguistic critiques of Hillary Clinton in the media are developed upon stereotypes and sexism.

Method

This project includes five main corpuses all examined by using unsupervised clustering. Each corpus was processed seperately, and each speech was broken down into a frequency table. The collection of the different speech frequency tables provides insight into the high frequency word features of each speaker. The high frequency words being the, of, and, he, she, our and and many others. The frequencies of these features are then used to identify a “linguistic fingerprint” of each speaker.

Each speech within the corpuses is compared to the others. To compare the differences, Euclidean distance is calculated between each speech. This creates a distance between speeches based upon the similarity or difference between each speakers use of high frequency words or their “linguistic fingerprint”. The results are clustered and plotted in a Cluster Dendrogram to reveal speakers who are similar and outliers.

library(dplyr)

Famous Political Speeches by Presidents and Politicians
This corpus contains twelve speeches by politicians and Presidents.

The local file path to the folder of speeches is saved into the variable of corpus_dir. files_v then clarifies that the directory created must contain all files ending in the pattern “.txt”. Once both are defined, All twelve speeches exist in one vector of files.

corpus_dir <- “/Users/emmahimes/Desktop/SCHOOL/FallSemester/TextAnalysis/TextAnalysisWithR/Speech”
files_v <- dir(path = corpus_dir, pattern = “.*txt”)

This user defined function is able to make a character vector of every text. The For Loop is the most convient way to process every text quickly with only a few lines of code. This For Loop takes each text from the previously defined function, scans in the text, separating lines. The blank spaces are then collapsed and all characters become lowercase. Next, the characters are are split based upon a regular expression, because this command produces a list, the list must be unlisted to become a vector. Finally all blank characters are eliminated and the words are tabled. Aftering tabling the words, the relative frequency of each word is found and sorted within a new object. The new object is a list of each speech. Within each speech is a table of relative word frequencies.

make_file_word_v_l <- function(files_v, output_dir){
text_word_vector_l <- list()
for(i in 1:length(files_v)){
text_v <- scan(paste(corpus_dir, files_v[i], sep = “/”),
what = “character”, sep = “\n”)
text_v <- paste(text_v, collapse = ” “)
text_lower_v <- tolower(text_v)
text_words_v <- strsplit(text_lower_v, “\\W”)
text_words_v <- unlist(text_words_v)
text_words_v <- text_words_v[which(text_words_v!=””)]
book_freqs_t <- table(text_words_v)
book_freqs_rel_t <- 100*(book_freqs_t/sum(book_freqs_t))
text_word_vector_l[[files_v[i]]] <- book_freqs_rel_t
}
return(text_word_vector_l)
}

my_corpus_l <- make_file_word_v_l(files_v, corpus_dir)

The word frequency data is taken from the corpus list and converted into individual data frame objects by using the mapply function, the information is then bound together into a single three column data frame, which is a long form table. By using xtabs, the data frame is pivoted to become a wide form table. Each row becomes the speech and each column is a different word feature.

freqs_l <- mapply(data.frame,
ID=seq_along(my_corpus_l),
my_corpus_l, SIMPLIFY = FALSE,
MoreArgs = list(stringsAsFactors = FALSE))
freqs_df <- do.call(rbind, freqs_l)
result <- xtabs(Freq ~ ID+text_words_v, data = freqs_df)

The wide frame data table is then converted to a matrix by using apply.

Then for three different result sizes (all words, a small sample, and an even smaller sample), the distance between speeches is found, clustered and plotted.

final_m <- apply(result, 2, as.numeric)
dim(final_m)
## [1] 12 5602
d_final_m <- dist(final_m)
cluster_final <- hclust(d_final_m)
cluster_final$labels <- names(my_corpus_l)
plot(cluster_final, main = “Total Cluster Dendrogram: Politicians & Presidents”, sub = “Popular Political Speeches by Politicians, Judges, and Presidents”)

smaller_m =.01]
dim(smaller_m)
## [1] 12 1203
d_smaller_m <- dist(smaller_m)
cluster_smaller <- hclust(d_smaller_m)
cluster_smaller$labels <- names(my_corpus_l)
plot(cluster_smaller, main = “Small Cluster Dendrogram: Politicians & Presidents”, sub = “Popular Political Speeches by Politicians, Judges, and Presidents”)

smallest_m =.1]
dim(smallest_m)
## [1] 12 129
d_smallest_m <- dist(smallest_m)
cluster_smallest <- hclust(d_smallest_m)
cluster_smallest$labels <- names(my_corpus_l)
plot(cluster_smallest, main = “Smallest Cluster Dendrogram: Politicians & Presidents”, sub = “Popular Political Speeches by Politicians, Judges, and Presidents”)


Commencement Speeches by Presidents and their Wives

All of the following code has been repeated with a different corpus:

This corpus contains eight speeches. One commencement speech from each of the four most recent presidents of the United States and their wives.

The local file path to the folder of speeches is saved into the variable of corpus_dir. files_v then clarifies that the directory created must contain all files ending in the pattern “.txt”. Once both are defined, All eight speeches exist in one vector of files.

mcorpus_dir <- “/Users/emmahimes/Desktop/SCHOOL/FallSemester/TextAnalysis/TextAnalysisWithR/mSpeech”
mfiles_v <- dir(path = mcorpus_dir, pattern = “.*txt”)

This user defined function is able to make a character vector of every text. The For Loop is the most convient way to process every text quickly with only a few lines of code. This For Loop takes each text from the previously defined function, scans in the text, separating lines. The blank spaces are then collapsed and all characters become lowercase. Next, the characters are are split based upon a regular expression, because this command produces a list, the list must be unlisted to become a vector. Finally all blank characters are eliminated and the words are tabled. Aftering tabling the words, the relative frequency of each word is found and sorted within a new object. The new object is a list of each speech. Within each speech is a table of relative word frequencies.

mmake_file_word_v_l <- function(mfiles_v, output_dir){
mtext_word_vector_l <- list()
for(i in 1:length(mfiles_v)){
mtext_v <- scan(paste(mcorpus_dir, mfiles_v[i], sep = “/”),
what = “character”, sep = “\n”)
mtext_v <- paste(mtext_v, collapse = ” “)
mtext_lower_v <- tolower(mtext_v)
mtext_words_v <- strsplit(mtext_lower_v, “\\W”)
mtext_words_v <- unlist(mtext_words_v)
mtext_words_v <- mtext_words_v[which(mtext_words_v!=””)]
mbook_freqs_t <- table(mtext_words_v)
mbook_freqs_rel_t <- 100*(mbook_freqs_t/sum(mbook_freqs_t))
mtext_word_vector_l[[mfiles_v[i]]] <- mbook_freqs_rel_t
}
return(mtext_word_vector_l)
}

mmy_corpus_l <- mmake_file_word_v_l(mfiles_v, mcorpus_dir)

The word frequency data is taken from the corpus list and converted into individual data frame objects by using the mapply function, the information is then bound together into a single three column data frame, which is a long form table. By using xtabs, the data frame is pivoted to become a wide form table. Each row becomes the speech and each column is a different word feature.

mfreqs_l <- mapply(data.frame,
ID=seq_along(mmy_corpus_l),
mmy_corpus_l, SIMPLIFY = FALSE,
MoreArgs = list(stringsAsFactors = FALSE))
mfreqs_df <- do.call(rbind, mfreqs_l)
m_result <- xtabs(Freq ~ ID+mtext_words_v, data = mfreqs_df)

The wide frame data table is then converted to a matrix by using apply.

Then for three different result sizes (all words, a small sample, and an even smaller sample), the distance between speeches is found, clustered and plotted.

m_final_m <- apply(m_result, 2, as.numeric)
dim(m_final_m)
## [1] 8 3524
m_d_final_m <- dist(m_final_m)
m_cluster_final <- hclust(m_d_final_m)
m_cluster_final$labels <- names(mmy_corpus_l)
plot(m_cluster_final, main = “Total Cluster Dendrogram: Presidential Pairs”, sub = “Presidential Commencement Speeches by Marital Pairs”)

m_smaller_m =.01]
dim(m_smaller_m)
## [1] 8 1339
m_d_smaller_m <- dist(m_smaller_m)
m_cluster_smaller <- hclust(m_d_smaller_m)
m_cluster_smaller$labels <- names(mmy_corpus_l)
plot(m_cluster_smaller, main = “Small Cluster Dendrogram: Presidential Pairs”, sub = “Presidential Commencement Speeches by Marital Pairs”)

m_smallest_m =.1]
dim(m_smallest_m)
## [1] 8 134
m_d_smallest_m <- dist(m_smallest_m)
m_cluster_smallest <- hclust(m_d_smallest_m)
m_cluster_smallest$labels <- names(mmy_corpus_l)
plot(m_cluster_smallest, main = “Smallest Cluster Dendrogram: Presidential Pairs”, sub = “Presidential Commencement Speeches by Marital Pairs”)


Presidential State of the Unions and Hillary Clinton’s Speech

All of the following code has been repeated with a different corpus:

This corpus contains eleven presidential State of the Unions and one rally speech by Hillary Clinton.

The local file path to the folder of speeches is saved into the variable of corpus_dir. files_v then clarifies that the directory created must contain all files ending in the pattern “.txt”. Once both are defined, All twelve speeches exist in one vector of files.

soucorpus_dir <- “/Users/emmahimes/Desktop/SCHOOL/FallSemester/TextAnalysis/TextAnalysisWithR/souSpeech”
soufiles_v <- dir(path = soucorpus_dir, pattern = “.*txt”)

This user defined function is able to make a character vector of every text. The For Loop is the most convient way to process every text quickly with only a few lines of code. This For Loop takes each text from the previously defined function, scans in the text, separating lines. The blank spaces are then collapsed and all characters become lowercase. Next, the characters are are split based upon a regular expression, because this command produces a list, the list must be unlisted to become a vector. Finally all blank characters are eliminated and the words are tabled. Aftering tabling the words, the relative frequency of each word is found and sorted within a new object. The new object is a list of each speech. Within each speech is a table of relative word frequencies.

soumake_file_word_v_l <- function(soufiles_v, output_dir){
soutext_word_vector_l <- list()
for(i in 1:length(soufiles_v)){
soutext_v <- scan(paste(soucorpus_dir, soufiles_v[i], sep = “/”),
what = “character”, sep = “\n”)
soutext_v <- paste(soutext_v, collapse = ” “)
soutext_lower_v <- tolower(soutext_v)
soutext_words_v <- strsplit(soutext_lower_v, “\\W”)
soutext_words_v <- unlist(soutext_words_v)
soutext_words_v <- soutext_words_v[which(soutext_words_v!=””)]
soubook_freqs_t <- table(soutext_words_v)
soubook_freqs_rel_t <- 100*(soubook_freqs_t/sum(soubook_freqs_t))
soutext_word_vector_l[[soufiles_v[i]]] <- soubook_freqs_rel_t
}
return(soutext_word_vector_l)
}

soumy_corpus_l <- soumake_file_word_v_l(soufiles_v, soucorpus_dir)

The word frequency data is taken from the corpus list and converted into individual data frame objects by using the mapply function, the information is then bound together into a single three column data frame, which is a long form table. By using xtabs, the data frame is pivoted to become a wide form table. Each row becomes the speech and each column is a different word feature.

soufreqs_l <- mapply(data.frame,
ID=seq_along(soumy_corpus_l),
soumy_corpus_l, SIMPLIFY = FALSE,
MoreArgs = list(stringsAsFactors = FALSE))
soufreqs_df <- do.call(rbind, soufreqs_l)
sou_result <- xtabs(Freq ~ ID+soutext_words_v, data = soufreqs_df)

The wide frame data table is then converted to a matrix by using apply.

Then for three different result sizes (all words, a small sample, and an even smaller sample), the distance between speeches is found, clustered and plotted.

sou_final_m <- apply(sou_result, 2, as.numeric)
dim(sou_final_m)
## [1] 11 6257
sou_d_final_m <- dist(sou_final_m)
sou_cluster_final <- hclust(sou_d_final_m)
sou_cluster_final$labels <- names(soumy_corpus_l)
plot(sou_cluster_final, main = “Total Cluster Dendrogram: State of the Unions”, sub = “Presidential State of the Unions”)

sou_smaller_m =.01]
dim(sou_smaller_m)
## [1] 11 1219
sou_d_smaller_m <- dist(sou_smaller_m)
sou_cluster_smaller <- hclust(sou_d_smaller_m)
sou_cluster_smaller$labels <- names(soumy_corpus_l)
plot(sou_cluster_smaller, main = “Small Cluster Dendrogram: State of the Unions”, sub = “Presidential State of the Unions”)

sou_smallest_m =.1]
dim(sou_smallest_m)
## [1] 11 117
sou_d_smallest_m <- dist(sou_smallest_m)
sou_cluster_smallest <- hclust(sou_d_smallest_m)
sou_cluster_smallest$labels <- names(soumy_corpus_l)
plot(sou_cluster_smallest, main = “Smallest Cluster Dendrogram: State of the Unions”, sub = “Presidential State of the Unions”)

2016 Candidates

All of the following code has been repeated with a different corpus:

This corpus contains twelve presidential candidates announcing that they are running for the 2016 presidency.

The local file path to the folder of speeches is saved into the variable of corpus_dir. files_v then clarifies that the directory created must contain all files ending in the pattern “.txt”. Once both are defined, All twelve speeches exist in one vector of files.

ccorpus_dir <- “/Users/emmahimes/Desktop/SCHOOL/FallSemester/TextAnalysis/TextAnalysisWithR/cSpeech”
cfiles_v <- dir(path = ccorpus_dir, pattern = “.*txt”)

This user defined function is able to make a character vector of every text. The For Loop is the most convient way to process every text quickly with only a few lines of code. This For Loop takes each text from the previously defined function, scans in the text, separating lines. The blank spaces are then collapsed and all characters become lowercase. Next, the characters are are split based upon a regular expression, because this command produces a list, the list must be unlisted to become a vector. Finally all blank characters are eliminated and the words are tabled. Aftering tabling the words, the relative frequency of each word is found and sorted within a new object. The new object is a list of each speech. Within each speech is a table of relative word frequencies.

cmake_file_word_v_l <- function(cfiles_v, output_dir){
ctext_word_vector_l <- list()
for(i in 1:length(cfiles_v)){
ctext_v <- scan(paste(ccorpus_dir, cfiles_v[i], sep = “/”),
what = “character”, sep = “\n”)
ctext_v <- paste(ctext_v, collapse = ” “)
ctext_lower_v <- tolower(ctext_v)
ctext_words_v <- strsplit(ctext_lower_v, “\\W”)
ctext_words_v <- unlist(ctext_words_v)
ctext_words_v <- ctext_words_v[which(ctext_words_v!=””)]
cbook_freqs_t <- table(ctext_words_v)
cbook_freqs_rel_t <- 100*(cbook_freqs_t/sum(cbook_freqs_t))
ctext_word_vector_l[[cfiles_v[i]]] <- cbook_freqs_rel_t
}
return(ctext_word_vector_l)
}

cmy_corpus_l <- cmake_file_word_v_l(cfiles_v, ccorpus_dir)

The word frequency data is taken from the corpus list and converted into individual data frame objects by using the mapply function, the information is then bound together into a single three column data frame, which is a long form table. By using xtabs, the data frame is pivoted to become a wide form table. Each row becomes the speech and each column is a different word feature.

cfreqs_l <- mapply(data.frame,
ID=seq_along(cmy_corpus_l),
cmy_corpus_l, SIMPLIFY = FALSE,
MoreArgs = list(stringsAsFactors = FALSE))
cfreqs_df <- do.call(rbind, cfreqs_l)
c_result <- xtabs(Freq ~ ID+ctext_words_v, data = cfreqs_df)

The wide frame data table is then converted to a matrix by using apply.

Then for three different result sizes (all words, a small sample, and an even smaller sample), the distance between speeches is found, clustered and plotted.

c_final_m <- apply(c_result, 2, as.numeric)
dim(c_final_m)
## [1] 13 4546
c_d_final_m <- dist(c_final_m)
c_cluster_final <- hclust(c_d_final_m)
c_cluster_final$labels <- names(cmy_corpus_l)
plot(c_cluster_final, main = “Total Cluster Dendrogram: Presidential Nominees”, sub = “Presidental Nominee Announcement Speeches”)

c_smaller_m =.01]
dim(c_smaller_m)
## [1] 13 1110
c_d_smaller_m <- dist(c_smaller_m)
c_cluster_smaller <- hclust(c_d_smaller_m)
c_cluster_smaller$labels <- names(cmy_corpus_l)
plot(c_cluster_smaller, main = “Small Cluster Dendrogram: Presidential Nominees”, sub = “Presidental Nominee Announcement Speeches”)

c_smallest_m =.1]
dim(c_smallest_m)
## [1] 13 143
c_d_smallest_m <- dist(c_smallest_m)
c_cluster_smallest <- hclust(c_d_smallest_m)
c_cluster_smallest$labels <- names(cmy_corpus_l)
plot(c_cluster_smallest, main = “Smallest Cluster Dendrogram: Presidential Nominees”, sub = “Presidental Nominee Announcement Speeches”)

Presidents and 2016 Candidates

All of the following code has been repeated with a different corpus:

This corpus contains twelve presidential candidates announcing that they are running for the presidency along with eleven State of the Union addresses from US Presidents.

The local file path to the folder of speeches is saved into the variable of corpus_dir. files_v then clarifies that the directory created must contain all files ending in the pattern “.txt”. Once both are defined, All twenty-three speeches exist in one vector of files.

pwcorpus_dir <- “/Users/emmahimes/Desktop/SCHOOL/FallSemester/TextAnalysis/TextAnalysisWithR/pwSpeech”
pwfiles_v <- dir(path = pwcorpus_dir, pattern = “.*txt”)

This user defined function is able to make a character vector of every text. The For Loop is the most convient way to process every text quickly with only a few lines of code. This For Loop takes each text from the previously defined function, scans in the text, separating lines. The blank spaces are then collapsed and all characters become lowercase. Next, the characters are are split based upon a regular expression, because this command produces a list, the list must be unlisted to become a vector. Finally all blank characters are eliminated and the words are tabled. Aftering tabling the words, the relative frequency of each word is found and sorted within a new object. The new object is a list of each speech. Within each speech is a table of relative word frequencies.

pwmake_file_word_v_l <- function(pwfiles_v, output_dir){
pwtext_word_vector_l <- list()
for(i in 1:length(pwfiles_v)){
pwtext_v <- scan(paste(pwcorpus_dir, pwfiles_v[i], sep = “/”),
what = “character”, sep = “\n”)
pwtext_v <- paste(pwtext_v, collapse = ” “)
pwtext_lower_v <- tolower(pwtext_v)
pwtext_words_v <- strsplit(pwtext_lower_v, “\\W”)
pwtext_words_v <- unlist(pwtext_words_v)
pwtext_words_v <- pwtext_words_v[which(pwtext_words_v!=””)]
pwbook_freqs_t <- table(pwtext_words_v)
pwbook_freqs_rel_t <- 100*(pwbook_freqs_t/sum(pwbook_freqs_t))
pwtext_word_vector_l[[pwfiles_v[i]]] <- pwbook_freqs_rel_t
}
return(pwtext_word_vector_l)
}

pwmy_corpus_l <- pwmake_file_word_v_l(pwfiles_v, pwcorpus_dir)

The word frequency data is taken from the corpus list and converted into individual data frame objects by using the mapply function, the information is then bound together into a single three column data frame, which is a long form table. By using xtabs, the data frame is pivoted to become a wide form table. Each row becomes the speech and each column is a different word feature.

pwfreqs_l <- mapply(data.frame,
ID=seq_along(pwmy_corpus_l),
pwmy_corpus_l, SIMPLIFY = FALSE,
MoreArgs = list(stringsAsFactors = FALSE))
pwfreqs_df <- do.call(rbind, pwfreqs_l)
pw_result <- xtabs(Freq ~ ID+pwtext_words_v, data = pwfreqs_df)

The wide frame data table is then converted to a matrix by using apply.

Then for three different result sizes (all words, a small sample, and an even smaller sample), the distance between speeches is found, clustered and plotted.

pw_final_m <- apply(pw_result, 2, as.numeric)
dim(pw_final_m)
## [1] 23 8018
pw_d_final_m <- dist(pw_final_m)
pw_cluster_final <- hclust(pw_d_final_m)
pw_cluster_final$labels <- names(pwmy_corpus_l)
plot(pw_cluster_final, main = “Total Cluster Dendrogram: SOTU & Nominees”, sub = “President State of the Unions & Presidental Nominee Announcement Speeches”)

pw_smaller_m =.01]
dim(pw_smaller_m)
## [1] 23 1115
pw_d_smaller_m <- dist(pw_smaller_m)
pw_cluster_smaller <- hclust(pw_d_smaller_m)
pw_cluster_smaller$labels <- names(pwmy_corpus_l)
plot(pw_cluster_smaller, main = “Small Cluster Dendrogram: SOTU & Nominees”, sub = “President State of the Unions & Presidental Nominee Announcement Speeches”)

pw_smallest_m =.1]
dim(pw_smallest_m)
## [1] 23 130
pw_d_smallest_m <- dist(pw_smallest_m)
pw_cluster_smallest <- hclust(pw_d_smallest_m)
pw_cluster_smallest$labels <- names(pwmy_corpus_l)
plot(pw_cluster_smallest, main = “Smallest Cluster Dendrogram: SOTU & Nominees”, sub = “President State of the Unions & Presidental Nominee Announcement Speeches”)

Discussion

The first comparison of twelve famous political speeches immediately eliminated the thought of gendered political speech. The results are fairly mixed between genders, and Hillary Clinton and Bill Clinton spoke very similarly. The subject content of the speeches could have greatly influenced the results, however the high frequency words still did not indicate a major difference. The similarity between Hillary and Bill led to the next study of Presidents and their wives, to prove whether or not living proximity influenced the similarities.

The second comparison of Presidents and their wives included a corpus of eight commencement speeches, because not all wives had made political speeches before Hillary Clinton. The results did not show any similarities by marital pair.

The third comparision included a corpus of eleven Presidential State of the Unions and one rally speech by Hillary Clinton, to fully establish that her language is “Presidential”. The Hillary Clinton rally speech was included in the corpus because it has many similar topics as to not automatically cause an outlier. Hillary’s “linguistic fingerprint” was very similar to Bill Clinton once again, this can partically be explained by the time period; however, it is very possible that Hillary and Bill are similar linguistically. The largest split in this this Dendrogram can be explained by the era that the presidents served. The most recent presidents spoke more similarly. After comparing Hillary’s rally speech, it was essential to compare the rally speeches of other top candidates.

The fourth comparison included twelve speeches by 2016 Presidential Candidates. All speeches are from campaign announcements, and Kanye West was included as well because he announced his run for office in 2020 this year at the VMAs. The results are not gendered in this result either, and they are not separated by political party affiliation either. The smallest cluster reveals that many of the cartoon like candidates are clustered together: West, Trump, Carson, and Christie.

The fifth and final comparision includes a large corpus of twenty-three different speeches. Twelve of the speeches are the 2016 Candidate Announcement Speeches and eleven are Presidential State of the Unions. The two major outliers of the smallest sample are Trump and West. They are followed by a group of outliers based upon time period: Monroe, Wilson, Lincoln, and Truman. There is not a clear split between political parties or gender on the cluster dendrogram. Hillary Clinton and Bill Clinton are not closely connected either. Cruz and Rubio are closely distanced which could account for linguistic differences of Cuban-Americans. Otherwise, the results prove that there is a very basic political “linguistic fingerprint”.

This political “linguistic fingerprint” may be the work of speech writers; however, each speech act is considered to reflect the speaker and not a possible speech writer. The results have proven that Political Discourse is not gendered or split between party lines. However, there are clear outliers who do not follow the model.


Sources

Jockers, Matthew Lee. “Chapter 11: Clustering Data.” “Text Analysis with R for Students of Literature”. Springer, 2014. N. pag. Print.

Lakoff, Robin. “Language and Woman’s Place.” Lang. Soc. Language in Society 2.01 (1973): 45-79. Cambridge University Press. Web. 15 Nov. 2015.

Suleiman, Camelia, and Daniel C. O’Connell. “Gender Differences in the Media Interviews of Bill and Hillary Clinton.” J Psycholinguist Res Journal of Psycholinguistic Research 37.1 (2007): 33-48. Web. 15 Nov. 2015.