Introduction to Text Analysis with Python

Digilab Tutorial Series, Feb 11, 2021

Katie Ireland Kuiper
katherine.kuiper25@uga.edu

upload the python libraries for today

In [38]:
import pandas as pd #import the pandas library
import nltk
import codecs
import gensim
from nltk.collocations import *
from nltk import FreqDist
from nltk import pos_tag, word_tokenize
import matplotlib.pyplot as plt
In [40]:
nltk.download('stopwords')
from nltk.corpus import stopwords
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/kikuiper/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

write first print statement!

In [125]:
print("Let's start analyzing some text!")
Let's start analyzing some text!

create variable of strings

In [126]:
textanalysis = ("Let's start analyzing some text!")
In [127]:
print(textanalysis)
Let's start analyzing some text!

test different string functions

In [128]:
textanalysis2 = "Text analysis  "
isfun = "is fun   "
sentence = textanalysis2+isfun
print(sentence)
print(textanalysis2.upper())
print(textanalysis2.lower())
Text analysis  is fun   
TEXT ANALYSIS  
text analysis  
In [9]:
#remove extra whitespace
sentence = textanalysis2.rstrip()+" "+isfun
print(sentence)
Text analysis is fun   
In [7]:
ta  = "textanalysis"
print(ta+"\t")
ta = ta.partition("text") #partition splits based on whatever key-value here it is text
print(ta)
textanalysis	
('', 'text', 'analysis')

Test tokenize with ntlk

In [129]:
tokens = nltk.wordpunct_tokenize(sentence)
tokens
Out[129]:
['Text', 'analysis', 'is', 'fun']
In [130]:
len(tokens) #get the length
Out[130]:
4
In [12]:
#uploading your own text file(s)
In [131]:
#open A Modest Proposal corpus text file
modestprop = open("/Users/kikuiper/Documents/data_dh/amodestproposal.txt")
#use readlines function, returns list value containing lines in the corpus; splits by lines of text
modestproplines = modestprop.readlines()
#number of lines
len(modestproplines)
Out[131]:
742
In [132]:
modestproplines
Out[132]:
['\ufeff\n',
 'The Project Gutenberg EBook of A Modest Proposal, by Jonathan Swift\n',
 '\n',
 'This eBook is for the use of anyone anywhere at no cost and with\n',
 'almost no restrictions whatsoever.  You may copy it, give it away or\n',
 're-use it under the terms of the Project Gutenberg License included\n',
 'with this eBook or online at www.gutenberg.org\n',
 '\n',
 '\n',
 'Title: A Modest Proposal\n',
 '       For preventing the children of poor people in Ireland,\n',
 '              from being a burden on their parents or country, and for\n',
 '              making them beneficial to the publick - 1729\n',
 '\n',
 'Author: Jonathan Swift\n',
 '\n',
 'Release Date: July 27, 2008 [EBook #1080]\n',
 'Last Updated: October 17, 2019\n',
 '\n',
 'Language: English\n',
 '\n',
 'Character set encoding: UTF-8\n',
 '\n',
 '*** START OF THIS PROJECT GUTENBERG EBOOK A MODEST PROPOSAL ***\n',
 '\n',
 '\n',
 '\n',
 '\n',
 'Produced by An Anonymous Volunteer, and David Widger\n',
 '\n',
 '\n',
 '\n',
 '\n',
 'A Modest Proposal\n',
 '\n',
 'For preventing the children of poor people in Ireland,\n',
 'from being a burden on their parents or country,\n',
 'and for making them beneficial to the publick.\n',
 '\n',
 'by Dr. Jonathan Swift\n',
 '\n',
 '1729\n',
 '\n',
 '\n',
 '\n',
 '\n',
 'It is a melancholy object to those, who walk through this great town,\n',
 'or travel in the country, when they see the streets, the roads, and\n',
 'cabbin-doors crowded with beggars of the female sex, followed by three,\n',
 'four, or six children, all in rags, and importuning every passenger for\n',
 'an alms. These mothers, instead of being able to work for their honest\n',
 'livelihood, are forced to employ all their time in stroling to beg\n',
 'sustenance for their helpless infants who, as they grow up, either turn\n',
 'thieves for want of work, or leave their dear native country, to fight\n',
 'for the Pretender in Spain, or sell themselves to the Barbadoes.\n',
 '\n',
 'I think it is agreed by all parties, that this prodigious number of\n',
 'children in the arms, or on the backs, or at the heels of their\n',
 'mothers, and frequently of their fathers, is in the present deplorable\n',
 'state of the kingdom, a very great additional grievance; and therefore\n',
 'whoever could find out a fair, cheap and easy method of making these\n',
 'children sound and useful members of the commonwealth, would deserve so\n',
 'well of the publick, as to have his statue set up for a preserver of\n',
 'the nation.\n',
 '\n',
 'But my intention is very far from being confined to provide only for\n',
 'the children of professed beggars: it is of a much greater extent, and\n',
 'shall take in the whole number of infants at a certain age, who are\n',
 'born of parents in effect as little able to support them, as those who\n',
 'demand our charity in the streets.\n',
 '\n',
 'As to my own part, having turned my thoughts for many years upon this\n',
 'important subject, and maturely weighed the several schemes of our\n',
 'projectors, I have always found them grossly mistaken in their\n',
 'computation. It is true, a child just dropt from its dam, may be\n',
 'supported by her milk, for a solar year, with little other nourishment:\n',
 'at most not above the value of two shillings, which the mother may\n',
 'certainly get, or the value in scraps, by her lawful occupation of\n',
 'begging; and it is exactly at one year old that I propose to provide\n',
 'for them in such a manner, as, instead of being a charge upon their\n',
 'parents, or the parish, or wanting food and raiment for the rest of\n',
 'their lives, they shall, on the contrary, contribute to the feeding,\n',
 'and partly to the clothing of many thousands.\n',
 '\n',
 'There is likewise another great advantage in my scheme, that it will\n',
 'prevent those voluntary abortions, and that horrid practice of women\n',
 'murdering their bastard children, alas! too frequent among us,\n',
 'sacrificing the poor innocent babes, I doubt, more to avoid the expence\n',
 'than the shame, which would move tears and pity in the most savage and\n',
 'inhuman breast.\n',
 '\n',
 'The number of souls in this kingdom being usually reckoned one million\n',
 'and a half, of these I calculate there may be about two hundred\n',
 'thousand couple, whose wives are breeders; from which number I subtract\n',
 'thirty thousand couple, who are able to maintain their own children,\n',
 '(although I apprehend there cannot be so many under the present\n',
 'distresses of the kingdom) but this being granted, there will remain a\n',
 'hundred and seventy thousand breeders. I again subtract fifty thousand,\n',
 'for those women who miscarry, or whose children die by accident or\n',
 'disease within the year. There only remain a hundred and twenty\n',
 'thousand children of poor parents annually born. The question therefore\n',
 'is, How this number shall be reared and provided for? which, as I have\n',
 'already said, under the present situation of affairs, is utterly\n',
 'impossible by all the methods hitherto proposed. For we can neither\n',
 'employ them in handicraft or agriculture; they neither build houses, (I\n',
 'mean in the country) nor cultivate land: they can very seldom pick up a\n',
 'livelihood by stealing till they arrive at six years old; except where\n',
 'they are of towardly parts, although I confess they learn the rudiments\n',
 'much earlier; during which time they can however be properly looked\n',
 'upon only as probationers; as I have been informed by a principal\n',
 'gentleman in the county of Cavan, who protested to me, that he never\n',
 'knew above one or two instances under the age of six, even in a part of\n',
 'the kingdom so renowned for the quickest proficiency in that art.\n',
 '\n',
 'I am assured by our merchants, that a boy or a girl, before twelve\n',
 'years old, is no saleable commodity, and even when they come to this\n',
 'age, they will not yield above three pounds, or three pounds and half a\n',
 'crown at most, on the exchange; which cannot turn to account either to\n',
 'the parents or kingdom, the charge of nutriments and rags having been\n',
 'at least four times that value.\n',
 '\n',
 'I shall now therefore humbly propose my own thoughts, which I hope will\n',
 'not be liable to the least objection.\n',
 '\n',
 'I have been assured by a very knowing American of my acquaintance in\n',
 'London, that a young healthy child well nursed, is, at a year old, a\n',
 'most delicious nourishing and wholesome food, whether stewed, roasted,\n',
 'baked, or boiled; and I make no doubt that it will equally serve in a\n',
 'fricasee, or a ragoust.\n',
 '\n',
 'I do therefore humbly offer it to publick consideration, that of the\n',
 'hundred and twenty thousand children, already computed, twenty thousand\n',
 'may be reserved for breed, whereof only one fourth part to be males;\n',
 'which is more than we allow to sheep, black cattle, or swine, and my\n',
 'reason is, that these children are seldom the fruits of marriage, a\n',
 'circumstance not much regarded by our savages, therefore, one male will\n',
 'be sufficient to serve four females. That the remaining hundred\n',
 'thousand may, at a year old, be offered in sale to the persons of\n',
 'quality and fortune, through the kingdom, always advising the mother to\n',
 'let them suck plentifully in the last month, so as to render them\n',
 'plump, and fat for a good table. A child will make two dishes at an\n',
 'entertainment for friends, and when the family dines alone, the fore or\n',
 'hind quarter will make a reasonable dish, and seasoned with a little\n',
 'pepper or salt, will be very good boiled on the fourth day, especially\n',
 'in winter.\n',
 '\n',
 'I have reckoned upon a medium, that a child just born will weigh 12\n',
 'pounds, and in a solar year, if tolerably nursed, encreaseth to 28\n',
 'pounds.\n',
 '\n',
 'I grant this food will be somewhat dear, and therefore very proper for\n',
 'landlords, who, as they have already devoured most of the parents, seem\n',
 'to have the best title to the children.\n',
 '\n',
 'Infant’s flesh will be in season throughout the year, but more\n',
 'plentiful in March, and a little before and after; for we are told by a\n',
 'grave author, an eminent French physician, that fish being a prolifick\n',
 'dyet, there are more children born in Roman Catholick countries about\n',
 'nine months after Lent, than at any other season; therefore, reckoning\n',
 'a year after Lent, the markets will be more glutted than usual, because\n',
 'the number of Popish infants, is at least three to one in this kingdom,\n',
 'and therefore it will have one other collateral advantage, by lessening\n',
 'the number of Papists among us.\n',
 '\n',
 'I have already computed the charge of nursing a beggar’s child (in\n',
 'which list I reckon all cottagers, labourers, and four-fifths of the\n',
 'farmers) to be about two shillings per annum, rags included; and I\n',
 'believe no gentleman would repine to give ten shillings for the carcass\n',
 'of a good fat child, which, as I have said, will make four dishes of\n',
 'excellent nutritive meat, when he hath only some particular friend, or\n',
 'his own family to dine with him. Thus the squire will learn to be a\n',
 'good landlord, and grow popular among his tenants, the mother will have\n',
 'eight shillings neat profit, and be fit for work till she produces\n',
 'another child.\n',
 '\n',
 'Those who are more thrifty (as I must confess the times require) may\n',
 'flay the carcass; the skin of which, artificially dressed, will make\n',
 'admirable gloves for ladies, and summer boots for fine gentlemen.\n',
 '\n',
 'As to our City of Dublin, shambles may be appointed for this purpose,\n',
 'in the most convenient parts of it, and butchers we may be assured will\n',
 'not be wanting; although I rather recommend buying the children alive,\n',
 'and dressing them hot from the knife, as we do roasting pigs.\n',
 '\n',
 'A very worthy person, a true lover of his country, and whose virtues I\n',
 'highly esteem, was lately pleased in discoursing on this matter, to\n',
 'offer a refinement upon my scheme. He said, that many gentlemen of this\n',
 'kingdom, having of late destroyed their deer, he conceived that the\n',
 'want of venison might be well supplied by the bodies of young lads and\n',
 'maidens, not exceeding fourteen years of age, nor under twelve; so\n',
 'great a number of both sexes in every county being now ready to starve\n',
 'for want of work and service: and these to be disposed of by their\n',
 'parents if alive, or otherwise by their nearest relations. But with due\n',
 'deference to so excellent a friend, and so deserving a patriot, I\n',
 'cannot be altogether in his sentiments; for as to the males, my\n',
 'American acquaintance assured me from frequent experience, that their\n',
 'flesh was generally tough and lean, like that of our schoolboys, by\n',
 'continual exercise, and their taste disagreeable, and to fatten them\n',
 'would not answer the charge. Then as to the females, it would, I think,\n',
 'with humble submission, be a loss to the publick, because they soon\n',
 'would become breeders themselves: and besides, it is not improbable\n',
 'that some scrupulous people might be apt to censure such a practice,\n',
 '(although indeed very unjustly) as a little bordering upon cruelty,\n',
 'which, I confess, hath always been with me the strongest objection\n',
 'against any project, how well soever intended.\n',
 '\n',
 'But in order to justify my friend, he confessed, that this expedient\n',
 'was put into his head by the famous Psalmanaazor, a native of the\n',
 'island Formosa, who came from thence to London, above twenty years ago,\n',
 'and in conversation told my friend, that in his country, when any young\n',
 'person happened to be put to death, the executioner sold the carcass to\n',
 'persons of quality, as a prime dainty; and that, in his time, the body\n',
 'of a plump girl of fifteen, who was crucified for an attempt to poison\n',
 'the Emperor, was sold to his imperial majesty’s prime minister of\n',
 'state, and other great mandarins of the court in joints from the\n',
 'gibbet, at four hundred crowns. Neither indeed can I deny, that if the\n',
 'same use were made of several plump young girls in this town, who\n',
 'without one single groat to their fortunes, cannot stir abroad without\n',
 'a chair, and appear at a playhouse and assemblies in foreign fineries\n',
 'which they never will pay for, the kingdom would not be the worse.\n',
 '\n',
 'Some persons of a desponding spirit are in great concern about that\n',
 'vast number of poor people, who are aged, diseased, or maimed; and I\n',
 'have been desired to employ my thoughts what course may be taken, to\n',
 'ease the nation of so grievous an incumbrance. But I am not in the\n',
 'least pain upon that matter, because it is very well known, that they\n',
 'are every day dying, and rotting, by cold and famine, and filth, and\n',
 'vermin, as fast as can be reasonably expected. And as to the young\n',
 'labourers, they are now in almost as hopeful a condition. They cannot\n',
 'get work, and consequently pine away from want of nourishment, to a\n',
 'degree, that if at any time they are accidentally hired to common\n',
 'labour, they have not strength to perform it, and thus the country and\n',
 'themselves are happily delivered from the evils to come.\n',
 '\n',
 'I have too long digressed, and therefore shall return to my subject. I\n',
 'think the advantages by the proposal which I have made are obvious and\n',
 'many, as well as of the highest importance.\n',
 '\n',
 'For first, as I have already observed, it would greatly lessen the\n',
 'number of Papists, with whom we are yearly overrun, being the principal\n',
 'breeders of the nation, as well as our most dangerous enemies, and who\n',
 'stay at home on purpose with a design to deliver the kingdom to the\n',
 'Pretender, hoping to take their advantage by the absence of so many\n',
 'good Protestants, who have chosen rather to leave their country, than\n',
 'stay at home and pay tithes against their conscience to an episcopal\n',
 'curate.\n',
 '\n',
 'Secondly, The poorer tenants will have something valuable of their own,\n',
 'which by law may be made liable to a distress, and help to pay their\n',
 'landlord’s rent, their corn and cattle being already seized, and money\n',
 'a thing unknown.\n',
 '\n',
 'Thirdly, Whereas the maintainance of a hundred thousand children, from\n',
 'two years old, and upwards, cannot be computed at less than ten\n',
 'shillings a piece per annum, the nation’s stock will be thereby\n',
 'encreased fifty thousand pounds per annum, besides the profit of a new\n',
 'dish, introduced to the tables of all gentlemen of fortune in the\n',
 'kingdom, who have any refinement in taste. And the money will circulate\n',
 'among our selves, the goods being entirely of our own growth and\n',
 'manufacture.\n',
 '\n',
 'Fourthly, The constant breeders, besides the gain of eight shillings\n',
 'sterling per annum by the sale of their children, will be rid of the\n',
 'charge of maintaining them after the first year.\n',
 '\n',
 'Fifthly, This food would likewise bring great custom to taverns, where\n',
 'the vintners will certainly be so prudent as to procure the best\n',
 'receipts for dressing it to perfection; and consequently have their\n',
 'houses frequented by all the fine gentlemen, who justly value\n',
 'themselves upon their knowledge in good eating; and a skilful cook, who\n',
 'understands how to oblige his guests, will contrive to make it as\n',
 'expensive as they please.\n',
 '\n',
 'Sixthly, This would be a great inducement to marriage, which all wise\n',
 'nations have either encouraged by rewards, or enforced by laws and\n',
 'penalties. It would encrease the care and tenderness of mothers towards\n',
 'their children, when they were sure of a settlement for life to the\n',
 'poor babes, provided in some sort by the publick, to their annual\n',
 'profit instead of expence. We should soon see an honest emulation among\n',
 'the married women, which of them could bring the fattest child to the\n',
 'market. Men would become as fond of their wives, during the time of\n',
 'their pregnancy, as they are now of their mares in foal, their cows in\n',
 'calf, or sows when they are ready to farrow; nor offer to beat or kick\n',
 'them (as is too frequent a practice) for fear of a miscarriage.\n',
 '\n',
 'Many other advantages might be enumerated. For instance, the addition\n',
 'of some thousand carcasses in our exportation of barrel’d beef: the\n',
 'propagation of swine’s flesh, and improvement in the art of making good\n',
 'bacon, so much wanted among us by the great destruction of pigs, too\n',
 'frequent at our tables; which are no way comparable in taste or\n',
 'magnificence to a well grown, fat yearling child, which roasted whole\n',
 'will make a considerable figure at a Lord Mayor’s feast, or any other\n',
 'publick entertainment. But this, and many others, I omit, being\n',
 'studious of brevity.\n',
 '\n',
 'Supposing that one thousand families in this city, would be constant\n',
 'customers for infants flesh, besides others who might have it at merry\n',
 'meetings, particularly at weddings and christenings, I compute that\n',
 'Dublin would take off annually about twenty thousand carcasses; and the\n',
 'rest of the kingdom (where probably they will be sold somewhat cheaper)\n',
 'the remaining eighty thousand.\n',
 '\n',
 'I can think of no one objection, that will possibly be raised against\n',
 'this proposal, unless it should be urged, that the number of people\n',
 'will be thereby much lessened in the kingdom. This I freely own, and\n',
 'was indeed one principal design in offering it to the world. I desire\n',
 'the reader will observe, that I calculate my remedy for this one\n',
 'individual Kingdom of Ireland, and for no other that ever was, is, or,\n',
 'I think, ever can be upon Earth. Therefore let no man talk to me of\n',
 'other expedients: Of taxing our absentees at five shillings a pound: Of\n',
 'using neither clothes, nor houshold furniture, except what is of our\n',
 'own growth and manufacture: Of utterly rejecting the materials and\n',
 'instruments that promote foreign luxury: Of curing the expensiveness of\n',
 'pride, vanity, idleness, and gaming in our women: Of introducing a vein\n',
 'of parsimony, prudence and temperance: Of learning to love our country,\n',
 'wherein we differ even from Laplanders, and the inhabitants of\n',
 'Topinamboo: Of quitting our animosities and factions, nor acting any\n',
 'longer like the Jews, who were murdering one another at the very moment\n',
 'their city was taken: Of being a little cautious not to sell our\n',
 'country and consciences for nothing: Of teaching landlords to have at\n',
 'least one degree of mercy towards their tenants. Lastly, of putting a\n',
 'spirit of honesty, industry, and skill into our shopkeepers, who, if a\n',
 'resolution could now be taken to buy only our native goods, would\n',
 'immediately unite to cheat and exact upon us in the price, the measure,\n',
 'and the goodness, nor could ever yet be brought to make one fair\n',
 'proposal of just dealing, though often and earnestly invited to it.\n',
 '\n',
 'Therefore I repeat, let no man talk to me of these and the like\n',
 'expedients, till he hath at least some glympse of hope, that there will\n',
 'ever be some hearty and sincere attempt to put them into practice.\n',
 '\n',
 'But, as to myself, having been wearied out for many years with offering\n',
 'vain, idle, visionary thoughts, and at length utterly despairing of\n',
 'success, I fortunately fell upon this proposal, which, as it is wholly\n',
 'new, so it hath something solid and real, of no expence and little\n',
 'trouble, full in our own power, and whereby we can incur no danger in\n',
 'disobliging England. For this kind of commodity will not bear\n',
 'exportation, and flesh being of too tender a consistence, to admit a\n',
 'long continuance in salt, although perhaps I could name a country,\n',
 'which would be glad to eat up our whole nation without it.\n',
 '\n',
 'After all, I am not so violently bent upon my own opinion, as to reject\n',
 'any offer, proposed by wise men, which shall be found equally innocent,\n',
 'cheap, easy, and effectual. But before something of that kind shall be\n',
 'advanced in contradiction to my scheme, and offering a better, I desire\n',
 'the author or authors will be pleased maturely to consider two points.\n',
 'First, As things now stand, how they will be able to find food and\n',
 'raiment for a hundred thousand useless mouths and backs. And secondly,\n',
 'There being a round million of creatures in humane figure throughout\n',
 'this kingdom, whose whole subsistence put into a common stock, would\n',
 'leave them in debt two million of pounds sterling, adding those who are\n',
 'beggars by profession, to the bulk of farmers, cottagers and labourers,\n',
 'with their wives and children, who are beggars in effect; I desire\n',
 'those politicians who dislike my overture, and may perhaps be so bold\n',
 'to attempt an answer, that they will first ask the parents of these\n',
 'mortals, whether they would not at this day think it a great happiness\n',
 'to have been sold for food at a year old, in the manner I prescribe,\n',
 'and thereby have avoided such a perpetual scene of misfortunes, as they\n',
 'have since gone through, by the oppression of landlords, the\n',
 'impossibility of paying rent without money or trade, the want of common\n',
 'sustenance, with neither house nor clothes to cover them from the\n',
 'inclemencies of the weather, and the most inevitable prospect of\n',
 'intailing the like, or greater miseries, upon their breed for ever.\n',
 '\n',
 'I profess in the sincerity of my heart, that I have not the least\n',
 'personal interest in endeavouring to promote this necessary work,\n',
 'having no other motive than the publick good of my country, by\n',
 'advancing our trade, providing for infants, relieving the poor, and\n',
 'giving some pleasure to the rich. I have no children, by which I can\n',
 'propose to get a single penny; the youngest being nine years old, and\n',
 'my wife past child-bearing.\n',
 '\n',
 '\n',
 '\n',
 '\n',
 '\n',
 '\n',
 '\n',
 'End of the Project Gutenberg EBook of A Modest Proposal, by Jonathan Swift\n',
 '\n',
 '*** END OF THIS PROJECT GUTENBERG EBOOK A MODEST PROPOSAL ***\n',
 '\n',
 '***** This file should be named 1080-0.txt or 1080-0.zip *****\n',
 'This and all associated files of various formats will be found in:\n',
 '        http://www.gutenberg.org/1/0/8/1080/\n',
 '\n',
 'Produced by An Anonymous Volunteer, and David Widger\n',
 '\n',
 'Updated editions will replace the previous one--the old editions will\n',
 'be renamed.\n',
 '\n',
 'Creating the works from print editions not protected by U.S. copyright\n',
 'law means that no one owns a United States copyright in these works,\n',
 'so the Foundation (and you!) can copy and distribute it in the United\n',
 'States without permission and without paying copyright\n',
 'royalties. Special rules, set forth in the General Terms of Use part\n',
 'of this license, apply to copying and distributing Project\n',
 'Gutenberg-tm electronic works to protect the PROJECT GUTENBERG-tm\n',
 'concept and trademark. Project Gutenberg is a registered trademark,\n',
 'and may not be used if you charge for the eBooks, unless you receive\n',
 'specific permission. If you do not charge anything for copies of this\n',
 'eBook, complying with the rules is very easy. You may use this eBook\n',
 'for nearly any purpose such as creation of derivative works, reports,\n',
 'performances and research. They may be modified and printed and given\n',
 'away--you may do practically ANYTHING in the United States with eBooks\n',
 'not protected by U.S. copyright law. Redistribution is subject to the\n',
 'trademark license, especially commercial redistribution.\n',
 '\n',
 'START: FULL LICENSE\n',
 '\n',
 'THE FULL PROJECT GUTENBERG LICENSE\n',
 'PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK\n',
 '\n',
 'To protect the Project Gutenberg-tm mission of promoting the free\n',
 'distribution of electronic works, by using or distributing this work\n',
 '(or any other work associated in any way with the phrase "Project\n',
 'Gutenberg"), you agree to comply with all the terms of the Full\n',
 'Project Gutenberg-tm License available with this file or online at\n',
 'www.gutenberg.org/license.\n',
 '\n',
 'Section 1. General Terms of Use and Redistributing Project\n',
 'Gutenberg-tm electronic works\n',
 '\n',
 '1.A. By reading or using any part of this Project Gutenberg-tm\n',
 'electronic work, you indicate that you have read, understand, agree to\n',
 'and accept all the terms of this license and intellectual property\n',
 '(trademark/copyright) agreement. If you do not agree to abide by all\n',
 'the terms of this agreement, you must cease using and return or\n',
 'destroy all copies of Project Gutenberg-tm electronic works in your\n',
 'possession. If you paid a fee for obtaining a copy of or access to a\n',
 'Project Gutenberg-tm electronic work and you do not agree to be bound\n',
 'by the terms of this agreement, you may obtain a refund from the\n',
 'person or entity to whom you paid the fee as set forth in paragraph\n',
 '1.E.8.\n',
 '\n',
 '1.B. "Project Gutenberg" is a registered trademark. It may only be\n',
 'used on or associated in any way with an electronic work by people who\n',
 'agree to be bound by the terms of this agreement. There are a few\n',
 'things that you can do with most Project Gutenberg-tm electronic works\n',
 'even without complying with the full terms of this agreement. See\n',
 'paragraph 1.C below. There are a lot of things you can do with Project\n',
 'Gutenberg-tm electronic works if you follow the terms of this\n',
 'agreement and help preserve free future access to Project Gutenberg-tm\n',
 'electronic works. See paragraph 1.E below.\n',
 '\n',
 '1.C. The Project Gutenberg Literary Archive Foundation ("the\n',
 'Foundation" or PGLAF), owns a compilation copyright in the collection\n',
 'of Project Gutenberg-tm electronic works. Nearly all the individual\n',
 'works in the collection are in the public domain in the United\n',
 'States. If an individual work is unprotected by copyright law in the\n',
 'United States and you are located in the United States, we do not\n',
 'claim a right to prevent you from copying, distributing, performing,\n',
 'displaying or creating derivative works based on the work as long as\n',
 'all references to Project Gutenberg are removed. Of course, we hope\n',
 'that you will support the Project Gutenberg-tm mission of promoting\n',
 'free access to electronic works by freely sharing Project Gutenberg-tm\n',
 'works in compliance with the terms of this agreement for keeping the\n',
 'Project Gutenberg-tm name associated with the work. You can easily\n',
 'comply with the terms of this agreement by keeping this work in the\n',
 'same format with its attached full Project Gutenberg-tm License when\n',
 'you share it without charge with others.\n',
 '\n',
 '1.D. The copyright laws of the place where you are located also govern\n',
 'what you can do with this work. Copyright laws in most countries are\n',
 'in a constant state of change. If you are outside the United States,\n',
 'check the laws of your country in addition to the terms of this\n',
 'agreement before downloading, copying, displaying, performing,\n',
 'distributing or creating derivative works based on this work or any\n',
 'other Project Gutenberg-tm work. The Foundation makes no\n',
 'representations concerning the copyright status of any work in any\n',
 'country outside the United States.\n',
 '\n',
 '1.E. Unless you have removed all references to Project Gutenberg:\n',
 '\n',
 '1.E.1. The following sentence, with active links to, or other\n',
 'immediate access to, the full Project Gutenberg-tm License must appear\n',
 'prominently whenever any copy of a Project Gutenberg-tm work (any work\n',
 'on which the phrase "Project Gutenberg" appears, or with which the\n',
 'phrase "Project Gutenberg" is associated) is accessed, displayed,\n',
 'performed, viewed, copied or distributed:\n',
 '\n',
 '  This eBook is for the use of anyone anywhere in the United States and\n',
 '  most other parts of the world at no cost and with almost no\n',
 '  restrictions whatsoever. You may copy it, give it away or re-use it\n',
 '  under the terms of the Project Gutenberg License included with this\n',
 '  eBook or online at www.gutenberg.org. If you are not located in the\n',
 "  United States, you'll have to check the laws of the country where you\n",
 '  are located before using this ebook.\n',
 '\n',
 '1.E.2. If an individual Project Gutenberg-tm electronic work is\n',
 'derived from texts not protected by U.S. copyright law (does not\n',
 'contain a notice indicating that it is posted with permission of the\n',
 'copyright holder), the work can be copied and distributed to anyone in\n',
 'the United States without paying any fees or charges. If you are\n',
 'redistributing or providing access to a work with the phrase "Project\n',
 'Gutenberg" associated with or appearing on the work, you must comply\n',
 'either with the requirements of paragraphs 1.E.1 through 1.E.7 or\n',
 'obtain permission for the use of the work and the Project Gutenberg-tm\n',
 'trademark as set forth in paragraphs 1.E.8 or 1.E.9.\n',
 '\n',
 '1.E.3. If an individual Project Gutenberg-tm electronic work is posted\n',
 'with the permission of the copyright holder, your use and distribution\n',
 'must comply with both paragraphs 1.E.1 through 1.E.7 and any\n',
 'additional terms imposed by the copyright holder. Additional terms\n',
 'will be linked to the Project Gutenberg-tm License for all works\n',
 'posted with the permission of the copyright holder found at the\n',
 'beginning of this work.\n',
 '\n',
 '1.E.4. Do not unlink or detach or remove the full Project Gutenberg-tm\n',
 'License terms from this work, or any files containing a part of this\n',
 'work or any other work associated with Project Gutenberg-tm.\n',
 '\n',
 '1.E.5. Do not copy, display, perform, distribute or redistribute this\n',
 'electronic work, or any part of this electronic work, without\n',
 'prominently displaying the sentence set forth in paragraph 1.E.1 with\n',
 'active links or immediate access to the full terms of the Project\n',
 'Gutenberg-tm License.\n',
 '\n',
 '1.E.6. You may convert to and distribute this work in any binary,\n',
 'compressed, marked up, nonproprietary or proprietary form, including\n',
 'any word processing or hypertext form. However, if you provide access\n',
 'to or distribute copies of a Project Gutenberg-tm work in a format\n',
 'other than "Plain Vanilla ASCII" or other format used in the official\n',
 'version posted on the official Project Gutenberg-tm web site\n',
 '(www.gutenberg.org), you must, at no additional cost, fee or expense\n',
 'to the user, provide a copy, a means of exporting a copy, or a means\n',
 'of obtaining a copy upon request, of the work in its original "Plain\n',
 'Vanilla ASCII" or other form. Any alternate format must include the\n',
 'full Project Gutenberg-tm License as specified in paragraph 1.E.1.\n',
 '\n',
 '1.E.7. Do not charge a fee for access to, viewing, displaying,\n',
 'performing, copying or distributing any Project Gutenberg-tm works\n',
 'unless you comply with paragraph 1.E.8 or 1.E.9.\n',
 '\n',
 '1.E.8. You may charge a reasonable fee for copies of or providing\n',
 'access to or distributing Project Gutenberg-tm electronic works\n',
 'provided that\n',
 '\n',
 '* You pay a royalty fee of 20% of the gross profits you derive from\n',
 '  the use of Project Gutenberg-tm works calculated using the method\n',
 '  you already use to calculate your applicable taxes. The fee is owed\n',
 '  to the owner of the Project Gutenberg-tm trademark, but he has\n',
 '  agreed to donate royalties under this paragraph to the Project\n',
 '  Gutenberg Literary Archive Foundation. Royalty payments must be paid\n',
 '  within 60 days following each date on which you prepare (or are\n',
 '  legally required to prepare) your periodic tax returns. Royalty\n',
 '  payments should be clearly marked as such and sent to the Project\n',
 '  Gutenberg Literary Archive Foundation at the address specified in\n',
 '  Section 4, "Information about donations to the Project Gutenberg\n',
 '  Literary Archive Foundation."\n',
 '\n',
 '* You provide a full refund of any money paid by a user who notifies\n',
 '  you in writing (or by e-mail) within 30 days of receipt that s/he\n',
 '  does not agree to the terms of the full Project Gutenberg-tm\n',
 '  License. You must require such a user to return or destroy all\n',
 '  copies of the works possessed in a physical medium and discontinue\n',
 '  all use of and all access to other copies of Project Gutenberg-tm\n',
 '  works.\n',
 '\n',
 '* You provide, in accordance with paragraph 1.F.3, a full refund of\n',
 '  any money paid for a work or a replacement copy, if a defect in the\n',
 '  electronic work is discovered and reported to you within 90 days of\n',
 '  receipt of the work.\n',
 '\n',
 '* You comply with all other terms of this agreement for free\n',
 '  distribution of Project Gutenberg-tm works.\n',
 '\n',
 '1.E.9. If you wish to charge a fee or distribute a Project\n',
 'Gutenberg-tm electronic work or group of works on different terms than\n',
 'are set forth in this agreement, you must obtain permission in writing\n',
 'from both the Project Gutenberg Literary Archive Foundation and The\n',
 'Project Gutenberg Trademark LLC, the owner of the Project Gutenberg-tm\n',
 'trademark. Contact the Foundation as set forth in Section 3 below.\n',
 '\n',
 '1.F.\n',
 '\n',
 '1.F.1. Project Gutenberg volunteers and employees expend considerable\n',
 'effort to identify, do copyright research on, transcribe and proofread\n',
 'works not protected by U.S. copyright law in creating the Project\n',
 'Gutenberg-tm collection. Despite these efforts, Project Gutenberg-tm\n',
 'electronic works, and the medium on which they may be stored, may\n',
 'contain "Defects," such as, but not limited to, incomplete, inaccurate\n',
 'or corrupt data, transcription errors, a copyright or other\n',
 'intellectual property infringement, a defective or damaged disk or\n',
 'other medium, a computer virus, or computer codes that damage or\n',
 'cannot be read by your equipment.\n',
 '\n',
 '1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the "Right\n',
 'of Replacement or Refund" described in paragraph 1.F.3, the Project\n',
 'Gutenberg Literary Archive Foundation, the owner of the Project\n',
 'Gutenberg-tm trademark, and any other party distributing a Project\n',
 'Gutenberg-tm electronic work under this agreement, disclaim all\n',
 'liability to you for damages, costs and expenses, including legal\n',
 'fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT\n',
 'LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE\n',
 'PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE\n',
 'TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE\n',
 'LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR\n',
 'INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH\n',
 'DAMAGE.\n',
 '\n',
 '1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a\n',
 'defect in this electronic work within 90 days of receiving it, you can\n',
 'receive a refund of the money (if any) you paid for it by sending a\n',
 'written explanation to the person you received the work from. If you\n',
 'received the work on a physical medium, you must return the medium\n',
 'with your written explanation. The person or entity that provided you\n',
 'with the defective work may elect to provide a replacement copy in\n',
 'lieu of a refund. If you received the work electronically, the person\n',
 'or entity providing it to you may choose to give you a second\n',
 'opportunity to receive the work electronically in lieu of a refund. If\n',
 'the second copy is also defective, you may demand a refund in writing\n',
 'without further opportunities to fix the problem.\n',
 '\n',
 '1.F.4. Except for the limited right of replacement or refund set forth\n',
 "in paragraph 1.F.3, this work is provided to you 'AS-IS', WITH NO\n",
 'OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT\n',
 'LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.\n',
 '\n',
 '1.F.5. Some states do not allow disclaimers of certain implied\n',
 'warranties or the exclusion or limitation of certain types of\n',
 'damages. If any disclaimer or limitation set forth in this agreement\n',
 'violates the law of the state applicable to this agreement, the\n',
 'agreement shall be interpreted to make the maximum disclaimer or\n',
 'limitation permitted by the applicable state law. The invalidity or\n',
 'unenforceability of any provision of this agreement shall not void the\n',
 'remaining provisions.\n',
 '\n',
 '1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the\n',
 'trademark owner, any agent or employee of the Foundation, anyone\n',
 'providing copies of Project Gutenberg-tm electronic works in\n',
 'accordance with this agreement, and any volunteers associated with the\n',
 'production, promotion and distribution of Project Gutenberg-tm\n',
 'electronic works, harmless from all liability, costs and expenses,\n',
 'including legal fees, that arise directly or indirectly from any of\n',
 'the following which you do or cause to occur: (a) distribution of this\n',
 'or any Project Gutenberg-tm work, (b) alteration, modification, or\n',
 'additions or deletions to any Project Gutenberg-tm work, and (c) any\n',
 'Defect you cause.\n',
 '\n',
 'Section 2. Information about the Mission of Project Gutenberg-tm\n',
 '\n',
 'Project Gutenberg-tm is synonymous with the free distribution of\n',
 'electronic works in formats readable by the widest variety of\n',
 'computers including obsolete, old, middle-aged and new computers. It\n',
 'exists because of the efforts of hundreds of volunteers and donations\n',
 'from people in all walks of life.\n',
 '\n',
 'Volunteers and financial support to provide volunteers with the\n',
 "assistance they need are critical to reaching Project Gutenberg-tm's\n",
 'goals and ensuring that the Project Gutenberg-tm collection will\n',
 'remain freely available for generations to come. In 2001, the Project\n',
 'Gutenberg Literary Archive Foundation was created to provide a secure\n',
 'and permanent future for Project Gutenberg-tm and future\n',
 'generations. To learn more about the Project Gutenberg Literary\n',
 'Archive Foundation and how your efforts and donations can help, see\n',
 'Sections 3 and 4 and the Foundation information page at\n',
 'www.gutenberg.org\n',
 '\n',
 '\n',
 '\n',
 'Section 3. Information about the Project Gutenberg Literary Archive Foundation\n',
 '\n',
 'The Project Gutenberg Literary Archive Foundation is a non profit\n',
 '501(c)(3) educational corporation organized under the laws of the\n',
 'state of Mississippi and granted tax exempt status by the Internal\n',
 "Revenue Service. The Foundation's EIN or federal tax identification\n",
 'number is 64-6221541. Contributions to the Project Gutenberg Literary\n',
 'Archive Foundation are tax deductible to the full extent permitted by\n',
 "U.S. federal laws and your state's laws.\n",
 '\n',
 "The Foundation's principal office is in Fairbanks, Alaska, with the\n",
 'mailing address: PO Box 750175, Fairbanks, AK 99775, but its\n',
 'volunteers and employees are scattered throughout numerous\n',
 'locations. Its business office is located at 809 North 1500 West, Salt\n',
 'Lake City, UT 84116, (801) 596-1887. Email contact links and up to\n',
 "date contact information can be found at the Foundation's web site and\n",
 'official page at www.gutenberg.org/contact\n',
 '\n',
 'For additional contact information:\n',
 '\n',
 '    Dr. Gregory B. Newby\n',
 '    Chief Executive and Director\n',
 '    gbnewby@pglaf.org\n',
 '\n',
 'Section 4. Information about Donations to the Project Gutenberg\n',
 'Literary Archive Foundation\n',
 '\n',
 'Project Gutenberg-tm depends upon and cannot survive without wide\n',
 'spread public support and donations to carry out its mission of\n',
 'increasing the number of public domain and licensed works that can be\n',
 'freely distributed in machine readable form accessible by the widest\n',
 'array of equipment including outdated equipment. Many small donations\n',
 '($1 to $5,000) are particularly important to maintaining tax exempt\n',
 'status with the IRS.\n',
 '\n',
 'The Foundation is committed to complying with the laws regulating\n',
 'charities and charitable donations in all 50 states of the United\n',
 'States. Compliance requirements are not uniform and it takes a\n',
 'considerable effort, much paperwork and many fees to meet and keep up\n',
 'with these requirements. We do not solicit donations in locations\n',
 'where we have not received written confirmation of compliance. To SEND\n',
 'DONATIONS or determine the status of compliance for any particular\n',
 'state visit www.gutenberg.org/donate\n',
 '\n',
 'While we cannot and do not solicit contributions from states where we\n',
 'have not met the solicitation requirements, we know of no prohibition\n',
 'against accepting unsolicited donations from donors in such states who\n',
 'approach us with offers to donate.\n',
 '\n',
 'International donations are gratefully accepted, but we cannot make\n',
 'any statements concerning tax treatment of donations received from\n',
 'outside the United States. U.S. laws alone swamp our small staff.\n',
 '\n',
 'Please check the Project Gutenberg Web pages for current donation\n',
 'methods and addresses. Donations are accepted in a number of other\n',
 'ways including checks, online payments and credit card donations. To\n',
 'donate, please visit: www.gutenberg.org/donate\n',
 '\n',
 'Section 5. General Information About Project Gutenberg-tm electronic works.\n',
 '\n',
 'Professor Michael S. Hart was the originator of the Project\n',
 'Gutenberg-tm concept of a library of electronic works that could be\n',
 'freely shared with anyone. For forty years, he produced and\n',
 'distributed Project Gutenberg-tm eBooks with only a loose network of\n',
 'volunteer support.\n',
 '\n',
 'Project Gutenberg-tm eBooks are often created from several printed\n',
 'editions, all of which are confirmed as not protected by copyright in\n',
 'the U.S. unless a copyright notice is included. Thus, we do not\n',
 'necessarily keep eBooks in compliance with any particular paper\n',
 'edition.\n',
 '\n',
 'Most people start at our Web site which has the main PG search\n',
 'facility: www.gutenberg.org\n',
 '\n',
 'This Web site includes information about Project Gutenberg-tm,\n',
 'including how to make donations to the Project Gutenberg Literary\n',
 'Archive Foundation, how to help produce our new eBooks, and how to\n',
 'subscribe to our email newsletter to hear about new eBooks.\n',
 '\n',
 '\n']
In [16]:
#you can see that this isn't the cleanest file. so let's separate out
#the words in this document, and apply some string functions to clean it up
In [133]:
modestprop_cleaned = open("/Users/kikuiper/Documents/data_dh/amodestproposal.txt", 'r')
text = modestprop_cleaned.read()

#cleaning the document
text = text.lower()
In [134]:
tokens = text.split()
tokens
Out[134]:
['\ufeff',
 'the',
 'project',
 'gutenberg',
 'ebook',
 'of',
 'a',
 'modest',
 'proposal,',
 'by',
 'jonathan',
 'swift',
 'this',
 'ebook',
 'is',
 'for',
 'the',
 'use',
 'of',
 'anyone',
 'anywhere',
 'at',
 'no',
 'cost',
 'and',
 'with',
 'almost',
 'no',
 'restrictions',
 'whatsoever.',
 'you',
 'may',
 'copy',
 'it,',
 'give',
 'it',
 'away',
 'or',
 're-use',
 'it',
 'under',
 'the',
 'terms',
 'of',
 'the',
 'project',
 'gutenberg',
 'license',
 'included',
 'with',
 'this',
 'ebook',
 'or',
 'online',
 'at',
 'www.gutenberg.org',
 'title:',
 'a',
 'modest',
 'proposal',
 'for',
 'preventing',
 'the',
 'children',
 'of',
 'poor',
 'people',
 'in',
 'ireland,',
 'from',
 'being',
 'a',
 'burden',
 'on',
 'their',
 'parents',
 'or',
 'country,',
 'and',
 'for',
 'making',
 'them',
 'beneficial',
 'to',
 'the',
 'publick',
 '-',
 '1729',
 'author:',
 'jonathan',
 'swift',
 'release',
 'date:',
 'july',
 '27,',
 '2008',
 '[ebook',
 '#1080]',
 'last',
 'updated:',
 'october',
 '17,',
 '2019',
 'language:',
 'english',
 'character',
 'set',
 'encoding:',
 'utf-8',
 '***',
 'start',
 'of',
 'this',
 'project',
 'gutenberg',
 'ebook',
 'a',
 'modest',
 'proposal',
 '***',
 'produced',
 'by',
 'an',
 'anonymous',
 'volunteer,',
 'and',
 'david',
 'widger',
 'a',
 'modest',
 'proposal',
 'for',
 'preventing',
 'the',
 'children',
 'of',
 'poor',
 'people',
 'in',
 'ireland,',
 'from',
 'being',
 'a',
 'burden',
 'on',
 'their',
 'parents',
 'or',
 'country,',
 'and',
 'for',
 'making',
 'them',
 'beneficial',
 'to',
 'the',
 'publick.',
 'by',
 'dr.',
 'jonathan',
 'swift',
 '1729',
 'it',
 'is',
 'a',
 'melancholy',
 'object',
 'to',
 'those,',
 'who',
 'walk',
 'through',
 'this',
 'great',
 'town,',
 'or',
 'travel',
 'in',
 'the',
 'country,',
 'when',
 'they',
 'see',
 'the',
 'streets,',
 'the',
 'roads,',
 'and',
 'cabbin-doors',
 'crowded',
 'with',
 'beggars',
 'of',
 'the',
 'female',
 'sex,',
 'followed',
 'by',
 'three,',
 'four,',
 'or',
 'six',
 'children,',
 'all',
 'in',
 'rags,',
 'and',
 'importuning',
 'every',
 'passenger',
 'for',
 'an',
 'alms.',
 'these',
 'mothers,',
 'instead',
 'of',
 'being',
 'able',
 'to',
 'work',
 'for',
 'their',
 'honest',
 'livelihood,',
 'are',
 'forced',
 'to',
 'employ',
 'all',
 'their',
 'time',
 'in',
 'stroling',
 'to',
 'beg',
 'sustenance',
 'for',
 'their',
 'helpless',
 'infants',
 'who,',
 'as',
 'they',
 'grow',
 'up,',
 'either',
 'turn',
 'thieves',
 'for',
 'want',
 'of',
 'work,',
 'or',
 'leave',
 'their',
 'dear',
 'native',
 'country,',
 'to',
 'fight',
 'for',
 'the',
 'pretender',
 'in',
 'spain,',
 'or',
 'sell',
 'themselves',
 'to',
 'the',
 'barbadoes.',
 'i',
 'think',
 'it',
 'is',
 'agreed',
 'by',
 'all',
 'parties,',
 'that',
 'this',
 'prodigious',
 'number',
 'of',
 'children',
 'in',
 'the',
 'arms,',
 'or',
 'on',
 'the',
 'backs,',
 'or',
 'at',
 'the',
 'heels',
 'of',
 'their',
 'mothers,',
 'and',
 'frequently',
 'of',
 'their',
 'fathers,',
 'is',
 'in',
 'the',
 'present',
 'deplorable',
 'state',
 'of',
 'the',
 'kingdom,',
 'a',
 'very',
 'great',
 'additional',
 'grievance;',
 'and',
 'therefore',
 'whoever',
 'could',
 'find',
 'out',
 'a',
 'fair,',
 'cheap',
 'and',
 'easy',
 'method',
 'of',
 'making',
 'these',
 'children',
 'sound',
 'and',
 'useful',
 'members',
 'of',
 'the',
 'commonwealth,',
 'would',
 'deserve',
 'so',
 'well',
 'of',
 'the',
 'publick,',
 'as',
 'to',
 'have',
 'his',
 'statue',
 'set',
 'up',
 'for',
 'a',
 'preserver',
 'of',
 'the',
 'nation.',
 'but',
 'my',
 'intention',
 'is',
 'very',
 'far',
 'from',
 'being',
 'confined',
 'to',
 'provide',
 'only',
 'for',
 'the',
 'children',
 'of',
 'professed',
 'beggars:',
 'it',
 'is',
 'of',
 'a',
 'much',
 'greater',
 'extent,',
 'and',
 'shall',
 'take',
 'in',
 'the',
 'whole',
 'number',
 'of',
 'infants',
 'at',
 'a',
 'certain',
 'age,',
 'who',
 'are',
 'born',
 'of',
 'parents',
 'in',
 'effect',
 'as',
 'little',
 'able',
 'to',
 'support',
 'them,',
 'as',
 'those',
 'who',
 'demand',
 'our',
 'charity',
 'in',
 'the',
 'streets.',
 'as',
 'to',
 'my',
 'own',
 'part,',
 'having',
 'turned',
 'my',
 'thoughts',
 'for',
 'many',
 'years',
 'upon',
 'this',
 'important',
 'subject,',
 'and',
 'maturely',
 'weighed',
 'the',
 'several',
 'schemes',
 'of',
 'our',
 'projectors,',
 'i',
 'have',
 'always',
 'found',
 'them',
 'grossly',
 'mistaken',
 'in',
 'their',
 'computation.',
 'it',
 'is',
 'true,',
 'a',
 'child',
 'just',
 'dropt',
 'from',
 'its',
 'dam,',
 'may',
 'be',
 'supported',
 'by',
 'her',
 'milk,',
 'for',
 'a',
 'solar',
 'year,',
 'with',
 'little',
 'other',
 'nourishment:',
 'at',
 'most',
 'not',
 'above',
 'the',
 'value',
 'of',
 'two',
 'shillings,',
 'which',
 'the',
 'mother',
 'may',
 'certainly',
 'get,',
 'or',
 'the',
 'value',
 'in',
 'scraps,',
 'by',
 'her',
 'lawful',
 'occupation',
 'of',
 'begging;',
 'and',
 'it',
 'is',
 'exactly',
 'at',
 'one',
 'year',
 'old',
 'that',
 'i',
 'propose',
 'to',
 'provide',
 'for',
 'them',
 'in',
 'such',
 'a',
 'manner,',
 'as,',
 'instead',
 'of',
 'being',
 'a',
 'charge',
 'upon',
 'their',
 'parents,',
 'or',
 'the',
 'parish,',
 'or',
 'wanting',
 'food',
 'and',
 'raiment',
 'for',
 'the',
 'rest',
 'of',
 'their',
 'lives,',
 'they',
 'shall,',
 'on',
 'the',
 'contrary,',
 'contribute',
 'to',
 'the',
 'feeding,',
 'and',
 'partly',
 'to',
 'the',
 'clothing',
 'of',
 'many',
 'thousands.',
 'there',
 'is',
 'likewise',
 'another',
 'great',
 'advantage',
 'in',
 'my',
 'scheme,',
 'that',
 'it',
 'will',
 'prevent',
 'those',
 'voluntary',
 'abortions,',
 'and',
 'that',
 'horrid',
 'practice',
 'of',
 'women',
 'murdering',
 'their',
 'bastard',
 'children,',
 'alas!',
 'too',
 'frequent',
 'among',
 'us,',
 'sacrificing',
 'the',
 'poor',
 'innocent',
 'babes,',
 'i',
 'doubt,',
 'more',
 'to',
 'avoid',
 'the',
 'expence',
 'than',
 'the',
 'shame,',
 'which',
 'would',
 'move',
 'tears',
 'and',
 'pity',
 'in',
 'the',
 'most',
 'savage',
 'and',
 'inhuman',
 'breast.',
 'the',
 'number',
 'of',
 'souls',
 'in',
 'this',
 'kingdom',
 'being',
 'usually',
 'reckoned',
 'one',
 'million',
 'and',
 'a',
 'half,',
 'of',
 'these',
 'i',
 'calculate',
 'there',
 'may',
 'be',
 'about',
 'two',
 'hundred',
 'thousand',
 'couple,',
 'whose',
 'wives',
 'are',
 'breeders;',
 'from',
 'which',
 'number',
 'i',
 'subtract',
 'thirty',
 'thousand',
 'couple,',
 'who',
 'are',
 'able',
 'to',
 'maintain',
 'their',
 'own',
 'children,',
 '(although',
 'i',
 'apprehend',
 'there',
 'cannot',
 'be',
 'so',
 'many',
 'under',
 'the',
 'present',
 'distresses',
 'of',
 'the',
 'kingdom)',
 'but',
 'this',
 'being',
 'granted,',
 'there',
 'will',
 'remain',
 'a',
 'hundred',
 'and',
 'seventy',
 'thousand',
 'breeders.',
 'i',
 'again',
 'subtract',
 'fifty',
 'thousand,',
 'for',
 'those',
 'women',
 'who',
 'miscarry,',
 'or',
 'whose',
 'children',
 'die',
 'by',
 'accident',
 'or',
 'disease',
 'within',
 'the',
 'year.',
 'there',
 'only',
 'remain',
 'a',
 'hundred',
 'and',
 'twenty',
 'thousand',
 'children',
 'of',
 'poor',
 'parents',
 'annually',
 'born.',
 'the',
 'question',
 'therefore',
 'is,',
 'how',
 'this',
 'number',
 'shall',
 'be',
 'reared',
 'and',
 'provided',
 'for?',
 'which,',
 'as',
 'i',
 'have',
 'already',
 'said,',
 'under',
 'the',
 'present',
 'situation',
 'of',
 'affairs,',
 'is',
 'utterly',
 'impossible',
 'by',
 'all',
 'the',
 'methods',
 'hitherto',
 'proposed.',
 'for',
 'we',
 'can',
 'neither',
 'employ',
 'them',
 'in',
 'handicraft',
 'or',
 'agriculture;',
 'they',
 'neither',
 'build',
 'houses,',
 '(i',
 'mean',
 'in',
 'the',
 'country)',
 'nor',
 'cultivate',
 'land:',
 'they',
 'can',
 'very',
 'seldom',
 'pick',
 'up',
 'a',
 'livelihood',
 'by',
 'stealing',
 'till',
 'they',
 'arrive',
 'at',
 'six',
 'years',
 'old;',
 'except',
 'where',
 'they',
 'are',
 'of',
 'towardly',
 'parts,',
 'although',
 'i',
 'confess',
 'they',
 'learn',
 'the',
 'rudiments',
 'much',
 'earlier;',
 'during',
 'which',
 'time',
 'they',
 'can',
 'however',
 'be',
 'properly',
 'looked',
 'upon',
 'only',
 'as',
 'probationers;',
 'as',
 'i',
 'have',
 'been',
 'informed',
 'by',
 'a',
 'principal',
 'gentleman',
 'in',
 'the',
 'county',
 'of',
 'cavan,',
 'who',
 'protested',
 'to',
 'me,',
 'that',
 'he',
 'never',
 'knew',
 'above',
 'one',
 'or',
 'two',
 'instances',
 'under',
 'the',
 'age',
 'of',
 'six,',
 'even',
 'in',
 'a',
 'part',
 'of',
 'the',
 'kingdom',
 'so',
 'renowned',
 'for',
 'the',
 'quickest',
 'proficiency',
 'in',
 'that',
 'art.',
 'i',
 'am',
 'assured',
 'by',
 'our',
 'merchants,',
 'that',
 'a',
 'boy',
 'or',
 'a',
 'girl,',
 'before',
 'twelve',
 'years',
 'old,',
 'is',
 'no',
 'saleable',
 'commodity,',
 'and',
 'even',
 'when',
 'they',
 'come',
 'to',
 'this',
 'age,',
 'they',
 'will',
 'not',
 'yield',
 'above',
 'three',
 'pounds,',
 'or',
 'three',
 'pounds',
 'and',
 'half',
 'a',
 'crown',
 'at',
 'most,',
 'on',
 'the',
 'exchange;',
 'which',
 'cannot',
 'turn',
 'to',
 'account',
 'either',
 'to',
 'the',
 'parents',
 'or',
 'kingdom,',
 'the',
 'charge',
 'of',
 'nutriments',
 'and',
 'rags',
 'having',
 'been',
 'at',
 'least',
 'four',
 'times',
 'that',
 'value.',
 'i',
 'shall',
 'now',
 'therefore',
 'humbly',
 'propose',
 'my',
 'own',
 'thoughts,',
 'which',
 'i',
 'hope',
 'will',
 'not',
 'be',
 'liable',
 'to',
 'the',
 'least',
 'objection.',
 'i',
 'have',
 'been',
 'assured',
 'by',
 'a',
 'very',
 'knowing',
 'american',
 'of',
 'my',
 'acquaintance',
 'in',
 'london,',
 'that',
 'a',
 'young',
 'healthy',
 'child',
 'well',
 'nursed,',
 'is,',
 'at',
 ...]
In [135]:
#notice anything that needs to be removed from list of tokens?
cleaned_tokens = [word.replace("\ufeff", "") for word in tokens]
cleaned_tokens
Out[135]:
['',
 'the',
 'project',
 'gutenberg',
 'ebook',
 'of',
 'a',
 'modest',
 'proposal,',
 'by',
 'jonathan',
 'swift',
 'this',
 'ebook',
 'is',
 'for',
 'the',
 'use',
 'of',
 'anyone',
 'anywhere',
 'at',
 'no',
 'cost',
 'and',
 'with',
 'almost',
 'no',
 'restrictions',
 'whatsoever.',
 'you',
 'may',
 'copy',
 'it,',
 'give',
 'it',
 'away',
 'or',
 're-use',
 'it',
 'under',
 'the',
 'terms',
 'of',
 'the',
 'project',
 'gutenberg',
 'license',
 'included',
 'with',
 'this',
 'ebook',
 'or',
 'online',
 'at',
 'www.gutenberg.org',
 'title:',
 'a',
 'modest',
 'proposal',
 'for',
 'preventing',
 'the',
 'children',
 'of',
 'poor',
 'people',
 'in',
 'ireland,',
 'from',
 'being',
 'a',
 'burden',
 'on',
 'their',
 'parents',
 'or',
 'country,',
 'and',
 'for',
 'making',
 'them',
 'beneficial',
 'to',
 'the',
 'publick',
 '-',
 '1729',
 'author:',
 'jonathan',
 'swift',
 'release',
 'date:',
 'july',
 '27,',
 '2008',
 '[ebook',
 '#1080]',
 'last',
 'updated:',
 'october',
 '17,',
 '2019',
 'language:',
 'english',
 'character',
 'set',
 'encoding:',
 'utf-8',
 '***',
 'start',
 'of',
 'this',
 'project',
 'gutenberg',
 'ebook',
 'a',
 'modest',
 'proposal',
 '***',
 'produced',
 'by',
 'an',
 'anonymous',
 'volunteer,',
 'and',
 'david',
 'widger',
 'a',
 'modest',
 'proposal',
 'for',
 'preventing',
 'the',
 'children',
 'of',
 'poor',
 'people',
 'in',
 'ireland,',
 'from',
 'being',
 'a',
 'burden',
 'on',
 'their',
 'parents',
 'or',
 'country,',
 'and',
 'for',
 'making',
 'them',
 'beneficial',
 'to',
 'the',
 'publick.',
 'by',
 'dr.',
 'jonathan',
 'swift',
 '1729',
 'it',
 'is',
 'a',
 'melancholy',
 'object',
 'to',
 'those,',
 'who',
 'walk',
 'through',
 'this',
 'great',
 'town,',
 'or',
 'travel',
 'in',
 'the',
 'country,',
 'when',
 'they',
 'see',
 'the',
 'streets,',
 'the',
 'roads,',
 'and',
 'cabbin-doors',
 'crowded',
 'with',
 'beggars',
 'of',
 'the',
 'female',
 'sex,',
 'followed',
 'by',
 'three,',
 'four,',
 'or',
 'six',
 'children,',
 'all',
 'in',
 'rags,',
 'and',
 'importuning',
 'every',
 'passenger',
 'for',
 'an',
 'alms.',
 'these',
 'mothers,',
 'instead',
 'of',
 'being',
 'able',
 'to',
 'work',
 'for',
 'their',
 'honest',
 'livelihood,',
 'are',
 'forced',
 'to',
 'employ',
 'all',
 'their',
 'time',
 'in',
 'stroling',
 'to',
 'beg',
 'sustenance',
 'for',
 'their',
 'helpless',
 'infants',
 'who,',
 'as',
 'they',
 'grow',
 'up,',
 'either',
 'turn',
 'thieves',
 'for',
 'want',
 'of',
 'work,',
 'or',
 'leave',
 'their',
 'dear',
 'native',
 'country,',
 'to',
 'fight',
 'for',
 'the',
 'pretender',
 'in',
 'spain,',
 'or',
 'sell',
 'themselves',
 'to',
 'the',
 'barbadoes.',
 'i',
 'think',
 'it',
 'is',
 'agreed',
 'by',
 'all',
 'parties,',
 'that',
 'this',
 'prodigious',
 'number',
 'of',
 'children',
 'in',
 'the',
 'arms,',
 'or',
 'on',
 'the',
 'backs,',
 'or',
 'at',
 'the',
 'heels',
 'of',
 'their',
 'mothers,',
 'and',
 'frequently',
 'of',
 'their',
 'fathers,',
 'is',
 'in',
 'the',
 'present',
 'deplorable',
 'state',
 'of',
 'the',
 'kingdom,',
 'a',
 'very',
 'great',
 'additional',
 'grievance;',
 'and',
 'therefore',
 'whoever',
 'could',
 'find',
 'out',
 'a',
 'fair,',
 'cheap',
 'and',
 'easy',
 'method',
 'of',
 'making',
 'these',
 'children',
 'sound',
 'and',
 'useful',
 'members',
 'of',
 'the',
 'commonwealth,',
 'would',
 'deserve',
 'so',
 'well',
 'of',
 'the',
 'publick,',
 'as',
 'to',
 'have',
 'his',
 'statue',
 'set',
 'up',
 'for',
 'a',
 'preserver',
 'of',
 'the',
 'nation.',
 'but',
 'my',
 'intention',
 'is',
 'very',
 'far',
 'from',
 'being',
 'confined',
 'to',
 'provide',
 'only',
 'for',
 'the',
 'children',
 'of',
 'professed',
 'beggars:',
 'it',
 'is',
 'of',
 'a',
 'much',
 'greater',
 'extent,',
 'and',
 'shall',
 'take',
 'in',
 'the',
 'whole',
 'number',
 'of',
 'infants',
 'at',
 'a',
 'certain',
 'age,',
 'who',
 'are',
 'born',
 'of',
 'parents',
 'in',
 'effect',
 'as',
 'little',
 'able',
 'to',
 'support',
 'them,',
 'as',
 'those',
 'who',
 'demand',
 'our',
 'charity',
 'in',
 'the',
 'streets.',
 'as',
 'to',
 'my',
 'own',
 'part,',
 'having',
 'turned',
 'my',
 'thoughts',
 'for',
 'many',
 'years',
 'upon',
 'this',
 'important',
 'subject,',
 'and',
 'maturely',
 'weighed',
 'the',
 'several',
 'schemes',
 'of',
 'our',
 'projectors,',
 'i',
 'have',
 'always',
 'found',
 'them',
 'grossly',
 'mistaken',
 'in',
 'their',
 'computation.',
 'it',
 'is',
 'true,',
 'a',
 'child',
 'just',
 'dropt',
 'from',
 'its',
 'dam,',
 'may',
 'be',
 'supported',
 'by',
 'her',
 'milk,',
 'for',
 'a',
 'solar',
 'year,',
 'with',
 'little',
 'other',
 'nourishment:',
 'at',
 'most',
 'not',
 'above',
 'the',
 'value',
 'of',
 'two',
 'shillings,',
 'which',
 'the',
 'mother',
 'may',
 'certainly',
 'get,',
 'or',
 'the',
 'value',
 'in',
 'scraps,',
 'by',
 'her',
 'lawful',
 'occupation',
 'of',
 'begging;',
 'and',
 'it',
 'is',
 'exactly',
 'at',
 'one',
 'year',
 'old',
 'that',
 'i',
 'propose',
 'to',
 'provide',
 'for',
 'them',
 'in',
 'such',
 'a',
 'manner,',
 'as,',
 'instead',
 'of',
 'being',
 'a',
 'charge',
 'upon',
 'their',
 'parents,',
 'or',
 'the',
 'parish,',
 'or',
 'wanting',
 'food',
 'and',
 'raiment',
 'for',
 'the',
 'rest',
 'of',
 'their',
 'lives,',
 'they',
 'shall,',
 'on',
 'the',
 'contrary,',
 'contribute',
 'to',
 'the',
 'feeding,',
 'and',
 'partly',
 'to',
 'the',
 'clothing',
 'of',
 'many',
 'thousands.',
 'there',
 'is',
 'likewise',
 'another',
 'great',
 'advantage',
 'in',
 'my',
 'scheme,',
 'that',
 'it',
 'will',
 'prevent',
 'those',
 'voluntary',
 'abortions,',
 'and',
 'that',
 'horrid',
 'practice',
 'of',
 'women',
 'murdering',
 'their',
 'bastard',
 'children,',
 'alas!',
 'too',
 'frequent',
 'among',
 'us,',
 'sacrificing',
 'the',
 'poor',
 'innocent',
 'babes,',
 'i',
 'doubt,',
 'more',
 'to',
 'avoid',
 'the',
 'expence',
 'than',
 'the',
 'shame,',
 'which',
 'would',
 'move',
 'tears',
 'and',
 'pity',
 'in',
 'the',
 'most',
 'savage',
 'and',
 'inhuman',
 'breast.',
 'the',
 'number',
 'of',
 'souls',
 'in',
 'this',
 'kingdom',
 'being',
 'usually',
 'reckoned',
 'one',
 'million',
 'and',
 'a',
 'half,',
 'of',
 'these',
 'i',
 'calculate',
 'there',
 'may',
 'be',
 'about',
 'two',
 'hundred',
 'thousand',
 'couple,',
 'whose',
 'wives',
 'are',
 'breeders;',
 'from',
 'which',
 'number',
 'i',
 'subtract',
 'thirty',
 'thousand',
 'couple,',
 'who',
 'are',
 'able',
 'to',
 'maintain',
 'their',
 'own',
 'children,',
 '(although',
 'i',
 'apprehend',
 'there',
 'cannot',
 'be',
 'so',
 'many',
 'under',
 'the',
 'present',
 'distresses',
 'of',
 'the',
 'kingdom)',
 'but',
 'this',
 'being',
 'granted,',
 'there',
 'will',
 'remain',
 'a',
 'hundred',
 'and',
 'seventy',
 'thousand',
 'breeders.',
 'i',
 'again',
 'subtract',
 'fifty',
 'thousand,',
 'for',
 'those',
 'women',
 'who',
 'miscarry,',
 'or',
 'whose',
 'children',
 'die',
 'by',
 'accident',
 'or',
 'disease',
 'within',
 'the',
 'year.',
 'there',
 'only',
 'remain',
 'a',
 'hundred',
 'and',
 'twenty',
 'thousand',
 'children',
 'of',
 'poor',
 'parents',
 'annually',
 'born.',
 'the',
 'question',
 'therefore',
 'is,',
 'how',
 'this',
 'number',
 'shall',
 'be',
 'reared',
 'and',
 'provided',
 'for?',
 'which,',
 'as',
 'i',
 'have',
 'already',
 'said,',
 'under',
 'the',
 'present',
 'situation',
 'of',
 'affairs,',
 'is',
 'utterly',
 'impossible',
 'by',
 'all',
 'the',
 'methods',
 'hitherto',
 'proposed.',
 'for',
 'we',
 'can',
 'neither',
 'employ',
 'them',
 'in',
 'handicraft',
 'or',
 'agriculture;',
 'they',
 'neither',
 'build',
 'houses,',
 '(i',
 'mean',
 'in',
 'the',
 'country)',
 'nor',
 'cultivate',
 'land:',
 'they',
 'can',
 'very',
 'seldom',
 'pick',
 'up',
 'a',
 'livelihood',
 'by',
 'stealing',
 'till',
 'they',
 'arrive',
 'at',
 'six',
 'years',
 'old;',
 'except',
 'where',
 'they',
 'are',
 'of',
 'towardly',
 'parts,',
 'although',
 'i',
 'confess',
 'they',
 'learn',
 'the',
 'rudiments',
 'much',
 'earlier;',
 'during',
 'which',
 'time',
 'they',
 'can',
 'however',
 'be',
 'properly',
 'looked',
 'upon',
 'only',
 'as',
 'probationers;',
 'as',
 'i',
 'have',
 'been',
 'informed',
 'by',
 'a',
 'principal',
 'gentleman',
 'in',
 'the',
 'county',
 'of',
 'cavan,',
 'who',
 'protested',
 'to',
 'me,',
 'that',
 'he',
 'never',
 'knew',
 'above',
 'one',
 'or',
 'two',
 'instances',
 'under',
 'the',
 'age',
 'of',
 'six,',
 'even',
 'in',
 'a',
 'part',
 'of',
 'the',
 'kingdom',
 'so',
 'renowned',
 'for',
 'the',
 'quickest',
 'proficiency',
 'in',
 'that',
 'art.',
 'i',
 'am',
 'assured',
 'by',
 'our',
 'merchants,',
 'that',
 'a',
 'boy',
 'or',
 'a',
 'girl,',
 'before',
 'twelve',
 'years',
 'old,',
 'is',
 'no',
 'saleable',
 'commodity,',
 'and',
 'even',
 'when',
 'they',
 'come',
 'to',
 'this',
 'age,',
 'they',
 'will',
 'not',
 'yield',
 'above',
 'three',
 'pounds,',
 'or',
 'three',
 'pounds',
 'and',
 'half',
 'a',
 'crown',
 'at',
 'most,',
 'on',
 'the',
 'exchange;',
 'which',
 'cannot',
 'turn',
 'to',
 'account',
 'either',
 'to',
 'the',
 'parents',
 'or',
 'kingdom,',
 'the',
 'charge',
 'of',
 'nutriments',
 'and',
 'rags',
 'having',
 'been',
 'at',
 'least',
 'four',
 'times',
 'that',
 'value.',
 'i',
 'shall',
 'now',
 'therefore',
 'humbly',
 'propose',
 'my',
 'own',
 'thoughts,',
 'which',
 'i',
 'hope',
 'will',
 'not',
 'be',
 'liable',
 'to',
 'the',
 'least',
 'objection.',
 'i',
 'have',
 'been',
 'assured',
 'by',
 'a',
 'very',
 'knowing',
 'american',
 'of',
 'my',
 'acquaintance',
 'in',
 'london,',
 'that',
 'a',
 'young',
 'healthy',
 'child',
 'well',
 'nursed,',
 'is,',
 'at',
 ...]
In [136]:
num_tokens = len(cleaned_tokens)
num_tokens
Out[136]:
6533
In [137]:
#finding unique words
#create new empty variable (list) called unique

unique = []
for token in cleaned_tokens:
    if token not in unique:
        unique.append(token)

#sort
unique.sort()

#print
print(unique)
['', '"defects,"', '"information', '"plain', '"project', '"right', '#1080]', '$5,000)', "'as-is',", '("the', '($1', '(801)', '(a)', '(although', '(and', '(any', '(as', '(b)', '(c)', '(does', '(i', '(if', '(in', '(or', '(trademark/copyright)', '(where', '(www.gutenberg.org),', '*', '***', '*****', '-', '1.', '1.a.', '1.b.', '1.c', '1.c.', '1.d.', '1.e', '1.e.', '1.e.1', '1.e.1.', '1.e.2.', '1.e.3.', '1.e.4.', '1.e.5.', '1.e.6.', '1.e.7', '1.e.7.', '1.e.8', '1.e.8.', '1.e.9.', '1.f.', '1.f.1.', '1.f.2.', '1.f.3,', '1.f.3.', '1.f.4.', '1.f.5.', '1.f.6.', '1080-0.txt', '1080-0.zip', '12', '1500', '17,', '1729', '2.', '20%', '2001,', '2008', '2019', '27,', '28', '3', '3.', '30', '4', '4,', '4.', '5.', '50', '501(c)(3)', '596-1887.', '60', '64-6221541.', '750175,', '809', '84116,', '90', '99775,', '[ebook', 'a', 'abide', 'able', 'abortions,', 'about', 'above', 'abroad', 'absence', 'absentees', 'accept', 'accepted', 'accepted,', 'accepting', 'access', 'accessed,', 'accessible', 'accident', 'accidentally', 'accordance', 'account', 'acquaintance', 'acting', 'active', 'actual,', 'adding', 'addition', 'additional', 'additions', 'address', 'address:', 'addresses.', 'admirable', 'admit', 'advanced', 'advancing', 'advantage', 'advantage,', 'advantages', 'advising', 'affairs,', 'after', 'after;', 'again', 'against', 'age', 'age,', 'aged,', 'agent', 'ago,', 'agree', 'agreed', 'agreement', 'agreement,', 'agreement.', 'agriculture;', 'ak', 'alas!', 'alaska,', 'alive,', 'all', 'all,', 'allow', 'almost', 'alms.', 'alone', 'alone,', 'already', 'also', 'alteration,', 'alternate', 'although', 'altogether', 'always', 'am', 'american', 'among', 'an', 'and', 'animosities', 'annual', 'annually', 'annum', 'annum,', 'anonymous', 'another', 'answer', 'answer,', 'any', 'any)', 'anyone', 'anyone.', 'anything', 'anywhere', 'appear', 'appearing', 'appears,', 'applicable', 'apply', 'appointed', 'apprehend', 'approach', 'apt', 'archive', 'are', 'arise', 'arms,', 'array', 'arrive', 'art', 'art.', 'artificially', 'as', 'as,', 'ascii"', 'ask', 'assemblies', 'assistance', 'associated', 'associated)', 'assured', 'at', 'attached', 'attempt', 'author', 'author,', 'author:', 'authors', 'available', 'avoid', 'avoided', 'away', 'away--you', 'b.', 'babes,', 'backs,', 'backs.', 'bacon,', 'baked,', 'barbadoes.', 'barrel’d', 'based', 'bastard', 'be', 'bear', 'beat', 'because', 'become', 'beef:', 'been', 'before', 'beg', 'beggars', 'beggars:', 'beggar’s', 'begging;', 'beginning', 'being', 'believe', 'below.', 'beneficial', 'bent', 'besides', 'besides,', 'best', 'better,', 'binary,', 'black', 'bodies', 'body', 'boiled', 'boiled;', 'bold', 'boots', 'bordering', 'born', 'born.', 'both', 'bound', 'box', 'boy', 'breach', 'breast.', 'breed', 'breed,', 'breeders', 'breeders,', 'breeders.', 'breeders;', 'brevity.', 'bring', 'brought', 'build', 'bulk', 'burden', 'business', 'but', 'but,', 'butchers', 'buy', 'buying', 'by', 'cabbin-doors', 'calculate', 'calculated', 'calf,', 'came', 'can', 'cannot', 'carcass', 'carcass;', 'carcasses', 'carcasses;', 'card', 'care', 'carry', 'catholick', 'cattle', 'cattle,', 'cause', 'cause.', 'cautious', 'cavan,', 'cease', 'censure', 'certain', 'certainly', 'chair,', 'change.', 'character', 'charge', 'charge.', 'charges.', 'charitable', 'charities', 'charity', 'cheap', 'cheap,', 'cheaper)', 'cheat', 'check', 'checks,', 'chief', 'child', 'child,', 'child-bearing.', 'child.', 'children', 'children,', 'children.', 'choose', 'chosen', 'christenings,', 'circulate', 'circumstance', 'city', 'city,', 'claim', 'clearly', 'clothes', 'clothes,', 'clothing', 'codes', 'cold', 'collateral', 'collection', 'collection.', 'come', 'come.', 'commercial', 'committed', 'commodity', 'commodity,', 'common', 'commonwealth,', 'comparable', 'compilation', 'compliance', 'compliance.', 'comply', 'complying', 'compressed,', 'computation.', 'compute', 'computed', 'computed,', 'computer', 'computers', 'computers.', 'conceived', 'concept', 'concern', 'concerning', 'condition.', 'confess', 'confess,', 'confessed,', 'confined', 'confirmation', 'confirmed', 'conscience', 'consciences', 'consequential,', 'consequently', 'consider', 'considerable', 'consideration,', 'consistence,', 'constant', 'contact', 'contain', 'containing', 'continual', 'continuance', 'contract', 'contradiction', 'contrary,', 'contribute', 'contributions', 'contrive', 'convenient', 'conversation', 'convert', 'cook,', 'copied', 'copies', 'copy', 'copy,', 'copying', 'copying,', 'copyright', 'corn', 'corporation', 'corrupt', 'cost', 'cost,', 'costs', 'cottagers', 'cottagers,', 'could', 'countries', 'country', 'country)', 'country,', 'county', 'couple,', 'course', 'course,', 'court', 'cover', 'cows', 'created', 'creating', 'creation', 'creatures', 'credit', 'critical', 'crowded', 'crown', 'crowns.', 'crucified', 'cruelty,', 'cultivate', 'curate.', 'curing', 'current', 'custom', 'customers', 'dainty;', 'dam,', 'damage', 'damage.', 'damaged', 'damages', 'damages,', 'damages.', 'danger', 'dangerous', 'data,', 'date', 'date:', 'david', 'day', 'day,', 'days', 'dealing,', 'dear', 'dear,', 'death,', 'debt', 'deductible', 'deer,', 'defect', 'defective', 'defective,', 'deference', 'degree', 'degree,', 'deletions', 'delicious', 'deliver', 'delivered', 'demand', 'deny,', 'depends', 'deplorable', 'derivative', 'derive', 'derived', 'described', 'deserve', 'deserving', 'design', 'desire', 'desired', 'despairing', 'despite', 'desponding', 'destroy', 'destroyed', 'destruction', 'detach', 'determine', 'devoured', 'die', 'differ', 'different', 'digressed,', 'dine', 'dines', 'direct,', 'directly', 'director', 'disagreeable,', 'disclaim', 'disclaimer', 'disclaimers', 'discontinue', 'discoursing', 'discover', 'discovered', 'disease', 'diseased,', 'dish,', 'dishes', 'disk', 'dislike', 'disobliging', 'display,', 'displayed,', 'displaying', 'displaying,', 'disposed', 'distress,', 'distresses', 'distribute', 'distributed', 'distributed:', 'distributing', 'distributing,', 'distribution', 'distributor', 'do', 'does', 'domain', 'donate', 'donate,', 'donate.', 'donation', 'donations', 'donations.', 'donors', 'doubt', 'doubt,', 'downloading,', 'dr.', 'dressed,', 'dressing', 'dropt', 'dublin', 'dublin,', 'due', 'during', 'dyet,', 'dying,', 'e-mail)', 'each', 'earlier;', 'earnestly', 'earth.', 'ease', 'easily', 'easy', 'easy,', 'easy.', 'eat', 'eating;', 'ebook', 'ebook,', 'ebook.', 'ebooks', 'ebooks,', 'ebooks.', 'edition.', 'editions', 'editions,', 'educational', 'effect', 'effect;', 'effectual.', 'effort', 'effort,', 'efforts', 'efforts,', 'eight', 'eighty', 'ein', 'either', 'elect', 'electronic', 'electronically', 'electronically,', 'email', 'eminent', 'emperor,', 'employ', 'employee', 'employees', 'emulation', 'encoding:', 'encouraged', 'encrease', 'encreased', 'encreaseth', 'end', 'endeavouring', 'enemies,', 'enforced', 'england.', 'english', 'ensuring', 'entertainment', 'entertainment.', 'entirely', 'entity', 'enumerated.', 'episcopal', 'equally', 'equipment', 'equipment.', 'errors,', 'especially', 'esteem,', 'even', 'ever', 'ever.', 'every', 'evils', 'exact', 'exactly', 'exceeding', 'excellent', 'except', 'exchange;', 'exclusion', 'executioner', 'executive', 'exempt', 'exercise,', 'exists', 'expected.', 'expedient', 'expedients,', 'expedients:', 'expence', 'expence.', 'expend', 'expense', 'expenses,', 'expensive', 'expensiveness', 'experience,', 'explanation', 'explanation.', 'exportation', 'exportation,', 'exporting', 'express', 'extent', 'extent,', 'facility:', 'factions,', 'fair', 'fair,', 'fairbanks,', 'families', 'family', 'famine,', 'famous', 'far', 'farmers)', 'farmers,', 'farrow;', 'fast', 'fat', 'fathers,', 'fatten', 'fattest', 'fear', 'feast,', 'federal', 'fee', 'feeding,', 'fees', 'fees,', 'fees.', 'fell', 'female', 'females,', 'females.', 'few', 'fifteen,', 'fifthly,', 'fifty', 'fight', 'figure', 'file', 'files', 'filth,', 'financial', 'find', 'fine', 'fineries', 'first', 'first,', 'fish', 'fit', 'fitness', 'five', 'fix', 'flay', 'flesh', 'flesh,', 'foal,', 'follow', 'followed', 'following', 'fond', 'food', 'food,', 'for', 'for,', 'for?', 'forced', 'fore', 'foreign', 'form', 'form,', 'form.', 'format', 'formats', 'formosa,', 'forth', 'fortunately', 'fortune', 'fortune,', 'fortunes,', 'forty', 'found', 'foundation', 'foundation"', "foundation's", 'foundation,', 'foundation.', 'foundation."', 'four', 'four,', 'four-fifths', 'fourteen', 'fourth', 'fourthly,', 'free', 'freely', 'french', 'frequent', 'frequented', 'frequently', 'fricasee,', 'friend,', 'friends,', 'from', 'from.', 'fruits', 'full', 'furniture,', 'further', 'future', 'gain', 'gaming', 'gbnewby@pglaf.org', 'general', 'generally', 'generations', 'generations.', 'gentleman', 'gentlemen', 'gentlemen,', 'gentlemen.', 'get', 'get,', 'gibbet,', 'girl', 'girl,', 'girls', 'give', 'given', 'giving', 'glad', 'gloves', 'glutted', 'glympse', 'goals', 'gone', 'good', 'goodness,', 'goods', 'goods,', 'govern', 'grant', 'granted', 'granted,', 'gratefully', 'grave', 'great', 'greater', 'greatly', 'gregory', 'grievance;', 'grievous', 'groat', 'gross', 'grossly', 'group', 'grow', 'grown,', 'growth', 'guests,', 'gutenberg', 'gutenberg"', 'gutenberg"),', 'gutenberg-tm', "gutenberg-tm's", 'gutenberg-tm,', 'gutenberg-tm.', 'gutenberg:', 'half', 'half,', 'handicraft', 'happened', 'happily', 'happiness', 'harmless', 'hart', 'has', 'hath', 'have', 'having', 'he', 'head', 'healthy', 'hear', 'heart,', 'hearty', 'heels', 'help', 'help,', 'helpless', 'her', 'highest', 'highly', 'him.', 'hind', 'hired', 'his', 'hitherto', 'hold', 'holder', 'holder),', 'holder,', 'holder.', 'home', 'honest', 'honesty,', 'hope', 'hope,', 'hopeful', 'hoping', 'horrid', 'hot', 'house', 'houses', 'houses,', 'houshold', 'how', 'however', 'however,', 'http://www.gutenberg.org/1/0/8/1080/', 'humane', 'humble', 'humbly', 'hundred', 'hundreds', 'hypertext', 'i', 'identification', 'identify,', 'idle,', 'idleness,', 'if', 'immediate', 'immediately', 'imperial', 'implied', 'implied,', 'importance.', 'important', 'importuning', 'imposed', 'impossibility', 'impossible', 'improbable', 'improvement', 'in', 'in:', 'inaccurate', 'incidental', 'inclemencies', 'include', 'included', 'included.', 'included;', 'includes', 'including', 'incomplete,', 'increasing', 'incumbrance.', 'incur', 'indeed', 'indemnify', 'indemnity', 'indicate', 'indicating', 'indirect,', 'indirectly', 'individual', 'inducement', 'industry,', 'inevitable', 'infants', 'infants,', 'infant’s', 'information', 'information:', 'informed', 'infringement,', 'inhabitants', 'inhuman', 'innocent', 'innocent,', 'instance,', 'instances', 'instead', 'instruments', 'intailing', 'intellectual', 'intended.', 'intention', 'interest', 'internal', 'international', 'interpreted', 'into', 'introduced', 'introducing', 'invalidity', 'invited', 'ireland,', 'irs.', 'is', 'is,', 'island', 'it', 'it,', 'it.', 'its', 'jews,', 'joints', 'jonathan', 'july', 'just', 'justify', 'justly', 'keep', 'keeping', 'kick', 'kind', 'kind,', 'kingdom', 'kingdom)', 'kingdom,', 'kingdom.', 'knew', 'knife,', 'know', 'knowing', 'knowledge', 'known,', 'labour,', 'labourers,', 'ladies,', 'lads', 'lake', 'land:', 'landlord,', 'landlords', 'landlords,', 'landlord’s', 'language:', 'laplanders,', 'last', 'lastly,', 'late', 'lately', 'law', 'law.', 'lawful', 'laws', 'laws.', 'lean,', 'learn', 'learning', 'least', 'leave', 'legal', 'legally', 'length', 'lent,', 'less', 'lessen', 'lessened', 'lessening', 'let', 'liability', 'liability,', 'liable', 'library', 'license', 'license,', 'license.', 'licensed', 'lieu', 'life', 'life.', 'like', 'like,', 'likewise', 'limitation', 'limited', 'linked', 'links', 'list', 'literary', 'little', 'livelihood', 'livelihood,', 'lives,', 'llc,', 'located', 'locations', 'locations.', 'london,', 'long', 'longer', 'looked', 'loose', 'lord', 'loss', 'lot', 'love', 'lover', 'luxury:', 'machine', 'made', 'magnificence', 'maidens,', 'mailing', 'maimed;', 'main', 'maintain', 'maintainance', 'maintaining', 'majesty’s', 'make', 'makes', 'making', 'male', 'males,', 'males;', 'man', 'mandarins', 'manner', 'manner,', 'manufacture.', 'manufacture:', 'many', 'many,', 'march,', 'mares', 'marked', 'market.', 'markets', 'marriage,', 'married', 'materials', 'matter,', 'maturely', 'maximum', 'may', 'may,', 'mayor’s', 'me', 'me,', 'mean', 'means', 'measure,', 'meat,', 'medium', 'medium,', 'meet', 'meetings,', 'melancholy', 'members', 'men', 'men,', 'merchantability', 'merchants,', 'mercy', 'merry', 'met', 'method', 'methods', 'michael', 'middle-aged', 'might', 'milk,', 'million', 'minister', 'miscarriage.', 'miscarry,', 'miseries,', 'misfortunes,', 'mission', 'mississippi', 'mistaken', 'modest', 'modification,', 'modified', 'moment', 'money', 'month,', 'months', 'more', 'mortals,', 'most', 'most,', 'mother', 'mothers', 'mothers,', 'motive', 'mouths', 'move', 'much', 'murdering', 'must', 'must,', 'my', 'myself,', 'name', 'named', 'nation', 'nation,', 'nation.', 'nations', 'nation’s', 'native', 'nearest', 'nearly', 'neat', 'necessarily', 'necessary', 'need', 'negligence,', 'neither', 'network', 'never', 'new', 'new,', 'newby', 'newsletter', 'nine', 'no', 'non', 'nonproprietary', 'nor', 'north', 'not', 'nothing:', 'notice', 'notifies', 'nourishing', 'nourishment,', 'nourishment:', 'now', 'number', 'numerous', 'nursed,', 'nursing', 'nutriments', 'nutritive', 'object', 'objection', 'objection,', 'objection.', 'oblige', 'observe,', 'observed,', 'obsolete,', 'obtain', 'obtaining', 'obvious', 'occupation', 'occur:', 'october', 'of', 'off', 'offer', 'offer,', 'offered', 'offering', 'offers', 'office', 'official', 'often', 'old', 'old,', 'old;', 'omit,', 'on', 'on,', 'one', 'one--the', 'online', 'only', 'opinion,', 'opportunities', 'opportunity', 'oppression', 'or', 'or,', 'order', 'organized', 'original', 'originator', 'other', 'others', 'others,', 'others.', 'otherwise', 'our', 'out', 'outdated', 'outside', 'overrun,', 'overture,', 'owed', 'own', 'own,', 'owner', 'owner,', 'owns', 'page', 'pages', 'paid', 'pain', 'paper', 'paperwork', 'papists', 'papists,', 'paragraph', 'paragraphs', 'parents', 'parents,', 'parish,', 'parsimony,', 'part', 'part,', 'particular', 'particularly', 'parties,', 'partly', 'parts', 'parts,', 'party', 'passenger', 'past', 'patriot,', 'pay', 'paying', 'payments', 'penalties.', 'penny;', 'people', 'people,', 'pepper', 'per', 'perfection;', 'perform', 'perform,', 'performances', 'performed,', 'performing,', 'perhaps', 'periodic', 'permanent', 'permission', 'permission.', 'permitted', 'perpetual', 'person', 'person,', 'personal', 'persons', 'pg', 'pglaf),', 'phrase', 'physical', 'physician,', 'pick', 'piece', 'pigs,', 'pigs.', 'pine', 'pity', 'place', 'playhouse', 'please', 'please.', 'pleased', 'pleasure', 'plentiful', 'plentifully', 'plump', 'plump,', 'po', 'points.', 'poison', 'politicians', 'poor', 'poor,', 'poorer', 'popish', 'popular', 'possessed', 'possession.', 'possibility', 'possibly', 'posted', 'pound:', 'pounds', 'pounds,', 'pounds.', 'power,', 'practically', 'practice', 'practice)', 'practice,', 'practice.', 'pregnancy,', 'prepare', 'prepare)', 'prescribe,', 'present', 'preserve', 'preserver', 'pretender', 'pretender,', 'prevent', 'preventing', 'previous', 'price,', 'pride,', 'prime', 'principal', 'print', 'printed', 'probably', 'probationers;', 'problem.', 'processing', 'procure', 'prodigious', 'produce', 'produced', 'produces', 'production,', 'profess', 'professed', 'profession,', 'professor', 'proficiency', 'profit', 'profit,', 'profits', 'prohibition', 'project', 'project,', 'projectors,', 'prolifick', 'prominently', 'promote', 'promoting', 'promotion', 'proofread', 'propagation', 'proper', 'properly', 'property', 'proposal', 'proposal,', 'propose', 'proposed', 'proposed.', 'proprietary', 'prospect', 'protect', 'protected', 'protestants,', 'protested', 'provide', 'provide,', 'provided', 'providing', 'provision', 'provisions.', 'prudence', 'prudent', 'psalmanaazor,', 'public', 'publick', 'publick,', 'publick.', 'punitive', 'purpose', 'purpose,', 'purpose.', 'put', 'putting', 'quality', 'quality,', 'quarter', 'question', 'quickest', 'quitting', 'ragoust.', 'rags', 'rags,', 'raiment', 'raised', 'rather', 're-use', 'reaching', 'read', 'read,', 'readable', 'reader', 'reading', 'ready', 'real,', 'reared', 'reason', 'reasonable', 'reasonably', 'receipt', 'receipts', 'receive', 'received', 'receiving', 'reckon', 'reckoned', 'reckoning', 'recommend', 'redistribute', 'redistributing', 'redistribution', 'redistribution.', 'references', 'refinement', 'refund', 'refund"', 'refund.', 'regarded', 'registered', 'regulating', 'reject', 'rejecting', 'relations.', 'release', 'relieving', 'remain', 'remaining', 'remedies', 'remedy', 'remove', 'removed', 'removed.', 'renamed.', 'render', 'renowned', 'rent', 'rent,', 'repeat,', 'repine', 'replace', 'replacement', 'reported', 'reports,', 'representations', 'request,', 'require', 'require)', 'required', 'requirements', 'requirements,', 'requirements.', 'research', 'research.', 'reserved', 'resolution', 'rest', 'restrictions', 'return', 'returns.', 'revenue', 'rewards,', 'rich.', 'rid', 'right', 'roads,', 'roasted', 'roasted,', 'roasting', 'roman', 'rotting,', 'round', 'royalties', 'royalties.', 'royalty', 'rudiments', 'rules', 'rules,', 's.', 's/he', 'sacrificing', 'said,', 'sale', 'saleable', 'salt', 'salt,', 'same', 'savage', 'savages,', 'scattered', 'scene', 'scheme,', 'scheme.', 'schemes', 'schoolboys,', 'scraps,', 'scrupulous', 'search', 'season', 'season;', 'seasoned', 'second', 'secondly,', 'section', 'sections', 'secure', 'see', 'seem', 'seized,', 'seldom', 'sell', 'selves,', 'send', 'sending', 'sent', 'sentence', 'sentence,', 'sentiments;', 'serve', 'service.', 'service:', 'set', 'settlement', 'seventy', 'several', 'sex,', 'sexes', 'shall', 'shall,', 'shambles', 'shame,', 'share', 'shared', 'sharing', 'she', 'sheep,', 'shillings', 'shillings,', 'shopkeepers,', 'should', 'since', 'sincere', 'sincerity', 'single', 'site', 'situation', 'six', 'six,', 'sixthly,', 'skilful', 'skill', 'skin', 'small', 'so', 'soever', 'solar', 'sold', 'solicit', 'solicitation', 'solid', 'some', 'something', 'somewhat', 'soon', 'sort', 'souls', 'sound', 'sows', 'spain,', 'special', 'specific', 'specified', 'spirit', 'spread', 'squire', 'staff.', 'stand,', 'start', 'start:', 'starve', 'state', "state's", 'state,', 'statements', 'states', 'states,', 'states.', 'statue', 'status', 'stay', 'stealing', 'sterling', 'sterling,', 'stewed,', 'stir', 'stock', 'stock,', 'stored,', 'streets,', 'streets.', 'strength', 'strict', 'stroling', 'strongest', 'studious', 'subject', 'subject,', 'subject.', 'submission,', 'subscribe', 'subsistence', 'subtract', 'success,', 'such', 'suck', 'sufficient', 'summer', 'supplied', 'support', 'support.', 'supported', 'supposing', 'sure', 'survive', 'sustenance', 'sustenance,', 'swamp', 'swift', 'swine,', 'swine’s', 'synonymous', 'table.', 'tables', 'tables;', 'take', 'taken', 'taken,', 'taken:', 'takes', 'talk', 'taste', 'taste.', 'taverns,', 'tax', 'taxes.', 'taxing', 'teaching', 'tears', 'temperance:', 'ten', 'tenants', 'tenants,', 'tenants.', 'tender', 'tenderness', 'terms', 'texts', 'than', 'that', 'that,', 'the', 'their', 'them', 'them,', 'themselves', 'themselves:', 'then', 'thence', 'there', 'thereby', 'therefore', 'therefore,', 'these', 'they', 'thieves', 'thing', 'things', 'think', 'think,', 'thirdly,', 'thirty', 'this', 'this,', 'those', 'those,', 'though', 'thoughts', 'thoughts,', 'thousand', 'thousand,', 'thousand.', 'thousands.', 'three', 'three,', 'thrifty', 'through', 'through,', 'throughout', 'thus', 'thus,', 'till', 'time', 'time,', 'times', 'tithes', 'title', 'title:', 'to', 'to,', 'told', 'tolerably', 'too', 'topinamboo:', 'tough', 'towardly', 'towards', 'town,', 'trade,', 'trademark', 'trademark,', 'trademark.', 'transcribe', 'transcription', 'travel', 'treatment', 'trouble,', 'true', 'true,', 'turn', 'turned', 'twelve', 'twelve;', 'twenty', 'two', 'types', 'u.s.', 'under', 'understand,', 'understands', 'unenforceability', 'uniform', 'unite', 'united', 'unjustly)', 'unknown.', 'unless', 'unlink', 'unprotected', 'unsolicited', 'up', 'up,', 'updated', 'updated:', 'upon', 'upwards,', 'urged,', 'us', 'us,', 'us.', 'use', 'used', 'useful', 'useless', 'user', 'user,', 'using', 'usual,', 'usually', 'ut', 'utf-8', 'utterly', 'vain,', 'valuable', 'value', 'value.', 'vanilla', 'vanity,', 'variety', 'various', 'vast', 'vein', 'venison', 'vermin,', 'version', 'very', 'viewed,', 'viewing,', 'vintners', 'violates', 'violently', 'virtues', 'virus,', 'visionary', 'visit', 'visit:', 'void', 'voluntary', 'volunteer', 'volunteer,', 'volunteers', 'walk', 'walks', 'want', 'wanted', 'wanting', 'wanting;', 'warranties', 'warranty', 'warranty,', 'was', 'was,', 'way', 'ways', 'we', 'wearied', 'weather,', 'web', 'weddings', 'weigh', 'weighed', 'well', 'were', 'west,', 'what', 'whatsoever.', 'when', 'whenever', 'where', 'whereas', 'whereby', 'wherein', 'whereof', 'whether', 'which', 'which,', 'while', 'who', 'who,', 'whoever', 'whole', 'wholesome', 'wholly', 'whom', 'whose', 'wide', 'widest', 'widger', 'wife', 'will', 'winter.', 'wise', 'wish', 'with', 'within', 'without', 'wives', 'wives,', 'women', 'women,', 'women:', 'word', 'work', 'work,', 'work.', 'works', 'works,', 'works.', 'world', 'world.', 'worse.', 'worthy', 'would', 'would,', 'writing', 'written', 'www.gutenberg.org', 'www.gutenberg.org.', 'www.gutenberg.org/contact', 'www.gutenberg.org/donate', 'www.gutenberg.org/license.', 'year', 'year,', 'year.', 'yearling', 'yearly', 'years', 'years,', 'yet', 'yield', 'you', 'you!)', "you'll", 'young', 'youngest', 'your']

Get frequencies of a specific word

In [138]:
the_count = cleaned_tokens.count('the')
the_count
Out[138]:
351
In [143]:
boy_count = cleaned_tokens.count('boy')
boy_count
Out[143]:
1

Get frequencies of all the tokens in the text

In [144]:
fdistmp = FreqDist(cleaned_tokens) #note that FreqDist is a useful NLTK function
fdistmp
Out[144]:
FreqDist({'the': 351, 'of': 259, 'and': 186, 'to': 185, 'a': 152, 'in': 129, 'or': 109, 'project': 83, 'this': 73, 'for': 72, ...})
In [145]:
fdistmptoptoken = fdistmp.max() #the most frequent token in the corpus
fdistmptoptoken
Out[145]:
'the'
In [146]:
fdistmp.plot() #notice anything weird about this plot?
Out[146]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd220822eb0>
In [147]:
fdistmp.most_common(10) #get the top ten tokens, adjust number to whatever you want to check out
Out[147]:
[('the', 351),
 ('of', 259),
 ('and', 186),
 ('to', 185),
 ('a', 152),
 ('in', 129),
 ('or', 109),
 ('project', 83),
 ('this', 73),
 ('for', 72)]
In [148]:
fdistmp.plot(10) #plot the top 10 tokens 
Out[148]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd20ec1e400>

Utilizing Stopwords

In [149]:
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords = stopwords.words("english") #from nltk
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/kikuiper/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
In [150]:
print(stopwords)
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
In [151]:
#create your own stopwords list
my_stopwords =["I", "the", "me", "a", "and", "an", "to", "from", "but", "or"]
In [152]:
#or add your stopwords to the nltk stopwords
stopwords2 = stopwords+my_stopwords
In [153]:
print(stopwords2)
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't", 'I', 'the', 'me', 'a', 'and', 'an', 'to', 'from', 'but', 'or']
In [157]:
#using loops to update the corpus implementing the stopword list

newtokens_withstop = []

for token in cleaned_tokens: #remember cleaned_tokens is variable name from earlier version of corpus
    if token not in stopwords2: #our custom stopword list
        newtokens_withstop.append(token)
In [158]:
print(newtokens_withstop)
['', 'project', 'gutenberg', 'ebook', 'modest', 'proposal,', 'jonathan', 'swift', 'ebook', 'use', 'anyone', 'anywhere', 'cost', 'almost', 'restrictions', 'whatsoever.', 'may', 'copy', 'it,', 'give', 'away', 're-use', 'terms', 'project', 'gutenberg', 'license', 'included', 'ebook', 'online', 'www.gutenberg.org', 'title:', 'modest', 'proposal', 'preventing', 'children', 'poor', 'people', 'ireland,', 'burden', 'parents', 'country,', 'making', 'beneficial', 'publick', '-', '1729', 'author:', 'jonathan', 'swift', 'release', 'date:', 'july', '27,', '2008', '[ebook', '#1080]', 'last', 'updated:', 'october', '17,', '2019', 'language:', 'english', 'character', 'set', 'encoding:', 'utf-8', '***', 'start', 'project', 'gutenberg', 'ebook', 'modest', 'proposal', '***', 'produced', 'anonymous', 'volunteer,', 'david', 'widger', 'modest', 'proposal', 'preventing', 'children', 'poor', 'people', 'ireland,', 'burden', 'parents', 'country,', 'making', 'beneficial', 'publick.', 'dr.', 'jonathan', 'swift', '1729', 'melancholy', 'object', 'those,', 'walk', 'great', 'town,', 'travel', 'country,', 'see', 'streets,', 'roads,', 'cabbin-doors', 'crowded', 'beggars', 'female', 'sex,', 'followed', 'three,', 'four,', 'six', 'children,', 'rags,', 'importuning', 'every', 'passenger', 'alms.', 'mothers,', 'instead', 'able', 'work', 'honest', 'livelihood,', 'forced', 'employ', 'time', 'stroling', 'beg', 'sustenance', 'helpless', 'infants', 'who,', 'grow', 'up,', 'either', 'turn', 'thieves', 'want', 'work,', 'leave', 'dear', 'native', 'country,', 'fight', 'pretender', 'spain,', 'sell', 'barbadoes.', 'think', 'agreed', 'parties,', 'prodigious', 'number', 'children', 'arms,', 'backs,', 'heels', 'mothers,', 'frequently', 'fathers,', 'present', 'deplorable', 'state', 'kingdom,', 'great', 'additional', 'grievance;', 'therefore', 'whoever', 'could', 'find', 'fair,', 'cheap', 'easy', 'method', 'making', 'children', 'sound', 'useful', 'members', 'commonwealth,', 'would', 'deserve', 'well', 'publick,', 'statue', 'set', 'preserver', 'nation.', 'intention', 'far', 'confined', 'provide', 'children', 'professed', 'beggars:', 'much', 'greater', 'extent,', 'shall', 'take', 'whole', 'number', 'infants', 'certain', 'age,', 'born', 'parents', 'effect', 'little', 'able', 'support', 'them,', 'demand', 'charity', 'streets.', 'part,', 'turned', 'thoughts', 'many', 'years', 'upon', 'important', 'subject,', 'maturely', 'weighed', 'several', 'schemes', 'projectors,', 'always', 'found', 'grossly', 'mistaken', 'computation.', 'true,', 'child', 'dropt', 'dam,', 'may', 'supported', 'milk,', 'solar', 'year,', 'little', 'nourishment:', 'value', 'two', 'shillings,', 'mother', 'may', 'certainly', 'get,', 'value', 'scraps,', 'lawful', 'occupation', 'begging;', 'exactly', 'one', 'year', 'old', 'propose', 'provide', 'manner,', 'as,', 'instead', 'charge', 'upon', 'parents,', 'parish,', 'wanting', 'food', 'raiment', 'rest', 'lives,', 'shall,', 'contrary,', 'contribute', 'feeding,', 'partly', 'clothing', 'many', 'thousands.', 'likewise', 'another', 'great', 'advantage', 'scheme,', 'prevent', 'voluntary', 'abortions,', 'horrid', 'practice', 'women', 'murdering', 'bastard', 'children,', 'alas!', 'frequent', 'among', 'us,', 'sacrificing', 'poor', 'innocent', 'babes,', 'doubt,', 'avoid', 'expence', 'shame,', 'would', 'move', 'tears', 'pity', 'savage', 'inhuman', 'breast.', 'number', 'souls', 'kingdom', 'usually', 'reckoned', 'one', 'million', 'half,', 'calculate', 'may', 'two', 'hundred', 'thousand', 'couple,', 'whose', 'wives', 'breeders;', 'number', 'subtract', 'thirty', 'thousand', 'couple,', 'able', 'maintain', 'children,', '(although', 'apprehend', 'cannot', 'many', 'present', 'distresses', 'kingdom)', 'granted,', 'remain', 'hundred', 'seventy', 'thousand', 'breeders.', 'subtract', 'fifty', 'thousand,', 'women', 'miscarry,', 'whose', 'children', 'die', 'accident', 'disease', 'within', 'year.', 'remain', 'hundred', 'twenty', 'thousand', 'children', 'poor', 'parents', 'annually', 'born.', 'question', 'therefore', 'is,', 'number', 'shall', 'reared', 'provided', 'for?', 'which,', 'already', 'said,', 'present', 'situation', 'affairs,', 'utterly', 'impossible', 'methods', 'hitherto', 'proposed.', 'neither', 'employ', 'handicraft', 'agriculture;', 'neither', 'build', 'houses,', '(i', 'mean', 'country)', 'cultivate', 'land:', 'seldom', 'pick', 'livelihood', 'stealing', 'till', 'arrive', 'six', 'years', 'old;', 'except', 'towardly', 'parts,', 'although', 'confess', 'learn', 'rudiments', 'much', 'earlier;', 'time', 'however', 'properly', 'looked', 'upon', 'probationers;', 'informed', 'principal', 'gentleman', 'county', 'cavan,', 'protested', 'me,', 'never', 'knew', 'one', 'two', 'instances', 'age', 'six,', 'even', 'part', 'kingdom', 'renowned', 'quickest', 'proficiency', 'art.', 'assured', 'merchants,', 'boy', 'girl,', 'twelve', 'years', 'old,', 'saleable', 'commodity,', 'even', 'come', 'age,', 'yield', 'three', 'pounds,', 'three', 'pounds', 'half', 'crown', 'most,', 'exchange;', 'cannot', 'turn', 'account', 'either', 'parents', 'kingdom,', 'charge', 'nutriments', 'rags', 'least', 'four', 'times', 'value.', 'shall', 'therefore', 'humbly', 'propose', 'thoughts,', 'hope', 'liable', 'least', 'objection.', 'assured', 'knowing', 'american', 'acquaintance', 'london,', 'young', 'healthy', 'child', 'well', 'nursed,', 'is,', 'year', 'old,', 'delicious', 'nourishing', 'wholesome', 'food,', 'whether', 'stewed,', 'roasted,', 'baked,', 'boiled;', 'make', 'doubt', 'equally', 'serve', 'fricasee,', 'ragoust.', 'therefore', 'humbly', 'offer', 'publick', 'consideration,', 'hundred', 'twenty', 'thousand', 'children,', 'already', 'computed,', 'twenty', 'thousand', 'may', 'reserved', 'breed,', 'whereof', 'one', 'fourth', 'part', 'males;', 'allow', 'sheep,', 'black', 'cattle,', 'swine,', 'reason', 'is,', 'children', 'seldom', 'fruits', 'marriage,', 'circumstance', 'much', 'regarded', 'savages,', 'therefore,', 'one', 'male', 'sufficient', 'serve', 'four', 'females.', 'remaining', 'hundred', 'thousand', 'may,', 'year', 'old,', 'offered', 'sale', 'persons', 'quality', 'fortune,', 'kingdom,', 'always', 'advising', 'mother', 'let', 'suck', 'plentifully', 'last', 'month,', 'render', 'plump,', 'fat', 'good', 'table.', 'child', 'make', 'two', 'dishes', 'entertainment', 'friends,', 'family', 'dines', 'alone,', 'fore', 'hind', 'quarter', 'make', 'reasonable', 'dish,', 'seasoned', 'little', 'pepper', 'salt,', 'good', 'boiled', 'fourth', 'day,', 'especially', 'winter.', 'reckoned', 'upon', 'medium,', 'child', 'born', 'weigh', '12', 'pounds,', 'solar', 'year,', 'tolerably', 'nursed,', 'encreaseth', '28', 'pounds.', 'grant', 'food', 'somewhat', 'dear,', 'therefore', 'proper', 'landlords,', 'who,', 'already', 'devoured', 'parents,', 'seem', 'best', 'title', 'children.', 'infant’s', 'flesh', 'season', 'throughout', 'year,', 'plentiful', 'march,', 'little', 'after;', 'told', 'grave', 'author,', 'eminent', 'french', 'physician,', 'fish', 'prolifick', 'dyet,', 'children', 'born', 'roman', 'catholick', 'countries', 'nine', 'months', 'lent,', 'season;', 'therefore,', 'reckoning', 'year', 'lent,', 'markets', 'glutted', 'usual,', 'number', 'popish', 'infants,', 'least', 'three', 'one', 'kingdom,', 'therefore', 'one', 'collateral', 'advantage,', 'lessening', 'number', 'papists', 'among', 'us.', 'already', 'computed', 'charge', 'nursing', 'beggar’s', 'child', '(in', 'list', 'reckon', 'cottagers,', 'labourers,', 'four-fifths', 'farmers)', 'two', 'shillings', 'per', 'annum,', 'rags', 'included;', 'believe', 'gentleman', 'would', 'repine', 'give', 'ten', 'shillings', 'carcass', 'good', 'fat', 'child,', 'which,', 'said,', 'make', 'four', 'dishes', 'excellent', 'nutritive', 'meat,', 'hath', 'particular', 'friend,', 'family', 'dine', 'him.', 'thus', 'squire', 'learn', 'good', 'landlord,', 'grow', 'popular', 'among', 'tenants,', 'mother', 'eight', 'shillings', 'neat', 'profit,', 'fit', 'work', 'till', 'produces', 'another', 'child.', 'thrifty', '(as', 'must', 'confess', 'times', 'require)', 'may', 'flay', 'carcass;', 'skin', 'which,', 'artificially', 'dressed,', 'make', 'admirable', 'gloves', 'ladies,', 'summer', 'boots', 'fine', 'gentlemen.', 'city', 'dublin,', 'shambles', 'may', 'appointed', 'purpose,', 'convenient', 'parts', 'it,', 'butchers', 'may', 'assured', 'wanting;', 'although', 'rather', 'recommend', 'buying', 'children', 'alive,', 'dressing', 'hot', 'knife,', 'roasting', 'pigs.', 'worthy', 'person,', 'true', 'lover', 'country,', 'whose', 'virtues', 'highly', 'esteem,', 'lately', 'pleased', 'discoursing', 'matter,', 'offer', 'refinement', 'upon', 'scheme.', 'said,', 'many', 'gentlemen', 'kingdom,', 'late', 'destroyed', 'deer,', 'conceived', 'want', 'venison', 'might', 'well', 'supplied', 'bodies', 'young', 'lads', 'maidens,', 'exceeding', 'fourteen', 'years', 'age,', 'twelve;', 'great', 'number', 'sexes', 'every', 'county', 'ready', 'starve', 'want', 'work', 'service:', 'disposed', 'parents', 'alive,', 'otherwise', 'nearest', 'relations.', 'due', 'deference', 'excellent', 'friend,', 'deserving', 'patriot,', 'cannot', 'altogether', 'sentiments;', 'males,', 'american', 'acquaintance', 'assured', 'frequent', 'experience,', 'flesh', 'generally', 'tough', 'lean,', 'like', 'schoolboys,', 'continual', 'exercise,', 'taste', 'disagreeable,', 'fatten', 'would', 'answer', 'charge.', 'females,', 'would,', 'think,', 'humble', 'submission,', 'loss', 'publick,', 'soon', 'would', 'become', 'breeders', 'themselves:', 'besides,', 'improbable', 'scrupulous', 'people', 'might', 'apt', 'censure', 'practice,', '(although', 'indeed', 'unjustly)', 'little', 'bordering', 'upon', 'cruelty,', 'which,', 'confess,', 'hath', 'always', 'strongest', 'objection', 'project,', 'well', 'soever', 'intended.', 'order', 'justify', 'friend,', 'confessed,', 'expedient', 'put', 'head', 'famous', 'psalmanaazor,', 'native', 'island', 'formosa,', 'came', 'thence', 'london,', 'twenty', 'years', 'ago,', 'conversation', 'told', 'friend,', 'country,', 'young', 'person', 'happened', 'put', 'death,', 'executioner', 'sold', 'carcass', 'persons', 'quality,', 'prime', 'dainty;', 'that,', 'time,', 'body', 'plump', 'girl', 'fifteen,', 'crucified', 'attempt', 'poison', 'emperor,', 'sold', 'imperial', 'majesty’s', 'prime', 'minister', 'state,', 'great', 'mandarins', 'court', 'joints', 'gibbet,', 'four', 'hundred', 'crowns.', 'neither', 'indeed', 'deny,', 'use', 'made', 'several', 'plump', 'young', 'girls', 'town,', 'without', 'one', 'single', 'groat', 'fortunes,', 'cannot', 'stir', 'abroad', 'without', 'chair,', 'appear', 'playhouse', 'assemblies', 'foreign', 'fineries', 'never', 'pay', 'for,', 'kingdom', 'would', 'worse.', 'persons', 'desponding', 'spirit', 'great', 'concern', 'vast', 'number', 'poor', 'people,', 'aged,', 'diseased,', 'maimed;', 'desired', 'employ', 'thoughts', 'course', 'may', 'taken,', 'ease', 'nation', 'grievous', 'incumbrance.', 'least', 'pain', 'upon', 'matter,', 'well', 'known,', 'every', 'day', 'dying,', 'rotting,', 'cold', 'famine,', 'filth,', 'vermin,', 'fast', 'reasonably', 'expected.', 'young', 'labourers,', 'almost', 'hopeful', 'condition.', 'cannot', 'get', 'work,', 'consequently', 'pine', 'away', 'want', 'nourishment,', 'degree,', 'time', 'accidentally', 'hired', 'common', 'labour,', 'strength', 'perform', 'it,', 'thus', 'country', 'happily', 'delivered', 'evils', 'come.', 'long', 'digressed,', 'therefore', 'shall', 'return', 'subject.', 'think', 'advantages', 'proposal', 'made', 'obvious', 'many,', 'well', 'highest', 'importance.', 'first,', 'already', 'observed,', 'would', 'greatly', 'lessen', 'number', 'papists,', 'yearly', 'overrun,', 'principal', 'breeders', 'nation,', 'well', 'dangerous', 'enemies,', 'stay', 'home', 'purpose', 'design', 'deliver', 'kingdom', 'pretender,', 'hoping', 'take', 'advantage', 'absence', 'many', 'good', 'protestants,', 'chosen', 'rather', 'leave', 'country,', 'stay', 'home', 'pay', 'tithes', 'conscience', 'episcopal', 'curate.', 'secondly,', 'poorer', 'tenants', 'something', 'valuable', 'own,', 'law', 'may', 'made', 'liable', 'distress,', 'help', 'pay', 'landlord’s', 'rent,', 'corn', 'cattle', 'already', 'seized,', 'money', 'thing', 'unknown.', 'thirdly,', 'whereas', 'maintainance', 'hundred', 'thousand', 'children,', 'two', 'years', 'old,', 'upwards,', 'cannot', 'computed', 'less', 'ten', 'shillings', 'piece', 'per', 'annum,', 'nation’s', 'stock', 'thereby', 'encreased', 'fifty', 'thousand', 'pounds', 'per', 'annum,', 'besides', 'profit', 'new', 'dish,', 'introduced', 'tables', 'gentlemen', 'fortune', 'kingdom,', 'refinement', 'taste.', 'money', 'circulate', 'among', 'selves,', 'goods', 'entirely', 'growth', 'manufacture.', 'fourthly,', 'constant', 'breeders,', 'besides', 'gain', 'eight', 'shillings', 'sterling', 'per', 'annum', 'sale', 'children,', 'rid', 'charge', 'maintaining', 'first', 'year.', 'fifthly,', 'food', 'would', 'likewise', 'bring', 'great', 'custom', 'taverns,', 'vintners', 'certainly', 'prudent', 'procure', 'best', 'receipts', 'dressing', 'perfection;', 'consequently', 'houses', 'frequented', 'fine', 'gentlemen,', 'justly', 'value', 'upon', 'knowledge', 'good', 'eating;', 'skilful', 'cook,', 'understands', 'oblige', 'guests,', 'contrive', 'make', 'expensive', 'please.', 'sixthly,', 'would', 'great', 'inducement', 'marriage,', 'wise', 'nations', 'either', 'encouraged', 'rewards,', 'enforced', 'laws', 'penalties.', 'would', 'encrease', 'care', 'tenderness', 'mothers', 'towards', 'children,', 'sure', 'settlement', 'life', 'poor', 'babes,', 'provided', 'sort', 'publick,', 'annual', 'profit', 'instead', 'expence.', 'soon', 'see', 'honest', 'emulation', 'among', 'married', 'women,', 'could', 'bring', 'fattest', 'child', 'market.', 'men', 'would', 'become', 'fond', 'wives,', 'time', 'pregnancy,', 'mares', 'foal,', 'cows', 'calf,', 'sows', 'ready', 'farrow;', 'offer', 'beat', 'kick', '(as', 'frequent', 'practice)', 'fear', 'miscarriage.', 'many', 'advantages', 'might', 'enumerated.', 'instance,', 'addition', 'thousand', 'carcasses', 'exportation', 'barrel’d', 'beef:', 'propagation', 'swine’s', 'flesh,', 'improvement', 'art', 'making', 'good', 'bacon,', 'much', 'wanted', 'among', 'us', 'great', 'destruction', 'pigs,', 'frequent', 'tables;', 'way', 'comparable', 'taste', 'magnificence', 'well', 'grown,', 'fat', 'yearling', 'child,', 'roasted', 'whole', 'make', 'considerable', 'figure', 'lord', 'mayor’s', 'feast,', 'publick', 'entertainment.', 'this,', 'many', 'others,', 'omit,', 'studious', 'brevity.', 'supposing', 'one', 'thousand', 'families', 'city,', 'would', 'constant', 'customers', 'infants', 'flesh,', 'besides', 'others', 'might', 'merry', 'meetings,', 'particularly', 'weddings', 'christenings,', 'compute', 'dublin', 'would', 'take', 'annually', 'twenty', 'thousand', 'carcasses;', 'rest', 'kingdom', '(where', 'probably', 'sold', 'somewhat', 'cheaper)', 'remaining', 'eighty', 'thousand.', 'think', 'one', 'objection,', 'possibly', 'raised', 'proposal,', 'unless', 'urged,', 'number', 'people', 'thereby', 'much', 'lessened', 'kingdom.', 'freely', 'own,', 'indeed', 'one', 'principal', 'design', 'offering', 'world.', 'desire', 'reader', 'observe,', 'calculate', 'remedy', 'one', 'individual', 'kingdom', 'ireland,', 'ever', 'was,', 'is,', 'or,', 'think,', 'ever', 'upon', 'earth.', 'therefore', 'let', 'man', 'talk', 'expedients:', 'taxing', 'absentees', 'five', 'shillings', 'pound:', 'using', 'neither', 'clothes,', 'houshold', 'furniture,', 'except', 'growth', 'manufacture:', 'utterly', 'rejecting', 'materials', 'instruments', 'promote', 'foreign', 'luxury:', 'curing', 'expensiveness', 'pride,', 'vanity,', 'idleness,', 'gaming', 'women:', 'introducing', 'vein', 'parsimony,', 'prudence', 'temperance:', 'learning', 'love', 'country,', 'wherein', 'differ', 'even', 'laplanders,', 'inhabitants', 'topinamboo:', 'quitting', 'animosities', 'factions,', 'acting', 'longer', 'like', 'jews,', 'murdering', 'one', 'another', 'moment', 'city', 'taken:', 'little', 'cautious', 'sell', 'country', 'consciences', 'nothing:', 'teaching', 'landlords', 'least', 'one', 'degree', 'mercy', 'towards', 'tenants.', 'lastly,', 'putting', 'spirit', 'honesty,', 'industry,', 'skill', 'shopkeepers,', 'who,', 'resolution', 'could', 'taken', 'buy', 'native', 'goods,', 'would', 'immediately', 'unite', 'cheat', 'exact', 'upon', 'us', 'price,', 'measure,', 'goodness,', 'could', 'ever', 'yet', 'brought', 'make', 'one', 'fair', 'proposal', 'dealing,', 'though', 'often', 'earnestly', 'invited', 'it.', 'therefore', 'repeat,', 'let', 'man', 'talk', 'like', 'expedients,', 'till', 'hath', 'least', 'glympse', 'hope,', 'ever', 'hearty', 'sincere', 'attempt', 'put', 'practice.', 'but,', 'myself,', 'wearied', 'many', 'years', 'offering', 'vain,', 'idle,', 'visionary', 'thoughts,', 'length', 'utterly', 'despairing', 'success,', 'fortunately', 'fell', 'upon', 'proposal,', 'which,', 'wholly', 'new,', 'hath', 'something', 'solid', 'real,', 'expence', 'little', 'trouble,', 'full', 'power,', 'whereby', 'incur', 'danger', 'disobliging', 'england.', 'kind', 'commodity', 'bear', 'exportation,', 'flesh', 'tender', 'consistence,', 'admit', 'long', 'continuance', 'salt,', 'although', 'perhaps', 'could', 'name', 'country,', 'would', 'glad', 'eat', 'whole', 'nation', 'without', 'it.', 'all,', 'violently', 'bent', 'upon', 'opinion,', 'reject', 'offer,', 'proposed', 'wise', 'men,', 'shall', 'found', 'equally', 'innocent,', 'cheap,', 'easy,', 'effectual.', 'something', 'kind', 'shall', 'advanced', 'contradiction', 'scheme,', 'offering', 'better,', 'desire', 'author', 'authors', 'pleased', 'maturely', 'consider', 'two', 'points.', 'first,', 'things', 'stand,', 'able', 'find', 'food', 'raiment', 'hundred', 'thousand', 'useless', 'mouths', 'backs.', 'secondly,', 'round', 'million', 'creatures', 'humane', 'figure', 'throughout', 'kingdom,', 'whose', 'whole', 'subsistence', 'put', 'common', 'stock,', 'would', 'leave', 'debt', 'two', 'million', 'pounds', 'sterling,', 'adding', 'beggars', 'profession,', 'bulk', 'farmers,', 'cottagers', 'labourers,', 'wives', 'children,', 'beggars', 'effect;', 'desire', 'politicians', 'dislike', 'overture,', 'may', 'perhaps', 'bold', 'attempt', 'answer,', 'first', 'ask', 'parents', 'mortals,', 'whether', 'would', 'day', 'think', 'great', 'happiness', 'sold', 'food', 'year', 'old,', 'manner', 'prescribe,', 'thereby', 'avoided', 'perpetual', 'scene', 'misfortunes,', 'since', 'gone', 'through,', 'oppression', 'landlords,', 'impossibility', 'paying', 'rent', 'without', 'money', 'trade,', 'want', 'common', 'sustenance,', 'neither', 'house', 'clothes', 'cover', 'inclemencies', 'weather,', 'inevitable', 'prospect', 'intailing', 'like,', 'greater', 'miseries,', 'upon', 'breed', 'ever.', 'profess', 'sincerity', 'heart,', 'least', 'personal', 'interest', 'endeavouring', 'promote', 'necessary', 'work,', 'motive', 'publick', 'good', 'country,', 'advancing', 'trade,', 'providing', 'infants,', 'relieving', 'poor,', 'giving', 'pleasure', 'rich.', 'children,', 'propose', 'get', 'single', 'penny;', 'youngest', 'nine', 'years', 'old,', 'wife', 'past', 'child-bearing.', 'end', 'project', 'gutenberg', 'ebook', 'modest', 'proposal,', 'jonathan', 'swift', '***', 'end', 'project', 'gutenberg', 'ebook', 'modest', 'proposal', '***', '*****', 'file', 'named', '1080-0.txt', '1080-0.zip', '*****', 'associated', 'files', 'various', 'formats', 'found', 'in:', 'http://www.gutenberg.org/1/0/8/1080/', 'produced', 'anonymous', 'volunteer,', 'david', 'widger', 'updated', 'editions', 'replace', 'previous', 'one--the', 'old', 'editions', 'renamed.', 'creating', 'works', 'print', 'editions', 'protected', 'u.s.', 'copyright', 'law', 'means', 'one', 'owns', 'united', 'states', 'copyright', 'works,', 'foundation', '(and', 'you!)', 'copy', 'distribute', 'united', 'states', 'without', 'permission', 'without', 'paying', 'copyright', 'royalties.', 'special', 'rules,', 'set', 'forth', 'general', 'terms', 'use', 'part', 'license,', 'apply', 'copying', 'distributing', 'project', 'gutenberg-tm', 'electronic', 'works', 'protect', 'project', 'gutenberg-tm', 'concept', 'trademark.', 'project', 'gutenberg', 'registered', 'trademark,', 'may', 'used', 'charge', 'ebooks,', 'unless', 'receive', 'specific', 'permission.', 'charge', 'anything', 'copies', 'ebook,', 'complying', 'rules', 'easy.', 'may', 'use', 'ebook', 'nearly', 'purpose', 'creation', 'derivative', 'works,', 'reports,', 'performances', 'research.', 'may', 'modified', 'printed', 'given', 'away--you', 'may', 'practically', 'anything', 'united', 'states', 'ebooks', 'protected', 'u.s.', 'copyright', 'law.', 'redistribution', 'subject', 'trademark', 'license,', 'especially', 'commercial', 'redistribution.', 'start:', 'full', 'license', 'full', 'project', 'gutenberg', 'license', 'please', 'read', 'distribute', 'use', 'work', 'protect', 'project', 'gutenberg-tm', 'mission', 'promoting', 'free', 'distribution', 'electronic', 'works,', 'using', 'distributing', 'work', '(or', 'work', 'associated', 'way', 'phrase', '"project', 'gutenberg"),', 'agree', 'comply', 'terms', 'full', 'project', 'gutenberg-tm', 'license', 'available', 'file', 'online', 'www.gutenberg.org/license.', 'section', '1.', 'general', 'terms', 'use', 'redistributing', 'project', 'gutenberg-tm', 'electronic', 'works', '1.a.', 'reading', 'using', 'part', 'project', 'gutenberg-tm', 'electronic', 'work,', 'indicate', 'read,', 'understand,', 'agree', 'accept', 'terms', 'license', 'intellectual', 'property', '(trademark/copyright)', 'agreement.', 'agree', 'abide', 'terms', 'agreement,', 'must', 'cease', 'using', 'return', 'destroy', 'copies', 'project', 'gutenberg-tm', 'electronic', 'works', 'possession.', 'paid', 'fee', 'obtaining', 'copy', 'access', 'project', 'gutenberg-tm', 'electronic', 'work', 'agree', 'bound', 'terms', 'agreement,', 'may', 'obtain', 'refund', 'person', 'entity', 'paid', 'fee', 'set', 'forth', 'paragraph', '1.e.8.', '1.b.', '"project', 'gutenberg"', 'registered', 'trademark.', 'may', 'used', 'associated', 'way', 'electronic', 'work', 'people', 'agree', 'bound', 'terms', 'agreement.', 'things', 'project', 'gutenberg-tm', 'electronic', 'works', 'even', 'without', 'complying', 'full', 'terms', 'agreement.', 'see', 'paragraph', '1.c', 'below.', 'lot', 'things', 'project', 'gutenberg-tm', 'electronic', 'works', 'follow', 'terms', 'agreement', 'help', 'preserve', 'free', 'future', 'access', 'project', 'gutenberg-tm', 'electronic', 'works.', 'see', 'paragraph', '1.e', 'below.', '1.c.', 'project', 'gutenberg', 'literary', 'archive', 'foundation', '("the', 'foundation"', 'pglaf),', 'owns', 'compilation', 'copyright', 'collection', 'project', 'gutenberg-tm', 'electronic', 'works.', 'nearly', 'individual', 'works', 'collection', 'public', 'domain', 'united', 'states.', 'individual', 'work', 'unprotected', 'copyright', 'law', 'united', 'states', 'located', 'united', 'states,', 'claim', 'right', 'prevent', 'copying,', 'distributing,', 'performing,', 'displaying', 'creating', 'derivative', 'works', 'based', 'work', 'long', 'references', 'project', 'gutenberg', 'removed.', 'course,', 'hope', 'support', 'project', 'gutenberg-tm', 'mission', 'promoting', 'free', 'access', 'electronic', 'works', 'freely', 'sharing', 'project', 'gutenberg-tm', 'works', 'compliance', 'terms', 'agreement', 'keeping', 'project', 'gutenberg-tm', 'name', 'associated', 'work.', 'easily', 'comply', 'terms', 'agreement', 'keeping', 'work', 'format', 'attached', 'full', 'project', 'gutenberg-tm', 'license', 'share', 'without', 'charge', 'others.', '1.d.', 'copyright', 'laws', 'place', 'located', 'also', 'govern', 'work.', 'copyright', 'laws', 'countries', 'constant', 'state', 'change.', 'outside', 'united', 'states,', 'check', 'laws', 'country', 'addition', 'terms', 'agreement', 'downloading,', 'copying,', 'displaying,', 'performing,', 'distributing', 'creating', 'derivative', 'works', 'based', 'work', 'project', 'gutenberg-tm', 'work.', 'foundation', 'makes', 'representations', 'concerning', 'copyright', 'status', 'work', 'country', 'outside', 'united', 'states.', '1.e.', 'unless', 'removed', 'references', 'project', 'gutenberg:', '1.e.1.', 'following', 'sentence,', 'active', 'links', 'to,', 'immediate', 'access', 'to,', 'full', 'project', 'gutenberg-tm', 'license', 'must', 'appear', 'prominently', 'whenever', 'copy', 'project', 'gutenberg-tm', 'work', '(any', 'work', 'phrase', '"project', 'gutenberg"', 'appears,', 'phrase', '"project', 'gutenberg"', 'associated)', 'accessed,', 'displayed,', 'performed,', 'viewed,', 'copied', 'distributed:', 'ebook', 'use', 'anyone', 'anywhere', 'united', 'states', 'parts', 'world', 'cost', 'almost', 'restrictions', 'whatsoever.', 'may', 'copy', 'it,', 'give', 'away', 're-use', 'terms', 'project', 'gutenberg', 'license', 'included', 'ebook', 'online', 'www.gutenberg.org.', 'located', 'united', 'states,', 'check', 'laws', 'country', 'located', 'using', 'ebook.', '1.e.2.', 'individual', 'project', 'gutenberg-tm', 'electronic', 'work', 'derived', 'texts', 'protected', 'u.s.', 'copyright', 'law', '(does', 'contain', 'notice', 'indicating', 'posted', 'permission', 'copyright', 'holder),', 'work', 'copied', 'distributed', 'anyone', 'united', 'states', 'without', 'paying', 'fees', 'charges.', 'redistributing', 'providing', 'access', 'work', 'phrase', '"project', 'gutenberg"', 'associated', 'appearing', 'work,', 'must', 'comply', 'either', 'requirements', 'paragraphs', '1.e.1', '1.e.7', 'obtain', 'permission', 'use', 'work', 'project', 'gutenberg-tm', 'trademark', 'set', 'forth', 'paragraphs', '1.e.8', '1.e.9.', '1.e.3.', 'individual', 'project', 'gutenberg-tm', 'electronic', 'work', 'posted', 'permission', 'copyright', 'holder,', 'use', 'distribution', 'must', 'comply', 'paragraphs', '1.e.1', '1.e.7', 'additional', 'terms', 'imposed', 'copyright', 'holder.', 'additional', 'terms', 'linked', 'project', 'gutenberg-tm', 'license', 'works', 'posted', 'permission', 'copyright', 'holder', 'found', 'beginning', 'work.', '1.e.4.', 'unlink', 'detach', 'remove', 'full', 'project', 'gutenberg-tm', 'license', 'terms', 'work,', 'files', 'containing', 'part', 'work', 'work', 'associated', 'project', 'gutenberg-tm.', '1.e.5.', 'copy,', 'display,', 'perform,', 'distribute', 'redistribute', 'electronic', 'work,', 'part', 'electronic', 'work,', 'without', 'prominently', 'displaying', 'sentence', 'set', 'forth', 'paragraph', '1.e.1', 'active', 'links', 'immediate', 'access', 'full', 'terms', 'project', 'gutenberg-tm', 'license.', '1.e.6.', 'may', 'convert', 'distribute', 'work', 'binary,', 'compressed,', 'marked', 'up,', 'nonproprietary', 'proprietary', 'form,', 'including', 'word', 'processing', 'hypertext', 'form.', 'however,', 'provide', 'access', 'distribute', 'copies', 'project', 'gutenberg-tm', 'work', 'format', '"plain', 'vanilla', 'ascii"', 'format', 'used', 'official', 'version', 'posted', 'official', 'project', 'gutenberg-tm', 'web', 'site', '(www.gutenberg.org),', 'must,', 'additional', 'cost,', 'fee', 'expense', 'user,', 'provide', 'copy,', 'means', 'exporting', 'copy,', 'means', 'obtaining', 'copy', 'upon', 'request,', 'work', 'original', '"plain', 'vanilla', 'ascii"', 'form.', 'alternate', 'format', 'must', 'include', 'full', 'project', 'gutenberg-tm', 'license', 'specified', 'paragraph', '1.e.1.', '1.e.7.', 'charge', 'fee', 'access', 'to,', 'viewing,', 'displaying,', 'performing,', 'copying', 'distributing', 'project', 'gutenberg-tm', 'works', 'unless', 'comply', 'paragraph', '1.e.8', '1.e.9.', '1.e.8.', 'may', 'charge', 'reasonable', 'fee', 'copies', 'providing', 'access', 'distributing', 'project', 'gutenberg-tm', 'electronic', 'works', 'provided', '*', 'pay', 'royalty', 'fee', '20%', 'gross', 'profits', 'derive', 'use', 'project', 'gutenberg-tm', 'works', 'calculated', 'using', 'method', 'already', 'use', 'calculate', 'applicable', 'taxes.', 'fee', 'owed', 'owner', 'project', 'gutenberg-tm', 'trademark,', 'agreed', 'donate', 'royalties', 'paragraph', 'project', 'gutenberg', 'literary', 'archive', 'foundation.', 'royalty', 'payments', 'must', 'paid', 'within', '60', 'days', 'following', 'date', 'prepare', '(or', 'legally', 'required', 'prepare)', 'periodic', 'tax', 'returns.', 'royalty', 'payments', 'clearly', 'marked', 'sent', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'address', 'specified', 'section', '4,', '"information', 'donations', 'project', 'gutenberg', 'literary', 'archive', 'foundation."', '*', 'provide', 'full', 'refund', 'money', 'paid', 'user', 'notifies', 'writing', '(or', 'e-mail)', 'within', '30', 'days', 'receipt', 's/he', 'agree', 'terms', 'full', 'project', 'gutenberg-tm', 'license.', 'must', 'require', 'user', 'return', 'destroy', 'copies', 'works', 'possessed', 'physical', 'medium', 'discontinue', 'use', 'access', 'copies', 'project', 'gutenberg-tm', 'works.', '*', 'provide,', 'accordance', 'paragraph', '1.f.3,', 'full', 'refund', 'money', 'paid', 'work', 'replacement', 'copy,', 'defect', 'electronic', 'work', 'discovered', 'reported', 'within', '90', 'days', 'receipt', 'work.', '*', 'comply', 'terms', 'agreement', 'free', 'distribution', 'project', 'gutenberg-tm', 'works.', '1.e.9.', 'wish', 'charge', 'fee', 'distribute', 'project', 'gutenberg-tm', 'electronic', 'work', 'group', 'works', 'different', 'terms', 'set', 'forth', 'agreement,', 'must', 'obtain', 'permission', 'writing', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'project', 'gutenberg', 'trademark', 'llc,', 'owner', 'project', 'gutenberg-tm', 'trademark.', 'contact', 'foundation', 'set', 'forth', 'section', '3', 'below.', '1.f.', '1.f.1.', 'project', 'gutenberg', 'volunteers', 'employees', 'expend', 'considerable', 'effort', 'identify,', 'copyright', 'research', 'on,', 'transcribe', 'proofread', 'works', 'protected', 'u.s.', 'copyright', 'law', 'creating', 'project', 'gutenberg-tm', 'collection.', 'despite', 'efforts,', 'project', 'gutenberg-tm', 'electronic', 'works,', 'medium', 'may', 'stored,', 'may', 'contain', '"defects,"', 'as,', 'limited', 'to,', 'incomplete,', 'inaccurate', 'corrupt', 'data,', 'transcription', 'errors,', 'copyright', 'intellectual', 'property', 'infringement,', 'defective', 'damaged', 'disk', 'medium,', 'computer', 'virus,', 'computer', 'codes', 'damage', 'cannot', 'read', 'equipment.', '1.f.2.', 'limited', 'warranty,', 'disclaimer', 'damages', '-', 'except', '"right', 'replacement', 'refund"', 'described', 'paragraph', '1.f.3,', 'project', 'gutenberg', 'literary', 'archive', 'foundation,', 'owner', 'project', 'gutenberg-tm', 'trademark,', 'party', 'distributing', 'project', 'gutenberg-tm', 'electronic', 'work', 'agreement,', 'disclaim', 'liability', 'damages,', 'costs', 'expenses,', 'including', 'legal', 'fees.', 'agree', 'remedies', 'negligence,', 'strict', 'liability,', 'breach', 'warranty', 'breach', 'contract', 'except', 'provided', 'paragraph', '1.f.3.', 'agree', 'foundation,', 'trademark', 'owner,', 'distributor', 'agreement', 'liable', 'actual,', 'direct,', 'indirect,', 'consequential,', 'punitive', 'incidental', 'damages', 'even', 'give', 'notice', 'possibility', 'damage.', '1.f.3.', 'limited', 'right', 'replacement', 'refund', '-', 'discover', 'defect', 'electronic', 'work', 'within', '90', 'days', 'receiving', 'it,', 'receive', 'refund', 'money', '(if', 'any)', 'paid', 'sending', 'written', 'explanation', 'person', 'received', 'work', 'from.', 'received', 'work', 'physical', 'medium,', 'must', 'return', 'medium', 'written', 'explanation.', 'person', 'entity', 'provided', 'defective', 'work', 'may', 'elect', 'provide', 'replacement', 'copy', 'lieu', 'refund.', 'received', 'work', 'electronically,', 'person', 'entity', 'providing', 'may', 'choose', 'give', 'second', 'opportunity', 'receive', 'work', 'electronically', 'lieu', 'refund.', 'second', 'copy', 'also', 'defective,', 'may', 'demand', 'refund', 'writing', 'without', 'opportunities', 'fix', 'problem.', '1.f.4.', 'except', 'limited', 'right', 'replacement', 'refund', 'set', 'forth', 'paragraph', '1.f.3,', 'work', 'provided', "'as-is',", 'warranties', 'kind,', 'express', 'implied,', 'including', 'limited', 'warranties', 'merchantability', 'fitness', 'purpose.', '1.f.5.', 'states', 'allow', 'disclaimers', 'certain', 'implied', 'warranties', 'exclusion', 'limitation', 'certain', 'types', 'damages.', 'disclaimer', 'limitation', 'set', 'forth', 'agreement', 'violates', 'law', 'state', 'applicable', 'agreement,', 'agreement', 'shall', 'interpreted', 'make', 'maximum', 'disclaimer', 'limitation', 'permitted', 'applicable', 'state', 'law.', 'invalidity', 'unenforceability', 'provision', 'agreement', 'shall', 'void', 'remaining', 'provisions.', '1.f.6.', 'indemnity', '-', 'agree', 'indemnify', 'hold', 'foundation,', 'trademark', 'owner,', 'agent', 'employee', 'foundation,', 'anyone', 'providing', 'copies', 'project', 'gutenberg-tm', 'electronic', 'works', 'accordance', 'agreement,', 'volunteers', 'associated', 'production,', 'promotion', 'distribution', 'project', 'gutenberg-tm', 'electronic', 'works,', 'harmless', 'liability,', 'costs', 'expenses,', 'including', 'legal', 'fees,', 'arise', 'directly', 'indirectly', 'following', 'cause', 'occur:', '(a)', 'distribution', 'project', 'gutenberg-tm', 'work,', '(b)', 'alteration,', 'modification,', 'additions', 'deletions', 'project', 'gutenberg-tm', 'work,', '(c)', 'defect', 'cause.', 'section', '2.', 'information', 'mission', 'project', 'gutenberg-tm', 'project', 'gutenberg-tm', 'synonymous', 'free', 'distribution', 'electronic', 'works', 'formats', 'readable', 'widest', 'variety', 'computers', 'including', 'obsolete,', 'old,', 'middle-aged', 'new', 'computers.', 'exists', 'efforts', 'hundreds', 'volunteers', 'donations', 'people', 'walks', 'life.', 'volunteers', 'financial', 'support', 'provide', 'volunteers', 'assistance', 'need', 'critical', 'reaching', 'project', "gutenberg-tm's", 'goals', 'ensuring', 'project', 'gutenberg-tm', 'collection', 'remain', 'freely', 'available', 'generations', 'come.', '2001,', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'created', 'provide', 'secure', 'permanent', 'future', 'project', 'gutenberg-tm', 'future', 'generations.', 'learn', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'efforts', 'donations', 'help,', 'see', 'sections', '3', '4', 'foundation', 'information', 'page', 'www.gutenberg.org', 'section', '3.', 'information', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'non', 'profit', '501(c)(3)', 'educational', 'corporation', 'organized', 'laws', 'state', 'mississippi', 'granted', 'tax', 'exempt', 'status', 'internal', 'revenue', 'service.', "foundation's", 'ein', 'federal', 'tax', 'identification', 'number', '64-6221541.', 'contributions', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'tax', 'deductible', 'full', 'extent', 'permitted', 'u.s.', 'federal', 'laws', "state's", 'laws.', "foundation's", 'principal', 'office', 'fairbanks,', 'alaska,', 'mailing', 'address:', 'po', 'box', '750175,', 'fairbanks,', 'ak', '99775,', 'volunteers', 'employees', 'scattered', 'throughout', 'numerous', 'locations.', 'business', 'office', 'located', '809', 'north', '1500', 'west,', 'salt', 'lake', 'city,', 'ut', '84116,', '(801)', '596-1887.', 'email', 'contact', 'links', 'date', 'contact', 'information', 'found', "foundation's", 'web', 'site', 'official', 'page', 'www.gutenberg.org/contact', 'additional', 'contact', 'information:', 'dr.', 'gregory', 'b.', 'newby', 'chief', 'executive', 'director', 'gbnewby@pglaf.org', 'section', '4.', 'information', 'donations', 'project', 'gutenberg', 'literary', 'archive', 'foundation', 'project', 'gutenberg-tm', 'depends', 'upon', 'cannot', 'survive', 'without', 'wide', 'spread', 'public', 'support', 'donations', 'carry', 'mission', 'increasing', 'number', 'public', 'domain', 'licensed', 'works', 'freely', 'distributed', 'machine', 'readable', 'form', 'accessible', 'widest', 'array', 'equipment', 'including', 'outdated', 'equipment.', 'many', 'small', 'donations', '($1', '$5,000)', 'particularly', 'important', 'maintaining', 'tax', 'exempt', 'status', 'irs.', 'foundation', 'committed', 'complying', 'laws', 'regulating', 'charities', 'charitable', 'donations', '50', 'states', 'united', 'states.', 'compliance', 'requirements', 'uniform', 'takes', 'considerable', 'effort,', 'much', 'paperwork', 'many', 'fees', 'meet', 'keep', 'requirements.', 'solicit', 'donations', 'locations', 'received', 'written', 'confirmation', 'compliance.', 'send', 'donations', 'determine', 'status', 'compliance', 'particular', 'state', 'visit', 'www.gutenberg.org/donate', 'cannot', 'solicit', 'contributions', 'states', 'met', 'solicitation', 'requirements,', 'know', 'prohibition', 'accepting', 'unsolicited', 'donations', 'donors', 'states', 'approach', 'us', 'offers', 'donate.', 'international', 'donations', 'gratefully', 'accepted,', 'cannot', 'make', 'statements', 'concerning', 'tax', 'treatment', 'donations', 'received', 'outside', 'united', 'states.', 'u.s.', 'laws', 'alone', 'swamp', 'small', 'staff.', 'please', 'check', 'project', 'gutenberg', 'web', 'pages', 'current', 'donation', 'methods', 'addresses.', 'donations', 'accepted', 'number', 'ways', 'including', 'checks,', 'online', 'payments', 'credit', 'card', 'donations.', 'donate,', 'please', 'visit:', 'www.gutenberg.org/donate', 'section', '5.', 'general', 'information', 'project', 'gutenberg-tm', 'electronic', 'works.', 'professor', 'michael', 's.', 'hart', 'originator', 'project', 'gutenberg-tm', 'concept', 'library', 'electronic', 'works', 'could', 'freely', 'shared', 'anyone.', 'forty', 'years,', 'produced', 'distributed', 'project', 'gutenberg-tm', 'ebooks', 'loose', 'network', 'volunteer', 'support.', 'project', 'gutenberg-tm', 'ebooks', 'often', 'created', 'several', 'printed', 'editions,', 'confirmed', 'protected', 'copyright', 'u.s.', 'unless', 'copyright', 'notice', 'included.', 'thus,', 'necessarily', 'keep', 'ebooks', 'compliance', 'particular', 'paper', 'edition.', 'people', 'start', 'web', 'site', 'main', 'pg', 'search', 'facility:', 'www.gutenberg.org', 'web', 'site', 'includes', 'information', 'project', 'gutenberg-tm,', 'including', 'make', 'donations', 'project', 'gutenberg', 'literary', 'archive', 'foundation,', 'help', 'produce', 'new', 'ebooks,', 'subscribe', 'email', 'newsletter', 'hear', 'new', 'ebooks.']
In [159]:
fdist_withstop = FreqDist(newtokens_withstop) 
fdist_withstop
Out[159]:
FreqDist({'project': 83, 'gutenberg-tm': 54, 'work': 36, 'electronic': 27, 'gutenberg': 25, 'may': 25, 'works': 22, 'terms': 21, 'copyright': 19, 'would': 17, ...})
In [160]:
#looks like there are a few other words that need to be added to our custom stopword list!
#project, gutenberg, ebook, ***, #1080
#create your own stopwords list
more_stopwords =["***", "copyright", "gutenberg-tm", "archive", "project", "ebook", "#1080", "electronic", "gutenberg"]
stopwords2= more_stopwords+stopwords2
In [161]:
#rerun loop function
#using loops to update the corpus implementing the stopword list

newtokens_withstop = []

for token in cleaned_tokens: #remember cleaned_tokens is variable name from earlier version of corpus
    if token not in stopwords2: #our custom stopword list
        newtokens_withstop.append(token)
In [106]:
newtokens2 = []

for token in cleaned_tokens:
    if token not in stopwords:
        newtokens2.append(token)
In [162]:
print(newtokens_withstop)
['', 'modest', 'proposal,', 'jonathan', 'swift', 'use', 'anyone', 'anywhere', 'cost', 'almost', 'restrictions', 'whatsoever.', 'may', 'copy', 'it,', 'give', 'away', 're-use', 'terms', 'license', 'included', 'online', 'www.gutenberg.org', 'title:', 'modest', 'proposal', 'preventing', 'children', 'poor', 'people', 'ireland,', 'burden', 'parents', 'country,', 'making', 'beneficial', 'publick', '-', '1729', 'author:', 'jonathan', 'swift', 'release', 'date:', 'july', '27,', '2008', '[ebook', '#1080]', 'last', 'updated:', 'october', '17,', '2019', 'language:', 'english', 'character', 'set', 'encoding:', 'utf-8', 'start', 'modest', 'proposal', 'produced', 'anonymous', 'volunteer,', 'david', 'widger', 'modest', 'proposal', 'preventing', 'children', 'poor', 'people', 'ireland,', 'burden', 'parents', 'country,', 'making', 'beneficial', 'publick.', 'dr.', 'jonathan', 'swift', '1729', 'melancholy', 'object', 'those,', 'walk', 'great', 'town,', 'travel', 'country,', 'see', 'streets,', 'roads,', 'cabbin-doors', 'crowded', 'beggars', 'female', 'sex,', 'followed', 'three,', 'four,', 'six', 'children,', 'rags,', 'importuning', 'every', 'passenger', 'alms.', 'mothers,', 'instead', 'able', 'work', 'honest', 'livelihood,', 'forced', 'employ', 'time', 'stroling', 'beg', 'sustenance', 'helpless', 'infants', 'who,', 'grow', 'up,', 'either', 'turn', 'thieves', 'want', 'work,', 'leave', 'dear', 'native', 'country,', 'fight', 'pretender', 'spain,', 'sell', 'barbadoes.', 'think', 'agreed', 'parties,', 'prodigious', 'number', 'children', 'arms,', 'backs,', 'heels', 'mothers,', 'frequently', 'fathers,', 'present', 'deplorable', 'state', 'kingdom,', 'great', 'additional', 'grievance;', 'therefore', 'whoever', 'could', 'find', 'fair,', 'cheap', 'easy', 'method', 'making', 'children', 'sound', 'useful', 'members', 'commonwealth,', 'would', 'deserve', 'well', 'publick,', 'statue', 'set', 'preserver', 'nation.', 'intention', 'far', 'confined', 'provide', 'children', 'professed', 'beggars:', 'much', 'greater', 'extent,', 'shall', 'take', 'whole', 'number', 'infants', 'certain', 'age,', 'born', 'parents', 'effect', 'little', 'able', 'support', 'them,', 'demand', 'charity', 'streets.', 'part,', 'turned', 'thoughts', 'many', 'years', 'upon', 'important', 'subject,', 'maturely', 'weighed', 'several', 'schemes', 'projectors,', 'always', 'found', 'grossly', 'mistaken', 'computation.', 'true,', 'child', 'dropt', 'dam,', 'may', 'supported', 'milk,', 'solar', 'year,', 'little', 'nourishment:', 'value', 'two', 'shillings,', 'mother', 'may', 'certainly', 'get,', 'value', 'scraps,', 'lawful', 'occupation', 'begging;', 'exactly', 'one', 'year', 'old', 'propose', 'provide', 'manner,', 'as,', 'instead', 'charge', 'upon', 'parents,', 'parish,', 'wanting', 'food', 'raiment', 'rest', 'lives,', 'shall,', 'contrary,', 'contribute', 'feeding,', 'partly', 'clothing', 'many', 'thousands.', 'likewise', 'another', 'great', 'advantage', 'scheme,', 'prevent', 'voluntary', 'abortions,', 'horrid', 'practice', 'women', 'murdering', 'bastard', 'children,', 'alas!', 'frequent', 'among', 'us,', 'sacrificing', 'poor', 'innocent', 'babes,', 'doubt,', 'avoid', 'expence', 'shame,', 'would', 'move', 'tears', 'pity', 'savage', 'inhuman', 'breast.', 'number', 'souls', 'kingdom', 'usually', 'reckoned', 'one', 'million', 'half,', 'calculate', 'may', 'two', 'hundred', 'thousand', 'couple,', 'whose', 'wives', 'breeders;', 'number', 'subtract', 'thirty', 'thousand', 'couple,', 'able', 'maintain', 'children,', '(although', 'apprehend', 'cannot', 'many', 'present', 'distresses', 'kingdom)', 'granted,', 'remain', 'hundred', 'seventy', 'thousand', 'breeders.', 'subtract', 'fifty', 'thousand,', 'women', 'miscarry,', 'whose', 'children', 'die', 'accident', 'disease', 'within', 'year.', 'remain', 'hundred', 'twenty', 'thousand', 'children', 'poor', 'parents', 'annually', 'born.', 'question', 'therefore', 'is,', 'number', 'shall', 'reared', 'provided', 'for?', 'which,', 'already', 'said,', 'present', 'situation', 'affairs,', 'utterly', 'impossible', 'methods', 'hitherto', 'proposed.', 'neither', 'employ', 'handicraft', 'agriculture;', 'neither', 'build', 'houses,', '(i', 'mean', 'country)', 'cultivate', 'land:', 'seldom', 'pick', 'livelihood', 'stealing', 'till', 'arrive', 'six', 'years', 'old;', 'except', 'towardly', 'parts,', 'although', 'confess', 'learn', 'rudiments', 'much', 'earlier;', 'time', 'however', 'properly', 'looked', 'upon', 'probationers;', 'informed', 'principal', 'gentleman', 'county', 'cavan,', 'protested', 'me,', 'never', 'knew', 'one', 'two', 'instances', 'age', 'six,', 'even', 'part', 'kingdom', 'renowned', 'quickest', 'proficiency', 'art.', 'assured', 'merchants,', 'boy', 'girl,', 'twelve', 'years', 'old,', 'saleable', 'commodity,', 'even', 'come', 'age,', 'yield', 'three', 'pounds,', 'three', 'pounds', 'half', 'crown', 'most,', 'exchange;', 'cannot', 'turn', 'account', 'either', 'parents', 'kingdom,', 'charge', 'nutriments', 'rags', 'least', 'four', 'times', 'value.', 'shall', 'therefore', 'humbly', 'propose', 'thoughts,', 'hope', 'liable', 'least', 'objection.', 'assured', 'knowing', 'american', 'acquaintance', 'london,', 'young', 'healthy', 'child', 'well', 'nursed,', 'is,', 'year', 'old,', 'delicious', 'nourishing', 'wholesome', 'food,', 'whether', 'stewed,', 'roasted,', 'baked,', 'boiled;', 'make', 'doubt', 'equally', 'serve', 'fricasee,', 'ragoust.', 'therefore', 'humbly', 'offer', 'publick', 'consideration,', 'hundred', 'twenty', 'thousand', 'children,', 'already', 'computed,', 'twenty', 'thousand', 'may', 'reserved', 'breed,', 'whereof', 'one', 'fourth', 'part', 'males;', 'allow', 'sheep,', 'black', 'cattle,', 'swine,', 'reason', 'is,', 'children', 'seldom', 'fruits', 'marriage,', 'circumstance', 'much', 'regarded', 'savages,', 'therefore,', 'one', 'male', 'sufficient', 'serve', 'four', 'females.', 'remaining', 'hundred', 'thousand', 'may,', 'year', 'old,', 'offered', 'sale', 'persons', 'quality', 'fortune,', 'kingdom,', 'always', 'advising', 'mother', 'let', 'suck', 'plentifully', 'last', 'month,', 'render', 'plump,', 'fat', 'good', 'table.', 'child', 'make', 'two', 'dishes', 'entertainment', 'friends,', 'family', 'dines', 'alone,', 'fore', 'hind', 'quarter', 'make', 'reasonable', 'dish,', 'seasoned', 'little', 'pepper', 'salt,', 'good', 'boiled', 'fourth', 'day,', 'especially', 'winter.', 'reckoned', 'upon', 'medium,', 'child', 'born', 'weigh', '12', 'pounds,', 'solar', 'year,', 'tolerably', 'nursed,', 'encreaseth', '28', 'pounds.', 'grant', 'food', 'somewhat', 'dear,', 'therefore', 'proper', 'landlords,', 'who,', 'already', 'devoured', 'parents,', 'seem', 'best', 'title', 'children.', 'infant’s', 'flesh', 'season', 'throughout', 'year,', 'plentiful', 'march,', 'little', 'after;', 'told', 'grave', 'author,', 'eminent', 'french', 'physician,', 'fish', 'prolifick', 'dyet,', 'children', 'born', 'roman', 'catholick', 'countries', 'nine', 'months', 'lent,', 'season;', 'therefore,', 'reckoning', 'year', 'lent,', 'markets', 'glutted', 'usual,', 'number', 'popish', 'infants,', 'least', 'three', 'one', 'kingdom,', 'therefore', 'one', 'collateral', 'advantage,', 'lessening', 'number', 'papists', 'among', 'us.', 'already', 'computed', 'charge', 'nursing', 'beggar’s', 'child', '(in', 'list', 'reckon', 'cottagers,', 'labourers,', 'four-fifths', 'farmers)', 'two', 'shillings', 'per', 'annum,', 'rags', 'included;', 'believe', 'gentleman', 'would', 'repine', 'give', 'ten', 'shillings', 'carcass', 'good', 'fat', 'child,', 'which,', 'said,', 'make', 'four', 'dishes', 'excellent', 'nutritive', 'meat,', 'hath', 'particular', 'friend,', 'family', 'dine', 'him.', 'thus', 'squire', 'learn', 'good', 'landlord,', 'grow', 'popular', 'among', 'tenants,', 'mother', 'eight', 'shillings', 'neat', 'profit,', 'fit', 'work', 'till', 'produces', 'another', 'child.', 'thrifty', '(as', 'must', 'confess', 'times', 'require)', 'may', 'flay', 'carcass;', 'skin', 'which,', 'artificially', 'dressed,', 'make', 'admirable', 'gloves', 'ladies,', 'summer', 'boots', 'fine', 'gentlemen.', 'city', 'dublin,', 'shambles', 'may', 'appointed', 'purpose,', 'convenient', 'parts', 'it,', 'butchers', 'may', 'assured', 'wanting;', 'although', 'rather', 'recommend', 'buying', 'children', 'alive,', 'dressing', 'hot', 'knife,', 'roasting', 'pigs.', 'worthy', 'person,', 'true', 'lover', 'country,', 'whose', 'virtues', 'highly', 'esteem,', 'lately', 'pleased', 'discoursing', 'matter,', 'offer', 'refinement', 'upon', 'scheme.', 'said,', 'many', 'gentlemen', 'kingdom,', 'late', 'destroyed', 'deer,', 'conceived', 'want', 'venison', 'might', 'well', 'supplied', 'bodies', 'young', 'lads', 'maidens,', 'exceeding', 'fourteen', 'years', 'age,', 'twelve;', 'great', 'number', 'sexes', 'every', 'county', 'ready', 'starve', 'want', 'work', 'service:', 'disposed', 'parents', 'alive,', 'otherwise', 'nearest', 'relations.', 'due', 'deference', 'excellent', 'friend,', 'deserving', 'patriot,', 'cannot', 'altogether', 'sentiments;', 'males,', 'american', 'acquaintance', 'assured', 'frequent', 'experience,', 'flesh', 'generally', 'tough', 'lean,', 'like', 'schoolboys,', 'continual', 'exercise,', 'taste', 'disagreeable,', 'fatten', 'would', 'answer', 'charge.', 'females,', 'would,', 'think,', 'humble', 'submission,', 'loss', 'publick,', 'soon', 'would', 'become', 'breeders', 'themselves:', 'besides,', 'improbable', 'scrupulous', 'people', 'might', 'apt', 'censure', 'practice,', '(although', 'indeed', 'unjustly)', 'little', 'bordering', 'upon', 'cruelty,', 'which,', 'confess,', 'hath', 'always', 'strongest', 'objection', 'project,', 'well', 'soever', 'intended.', 'order', 'justify', 'friend,', 'confessed,', 'expedient', 'put', 'head', 'famous', 'psalmanaazor,', 'native', 'island', 'formosa,', 'came', 'thence', 'london,', 'twenty', 'years', 'ago,', 'conversation', 'told', 'friend,', 'country,', 'young', 'person', 'happened', 'put', 'death,', 'executioner', 'sold', 'carcass', 'persons', 'quality,', 'prime', 'dainty;', 'that,', 'time,', 'body', 'plump', 'girl', 'fifteen,', 'crucified', 'attempt', 'poison', 'emperor,', 'sold', 'imperial', 'majesty’s', 'prime', 'minister', 'state,', 'great', 'mandarins', 'court', 'joints', 'gibbet,', 'four', 'hundred', 'crowns.', 'neither', 'indeed', 'deny,', 'use', 'made', 'several', 'plump', 'young', 'girls', 'town,', 'without', 'one', 'single', 'groat', 'fortunes,', 'cannot', 'stir', 'abroad', 'without', 'chair,', 'appear', 'playhouse', 'assemblies', 'foreign', 'fineries', 'never', 'pay', 'for,', 'kingdom', 'would', 'worse.', 'persons', 'desponding', 'spirit', 'great', 'concern', 'vast', 'number', 'poor', 'people,', 'aged,', 'diseased,', 'maimed;', 'desired', 'employ', 'thoughts', 'course', 'may', 'taken,', 'ease', 'nation', 'grievous', 'incumbrance.', 'least', 'pain', 'upon', 'matter,', 'well', 'known,', 'every', 'day', 'dying,', 'rotting,', 'cold', 'famine,', 'filth,', 'vermin,', 'fast', 'reasonably', 'expected.', 'young', 'labourers,', 'almost', 'hopeful', 'condition.', 'cannot', 'get', 'work,', 'consequently', 'pine', 'away', 'want', 'nourishment,', 'degree,', 'time', 'accidentally', 'hired', 'common', 'labour,', 'strength', 'perform', 'it,', 'thus', 'country', 'happily', 'delivered', 'evils', 'come.', 'long', 'digressed,', 'therefore', 'shall', 'return', 'subject.', 'think', 'advantages', 'proposal', 'made', 'obvious', 'many,', 'well', 'highest', 'importance.', 'first,', 'already', 'observed,', 'would', 'greatly', 'lessen', 'number', 'papists,', 'yearly', 'overrun,', 'principal', 'breeders', 'nation,', 'well', 'dangerous', 'enemies,', 'stay', 'home', 'purpose', 'design', 'deliver', 'kingdom', 'pretender,', 'hoping', 'take', 'advantage', 'absence', 'many', 'good', 'protestants,', 'chosen', 'rather', 'leave', 'country,', 'stay', 'home', 'pay', 'tithes', 'conscience', 'episcopal', 'curate.', 'secondly,', 'poorer', 'tenants', 'something', 'valuable', 'own,', 'law', 'may', 'made', 'liable', 'distress,', 'help', 'pay', 'landlord’s', 'rent,', 'corn', 'cattle', 'already', 'seized,', 'money', 'thing', 'unknown.', 'thirdly,', 'whereas', 'maintainance', 'hundred', 'thousand', 'children,', 'two', 'years', 'old,', 'upwards,', 'cannot', 'computed', 'less', 'ten', 'shillings', 'piece', 'per', 'annum,', 'nation’s', 'stock', 'thereby', 'encreased', 'fifty', 'thousand', 'pounds', 'per', 'annum,', 'besides', 'profit', 'new', 'dish,', 'introduced', 'tables', 'gentlemen', 'fortune', 'kingdom,', 'refinement', 'taste.', 'money', 'circulate', 'among', 'selves,', 'goods', 'entirely', 'growth', 'manufacture.', 'fourthly,', 'constant', 'breeders,', 'besides', 'gain', 'eight', 'shillings', 'sterling', 'per', 'annum', 'sale', 'children,', 'rid', 'charge', 'maintaining', 'first', 'year.', 'fifthly,', 'food', 'would', 'likewise', 'bring', 'great', 'custom', 'taverns,', 'vintners', 'certainly', 'prudent', 'procure', 'best', 'receipts', 'dressing', 'perfection;', 'consequently', 'houses', 'frequented', 'fine', 'gentlemen,', 'justly', 'value', 'upon', 'knowledge', 'good', 'eating;', 'skilful', 'cook,', 'understands', 'oblige', 'guests,', 'contrive', 'make', 'expensive', 'please.', 'sixthly,', 'would', 'great', 'inducement', 'marriage,', 'wise', 'nations', 'either', 'encouraged', 'rewards,', 'enforced', 'laws', 'penalties.', 'would', 'encrease', 'care', 'tenderness', 'mothers', 'towards', 'children,', 'sure', 'settlement', 'life', 'poor', 'babes,', 'provided', 'sort', 'publick,', 'annual', 'profit', 'instead', 'expence.', 'soon', 'see', 'honest', 'emulation', 'among', 'married', 'women,', 'could', 'bring', 'fattest', 'child', 'market.', 'men', 'would', 'become', 'fond', 'wives,', 'time', 'pregnancy,', 'mares', 'foal,', 'cows', 'calf,', 'sows', 'ready', 'farrow;', 'offer', 'beat', 'kick', '(as', 'frequent', 'practice)', 'fear', 'miscarriage.', 'many', 'advantages', 'might', 'enumerated.', 'instance,', 'addition', 'thousand', 'carcasses', 'exportation', 'barrel’d', 'beef:', 'propagation', 'swine’s', 'flesh,', 'improvement', 'art', 'making', 'good', 'bacon,', 'much', 'wanted', 'among', 'us', 'great', 'destruction', 'pigs,', 'frequent', 'tables;', 'way', 'comparable', 'taste', 'magnificence', 'well', 'grown,', 'fat', 'yearling', 'child,', 'roasted', 'whole', 'make', 'considerable', 'figure', 'lord', 'mayor’s', 'feast,', 'publick', 'entertainment.', 'this,', 'many', 'others,', 'omit,', 'studious', 'brevity.', 'supposing', 'one', 'thousand', 'families', 'city,', 'would', 'constant', 'customers', 'infants', 'flesh,', 'besides', 'others', 'might', 'merry', 'meetings,', 'particularly', 'weddings', 'christenings,', 'compute', 'dublin', 'would', 'take', 'annually', 'twenty', 'thousand', 'carcasses;', 'rest', 'kingdom', '(where', 'probably', 'sold', 'somewhat', 'cheaper)', 'remaining', 'eighty', 'thousand.', 'think', 'one', 'objection,', 'possibly', 'raised', 'proposal,', 'unless', 'urged,', 'number', 'people', 'thereby', 'much', 'lessened', 'kingdom.', 'freely', 'own,', 'indeed', 'one', 'principal', 'design', 'offering', 'world.', 'desire', 'reader', 'observe,', 'calculate', 'remedy', 'one', 'individual', 'kingdom', 'ireland,', 'ever', 'was,', 'is,', 'or,', 'think,', 'ever', 'upon', 'earth.', 'therefore', 'let', 'man', 'talk', 'expedients:', 'taxing', 'absentees', 'five', 'shillings', 'pound:', 'using', 'neither', 'clothes,', 'houshold', 'furniture,', 'except', 'growth', 'manufacture:', 'utterly', 'rejecting', 'materials', 'instruments', 'promote', 'foreign', 'luxury:', 'curing', 'expensiveness', 'pride,', 'vanity,', 'idleness,', 'gaming', 'women:', 'introducing', 'vein', 'parsimony,', 'prudence', 'temperance:', 'learning', 'love', 'country,', 'wherein', 'differ', 'even', 'laplanders,', 'inhabitants', 'topinamboo:', 'quitting', 'animosities', 'factions,', 'acting', 'longer', 'like', 'jews,', 'murdering', 'one', 'another', 'moment', 'city', 'taken:', 'little', 'cautious', 'sell', 'country', 'consciences', 'nothing:', 'teaching', 'landlords', 'least', 'one', 'degree', 'mercy', 'towards', 'tenants.', 'lastly,', 'putting', 'spirit', 'honesty,', 'industry,', 'skill', 'shopkeepers,', 'who,', 'resolution', 'could', 'taken', 'buy', 'native', 'goods,', 'would', 'immediately', 'unite', 'cheat', 'exact', 'upon', 'us', 'price,', 'measure,', 'goodness,', 'could', 'ever', 'yet', 'brought', 'make', 'one', 'fair', 'proposal', 'dealing,', 'though', 'often', 'earnestly', 'invited', 'it.', 'therefore', 'repeat,', 'let', 'man', 'talk', 'like', 'expedients,', 'till', 'hath', 'least', 'glympse', 'hope,', 'ever', 'hearty', 'sincere', 'attempt', 'put', 'practice.', 'but,', 'myself,', 'wearied', 'many', 'years', 'offering', 'vain,', 'idle,', 'visionary', 'thoughts,', 'length', 'utterly', 'despairing', 'success,', 'fortunately', 'fell', 'upon', 'proposal,', 'which,', 'wholly', 'new,', 'hath', 'something', 'solid', 'real,', 'expence', 'little', 'trouble,', 'full', 'power,', 'whereby', 'incur', 'danger', 'disobliging', 'england.', 'kind', 'commodity', 'bear', 'exportation,', 'flesh', 'tender', 'consistence,', 'admit', 'long', 'continuance', 'salt,', 'although', 'perhaps', 'could', 'name', 'country,', 'would', 'glad', 'eat', 'whole', 'nation', 'without', 'it.', 'all,', 'violently', 'bent', 'upon', 'opinion,', 'reject', 'offer,', 'proposed', 'wise', 'men,', 'shall', 'found', 'equally', 'innocent,', 'cheap,', 'easy,', 'effectual.', 'something', 'kind', 'shall', 'advanced', 'contradiction', 'scheme,', 'offering', 'better,', 'desire', 'author', 'authors', 'pleased', 'maturely', 'consider', 'two', 'points.', 'first,', 'things', 'stand,', 'able', 'find', 'food', 'raiment', 'hundred', 'thousand', 'useless', 'mouths', 'backs.', 'secondly,', 'round', 'million', 'creatures', 'humane', 'figure', 'throughout', 'kingdom,', 'whose', 'whole', 'subsistence', 'put', 'common', 'stock,', 'would', 'leave', 'debt', 'two', 'million', 'pounds', 'sterling,', 'adding', 'beggars', 'profession,', 'bulk', 'farmers,', 'cottagers', 'labourers,', 'wives', 'children,', 'beggars', 'effect;', 'desire', 'politicians', 'dislike', 'overture,', 'may', 'perhaps', 'bold', 'attempt', 'answer,', 'first', 'ask', 'parents', 'mortals,', 'whether', 'would', 'day', 'think', 'great', 'happiness', 'sold', 'food', 'year', 'old,', 'manner', 'prescribe,', 'thereby', 'avoided', 'perpetual', 'scene', 'misfortunes,', 'since', 'gone', 'through,', 'oppression', 'landlords,', 'impossibility', 'paying', 'rent', 'without', 'money', 'trade,', 'want', 'common', 'sustenance,', 'neither', 'house', 'clothes', 'cover', 'inclemencies', 'weather,', 'inevitable', 'prospect', 'intailing', 'like,', 'greater', 'miseries,', 'upon', 'breed', 'ever.', 'profess', 'sincerity', 'heart,', 'least', 'personal', 'interest', 'endeavouring', 'promote', 'necessary', 'work,', 'motive', 'publick', 'good', 'country,', 'advancing', 'trade,', 'providing', 'infants,', 'relieving', 'poor,', 'giving', 'pleasure', 'rich.', 'children,', 'propose', 'get', 'single', 'penny;', 'youngest', 'nine', 'years', 'old,', 'wife', 'past', 'child-bearing.', 'end', 'modest', 'proposal,', 'jonathan', 'swift', 'end', 'modest', 'proposal', '*****', 'file', 'named', '1080-0.txt', '1080-0.zip', '*****', 'associated', 'files', 'various', 'formats', 'found', 'in:', 'http://www.gutenberg.org/1/0/8/1080/', 'produced', 'anonymous', 'volunteer,', 'david', 'widger', 'updated', 'editions', 'replace', 'previous', 'one--the', 'old', 'editions', 'renamed.', 'creating', 'works', 'print', 'editions', 'protected', 'u.s.', 'law', 'means', 'one', 'owns', 'united', 'states', 'works,', 'foundation', '(and', 'you!)', 'copy', 'distribute', 'united', 'states', 'without', 'permission', 'without', 'paying', 'royalties.', 'special', 'rules,', 'set', 'forth', 'general', 'terms', 'use', 'part', 'license,', 'apply', 'copying', 'distributing', 'works', 'protect', 'concept', 'trademark.', 'registered', 'trademark,', 'may', 'used', 'charge', 'ebooks,', 'unless', 'receive', 'specific', 'permission.', 'charge', 'anything', 'copies', 'ebook,', 'complying', 'rules', 'easy.', 'may', 'use', 'nearly', 'purpose', 'creation', 'derivative', 'works,', 'reports,', 'performances', 'research.', 'may', 'modified', 'printed', 'given', 'away--you', 'may', 'practically', 'anything', 'united', 'states', 'ebooks', 'protected', 'u.s.', 'law.', 'redistribution', 'subject', 'trademark', 'license,', 'especially', 'commercial', 'redistribution.', 'start:', 'full', 'license', 'full', 'license', 'please', 'read', 'distribute', 'use', 'work', 'protect', 'mission', 'promoting', 'free', 'distribution', 'works,', 'using', 'distributing', 'work', '(or', 'work', 'associated', 'way', 'phrase', '"project', 'gutenberg"),', 'agree', 'comply', 'terms', 'full', 'license', 'available', 'file', 'online', 'www.gutenberg.org/license.', 'section', '1.', 'general', 'terms', 'use', 'redistributing', 'works', '1.a.', 'reading', 'using', 'part', 'work,', 'indicate', 'read,', 'understand,', 'agree', 'accept', 'terms', 'license', 'intellectual', 'property', '(trademark/copyright)', 'agreement.', 'agree', 'abide', 'terms', 'agreement,', 'must', 'cease', 'using', 'return', 'destroy', 'copies', 'works', 'possession.', 'paid', 'fee', 'obtaining', 'copy', 'access', 'work', 'agree', 'bound', 'terms', 'agreement,', 'may', 'obtain', 'refund', 'person', 'entity', 'paid', 'fee', 'set', 'forth', 'paragraph', '1.e.8.', '1.b.', '"project', 'gutenberg"', 'registered', 'trademark.', 'may', 'used', 'associated', 'way', 'work', 'people', 'agree', 'bound', 'terms', 'agreement.', 'things', 'works', 'even', 'without', 'complying', 'full', 'terms', 'agreement.', 'see', 'paragraph', '1.c', 'below.', 'lot', 'things', 'works', 'follow', 'terms', 'agreement', 'help', 'preserve', 'free', 'future', 'access', 'works.', 'see', 'paragraph', '1.e', 'below.', '1.c.', 'literary', 'foundation', '("the', 'foundation"', 'pglaf),', 'owns', 'compilation', 'collection', 'works.', 'nearly', 'individual', 'works', 'collection', 'public', 'domain', 'united', 'states.', 'individual', 'work', 'unprotected', 'law', 'united', 'states', 'located', 'united', 'states,', 'claim', 'right', 'prevent', 'copying,', 'distributing,', 'performing,', 'displaying', 'creating', 'derivative', 'works', 'based', 'work', 'long', 'references', 'removed.', 'course,', 'hope', 'support', 'mission', 'promoting', 'free', 'access', 'works', 'freely', 'sharing', 'works', 'compliance', 'terms', 'agreement', 'keeping', 'name', 'associated', 'work.', 'easily', 'comply', 'terms', 'agreement', 'keeping', 'work', 'format', 'attached', 'full', 'license', 'share', 'without', 'charge', 'others.', '1.d.', 'laws', 'place', 'located', 'also', 'govern', 'work.', 'laws', 'countries', 'constant', 'state', 'change.', 'outside', 'united', 'states,', 'check', 'laws', 'country', 'addition', 'terms', 'agreement', 'downloading,', 'copying,', 'displaying,', 'performing,', 'distributing', 'creating', 'derivative', 'works', 'based', 'work', 'work.', 'foundation', 'makes', 'representations', 'concerning', 'status', 'work', 'country', 'outside', 'united', 'states.', '1.e.', 'unless', 'removed', 'references', 'gutenberg:', '1.e.1.', 'following', 'sentence,', 'active', 'links', 'to,', 'immediate', 'access', 'to,', 'full', 'license', 'must', 'appear', 'prominently', 'whenever', 'copy', 'work', '(any', 'work', 'phrase', '"project', 'gutenberg"', 'appears,', 'phrase', '"project', 'gutenberg"', 'associated)', 'accessed,', 'displayed,', 'performed,', 'viewed,', 'copied', 'distributed:', 'use', 'anyone', 'anywhere', 'united', 'states', 'parts', 'world', 'cost', 'almost', 'restrictions', 'whatsoever.', 'may', 'copy', 'it,', 'give', 'away', 're-use', 'terms', 'license', 'included', 'online', 'www.gutenberg.org.', 'located', 'united', 'states,', 'check', 'laws', 'country', 'located', 'using', 'ebook.', '1.e.2.', 'individual', 'work', 'derived', 'texts', 'protected', 'u.s.', 'law', '(does', 'contain', 'notice', 'indicating', 'posted', 'permission', 'holder),', 'work', 'copied', 'distributed', 'anyone', 'united', 'states', 'without', 'paying', 'fees', 'charges.', 'redistributing', 'providing', 'access', 'work', 'phrase', '"project', 'gutenberg"', 'associated', 'appearing', 'work,', 'must', 'comply', 'either', 'requirements', 'paragraphs', '1.e.1', '1.e.7', 'obtain', 'permission', 'use', 'work', 'trademark', 'set', 'forth', 'paragraphs', '1.e.8', '1.e.9.', '1.e.3.', 'individual', 'work', 'posted', 'permission', 'holder,', 'use', 'distribution', 'must', 'comply', 'paragraphs', '1.e.1', '1.e.7', 'additional', 'terms', 'imposed', 'holder.', 'additional', 'terms', 'linked', 'license', 'works', 'posted', 'permission', 'holder', 'found', 'beginning', 'work.', '1.e.4.', 'unlink', 'detach', 'remove', 'full', 'license', 'terms', 'work,', 'files', 'containing', 'part', 'work', 'work', 'associated', 'gutenberg-tm.', '1.e.5.', 'copy,', 'display,', 'perform,', 'distribute', 'redistribute', 'work,', 'part', 'work,', 'without', 'prominently', 'displaying', 'sentence', 'set', 'forth', 'paragraph', '1.e.1', 'active', 'links', 'immediate', 'access', 'full', 'terms', 'license.', '1.e.6.', 'may', 'convert', 'distribute', 'work', 'binary,', 'compressed,', 'marked', 'up,', 'nonproprietary', 'proprietary', 'form,', 'including', 'word', 'processing', 'hypertext', 'form.', 'however,', 'provide', 'access', 'distribute', 'copies', 'work', 'format', '"plain', 'vanilla', 'ascii"', 'format', 'used', 'official', 'version', 'posted', 'official', 'web', 'site', '(www.gutenberg.org),', 'must,', 'additional', 'cost,', 'fee', 'expense', 'user,', 'provide', 'copy,', 'means', 'exporting', 'copy,', 'means', 'obtaining', 'copy', 'upon', 'request,', 'work', 'original', '"plain', 'vanilla', 'ascii"', 'form.', 'alternate', 'format', 'must', 'include', 'full', 'license', 'specified', 'paragraph', '1.e.1.', '1.e.7.', 'charge', 'fee', 'access', 'to,', 'viewing,', 'displaying,', 'performing,', 'copying', 'distributing', 'works', 'unless', 'comply', 'paragraph', '1.e.8', '1.e.9.', '1.e.8.', 'may', 'charge', 'reasonable', 'fee', 'copies', 'providing', 'access', 'distributing', 'works', 'provided', '*', 'pay', 'royalty', 'fee', '20%', 'gross', 'profits', 'derive', 'use', 'works', 'calculated', 'using', 'method', 'already', 'use', 'calculate', 'applicable', 'taxes.', 'fee', 'owed', 'owner', 'trademark,', 'agreed', 'donate', 'royalties', 'paragraph', 'literary', 'foundation.', 'royalty', 'payments', 'must', 'paid', 'within', '60', 'days', 'following', 'date', 'prepare', '(or', 'legally', 'required', 'prepare)', 'periodic', 'tax', 'returns.', 'royalty', 'payments', 'clearly', 'marked', 'sent', 'literary', 'foundation', 'address', 'specified', 'section', '4,', '"information', 'donations', 'literary', 'foundation."', '*', 'provide', 'full', 'refund', 'money', 'paid', 'user', 'notifies', 'writing', '(or', 'e-mail)', 'within', '30', 'days', 'receipt', 's/he', 'agree', 'terms', 'full', 'license.', 'must', 'require', 'user', 'return', 'destroy', 'copies', 'works', 'possessed', 'physical', 'medium', 'discontinue', 'use', 'access', 'copies', 'works.', '*', 'provide,', 'accordance', 'paragraph', '1.f.3,', 'full', 'refund', 'money', 'paid', 'work', 'replacement', 'copy,', 'defect', 'work', 'discovered', 'reported', 'within', '90', 'days', 'receipt', 'work.', '*', 'comply', 'terms', 'agreement', 'free', 'distribution', 'works.', '1.e.9.', 'wish', 'charge', 'fee', 'distribute', 'work', 'group', 'works', 'different', 'terms', 'set', 'forth', 'agreement,', 'must', 'obtain', 'permission', 'writing', 'literary', 'foundation', 'trademark', 'llc,', 'owner', 'trademark.', 'contact', 'foundation', 'set', 'forth', 'section', '3', 'below.', '1.f.', '1.f.1.', 'volunteers', 'employees', 'expend', 'considerable', 'effort', 'identify,', 'research', 'on,', 'transcribe', 'proofread', 'works', 'protected', 'u.s.', 'law', 'creating', 'collection.', 'despite', 'efforts,', 'works,', 'medium', 'may', 'stored,', 'may', 'contain', '"defects,"', 'as,', 'limited', 'to,', 'incomplete,', 'inaccurate', 'corrupt', 'data,', 'transcription', 'errors,', 'intellectual', 'property', 'infringement,', 'defective', 'damaged', 'disk', 'medium,', 'computer', 'virus,', 'computer', 'codes', 'damage', 'cannot', 'read', 'equipment.', '1.f.2.', 'limited', 'warranty,', 'disclaimer', 'damages', '-', 'except', '"right', 'replacement', 'refund"', 'described', 'paragraph', '1.f.3,', 'literary', 'foundation,', 'owner', 'trademark,', 'party', 'distributing', 'work', 'agreement,', 'disclaim', 'liability', 'damages,', 'costs', 'expenses,', 'including', 'legal', 'fees.', 'agree', 'remedies', 'negligence,', 'strict', 'liability,', 'breach', 'warranty', 'breach', 'contract', 'except', 'provided', 'paragraph', '1.f.3.', 'agree', 'foundation,', 'trademark', 'owner,', 'distributor', 'agreement', 'liable', 'actual,', 'direct,', 'indirect,', 'consequential,', 'punitive', 'incidental', 'damages', 'even', 'give', 'notice', 'possibility', 'damage.', '1.f.3.', 'limited', 'right', 'replacement', 'refund', '-', 'discover', 'defect', 'work', 'within', '90', 'days', 'receiving', 'it,', 'receive', 'refund', 'money', '(if', 'any)', 'paid', 'sending', 'written', 'explanation', 'person', 'received', 'work', 'from.', 'received', 'work', 'physical', 'medium,', 'must', 'return', 'medium', 'written', 'explanation.', 'person', 'entity', 'provided', 'defective', 'work', 'may', 'elect', 'provide', 'replacement', 'copy', 'lieu', 'refund.', 'received', 'work', 'electronically,', 'person', 'entity', 'providing', 'may', 'choose', 'give', 'second', 'opportunity', 'receive', 'work', 'electronically', 'lieu', 'refund.', 'second', 'copy', 'also', 'defective,', 'may', 'demand', 'refund', 'writing', 'without', 'opportunities', 'fix', 'problem.', '1.f.4.', 'except', 'limited', 'right', 'replacement', 'refund', 'set', 'forth', 'paragraph', '1.f.3,', 'work', 'provided', "'as-is',", 'warranties', 'kind,', 'express', 'implied,', 'including', 'limited', 'warranties', 'merchantability', 'fitness', 'purpose.', '1.f.5.', 'states', 'allow', 'disclaimers', 'certain', 'implied', 'warranties', 'exclusion', 'limitation', 'certain', 'types', 'damages.', 'disclaimer', 'limitation', 'set', 'forth', 'agreement', 'violates', 'law', 'state', 'applicable', 'agreement,', 'agreement', 'shall', 'interpreted', 'make', 'maximum', 'disclaimer', 'limitation', 'permitted', 'applicable', 'state', 'law.', 'invalidity', 'unenforceability', 'provision', 'agreement', 'shall', 'void', 'remaining', 'provisions.', '1.f.6.', 'indemnity', '-', 'agree', 'indemnify', 'hold', 'foundation,', 'trademark', 'owner,', 'agent', 'employee', 'foundation,', 'anyone', 'providing', 'copies', 'works', 'accordance', 'agreement,', 'volunteers', 'associated', 'production,', 'promotion', 'distribution', 'works,', 'harmless', 'liability,', 'costs', 'expenses,', 'including', 'legal', 'fees,', 'arise', 'directly', 'indirectly', 'following', 'cause', 'occur:', '(a)', 'distribution', 'work,', '(b)', 'alteration,', 'modification,', 'additions', 'deletions', 'work,', '(c)', 'defect', 'cause.', 'section', '2.', 'information', 'mission', 'synonymous', 'free', 'distribution', 'works', 'formats', 'readable', 'widest', 'variety', 'computers', 'including', 'obsolete,', 'old,', 'middle-aged', 'new', 'computers.', 'exists', 'efforts', 'hundreds', 'volunteers', 'donations', 'people', 'walks', 'life.', 'volunteers', 'financial', 'support', 'provide', 'volunteers', 'assistance', 'need', 'critical', 'reaching', "gutenberg-tm's", 'goals', 'ensuring', 'collection', 'remain', 'freely', 'available', 'generations', 'come.', '2001,', 'literary', 'foundation', 'created', 'provide', 'secure', 'permanent', 'future', 'future', 'generations.', 'learn', 'literary', 'foundation', 'efforts', 'donations', 'help,', 'see', 'sections', '3', '4', 'foundation', 'information', 'page', 'www.gutenberg.org', 'section', '3.', 'information', 'literary', 'foundation', 'literary', 'foundation', 'non', 'profit', '501(c)(3)', 'educational', 'corporation', 'organized', 'laws', 'state', 'mississippi', 'granted', 'tax', 'exempt', 'status', 'internal', 'revenue', 'service.', "foundation's", 'ein', 'federal', 'tax', 'identification', 'number', '64-6221541.', 'contributions', 'literary', 'foundation', 'tax', 'deductible', 'full', 'extent', 'permitted', 'u.s.', 'federal', 'laws', "state's", 'laws.', "foundation's", 'principal', 'office', 'fairbanks,', 'alaska,', 'mailing', 'address:', 'po', 'box', '750175,', 'fairbanks,', 'ak', '99775,', 'volunteers', 'employees', 'scattered', 'throughout', 'numerous', 'locations.', 'business', 'office', 'located', '809', 'north', '1500', 'west,', 'salt', 'lake', 'city,', 'ut', '84116,', '(801)', '596-1887.', 'email', 'contact', 'links', 'date', 'contact', 'information', 'found', "foundation's", 'web', 'site', 'official', 'page', 'www.gutenberg.org/contact', 'additional', 'contact', 'information:', 'dr.', 'gregory', 'b.', 'newby', 'chief', 'executive', 'director', 'gbnewby@pglaf.org', 'section', '4.', 'information', 'donations', 'literary', 'foundation', 'depends', 'upon', 'cannot', 'survive', 'without', 'wide', 'spread', 'public', 'support', 'donations', 'carry', 'mission', 'increasing', 'number', 'public', 'domain', 'licensed', 'works', 'freely', 'distributed', 'machine', 'readable', 'form', 'accessible', 'widest', 'array', 'equipment', 'including', 'outdated', 'equipment.', 'many', 'small', 'donations', '($1', '$5,000)', 'particularly', 'important', 'maintaining', 'tax', 'exempt', 'status', 'irs.', 'foundation', 'committed', 'complying', 'laws', 'regulating', 'charities', 'charitable', 'donations', '50', 'states', 'united', 'states.', 'compliance', 'requirements', 'uniform', 'takes', 'considerable', 'effort,', 'much', 'paperwork', 'many', 'fees', 'meet', 'keep', 'requirements.', 'solicit', 'donations', 'locations', 'received', 'written', 'confirmation', 'compliance.', 'send', 'donations', 'determine', 'status', 'compliance', 'particular', 'state', 'visit', 'www.gutenberg.org/donate', 'cannot', 'solicit', 'contributions', 'states', 'met', 'solicitation', 'requirements,', 'know', 'prohibition', 'accepting', 'unsolicited', 'donations', 'donors', 'states', 'approach', 'us', 'offers', 'donate.', 'international', 'donations', 'gratefully', 'accepted,', 'cannot', 'make', 'statements', 'concerning', 'tax', 'treatment', 'donations', 'received', 'outside', 'united', 'states.', 'u.s.', 'laws', 'alone', 'swamp', 'small', 'staff.', 'please', 'check', 'web', 'pages', 'current', 'donation', 'methods', 'addresses.', 'donations', 'accepted', 'number', 'ways', 'including', 'checks,', 'online', 'payments', 'credit', 'card', 'donations.', 'donate,', 'please', 'visit:', 'www.gutenberg.org/donate', 'section', '5.', 'general', 'information', 'works.', 'professor', 'michael', 's.', 'hart', 'originator', 'concept', 'library', 'works', 'could', 'freely', 'shared', 'anyone.', 'forty', 'years,', 'produced', 'distributed', 'ebooks', 'loose', 'network', 'volunteer', 'support.', 'ebooks', 'often', 'created', 'several', 'printed', 'editions,', 'confirmed', 'protected', 'u.s.', 'unless', 'notice', 'included.', 'thus,', 'necessarily', 'keep', 'ebooks', 'compliance', 'particular', 'paper', 'edition.', 'people', 'start', 'web', 'site', 'main', 'pg', 'search', 'facility:', 'www.gutenberg.org', 'web', 'site', 'includes', 'information', 'gutenberg-tm,', 'including', 'make', 'donations', 'literary', 'foundation,', 'help', 'produce', 'new', 'ebooks,', 'subscribe', 'email', 'newsletter', 'hear', 'new', 'ebooks.']
In [163]:
fdist_withstop = FreqDist(newtokens_withstop) 
fdist_withstop
Out[163]:
FreqDist({'work': 36, 'may': 25, 'works': 22, 'terms': 21, 'would': 17, 'one': 16, 'upon': 15, 'number': 14, 'full': 14, 'foundation': 14, ...})
In [164]:
fdist_withstop.plot(25) 
Out[164]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd220822dc0>

Testing Corpora from NLTK

In [165]:
nltk.download('brown')
from nltk.corpus import brown
[nltk_data] Downloading package brown to /Users/kikuiper/nltk_data...
[nltk_data]   Package brown is already up-to-date!
In [166]:
brown.words()
Out[166]:
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
In [167]:
fdistbrown = nltk.FreqDist(brown.words())
fdistbrown
Out[167]:
FreqDist({'the': 62713, ',': 58334, '.': 49346, 'of': 36080, 'and': 27915, 'to': 25732, 'a': 21881, 'in': 19536, 'that': 10237, 'is': 10011, ...})
In [168]:
fdistbrown.most_common(20) 
Out[168]:
[('the', 62713),
 (',', 58334),
 ('.', 49346),
 ('of', 36080),
 ('and', 27915),
 ('to', 25732),
 ('a', 21881),
 ('in', 19536),
 ('that', 10237),
 ('is', 10011),
 ('was', 9777),
 ('for', 8841),
 ('``', 8837),
 ("''", 8789),
 ('The', 7258),
 ('with', 7012),
 ('it', 6723),
 ('as', 6706),
 ('he', 6566),
 ('his', 6466)]
In [169]:
#brown was very carefully constructed
#and created to include different samples of American English
brown.categories()  
Out[169]:
['adventure',
 'belles_lettres',
 'editorial',
 'fiction',
 'government',
 'hobbies',
 'humor',
 'learned',
 'lore',
 'mystery',
 'news',
 'religion',
 'reviews',
 'romance',
 'science_fiction']
In [170]:
brown.tagged_words(categories='news')
Out[170]:
[('The', 'AT'), ('Fulton', 'NP-TL'), ...]
In [85]:
print(brown.tagged_words())
[('The', 'AT'), ('Fulton', 'NP-TL'), ...]
In [86]:
print(brown.sents())
[['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities', 'took', 'place', '.'], ['The', 'jury', 'further', 'said', 'in', 'term-end', 'presentments', 'that', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had', 'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves', 'the', 'praise', 'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta', "''", 'for', 'the', 'manner', 'in', 'which', 'the', 'election', 'was', 'conducted', '.'], ...]
In [87]:
print(brown.tagged_sents())
[[('The', 'AT'), ('Fulton', 'NP-TL'), ('County', 'NN-TL'), ('Grand', 'JJ-TL'), ('Jury', 'NN-TL'), ('said', 'VBD'), ('Friday', 'NR'), ('an', 'AT'), ('investigation', 'NN'), ('of', 'IN'), ("Atlanta's", 'NP$'), ('recent', 'JJ'), ('primary', 'NN'), ('election', 'NN'), ('produced', 'VBD'), ('``', '``'), ('no', 'AT'), ('evidence', 'NN'), ("''", "''"), ('that', 'CS'), ('any', 'DTI'), ('irregularities', 'NNS'), ('took', 'VBD'), ('place', 'NN'), ('.', '.')], [('The', 'AT'), ('jury', 'NN'), ('further', 'RBR'), ('said', 'VBD'), ('in', 'IN'), ('term-end', 'NN'), ('presentments', 'NNS'), ('that', 'CS'), ('the', 'AT'), ('City', 'NN-TL'), ('Executive', 'JJ-TL'), ('Committee', 'NN-TL'), (',', ','), ('which', 'WDT'), ('had', 'HVD'), ('over-all', 'JJ'), ('charge', 'NN'), ('of', 'IN'), ('the', 'AT'), ('election', 'NN'), (',', ','), ('``', '``'), ('deserves', 'VBZ'), ('the', 'AT'), ('praise', 'NN'), ('and', 'CC'), ('thanks', 'NNS'), ('of', 'IN'), ('the', 'AT'), ('City', 'NN-TL'), ('of', 'IN-TL'), ('Atlanta', 'NP-TL'), ("''", "''"), ('for', 'IN'), ('the', 'AT'), ('manner', 'NN'), ('in', 'IN'), ('which', 'WDT'), ('the', 'AT'), ('election', 'NN'), ('was', 'BEDZ'), ('conducted', 'VBN'), ('.', '.')], ...]
In [88]:
#NLTK also features SemCor, which is a subset of Brown that is tagged with WordNet senses and named entities.
#Both kinds of lexical items include multiword units, which are encoded as chunks 
#(senses and part-of-speech tags pertain to the entire chunk).
In [171]:
nltk.download('semcor')
from nltk.corpus import semcor
semcor.words()
[nltk_data] Downloading package semcor to /Users/kikuiper/nltk_data...
[nltk_data]   Package semcor is already up-to-date!
Out[171]:
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
In [172]:
semcor.chunks()
Out[172]:
[['The'], ['Fulton', 'County', 'Grand', 'Jury'], ...]
In [173]:
list(map(str, semcor.tagged_chunks(tag='both')[:10]))
Out[173]:
['(DT The)',
 "(Lemma('group.n.01.group') (NE (NNP Fulton County Grand Jury)))",
 "(Lemma('state.v.01.say') (VB said))",
 "(Lemma('friday.n.01.Friday') (NN Friday))",
 '(DT an)',
 "(Lemma('probe.n.01.investigation') (NN investigation))",
 '(IN of)',
 "(Lemma('atlanta.n.01.Atlanta') (NN Atlanta))",
 "(POS 's)",
 "(Lemma('late.s.03.recent') (JJ recent))"]
In [93]:
#sentiment analysis with NLTK
In [174]:
# import the relevant modules from the NLTK library
from nltk.sentiment.vader import SentimentIntensityAnalyzer
In [175]:
#initialize VADER 
sid = SentimentIntensityAnalyzer()
In [176]:
# the variable 'test_text' contains the text we will analyze.
test_text = ("Text analysis is cool. I am excited to learn about more python libraries and options for text analysis. Next week we will begin learning about R for text analysis, too!")
test_text
Out[176]:
'Text analysis is cool. I am excited to learn about more python libraries and options for text analysis. Next week we will begin learning about R for text analysis, too!'
In [177]:
# Calling the polarity_scores method on sid and passing in the message_text outputs a dictionary with negative, neutral, positive, and compound scores for the input text
scores = sid.polarity_scores(test_text)
In [178]:
# Here we loop through the keys contained in scores (pos, neu, neg, and compound scores) and print the key-value pairs on the screen
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')
compound: 0.6114, neg: 0.0, neu: 0.839, pos: 0.161, 
In [ ]:
#now apply it to the Jonathan Swift A Modest Proposal corpus: text
In [179]:
scores = sid.polarity_scores(text)
In [180]:
#using same loop on the scores from A Modest Proposal corpus
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')
compound: 1.0, neg: 0.065, neu: 0.793, pos: 0.142, 
In [111]:
#Testing Topic Modeling
In [182]:
text0 = "I love to read books. My favorite library at UGA is the science library."
text1 = "The main library at UGA is located on North Campus."
text2 = "The UGA Arch is on North Campus."
text3 = "A text could be a book; it could also be any instance of natural language, whether from social media or transcribed spoken language."
text4 = "The University of Georgia was founded in 1785."


all_texts = [text0, text1, text2, text3, text4]
In [183]:
nltk.download('wordnet')
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
import string
stop = set(stopwords.words('english'))
exclude = set(string.punctuation)
lemma = WordNetLemmatizer()
def clean(text):
    stop_free = " ".join([i for i in text.lower().split() if i not in stop])
    punc_free = ''.join(ch for ch in stop_free if ch not in exclude)
    normalized = " ".join(lemma.lemmatize(word) for word in punc_free.split())
    return normalized

text_clean = [clean(text).split() for text in all_texts]
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/kikuiper/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
In [184]:
text_clean
Out[184]:
[['love', 'read', 'book', 'favorite', 'library', 'uga', 'science', 'library'],
 ['main', 'library', 'uga', 'located', 'north', 'campus'],
 ['uga', 'arch', 'north', 'campus'],
 ['text',
  'could',
  'book',
  'could',
  'also',
  'instance',
  'natural',
  'language',
  'whether',
  'social',
  'medium',
  'transcribed',
  'spoken',
  'language'],
 ['university', 'georgia', 'founded', '1785']]
In [185]:
# Importing Gensim
import gensim
from gensim import corpora

# Create the term dictionary of our corpus, where every unique term is assigned an index. 
dictionary = corpora.Dictionary(text_clean)

# Convert corpus into DTM Document Term Matrix using the dictionary.
doc_term_matrix = [dictionary.doc2bow(text) for text in text_clean]
In [186]:
# LDA model object using gensim library
Lda = gensim.models.ldamodel.LdaModel

# Running and Training LDA model on DTM
ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=5)
In [187]:
print(ldamodel.print_topics(num_topics=3, num_words=3))
[(0, '0.098*"could" + 0.098*"language" + 0.058*"book"'), (1, '0.142*"library" + 0.101*"uga" + 0.058*"love"'), (2, '0.077*"north" + 0.077*"campus" + 0.076*"founded"')]
In [188]:
#now test this on the brown corpus

brown_docs = []
for fileid in brown.fileids():
    words = brown.words(fileid)
    text = ""
    for word in words:
        text = text + word + " "
    brown_docs.append(text)
In [189]:
browndoc_clean = [clean(doc).split() for doc in brown_docs]
In [190]:
# Create the term dictionary of our corpus, where every unique term is assigned an index. 
browndictionary = corpora.Dictionary(browndoc_clean)

# Convert corpus into DTM Document Term Matrix using the dictionary.
brown_dtm = [browndictionary.doc2bow(doc) for doc in browndoc_clean]
In [191]:
# LDA model object using gensim library
Lda = gensim.models.ldamodel.LdaModel

# Running and Training LDA model on DTM
ldamodel = Lda(brown_dtm, num_topics=20, id2word = browndictionary, passes=5) #if taking 50 too long, lower passes
In [192]:
print(ldamodel.print_topics(num_topics=20, num_words=10))
[(0, '0.005*"alfred" + 0.004*"spencer" + 0.004*"one" + 0.004*"men" + 0.004*"would" + 0.003*"chandler" + 0.003*"alexander" + 0.003*"captain" + 0.003*"said" + 0.002*"knew"'), (1, '0.007*"would" + 0.006*"one" + 0.004*"first" + 0.004*"year" + 0.003*"president" + 0.003*"could" + 0.003*"force" + 0.003*"time" + 0.003*"made" + 0.003*"american"'), (2, '0.005*"pansy" + 0.004*"plant" + 0.004*"avocado" + 0.003*"seed" + 0.003*"one" + 0.002*"fruit" + 0.002*"flower" + 0.002*"like" + 0.002*"soil" + 0.002*"blooming"'), (3, '0.010*"mr" + 0.006*"one" + 0.006*"state" + 0.005*"court" + 0.004*"federal" + 0.004*"poem" + 0.004*"year" + 0.004*"hardy" + 0.003*"would" + 0.003*"first"'), (4, '0.008*"would" + 0.007*"one" + 0.007*"said" + 0.006*"could" + 0.005*"time" + 0.004*"like" + 0.004*"man" + 0.004*"back" + 0.003*"get" + 0.003*"even"'), (5, '0.007*"state" + 0.007*"year" + 0.005*"one" + 0.005*"new" + 0.005*"school" + 0.004*"would" + 0.004*"program" + 0.004*"may" + 0.003*"time" + 0.003*"said"'), (6, '0.007*"said" + 0.006*"one" + 0.006*"would" + 0.005*"back" + 0.004*"time" + 0.003*"could" + 0.003*"first" + 0.003*"like" + 0.002*"even" + 0.002*"two"'), (7, '0.008*"one" + 0.007*"mr" + 0.006*"said" + 0.005*"would" + 0.004*"could" + 0.004*"like" + 0.004*"time" + 0.003*"well" + 0.003*"day" + 0.003*"get"'), (8, '0.006*"student" + 0.005*"bridge" + 0.005*"college" + 0.004*"faculty" + 0.004*"year" + 0.003*"carleton" + 0.003*"brannon" + 0.003*"first" + 0.003*"two" + 0.003*"one"'), (9, '0.006*"one" + 0.005*"said" + 0.004*"mike" + 0.004*"small" + 0.003*"man" + 0.003*"would" + 0.003*"business" + 0.003*"phil" + 0.003*"time" + 0.003*"back"'), (10, '0.012*"af" + 0.007*"state" + 0.005*"one" + 0.005*"system" + 0.004*"point" + 0.004*"policy" + 0.004*"may" + 0.004*"must" + 0.003*"line" + 0.003*"school"'), (11, '0.006*"state" + 0.006*"may" + 0.005*"united" + 0.004*"form" + 0.004*"1" + 0.004*"per" + 0.004*"shall" + 0.004*"one" + 0.004*"government" + 0.004*"section"'), (12, '0.006*"one" + 0.006*"year" + 0.006*"would" + 0.005*"man" + 0.005*"state" + 0.003*"new" + 0.003*"said" + 0.003*"class" + 0.003*"first" + 0.003*"two"'), (13, '0.013*"af" + 0.005*"one" + 0.004*"would" + 0.003*"said" + 0.003*"two" + 0.003*"first" + 0.003*"time" + 0.002*"may" + 0.002*"mr" + 0.002*"new"'), (14, '0.006*"one" + 0.005*"would" + 0.004*"man" + 0.003*"life" + 0.003*"world" + 0.003*"new" + 0.003*"could" + 0.003*"said" + 0.002*"time" + 0.002*"two"'), (15, '0.006*"one" + 0.005*"u" + 0.004*"man" + 0.004*"would" + 0.003*"say" + 0.003*"could" + 0.003*"time" + 0.003*"first" + 0.003*"like" + 0.003*"god"'), (16, '0.007*"one" + 0.005*"may" + 0.004*"new" + 0.003*"many" + 0.003*"would" + 0.003*"time" + 0.003*"even" + 0.003*"church" + 0.003*"people" + 0.003*"must"'), (17, '0.006*"one" + 0.006*"would" + 0.005*"said" + 0.004*"could" + 0.003*"time" + 0.003*"new" + 0.003*"like" + 0.003*"day" + 0.003*"back" + 0.002*"year"'), (18, '0.005*"would" + 0.004*"af" + 0.004*"one" + 0.003*"anode" + 0.003*"new" + 0.003*"shelter" + 0.003*"time" + 0.003*"two" + 0.003*"surface" + 0.003*"pressure"'), (19, '0.006*"one" + 0.006*"stress" + 0.006*"wine" + 0.005*"would" + 0.005*"man" + 0.004*"time" + 0.003*"men" + 0.003*"could" + 0.003*"back" + 0.003*"like"')]

Please fill out this survey!

Works Cited

Bansal, Shivam. 2016. Beginners Guide to Topic Modeling in Python.
Bird, Steven, Ewan Klein, and Edward Loper. 2019. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit
Gensim: 3.8.3 documentation. 2020.
Han, Na-Rae. Python 3 tutorials.
Matthes, Eric. 2016. Python Crash Course: A Hands-on, Project-based introduction to programming.
Malik, Usman. 2021. Removing Stop Words from Strings in Python.
Munir, Samira. 2019. Basic Sentiment Analysis using NLTK.
NLTK 3.5 documentation.
pandas documentation. V. 1.2.2
Prabhakaran, Selva. Topic Modeling with Gensim (Python).
Saldaña, Zoë Wilkinson . 2018. Sentiment Analysis for Exploratory Data Analysis, The Programming Historian.
Shukla, et al. 2020. “Natural Language Processing (NLP) with Python — Tutorial”, Towards AI

In [ ]: