CSCE 2014 - Homework 1
Due Date - 09/15/2009 at 11:59 PM

1. Problem Statement:

The goal of this assignment is to design and implement a program that can store and look up the top 500 most commonly used English words, and use this information to process sample documents to evaluate their content.

You will be given an ascii file words.txt that contains the top 500 commonly used English words. You must create a C++ Dictionary class that has methods to read this file into memory, and search this data structure to determine if the query word is in the list or not.

You must then create a main program that reads in a sample document in raw ascii format (not a Microsoft Word document). Your program must create two output files. One file will contain the text of the document with all of the top 500 words removed. The other file will contain the text of the document with only the top 500 words. All words that are not in the top 500 should be removed.

For example, consider the first sentence of Charles Dicken's famous novel "Tale of Two Cities". If we manually remove punctuation and upper case letters, we get the following input file:

   it was the best of times it was the worst of times it was the 
   age of wisdom it was the age of foolishness

When we look up each of these words, some will be in the list of the top 500 English words, and some will not. Your program should output these in two separate text files as follows:

   times worst times wisdom foolishness

   it was the best of it was the of it was the age of it was the age of

2. Design:

You will need to design the interface of the Dictionary class to meet the requirements listed above. You will also need to select a data structure that can store the word information, and support the searching operations. Hint: Keep your data structure as simple as possible. Once you have worked this out, you are ready to start implementing your program.

3. Implementation:

You can implement this program from the bottom-up or from the top-down. If you go for a bottom-up approach, start by creating the Dictionary class, and test the methods using a simple main program that calls each method. When this is working, you can create the main program that uses the Dictionary class to process input files as described above.

If you go for a top-down approach, start by creating your main program that reads an input document, and calls empty functions to look up each word to see if it is in the top 500 or not. This way, you will get an idea of how the whole program will work before you dive into the details of implementing the Dictionary class.

Regardless of which technique you choose to use, you should develop your code incrementally adding code, compiling, debugging, a little bit at a time. This way, you always have a program that "does something" even if it is not complete.

When you think you are about 1/2 way through the program, upload a copy of your source code and your program output at that point. Be sure to hand in something that compiles even if it does not do much when it runs.

4. Testing:

Test your program to check that it operates correctly for all of the requirements listed above. Also check for the error handling capabilities of the code. Try your program on 2-3 input documents, and save your testing output in text files for submission on the program due date.

5. Documentation:

When you have completed your C++ program, write a short report (less than one page long) describing what the objectives were, what you did, and the status of the program. Does it work properly for all test cases? Are there any known problems? Save this report in a separate text file to be submitted electronically.

6. Project Submission:

In this class, we will be using electronic project submission to make sure that all students hand their programming projects and labs on time, and to perform automatic analysis of all programs that are submitted. When you have completed the tasks above go to the class web site to "submit" your documentation, C++ program, and testing files.

The dates on your electronic submission will be used to verify that you met the due date above. All late projects will receive reduced credit (50% off if less than 24 hours late, no credit if more than 24 hours late), so hand in your best effort on the due date.

You should also PRINT a copy of these files and hand them into your teaching assistant in your next lab. Include a title page which has your name and uaid, and attach your hand written design notes from above.