Homework 5
Caution
  • You are expected to work individually.
  • Due: Friday March 29 at 11pm (Baltimore time).
  • This assignment is worth 70 points.
Late-day Deadline
It looks like a "ugrad" outage is planned on Sat 3/30. For this reason, if you are planning to use late days on this homework, Saturday will not be counted as a late day. In other words, you get an extra late day if you are using late days on this homework. The original deadline remains the same Friday 3/29, 11pm.

Learning Objectives

Objectives

To practice with:

  • C++ STL containers
  • the `string` class
  • file I/O
  • command-line arguments
  • input validation
Individual Assignment

This is an individual assignment. This means you must NOT show your working code to another student, and should discuss with each other only the assignment requirements and expectations. See course staff for coding help.

Overview

Caution

Before you start working on this homework, make sure you do a `git pull` on the `public` repo to get a copy of the starter code that comes with this assignment. The starter code is very minimal that includes one file only named `digraph_analyzer.cpp`. For this homework, you do not need to do any extra error checking other than what is already included in the instructions and starter code!

In this assignment, you will write a program to analyze digraphs and trigraphs in an input text file. A digraph/trigraph is a combination of letters that form one sound in a word (e.g., ch in character or sch in schindler). The format of an input text file is as follows:

There is a positive integer number n at the beginning of the file indicating how many digraphs/trigraphs will be processed; a list of n digraphs/trigraphs will then follow. After this list, there is a text. A text is a sequence of words with only a limited set of punctuations. The possible punctuations are period, comma, exclamation point and question mark. You can assume there will be no other punctuations in the text.

Example of a valid input file (input.txt):

5 ch ou ee sch wh

I was lucky. All of a sudden I thought of something that helped make me know I was
getting the hell out. I suddenly remembered this time in around October that I and
Robert Tichener and Paul Campbell were chucking a football around in front of the
academic building. They were nice guys, especially Tichener. It was just before
dinner and it was getting pretty dark out but we kept chucking the ball around
anyway. It kept getting darker and darker and we could hardly see the ball any more
but we did not want to stop doing what we were doing. Finally we had to. This
teacher that taught biology Mr. Zambesis stuck his head out of this window in the
academic building and told us to go back to the dorm and get ready for dinner. If
get a chance to remember that kind of stuff I can get a goodby when I need one, 
at least most of the time I can. As soon as I got it, I turned around and started
running down the other side of the hill toward old Spencer house. He did not live on
the campus. He lived on Anthony Wayne Avenue.

The program should count how many times each of the expected digraphs/trigraphs occur in the text, case-insensitively. Then, it should print to standard output the list of all the digraphs/trigraphs and their containing words (in order of their appearance in the text) in lower case, with the digraphs/trigraphs sorted in one of the three ways specified as a command-line argument. The possible arguments are: a (ASCII order), r (reverse ASCII order), and c (count, ordered from largest to smallest, with ties broken by ASCII order).

Example:

./digraph_analyzer input.txt r

will run your executable named digraph_analyzer on the input file named input.txt (given above) located in the current folder and outputs the digraphs/trigraphs in reverse-ASCII order (e.g., digraph ou would be printed before digraph ee). Note the punctuations are removed from the words and all output is lower-case only. [Recall: possible punctuations are limited to comma, exclamation point, question mark and period.] This will be the output of the above command:

wh: [what, when]
sch: []
ou: [thought, out, around, around, out, around, could, out, around, house]
ee: [see, need]
ch: [tichener, chucking, tichener, chucking, teacher, chance]
q?>

As can be seen, the program then awaits the user to enter queries by prompting q?>. The user can input 1) a number, 2) a digraph, or ) the word exit. If a number is entered, it should list all the digraphs/trigraphs (in ASCII order) that occur that many times and their corresponding containing words (in order of their appearance in the text), or print None if none exists. If a digraph/trigraph is entered, it should list how many times the digraph/trigraph occurs and in which words (in order of their appearance in the text) or No such digraph if it is not in the list of input digraphs/trigraphs; 0 would be printed for digraphs/trigraphs that are among the input list but not found in any word of the text. The program terminates when the word exit is typed in. All input queries should be accepted as either upper or lower case, and handled as if lower case.

Sample query runs on the input.txt example:

./digraph_analyzer input.txt r
wh: [what, when]
sch: []
ou: [thought, out, around, around, out, around, could, out, around, house]
ee: [see, need]
ch: [tichener, chucking, tichener, chucking, teacher, chance]
q?>6
ch: [tichener, chucking, tichener, chucking, teacher, chance]
q?>0
sch: []
q?>ch
6: [tichener, chucking, tichener, chucking, teacher, chance]
q?>sch
0: []
q?>CH
6: [tichener, chucking, tichener, chucking, teacher, chance]
q?>ck
No such digraph
q?>exit
Important Note

The program must use container classes from the C++ Standard Template Library (STL) to keep track of digraphs, words, and counts. You must at least use `std::string`, `std::vector` and `std::map`, but you are free to use others as well. Take the time to understand the STL containers; selecting the right ones will make your code cleaner and easier to write and debug.

Special Cases
  • If a certain digraph/trigraph is contained more than once in a word, count that appearance of the word only once. For example, the digraph `ch` is in the word `chacha` twice, but should only be counted once.
  • If a digraph is found in a word that occurs more than once in the text, that word should be counted as many times as it occurs in the text.

Git log

In the assignments folder of your private repository, create a new subfolder named `hw5`. Do your work in that subfolder and use `git add`, `git commit` and `git push` regularly to backup your work as you make progress!

README

You need to submit a file called `README` (not `README.txt` or `README.md`, etc -- just `README`), including information about additional changes you made (besides the program specification) and anything the graders should know about your submission. In your `README` you should write your Hopkins ID (random 6 character code) at the top of the file, briefly justify the structure of your program; why you defined the functions you did, etc., and if applicable tell the graders if you couldn't do everything. Where did you stop? What did you get stuck on? What are the parts you already know do not work according to the requirements?

Specific Requirements

Hints and Suggestions

Makefile

Important Note

You need to write your own Makefile. Make sure you have defined the target `digraph_analyzer` properly to compile your program. We will run `make digraph_analyzer` to compile your program and produce an executable named `digraph_analyzer`. Failure to comply with this requirement will result in a zero score.

Your submission to Gradescope

Create a .zip file named hw5.zip containing your source/header files, Makefile, gitlog.txt, and README. Do not include any .txt files and never submit any executable or object files!

Copy the hw5.zip file to your local machine (using scp or pscp), and submit it to Gradescope. When you submit, Gradescope conducts a series of automatic tests. These do basic checks, e.g. to check that you submitted the right files. If you see error messages (in red), address them and resubmit. You may resubmit any number of times prior to the deadline; only your latest submission will be graded. Review the course syllabus for late submission policies (grace period and late days).

Danger

Remember that if your final submitted code does not compile, you will earn a zero score for the assignment.

Info

Two notes regarding automatic checks for programming assignments:

  • Passing an automatic check is not itself worth points. (There might be a nominal, low point value like 0.01 associated with a check, but that won’t count in the end.) The checks exist to help you and the graders find obvious errors.
  • The automatic checks cover some of the requirements set out in the assignment, but not all. It is up to you to test your own work and ensure your programs satisfy all stated requirements. Passing all the automatic checks does not mean you have earned all the points.