Exercise 24
Info

This is an in-class exercise. An exercise page like this one will contain a brief description but is intended to be supplemented by discussion during our meeting time. Complete the exercise to the best of your ability in the time given. Feel free to talk with other students as you work, and do not be afraid to ask questions if you get stuck. Aim to complete as much as possible during our meeting, and submit on Gradescope to check your solution. You are encouraged to work at home to complete what you do not get through today, and ask questions over Piazza or in office hours.

Learning Objectives

Reinforces concepts learned in today meeting:

  • C++ basics: I/O using iostream library, how to compile/link/execute
  • Using the C++ Standard Template Library (std::vector, std::map, std::sort) and iterators
Tasks
  • Complete the implementation of code which counts the number of occurrences of particular words in a file, and then reports the results in different orderings by utilizing std::map.
  • Compare your implementation number sorting from ex23 to the implementation provided by the Standard Template Library.

Part 1

Get started by running git pull to update your clone of the public repository, and then copying the exercises/ex24 directory into your personal git repository. You’ll see that we have provided map-simple.cpp, input.txt, main.cpp, sort.cpp, and monkeys.txt.

Part 2

The supplied file map-simple.cpp is a completed program that takes in input pairs from standard input representing phone contacts made of a name string and a phone number string, and inserts each into a std::map structure. The name is the key in the map, and the phone number for that person is the value associated with that key in the map. This program will serve as a reference for you to follow as you work with std::map to complete the remainder of the exercise.

First, compile and link map-simple.cpp by typing:

g++ -std=c++11 -Wall -Wextra -pedantic map-simple.cpp -o map-simple

Next, run the executable map-simple using the file input.txt in place of interactive input using the pipe command-line operation by typing:

cat input.txt | ./map-simple

Now, look inside input.txt and then the code in map-simple.cpp and try to to understand how it works. Here are a few items we want to highlight:

Tip

A map will always organize its entries in ascending order by key.

Before moving on to the next portion, make sure you follow what is happening in this code. Rushing to the next part without understanding this code will not help you finish this exercise any more quickly! Please ask questions if something does not make sense.

Part 3

Now, you will work to edit main.cpp to work with maps in a more sophisticated way.

In this part, you’ll add code to read in words from standard input until end of file occurs, and populate the map<string, int> named counters so that each entry has a key which is a word collected and the corresponding value is the number of times that word appears in the file. The code to output the counters map contents is already present in this file.

Once you’ve modified it, you’ll compile main.cpp using:

g++ -std=c++11 -Wall -Wextra -pedantic main.cpp -o main

Next, run the executable main using the file monkeys.txt in place of interactive input by typing:

cat monkeys.txt | ./main

Note that the contents of monkeys.txt are as shown below:

and Mama called the doctor and the doctor said no more monkeys

So, if your code works correctly, you should see the following output:

word Mama has 1 occurrences 
word and has 2 occurrences 
word called has 1 occurrences 
word doctor has 2 occurrences 
word monkeys has 1 occurrences 
word more has 1 occurrences 
word no has 1 occurrences 
word said has 1 occurrences 
word the has 2 occurrences

Part 4

Once you have the counters map properly populated, and you confirmed that your output is correct, you can now work on rearranging that data you collected during Part 3.

In this part, you need to write code that will go through the simple counters map you populated and copy over the data into a new map called words_by_freq, but rearrange your data so that each entry in the new map has an integer key, and has an entire vector of strings as its value. The vector will hold all strings with the frequency indicated by the key integer. Recall that the push_back method in std::vector allows you to add an item at the end of a vector.

Once you have done this properly, if run with monkeys.txt as input, your words_by_freq map will contain entries like this:

key (an int) value (a vector of strings)
1 Mama, called, monkeys, more, no, said
2 and, doctor, the

You can think of the words_by_freq map as a notebook of information, where each entry is a page, and at the top of the page, you wrote an integer frequency, and on the rest of the page, you’ve written a list of all the words you encountered with that frequency.

To confirm that your code is correct, move onto Part 5 where you will write the code to output its contents.

Part 5

Using const_iterators (not just one but two!), write code to output the contents of the words_by_freq map you populated. This means that for each entry in the map, you will need to output the frequency key, and then iterate separately over the strings in the vector. (Happily, iterators can be used over vectors as well as maps!)

So, if the input words are the contents of monkeys.txt, then the output would from this part should be:

Frequency: 1
Mama
called
monkeys 
more 
no 
said                                                                    
Frequency: 2 
and 
doctor 
the  

If some frequency does not exist in the table, then your code should not output it. For example, if your input was the following words:

the tall tall tall giraffe

Then the output from this part would be:

Frequency: 1
giraffe
the
Frequency: 3
tall
Tip

The words for each frequency shown in the sample output above are shown in alphabetical (dictionary) ordering. Recall that unlike std::map a std::vector is not automatically stored in order based on <, so think about why this is.
Hint: we did not use std:sort or even go out of our way at all to make this happen!

Part 6

Copy your implementation of sort from ex23 into sort.cpp.

Part 7

Invoke STL’s sort implementation to sort the contents in the vector vec2.

Tip

Be sure to include any necessary headers.

Part 8

Compare the performance of your sort implementation to the one provided by STL.

To do this you will need to compile

g++ -std=c++11 -Wall -Wextra -pedantic sort.cpp -o sort

and then run

./sort

your code.

Tip

To get a reasonable sense of the relative performance, it makes sense to use a large random vector (e.g. with millions of entries).

Reminder

Remember to add and commit to your local repo copy as your work. Push to your remote repo when finished. Also scp and submit to Gradescope to check your solution. Use exit to logout from your ugrad account when finished. If you continue to work on the program after class, make sure to keep your repo updated as well!