Homework 3
Caution
  • You are expected to work individually.
  • Due: Friday, September 26th at 11pm EST (Baltimore time).
  • This assignment is worth 60 points.

Learning Objectives

Objectives

To practice with:

  • 2D arrays
  • strings
  • file I/O
  • command-line arguments
  • Makefiles
  • program organization
  • testing with assert
Individual Assignment

This is an individual assignment. This means you must not show your working code to another student, but may discuss with each other the assignment requirements and expectations. We strongly recommend utilizing office hours for help! You may collaborate with others and/or AI systems on *small* snippets of code, not your entire assignment. Remember to document these mini-collaborations with comments in your code as specified in the syllabus.

Project Scaffolding (“Starter Code”)

In the class public git repository, the folder cs220-f25-public/homework/hw3/ contains several files you must use as a starting point for this assignment. From within the public repo, type git status to confirm your local copy is in good shape, then type git pull to copy this folder. Then copy these files to your personal cs220 repository to begin your work. If you compile the files as posted, you’ll see a number of unused parameter errors reported, because the files contain incomplete function definitions. As you work on filling in the function definitions, you’ll eventually eliminate these warnings. By the time you submit your code for grading, none of these warnings (or any others) should be reported.

This program will find the valid word (based on a provided word list) in a 2D grid of characters that has the highest total point value, based on Scrabble letter points (https://en.wikipedia.org/wiki/Scrabble_letter_distributions). In order to do this, your program will accept two command-line arguments: the name of a text file containing the character grid and the name of a text file containing words that will be considered valid.

The program only needs to search for words that are contained in one row, in the forward only direction. In this respect the problem is much simpler than a traditional word search in which words may appear in multiple directions and dimensions. However, you have the additional twist of needing to find the highest point word in the grid, without knowing in advance what words the grid contains. Instead your program will need to efficiently search for each possible substring of each row to see if it appears in the word list. If it does, compute the total point value and save this as the best so far if it is higher than any previous words. Ties: in the case of multiple words with the same highest value, the first one found should be the result, where the searching begins from the top row, left-most character.

We have specific requirements detailed below regarding the structure of your solution code (see Implementation Requirements), test code that must be submitted (see Testing) and use of a makefile (see Makefile). Please read through the entire homework description (including Hints) before you begin.

Required Program Functionality

Unless otherwise noted, your program should exit with return value 0.

The program expects two filenames as command-line arguments. If either or both are missing, the program should print “Error: program usage requires two files names - grid and word list” to stdout, advance to the next line, and exit with return value 1.

If either or both cannot be opened for reading, the program should print “Error: could not open input file” to stdout, advance to the next line, and exit with return value 2.

A proper grid file contains an initial line with two integers indicating the number of rows (N) and columns (M) in the grid. This is followed by N lines of M lower case alphabet letters each. Note that N and M can be different values but are always less than or equal to 20 (MAX_GRID). You can assume there are no spaces between the letters in a row. Anything below the character grid in the input file can be ignored.

There are only 2 types of invalid grids that your program must detect and report. [Note that many other error conditions may exist, but you are not required to handle them.] In either of the following cases, your program should print “Error: malformed grid file” to stdout, advance to the next line, and exit with return value 3:

Valid words should be read only once from the second text file and stored in your program. Words must be all lower case with at most 15 letters and no spaces. Each line of the file contains a unique word in alphabetical order. You can assume there are at most 500000 words in the file. The program should continue reading words until it reaches the end of the file. Although many different types of errors may occur when collecting this input, your program only needs to detect whether any word is longer than 15 characters. In this case, your program should print “Error: malformed words file” to stdout, advance to the next line, and exit with return value 4. [Hint: read into a very large string and then copy to your list if valid.]

Note
You may assume the contents of the input grid file and the input word list are in lower case!

For each appearance of a valid word found in the grid, your program should print to stdout a line of text containing the matched word, the row number of the end of the word, the column number of the end of the word, and the total point value of the word using the format of this example:

--- found word: par with 5 pts ending at 3,2

Number the rows and columns shown in the output starting with 0.

If multiple copies of a valid word are present in the grid, the program must report each of them. Also note that some words found may be substrings of longer words on that line, for example:

--- found word: par with 5 pts ending at 0,2
--- found word: parse with 7 pts ending at 0,4
--- found word: par with 5 pts ending at 2,3

Found words should be reported in order of occurrence of the first letter in the search word, when the grid is read row-by-row from left to right, starting with row 0.

After displaying each new word found, one per line, the program should print an empty line, followed by the overall best word found and its total points using this format:

Best word is parse with 7 points

The program must also create an output file that reports on each word found that is better than the previous “best”. The name of the output file must be the same as that of the input file but with “best_” prepended; e.g. if the first command-line argument supplied to the program is “input_grid.txt”, then the output file should be named “best_input_grid.txt”. After listing each new “best” word found, one per line, the program should also display the overall best word and its point value, exactly as it appeared in the standard output (see above). Lastly you must write the character grid to the file, but with the best word CAPITALIZED to make it easy to visually locate.

Note that when reporting a new “best” word, it must only print the best word possible from a particular starting cell in the grid, and only if its point total is higher than the previous best total. You will need to strategically initialize variables in order to find this maximum. For the example above, it wouldn’t report “par” as a best word because “parse” starts in the same location but is better.

We are providing several sample input files, program runs (which include stdout output), and corresponding output files to help you understand these program requirements. Make sure to study them carefully so that you understand the nuances involved. Ask questions in class or on Piazza!

Recursion Requirement

You are required to use recursion when searching the grid for a best word. We have provided a method header to help you structure this search, based on a current position. Here is the method header:

/* Find best word in grid based on current position [r,c],
   updating the searched grid to capitalize and the word itself.
   Parameter list holds the valid words and size is how many are in it.
   Return the total point value of the word, or 0 if none found.
*/
int find_best(char crnt_grid[][MAX_GRID], int cols, char word[], int r, int c,
              char list[][MAX_LENGTH], int size);

The crnt_grid parameter can be updated with capitalized letters for the best word found at this position. The cols parameter indicates how many actual columns are used in this character grid. The word parameter should be used to (recursively) build the longest found word. Note that you will need to undo updates to crnt_grid and word when you hit some dead ends during the recursive search. The indices of the current position are indicated by r and c (row and column, respectively). The list contains the list of valid words to use and size is how many words it holds.

The function should return the total point value of the word that ends at position [r,c]. This is somewhat counter-intuitive because when it is called from main, the current position [r,c] is the start of the search. In that case, the word should be the empty string. However, as the recursion proceeds, the word is the longest word found to the left of the current position. In each iteration, the goal is to add the current character to the word if that is still a valid word. Then try to recursively build further by searching one position to the right in the array if you are not at the end of a row.

Here is an algorithm that you can follow.

  // try adding current letter to word                                                 

  // if the resulting word is valid, calculate points & print                          

  // if c is not the last column (implicit base case), try to extend:
     // recurse using cell to the right                                                
     // if result is better                                                            
        // capitalize the current letter                                               
        // return these better points
     // otherwise (extend fails) reset extend attempt:
       // remove next character from word                                              
       // uncapitalize next character from grid                                        

  // if valid new word ends at r, c:
    // capitalize current character in grid                                            
    // return points                                                                   
  // otherwise undo word inclusion of current character & return 0   

Examples

Here is a list of provided input files, resulting sample runs (terminal output) and created output files. We will also test your solution with other files to cover situations not included in these examples. It is really important that you study these to understand program operation and also that you follow the output formats exactly as they appear.

Grid File Word File Sample Run Output File
small_grid.txt small_words.txt run_small_small.txt best_small_grid.txt
tiny_grid.txt small_words.txt run_tiny_small.txt best_tiny_grid.txt
large_grid.txt clean_large_words.txt run_large_clean.txt best_large_grid.txt

We have also included sample runs for two error conditions. In these cases an output file may or may not be created. Make sure you understand which conditions they test, since you also must create some distinct error condition checks yourself (see Testing section).

Grid File Word File Sample Run
small_words.txt small_grid.txt run_words_grid.txt
small_grid.txt large_words.txt run_small_large.txt

Implementation Requirements

Testing

You are expected to fully test your solution, not only with our provided examples, but also with other examples you create to check its operation in specific cases. For this assignment you must include a README file (plain text, but without any file extension) that describes two error conditions not included in our examples and how you would test to see if your solution handles them correctly.

In addition, you must write and submit a C file named test_word_funcs.c that will contain a tester main function and #include word_funcs.h. This file must have a helper test function to test the declared points function and another helper test function for the find_word function you must add. The job of each helper function is to call the corresponding function in word_funcs.c multiple times to test at least three different situations that each of them will need to handle. Use assert statements in each tester function to make sure the functions return expected results. Include comments in your code to describe the situations being tested. The main function in test_word_funcs.c should call each helper test function, and if all the tests pass, output “Passed word function tests!!” on stdout, advancing to the next line.

Makefile

You must submit a working Makefile that has a target called hw3 which will be the executable program that is run to test your main program solution. You must also include a target called test that can be used to execute your test code in test_word_funcs.c. Create intermediary targets for necessary object files *.o. Also include a clean rule to remove *.o files and the executables hw3 and test.

To be considered fully functional, your Makefile should correctly build the executables with the appropriate flags when (and only when) one of the relevant source files has changed. The Makefile must be named Makefile (spelling and case are important).

Git Log

You are expected to use git to backup your progress on this project. Include in your submission a copy of the output of git log showing at least five commits to the repository of code from this assignment, with meaningful commit messages. (You should commit early and often, likely every time you have a version that compiles, so you should probably have many more than five of these). Just before submitting your work, save the output into a file called gitlog.txt by executing:

git log > gitlog.txt

Hints and Tips

Here are some further hints and tips:

  • Review the in-class exercise from day 7 for hints on writing a recursive binary search when you create and implement a `find_word` function.
  • Make use of gdb to help debug your program.
  • Write your function tester functions early, rather than waiting until the last minute. Their purpose is to help you catch bugs in the functions early on, so the bugs don't cause problems when used by main. Using this approach as intended will help you track down bugs more easily.
  • If in testing your program's output, you want to determine whether two files are identical (e.g. when one file contains your program's actual output and the other contains the expected output) you can use the unix `diff` function: `diff file1 file2` will display symbols, line and character numbers along with the differing characters. Remember you can use the manual pages to get more information on unix commands (`man diff`).
  • In addition to writing tests for individual helper functions that are included in your submission, you should always create and utilize "end-to-end" tests for your complete word search as well. By "end-to-end" testing, we simply mean running the entire program and checking if it behaves as desired on a number of different inputs. Your end-to-end tests do not need to be handed in.

Grading

The 60 points for this assignment will be applied as follows (tbd):

Submission

Your Gradescope submission should contain at least the following files:

* main.c
* word_funcs.h
* word_funcs.c
* test_word_funcs.c
* README
* Makefile
* gitlog.txt
* any data files created by you that your tester functions require

Create a .zip file named hw3.zip which contains all the requested files mentioned above. (Do not zip your entire hw3 folder - only the files.) Copy the hw3.zip file to your local machine and submit it as Homework 3 on Gradescope.

Caution

When you submit, Gradescope conducts a series of automatic tests. These tests do basic checks like making sure that you submitted the right files and that your `.c` files compile properly. If you see error messages here (look for red), address them and resubmit.

Two notes regarding automatic checks for programming assignments:

Tips and Hints
  • You may re-submit any number of times prior to the deadline; only your latest submission will be graded.
  • Review the course syllabus and Piazza for late submission policies.
No-compile policy

Remember that if your final submitted code does not compile, you will earn a zero score for the assignment.
Code Styling - Style Matters!

You should always make sure that your code has good style. You can look at the [coding style guidelines here](https://jhucsf.github.io/fall2025/resources/style.html) from a course you will take later that also applies to this course. In brief, you should make sure that your submission is well formed:
  • it is not overcommented or undercommented
  • there are no ambiguous variable names
  • there is proper/consistent bracket placements and indentation
  • there are no global variables
  • lines and functions are a reasonable length