CSC 209 assignment three questions and answers

The general notes at the beginning of the assignment two Q&A page also apply to assignment three, and probably to programming in general.
I'd also like to repeat another item from the assignment two Q&A page:

In general, don't check whether operations will succeed; just try to do them and get an appropriate error if applicable. For example, if you're about to fopen() a file, don't do a stat() and try to determine whether the file exists and/or is readable. Just do the fopen() and check for error. This results in a simpler program, and also one which functions more correctly in the invariable case that you have omitted checking something so you think it's going to succeed but it doesn't. And there can always be unexpected i/o errors, etc.

Q: When I compile my program I get the following warning message: [...]
Is this ok?

A: Well, if the due time is approaching, you should just hand in what you've got. But more generally, no, your program should compile with gcc -Wall with no warning or error messages. Almost all of the warning or error messages which gcc -Wall can output represent potentially-serious problems, and you need to fix them. I am willing to decode error messages by e-mail (although not generally to fix your bugs, obviously).

Remember that you only submit the .c files. Your submitted .c file must compile and work with the original myscandir.h.

Q: Do we have to check for errors from fork(), wait(), etc?

A: Yes. You must check the return values of all system calls (except for close() and dup2() in doing i/o redirection), except for the extremely rare case (which does not occur in assignment three) that there is nothing which you could do about the error. In almost all cases you can at a minimum print an error message and exit, or stop doing something which no longer makes sense given the failure of the first part.

Unfortunately there is a difficulty in testing many such error checks (i.e. arranging a test of your code which is testing the error status). It's hard to arrange to make a fork() fail, for example, unless there is a per-user process limit you can arrange to bump into. And making malloc() run out of memory is quite difficult.

So in some cases you need to think of some error checks as theoretical exercises, just as if we were writing the program with pen and paper and never actually running it.

But, in real life, some day one of your programs will run into an obscure error condition, and it will make a difference whether or not it performs appropriately under the circumstances. So make it good even though you can't test it. Testing is not the ultimate check of computer program behaviour anyway; it catches some kinds of errors, but misses others.

(We don't put an 'if' around the execl() or execve() call, but this does not constitute failing to do error-handling — we still do the error-handling, we just don't have to handle the success case because if an exec-family call succeeds, this program is overwritten with the new program so we won't be here.)

Q: What's the best way to write "true" and "false" as constants in C?
(this may or may not come up in assignment three, or anywhere, but I often get asked this question at around this time in this course)

A: There are many silly ideas about this topic out there. You should avoid complex constructions for simple ideas.

I recommend using "0" for false and "1" for true, rather than #defines or anything weirder. People who know C know how booleans work in C, but they don't know whatever additional constructions you create.

C99 introduces keywords (well, semi-keywords) "true" and "false", but most people still don't use them because we already have 0 and 1 and they're perfectly fine.

Don't use casts (except for NULL in execl()).

In Java, casts are fairly safe because if the cast produces meaningless results you will generally get a runtime exception.

In C, casts are very unsafe. Basically, they turn off error messages. Error messages are good. Don't turn them off.

Only use a cast in C if you have a very good understanding of the situation. A cast is always needed for the last parameter to execl(), but I don't believe that anything in assignment three calls for a cast.

Q: Various segfault problems ("Segmentation exception").

A: Please see the Q&A entry about segfaults and debugging from assignment two.

Q: How do we compile testscandir.c together with myscandir.c?

A: gcc -Wall testscandir.c myscandir.c

Q: Do we really need to copy myscandir.h? Can't we tell gcc where to find it?

A: Indeed, you do not have to copy myscandir.h. But the following is only for people who are interested. At this point, I suspect that most students will simply copy myscandir.h to their own directories.

One thing you must NOT do is #include "/u/csc209h/summer/pub/a3/myscandir.h". This is messy, and requires editing the source code if myscandir.h is moved. On the test machine, myscandir.h will be in the current directory, but it won't be at this teach.cs-specific path name.

There is an option to gcc to tell it where to find #includes which aren't in the same directory as the program which includes them. That option is −I.

So, you could simply say #include "myscandir.h" without copying it, and then compile with (for example):

	gcc -Wall -I /u/csc209h/summer/pub/a3 /u/csc209h/summer/pub/a3/testscandir.c myscandir.c

In a Makefile, you'd probably set a variable to be /u/csc209h/summer/pub/a3, like so:

	SRC = /u/csc209h/summer/pub/a3

	testscandir: myscandir.c $(SRC)/myscandir.h $(SRC)/testscandir.c
		gcc -Wall -o testscandir -I $(SRC) $(SRC)/testscandir.c myscandir.c

Q: Since there's no "filter" argument to myscandir(), does this mean it also stores "." and ".." in the array?

A: Yes. (And so does the real scandir() if you pass NULL for the filter argument.)

myscandir.c does not include a main(); you are writing a library routine. For testing, you can compile with any test main() you write in a different .c file, or my supplied testscandir.c. You submit only myscandir.c, which must work with the original myscandir.h, and with any valid main() which calls myscandir() and myfreescandir() appropriately.

Library routines such as myscandir() don't call perror() upon error. myscandir() does not output to stderr at all (nor to stdout). Library routines like this return an error code to the caller, who can decide whether perror() is appropriate or not. See the sample call in testscandir.c.

Don't call memcpy or bcopy in myscandir(). Just use an assignment statement. Casts and memcpy and such bypass the type information, which is unnecessarily error-prone — the type system helps you as a programmer.

Similarly, don't cast the return value of malloc() or realloc().

Q: How do we know how many struct-dirent items to malloc in myscandir before we read the directory?

A: Obviously, you don't. This is why you want to use realloc. Please see the beginning of the lecture of July 19.

Q: May I modify the supplied myscandir.h to declare helper functions?

A: No; you only submit the .c files. Just declare the helper functions in the .c file. You only need a .h file to share declarations between .c files.

Q: Does "tree" take a '−v' option? Is this required? What does it do?

A: This is not required, but there's no reason for you to remove it from the starter code either. Without −v, the key will be in argv[1] and the value in argv[2] as the assignment handout specifies. This is the normal operation and is all that will be used by the automated test programs.

But you might find '−v' useful in debugging. If you put "if (verbose)" around debugging printf statements, they will only be executed when −v is specified on the command-line, thus going away when you don't need them any more; and easily coming back if you find you want that debugging information after all.

But do be sure to test your program without −v too!

Q: Is there a distinction between a key's having a value which is the empty string, and a key not existing at all?

A: No. So it's ok if a request for a non-existent key actually creates a new node for that key. (It's also ok if it doesn't.)

More notes about tree.c:

In tree.c, be careful about which end of the pipe is for reading and which for writing!

malloc() is not involved in tree.c, since each process has its own variables.

Q: Does tree.c use dup2()? What should be the standard input?

A: No, there is no need for dup2() in tree.c. You can just do your i/o to the appropriate file descriptors directly. The time you need dup2() is when you're setting up i/o redirection for some other program, which is going to read from stdin and write to stdout. You are writing all of tree.c, so you just read and write from the appropriate files.

However, you will still have to close "the wrong side of the pipes" appropriately to get end-of-file on the pipes properly.

Q: How do I use read()?

A: Somewhat similar to fgets() in that you pass in a pointer to char which is like the zeroth element of an array of char, and the maximum number of bytes to read. Except instead of a FILE*, you pass in an integer file descriptor, which you may have received from open() or pipe(), or you can use 0 for stdin.

The argument order is: file descriptor; pointer to char; maximum byte count.

read() returns the number of bytes read, or zero for end of file, or negative for error. In the case of error, you can call perror().

Note that read() operates on raw bytes, not strings. It does not put a terminating \0 in the string — it just gives you the bytes which were read from that file descriptor, unmodified. This is fine for you for tree.c, since you will be reading and writing "struct pair"s instead of strings.

Q: Um, the pointer to char points to something "like" the zeroth element of an array??

A: It could point to the zeroth element of an array. It could also point into the middle of an array to start filling up the array at that point. Or, if the size is 1, it could be a pointer to a simple char which might not be part of an array at all, but only if the size is 1. (Some of these cases might come up in assignment four, but they won't in assignment three.)

A common problem when programming with pipes is that you may see a process block on a read from a pipe even when you are quite sure that nobody will be writing to it. This happens when all of the write file descriptors on the pipe are not closed. I demonstrated this in class by removing one of the 'close' calls from pipe-example.c (see comments in that file at the end about this).

Another common problem when programming with pipes is to see a "Broken pipe" message. This happens when a process writes to a pipe whose other side has already been closed. In the case of tree.c, that would always be a programming error, because the reading side of the pipe (in the parent) should not be closed until all of the data has been read from the child.

Q: Why does benode() return int when its value isn't used in the sample main()?

A: Sorry, this was inconsistent in my starter code. It was meant to be the process exit status, so the last two lines of main() should instead be

	return(benode(-1, -1, atoi(argv[optind]), argv[optind+1]));

Or you could just change benode to return type void instead of int, which is probably simpler.

Assignment three questions and answers