The first question that might come to mind is, "How do we know which commands are needed?" It is possible to just start with cat and ls then install other commands as we discover a need for them. But this is terribly inefficient. We need a plan or a blueprint to work from. For this we can turn to the Filesystem Hierarchy Standard (FHS) available from http://www.pathname.com/fhs/. The FHS dictates which commands should be present on a Linux system and where they should be placed in the directory structure.
The next logical question is, "Now that we know what we need, where do we get the source code?" One way to find the answer to this question is to check the manpages. We can either search the manpages included with one of the popular GNU/Linux distributions or use one of the manpage search engines listed at http://www.tldp.org/docs.html#man. One thing that should tip us off as to where to find the source code for a particular command is the email address listed for reporting bugs. For example the cat manpage lists bug-textutils@gnu.org. From this email address we can deduce that cat is part of the textutils package from GNU.
So let's look at the FHS requirements for the /bin directory. The first few commands in the list are cat, chgrp, chmod, chown and cp. We already know that cat is part of GNU's textutils. Using the next few commands as keywords in a manpage search we discover that we need GNU's fileutils package for chmod, chgrp, chown and cp. In fact quite a few of the commands in /bin come from GNU's fileutils. The date command also comes from a GNU package called sh-utils. So a good way to tackle the problem of finding source code might be to group the commands together by package as shown below.
The BASH shell -- echo, false, pwd, sh, true
GNU textutils -- cat
GNU fileutils -- chgrp, chmod, chown, cp, dd, df, ln, ls, mkdir, mknod, mv, rm, rmdir, sync
GNU sh-utils -- date, hostname, stty, su, uname
These four packages do not contain all of the commands in the /bin directory, but they do represent of over 70% of them. That should be enough to accomplish our goal of adding some of the commonly used external commands. We can worry about the other commands in later phases of the project.
To fetch the source code we simply need to connect to GNU's FTP site and navigate to the appropriate package directory.
When we get to the directory for textutils there are several versions available. There is also a note informing us that the package has been renamed to coreutils. The same message about coreutils appears in the fileutils and sh-utils directories as well. So instead of downloading three separate packages we can get everything in one convenient bundle in the coreutils directory.