Use the cmd line to run a python program on every file in a directory
It took me several hours to get the command line right to run a python program on every file in directory and send the results to the same output file.
I am jotting down my notes here because it seems that much of the advice on the internet is plain wrong. This is what worked for me
- We are working on a regular computer using Windows
- Python is already installed
- We have a Python program called myprog.py
- We have data we want to analyse with that program
- We have many, many data files in a directory and we want to analyze them one-by-one and put the results in one file.
- The program file expects one or more data files as sys.argv. That is, the program expects us to tell it where to find the data files in the cmd line using this format.
- Myprog.py sys.argv1 sysargv2 sysargv3 >output.txt
If we only have a few data files, or we are only using a few files in a directory, then we don’t need this procedure. It is easier simply to type out the command as shown above.
If we want to run the program on a thousand files, say, it would be tedious to type out the filenames (in place of sys.argv).
Gist of procedure to use cmd line to run a python program on every file in a directory
To run a python program on every file in directory, we use the for command. Here are the steps in an orderly way.
#1 Locate the directories of the python program and the data
Python program physically sits in c:python27myprog. The name of the program is progname.py.
Data physically sits on a removable drive e:mydata
#2 Decide where we are going to send the results
I always send my results to the same directory as the program but that is not necessary. The reason I do that there is less chance of a type if I don’t have to type in the directory name and it is relatively easy to move a file to another directory manually.
In my example
My results will go to the same file as the directory c:python27myprog
#3 Open the cmd line in Windows
We are going to give commands directly to DOS, the operating system underneath Windows. You may dimly remember it from 1980’s computers. Yes – it still runs Windows machines.
- Go to Start (bottom left – where you normally switch off your computer)
- Select Run
- Type cmd<enter> into the box : a black square should pop up.
#4 Set the directory of our program as our working directory
In the “command line window”, you should see something like c:mydocuments . . Windows machines try to make us work in My Documents. I change this to my program directory using
That means change directory to c:python27myprog
#5 Use the cmd line to tell python to run myprog iteratively with every file in the data directory
for %f in (e:mydata*.txt) do myprog.py %f >output.txt
- Look at the syntax : for variable in (set) do myprog.py variable >outputfilename
- The variable name must be %f or %x or %t. A percentage followed by one letter. Everything else throws up an error. The variable name must be the same in both uses.
- The directory holding the data is put in brackets : (dirname)
- >outputfile.txt sends results to a textfile in the same directory as our program. It must be a .txt file.
Conclusion: Use cmd line to run a python program on all the files in directory
And remember that if the program takes 1 seconds to run for one file, then for 1800 files, as I am running now, then the program takes 30 minutes to run. If the original program takes 2 sec to run, the total batch will take an hour.