Saturday, March 09, 2013

Just use grep! Don't use find + grep (+ xargs + whatever)...

I've been looking for an alternate to find + grep under Linux so that I can do similar queries to 'findstr' under Windows such as:

findstr /sic:"my_function" *.php

Recursive search for PHP files containing the string "my_function". The typical response to "how to search for some type of file containing some text" under Linux is usually along the lines of:

find "*.php" -exec grep -H "my_function" {} \;

Not only is that cryptic and longer and more difficult to type, it fires off a new, separate process (grep) for every PHP file it finds. I've seen a zillion incarnations of the above. Every time I run that, I end up waiting ten times longer than I would have waited for 'findstr' for the same operation. find + grep performance is terrible. So I said to myself today, "Hmmm...maybe grep has the solution already?" Lo-and-behold, it does:

grep -nir --include="*.php" "my_function" .

-nir is a mashup of the -n (line number), -i (case insensitive pattern search), and -r (recursive) flags. Depending on the system, the -H flag might be necessary as well to get filenames to display. The '--include' thing works around shell limitations that typically cause people to use 'find'. Also, the performance is comparable to 'findstr'. If you don't care about the file type and don't mind scanning images and other binary data:

grep -nir "my_function" .

But the above isn't probably what you stopped by for. I'm mostly posting it as a reminder to myself in the future. The FUTUUUUUUURRRRRE!

4 comments:

  1. Well Thomas, you have officially changed my life. thank you. `find / | grep ^find$ | xargs rm -f`

    oh the irony...

    ReplyDelete
    Replies
    1. LOL. I seriously hope that no one runs that...rm -f on every file containing the string 'find' in its filename - starting at the root of the system. Deleting core system binaries usually doesn't result in happy times. :)

      Ben Friedberg...still awesome. Oh, and you're welcome.

      Delete
    2. I was hoping the -d option was harmless, because then I could have used 'grep -nird "my_function" .' The -o option is kind of harmless, so maybe 'grep -iron "my_function" .' or 'grep -roni "my_function" .' After all, flag-based acronyms that spell real-ish words are easier to remember.

      Delete
    3. For reference, I only deleted the binaries find as the ONLY part of the file name (hence the ^...$). such a small difference between >:-) and *<:o)

      Maybe I should have just done
      echo 'grep -nirH --include="$1" "$2"' >> $HOME/bin/find
      chmod +x $HOME/bin/find
      ...

      also, since my ubuntu grep requires -H, I can also include -G (basic regexp) to be -GrinsH :-D


      Delete