/tests/sympolymathesy/content/journal/2020/Follow-Up on Command-Line Finding and Filtering.md

https://github.com/chriskrycho/lightning-rs · Markdown · 46 lines · 36 code · 10 blank · 0 comment · 0 complexity · e34f2d0440e49f74e42e8856eac83ddd MD5 · raw file

  1. ---
  2. title: Follow-Up on Command-Line Finding and Filtering
  3. subtitle: >
  4. A simpler solution that doesnt require <code>tr</code> if you have GNU utils or other alternatives.
  5. date: 2020-05-04T09:15:00-0600
  6. updated: 2020-07-26T12:05:00-0600
  7. summary: >
  8. You can use a variant flag with GNU grep and ripgrep to filter with null characters.
  9. tags:
  10. - things I learned
  11. - command line
  12. - software development
  13. qualifiers:
  14. audience: >
  15. 90% myself in the future, when I (inevitably) ask this question againbut also anyone else who hits this particular question about command-line invocations.
  16. epistemic: >
  17. Slightly higher than [the *previous* post on the subject](https://v5.chriskrycho.com/journal/find-grep-xargs-newlines-null/), courtesy of the requested reader feedback!
  18. ---
  19. In my [previous post](https://v5.chriskrycho.com/journal/find-grep-xargs-newlines-null/), I used the `tr` utility to deal with needing to transform newlines into null characters. However, as I hoped when I put a request for a better way to do it in my <b>Epistemic Status</b> qualifier, a reader emailed me with a better solution!
  20. If youre using the GNU version of `grep`, it has a `--null-data` (shortened as `-z`) flag which makes grep treat its input as null-character-separated. You can combine that with the `-print0` flag to `find` to get the same results as I got with `tr` (presumably with better performance because it doesnt require doing the replacement in another tool):
  21. ```sh
  22. $ find notes -name ".md" -print0 |\
  23. grep --null-data "notes/2020" |\
  24. xargs -0 wc -w
  25. ```
  26. This reminded me that [ripgrep] has the same feature, with the same `--null-data` flag. Similarly, [fd] has a `--print0` (`-0`) option. You can combine *these* and (if you like) [cw][cw][^cw] to get the same effect:
  27. ```sh
  28. $ fd notes --print0 ".md" notes |\
  29. rg --null-data 'notes/2020' |\
  30. xargs -0 cw -w
  31. ```
  32. Huzzah for versions of tools that understand these things and make this simpler than the solution I posted yesterday (and thanks to my reader for sending in that note)!
  33. [^cw]: `cw` is nice because with especially large sets of data, the fact that you can invoke across threads becomes very handy. If I word-count *all* of my notes with it (currently 667 files and just shy of 150,000 words), using 4 threads instead of 1 (the default, and all you get with `wc`) takes about 68 milliseconds off the run time. Not important at *this* scale but if youre dealing with *very* large amounts of data, it might be.
  34. [ripgrep]: https://github.com/BurntSushi/ripgrep
  35. [fd]: https://github.com/sharkdp/fd
  36. [cw]: https://github.com/Freaky/cw