README.md · boyter/scc

1# Sloc Cloc and Code (scc)23![SCC illustration](./scc.jpg)45A tool similar to cloc, sloccount and tokei. For counting the lines of code, blank lines, comment lines, and physical lines of source code in many programming languages.67Goal is to be the fastest code counter possible, but also perform COCOMO calculation like sloccount, LOCOMO estimation for LLM-based development costs, estimate code complexity similar to cyclomatic complexity calculators and produce unique lines of code or DRYness metrics. In short one tool to rule them all.89Also it has a very short name which is easy to type `scc`.1011If you don't like sloc cloc and code feel free to use the name `Succinct Code Counter`.1213[![Go](https://github.com/boyter/scc/actions/workflows/go.yml/badge.svg)](https://github.com/boyter/scc/actions/workflows/go.yml)14[![Go Report Card](https://goreportcard.com/badge/github.com/boyter/scc)](https://goreportcard.com/report/github.com/boyter/scc)15[![Coverage Status](https://coveralls.io/repos/github/boyter/scc/badge.svg?branch=master)](https://coveralls.io/github/boyter/scc?branch=master)16[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/)](https://github.com/boyter/scc/)17![Scc count downloads](https://img.shields.io/github/downloads/boyter/scc/total?label=downloads%20%28GH%29)18[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)1920Licensed under MIT licence.2122## Table of Contents2324- [Install](#install)25- [Background](#background)26- [Pitch](#pitch)27- [Usage](#usage)28- [Complexity Estimates](#complexity-estimates)29- [Unique Lines of Code (ULOC)](#unique-lines-of-code-uloc)30- [COCOMO](#cocomo)31- [LOCOMO](#locomo)32- [Output Formats](#output-formats)33- [Performance](#performance)34- [Development](#development)35- [MCP Server Mode](#mcp-server-mode)36- [Adding/Modifying Languages](#addingmodifying-languages)37- [Issues](#issues)38- [Badges](#badges)39- [Language Support](LANGUAGES.md)40- [Citation](#citation)4142### scc for Teams & Enterprise4344While scc will always be a free and tool for individual developers, companies and businesses, we are exploring an enhanced version designed for teams and businesses. scc Enterprise will build on the core scc engine to provide historical analysis, team-level dashboards, and policy enforcement to help engineering leaders track code health, manage technical debt, and forecast project costs.4546We are currently gathering interest for a private beta. If you want to visualize your codebase's evolution, integrate quality gates into your CI/CD pipeline, and get a big-picture view across all your projects,47sign up for the early access list [here](https://docs.google.com/forms/d/e/1FAIpQLScIBKy3y2m0rKu89L67qwe26Xyn9Scu0gW-HQX9lC0qEAx9nQ/viewform)4849### Install5051#### Go Install5253You can install `scc` by using the standard go toolchain.5455To install the latest stable version of scc:5657`go install github.com/boyter/scc/v3@latest`5859To install a development version:6061`go install github.com/boyter/scc/v3@master`6263Note that `scc` needs go version >= 1.25.6465#### Snap6667A [snap install](https://snapcraft.io/scc) exists thanks to [Ricardo](https://feliciano.tech/).6869`$ sudo snap install scc`7071*NB* Snap installed applications cannot run outside of `/home` <https://askubuntu.com/questions/930437/permission-denied-error-when-running-apps-installed-as-snap-packages-ubuntu-17> so you may encounter issues if you use snap and attempt to run outside this directory.7273#### Homebrew7475Or if you have [Homebrew](https://brew.sh/) installed7677`$ brew install scc`7879#### Fedora8081Fedora Linux users can use a [COPR repository](https://copr.fedorainfracloud.org/coprs/lihaohong/scc/):8283`$ sudo dnf copr enable lihaohong/scc && sudo dnf install scc`8485#### MacPorts8687On macOS, you can also install via [MacPorts](https://www.macports.org)8889`$ sudo port install scc`9091#### Scoop9293Or if you are using [Scoop](https://scoop.sh/) on Windows9495`$ scoop install scc`9697#### Chocolatey9899Or if you are using [Chocolatey](https://chocolatey.org/) on Windows100101`$ choco install scc`102103#### WinGet104105Or if you are using [WinGet](https://github.com/microsoft/winget-cli) on Windows106107`winget install --id benboyter.scc --source winget`108109#### FreeBSD110111On FreeBSD, scc is available as a package112113`$ pkg install scc`114115Or, if you prefer to build from source, you can use the ports tree116117`$ cd /usr/ports/devel/scc && make install clean`118119### Run in Docker120121Go to the directory you want to run scc from.122123Run the command below to run the latest release of scc on your current working directory:124125```bash126docker run --rm -it -v "$PWD:/pwd"  ghcr.io/boyter/scc:master scc /pwd127```128129#### Manual130131Binaries for Windows, GNU/Linux and macOS for both i386 and x86_64 machines are available from the [releases](https://github.com/boyter/scc/releases) page.132133#### GitLab134135<https://about.gitlab.com/blog/2023/02/15/code-counting-in-gitlab/>136137#### Other138139If you would like to assist with getting `scc` added into apt/chocolatey/etc... please submit a PR or at least raise an issue with instructions.140141### Background142143Read all about how it came to be along with performance benchmarks,144145- <https://boyter.org/posts/sloc-cloc-code/>146- <https://boyter.org/posts/why-count-lines-of-code/>147- <https://boyter.org/posts/sloc-cloc-code-revisited/>148- <https://boyter.org/posts/sloc-cloc-code-performance/>149- <https://boyter.org/posts/sloc-cloc-code-performance-update/>150151Some reviews of `scc`152153- <https://nickmchardy.com/2018/10/counting-lines-of-code-in-koi-cms.html>154- <https://www.feliciano.tech/blog/determine-source-code-size-and-complexity-with-scc/>155- <https://metaredux.com/posts/2019/12/13/counting-lines.html>156157Setting up `scc` in GitLab158159- <https://about.gitlab.com/blog/2023/02/15/code-counting-in-gitlab/>160161A talk given at the first GopherCon AU about `scc` (press S to see speaker notes)162163- <https://boyter.org/static/gophercon-syd-presentation/>164- <https://www.youtube.com/watch?v=jd-sjoy3GZo>165166For performance see the [Performance](https://github.com/boyter/scc#performance) section167168Other similar projects,169170- [SLOCCount](https://www.dwheeler.com/sloccount/) the original sloc counter171- [cloc](https://github.com/AlDanial/cloc), inspired by SLOCCount; implemented in Perl for portability172- [gocloc](https://github.com/hhatto/gocloc) a sloc counter in Go inspired by tokei173- [loc](https://github.com/cgag/loc) rust implementation similar to tokei but often faster174- [loccount](https://gitlab.com/esr/loccount) Go implementation written and maintained by ESR175- [polyglot](https://github.com/vmchale/polyglot) ATS sloc counter176- [tokei](https://github.com/XAMPPRocky/tokei) fast, accurate and written in rust177- [sloc](https://github.com/flosse/sloc) coffeescript code counter178- [stto](https://github.com/mainak55512/stto) new Go code counter with a focus on performance179180Interesting reading about other code counting projects tokei, loc, polyglot and loccount181182- <https://www.reddit.com/r/rust/comments/59bm3t/a_fast_cloc_replacement_in_rust/>183- <https://www.reddit.com/r/rust/comments/82k9iy/loc_count_lines_of_code_quickly/>184- <http://blog.vmchale.com/article/polyglot-comparisons>185- <http://esr.ibiblio.org/?p=8270>186187Further reading about processing files on the disk performance188189- <https://blog.burntsushi.net/ripgrep/>190191Using `scc` to process 40 TB of files from GitHub/Bitbucket/GitLab192193- <https://boyter.org/posts/an-informal-survey-of-10-million-github-bitbucket-gitlab-projects/>194195### Pitch196197Why use `scc`?198199- It is very fast and gets faster the more CPU you throw at it200- Accurate201- Works very well across multiple platforms without slowdown (Windows, Linux, macOS)202- Large language support203- Can ignore duplicate files204- Has complexity estimations205- You need to tell the difference between Coq and Verilog in the same directory206- cloc yaml output support so potentially a drop in replacement for some users207- Can identify or ignore minified files208- Able to identify many #! files ADVANCED! <https://github.com/boyter/scc/issues/115>209- Can ignore large files by lines or bytes210- Can calculate the ULOC or unique lines of code by file, language or project211- Supports multiple output formats for integration, CSV, SQL, JSON, HTML and more212213Why not use `scc`?214215- You don't like Go for some reason216- It cannot count D source with different nested multi-line comments correctly <https://github.com/boyter/scc/issues/27>217218### Differences219220There are some important differences between `scc` and other tools that are out there. Here are a few important ones for you to consider.221222Blank lines inside comments are counted as comments. While the line is technically blank the decision was made that once in a comment everything there should be considered a comment until that comment is ended. As such the following,223224```c225/* blank lines follow226227228*/229```230231Would be counted as 4 lines of comments. This is noticeable when comparing scc's output to other tools on large232repositories.233234`scc` is able to count verbatim strings correctly. For example in C# the following,235236```C#237private const string BasePath = @"a:\";238// The below is returned to the user as a version239private const string Version = "1.0.0";240```241242Because of the prefixed @ this string ends at the trailing " by ignoring the escape character \ and as such should be243counted as 2 code lines and 1 comment. Some tools are unable to244deal with this and instead count up to the "1.0.0" as a string which can cause the middle comment to be counted as245code rather than a comment.246247`scc` will also tell you the number of bytes it has processed (for most output formats) allowing you to estimate the248cost of running some static analysis tools.249250### Usage251252Command line usage of `scc` is designed to be as simple as possible.253Full details can be found in `scc --help` or `scc -h`. Note that the below reflects the state of master not a release, as such254features listed below may be missing from your installation.255256```text257Sloc, Cloc and Code. Count lines of code in a directory with complexity estimation.258Version 3.5.0 (beta)259Ben Boyter <ben@boyter.org> + Contributors260261Usage:262  scc [flags] [files or directories]263264Flags:265      --avg-wage int                       average wage value used for basic COCOMO calculation (default 56286)266      --binary                             disable binary file detection267      --by-file                            display output for every file268  -m, --character                          calculate max and mean characters per line269      --ci                                 enable CI output settings where stdout is ASCII270      --cocomo-project-type string         change COCOMO model type [organic, semi-detached, embedded, "custom,1,1,1,1"] (default "organic")271      --count-as string                    count extension as language [e.g. jsp:htm,chead:"C Header" maps extension jsp to html and chead to C Header]272      --count-ignore                       set to allow .gitignore and .ignore files to be counted273      --currency-symbol string             set currency symbol (default "$")274      --debug                              enable debug output275      --directory-walker-job-workers int   controls the maximum number of workers which will walk the directory tree (default 8)276  -a, --dryness                            calculate the DRYness of the project (implies --uloc)277      --eaf float                          the effort adjustment factor derived from the cost drivers (1.0 if rated nominal) (default 1)278      --exclude-dir strings                directories to exclude (default [.git,.hg,.svn])279  -x, --exclude-ext strings                ignore file extensions (overrides include-ext) [comma separated list: e.g. go,java,js]280  -n, --exclude-file strings               ignore files with matching names (default [package-lock.json,Cargo.lock,yarn.lock,pubspec.lock,Podfile.lock,pnpm-lock.yaml])281      --file-gc-count int                  number of files to parse before turning the GC on (default 10000)282      --file-list-queue-size int           the size of the queue of files found and ready to be read into memory (default 8)283      --file-process-job-workers int       number of goroutine workers that process files collecting stats (default 8)284      --file-summary-job-queue-size int    the size of the queue used to hold processed file statistics before formatting (default 8)285  -f, --format string                      set output format [tabular, wide, json, json2, csv, csv-stream, cloc-yaml, html, html-table, sql, sql-insert, openmetrics] (default "tabular")286      --format-multi string                have multiple format output overriding --format [e.g. tabular:stdout,csv:file.csv,json:file.json]287      --gen                                identify generated files288      --generated-markers strings          string markers in head of generated files (default [do not edit,<auto-generated />])289  -h, --help                               help for scc290  -i, --include-ext strings                limit to file extensions [comma separated list: e.g. go,java,js]291      --include-symlinks                   if set will count symlink files292  -l, --languages                          print supported languages and extensions293      --large-byte-count int               number of bytes a file can contain before being removed from output (default 1000000)294      --large-line-count int               number of lines a file can contain before being removed from output (default 40000)295      --locomo                             enable LOCOMO (LLM Output COst MOdel) cost estimation296      --locomo-config string               LOCOMO power-user config "tokensPerLine,inputPerLine,complexityWeight,iterations,iterationWeight"297      --locomo-cycles float               override estimated LLM iteration cycles (default: calculated from complexity)298      --locomo-input-price float           LOCOMO cost per 1M input tokens in dollars (overrides preset)299      --locomo-output-price float          LOCOMO cost per 1M output tokens in dollars (overrides preset)300      --locomo-preset string               LOCOMO model preset [large, medium, small, local] (default "medium")301      --locomo-review float                human review minutes per line of code for LOCOMO estimate (default 0.01)302      --locomo-tps float                   LOCOMO output tokens per second (overrides preset)303      --cost-comparison                    show both COCOMO and LOCOMO estimates side by side304      --min                                identify minified files305  -z, --min-gen                            identify minified or generated files306      --min-gen-line-length int            number of bytes per average line for file to be considered minified or generated (default 255)307      --mcp                                start as an MCP (Model Context Protocol) server over stdio308      --no-cocomo                          remove COCOMO calculation output309  -c, --no-complexity                      skip calculation of code complexity310  -d, --no-duplicates                      remove duplicate files from stats and output311      --no-gen                             ignore generated files in output (implies --gen)312      --no-gitignore                       disables .gitignore file logic313      --no-gitmodule                       disables .gitmodules file logic314      --no-hborder                         remove horizontal borders between sections315      --no-ignore                          disables .ignore file logic316      --no-large                           ignore files over certain byte and line size set by large-line-count and large-byte-count317      --no-min                             ignore minified files in output (implies --min)318      --no-min-gen                         ignore minified or generated files in output (implies --min-gen)319      --no-scc-ignore                      disables .sccignore file logic320      --no-size                            remove size calculation output321  -M, --not-match stringArray              ignore files and directories matching regular expression322  -o, --output string                      output filename (default stdout)323      --overhead float                     set the overhead multiplier for corporate overhead (facilities, equipment, accounting, etc.) (default 2.4)324  -p, --percent                            include percentage values in output325      --remap-all string                   inspect every file and remap by checking for a string and remapping the language [e.g. "-*- C++ -*-":"C Header"]326      --remap-unknown string               inspect files of unknown type and remap by checking for a string and remapping the language [e.g. "-*- C++ -*-":"C Header"]327      --size-unit string                   set size unit [si, binary, mixed, xkcd-kb, xkcd-kelly, xkcd-imaginary, xkcd-intel, xkcd-drive, xkcd-bakers] (default "si")328      --sloccount-format                   print a more SLOCCount like COCOMO calculation329  -s, --sort string                        column to sort by [files, name, lines, blanks, code, comments, complexity] (default "files")330      --sql-project string                 use supplied name as the project identifier for the current run. Only valid with the --format sql or sql-insert option331  -t, --trace                              enable trace output (not recommended when processing multiple files)332  -u, --uloc                               calculate the number of unique lines of code (ULOC) for the project333  -v, --verbose                            verbose output334      --version                            version for scc335  -w, --wide                               wider output with additional statistics (implies --complexity)336```337338Output should look something like the below for the redis project339340```text341$ scc redis 342───────────────────────────────────────────────────────────────────────────────343Language                 Files     Lines   Blanks  Comments     Code Complexity344───────────────────────────────────────────────────────────────────────────────345C                          437   267,353   31,103    45,998  190,252     48,269346JSON                       406    25,392        4         0   25,388          0347C Header                   288    48,831    5,648    11,302   31,881      3,097348TCL                        215    66,943    7,330     4,651   54,962      3,816349Shell                       75     1,626      239       343    1,044        185350Python                      34     4,802      694       498    3,610        621351Markdown                    26     4,647    1,226         0    3,421          0352Autoconf                    22    11,732    1,124     1,420    9,188      1,016353Lua                         20       525       69        71      385         89354Makefile                    20     1,956      368       170    1,418         85355YAML                        20     2,696      147        53    2,496          0356MSBuild                     11     1,995        2         0    1,993        160357Plain Text                  10     1,773      313         0    1,460          0358Ruby                         9       817       73       105      639        123359C++                          8       546       85        43      418         43360HTML                         5     9,658    2,928        12    6,718          0361License                      3        90       17         0       73          0362CMake                        2       298       49         5      244         12363CSS                          2       107       16         0       91          0364Systemd                      2        80        6         0       74          0365BASH                         1       143       16         5      122         38366Batch                        1        28        2         0       26          3367C++ Header                   1         9        1         3        5          0368Extensible Styleshe…         1        10        0         0       10          0369JavaScript                   1        31        1         0       30          5370Module-Definition            1    11,375    2,116         0    9,259        167371SVG                          1         1        0         0        1          0372Smarty Template              1        44        1         0       43          5373m4                           1       951      218        64      669          0374───────────────────────────────────────────────────────────────────────────────375Total                    1,624   464,459   53,796    64,743  345,920     57,734376───────────────────────────────────────────────────────────────────────────────377Estimated Cost to Develop (organic) $12,517,562378Estimated Schedule Effort (organic) 35.93 months379Estimated People Required (organic) 30.95380───────────────────────────────────────────────────────────────────────────────381Processed 16601962 bytes, 16.602 megabytes (SI)382───────────────────────────────────────────────────────────────────────────────383```384385Note that you don't have to specify the directory you want to run against. Running `scc` will assume you want to run against the current directory.386387You can also run against multiple files or directories `scc directory1 directory2 file1 file2` with the results aggregated in the output.388389Since `scc` writes to standard output, there are many ways to easily share the results. For example, using [netcat](https://manpages.org/nc)390and [one of many pastebins](https://paste.c-net.org/) gives a public URL:391392```bash393$ scc | nc paste.c-net.org 9999394https://paste.c-net.org/Example395```396397### Ignore Files398399`scc` mostly supports .ignore files inside directories that it scans. This is similar to how ripgrep, ag and tokei work. .ignore files are 100% the same as .gitignore files with the same syntax, and as such `scc` will ignore files and directories listed in them. You can add .ignore files to ignore things like vendored dependency checked in files and such. The idea is allowing you to add a file or folder to git and have ignored in the count.400401It also supports its own ignore file `.sccignore` if you want `scc` to ignore things while having ripgrep, ag, tokei and others support them.402403### Interesting Use Cases404405Used inside Intel Nemu Hypervisor to track code changes between revisions <https://github.com/intel/nemu/blob/topic/virt-x86/tools/cloc-change.sh#L9>406Appears to also be used inside both <http://codescoop.com/> <https://pinpoint.com/> <https://github.com/chaoss/grimoirelab-graal>407408It also is used to count code and guess language types in <https://searchcode.com/> which makes it one of the most frequently run code counters in the world.409410You can also hook scc into your gitlab pipeline <https://gitlab.com/guided-explorations/ci-cd-plugin-extensions/ci-cd-plugin-extension-scc>411412Used by the following products and services,413414- [GitHub CodeQL](https://github.com/boyter/scc/pull/317) - The CodeQL engine uses `scc` for line counting415- [JetBrains Qodana](https://github.com/JetBrains/qodana-cli) - The Qodana CLI leverages `scc` as a command-line helper for code analysis416- [Scaleway](https://twitter.com/Scaleway/status/1488087029476995074?s=20&t=N2-z6O-ISDdDzULg4o4uVQ) - Cloud provider using `scc`417- [Linux Foundation LFX Insights](https://docs.linuxfoundation.org/lfx/insights/v3-beta-version-current/getting-started/landing-page/cocomo-cost-estimation-simplified) - COCOMO cost estimation418- [OpenEMS](https://openems.io/)419420### Features421422`scc` uses a small state machine in order to determine what state the code is when it reaches a newline `\n`. As such it is aware of and able to count423424- Single Line Comments425- Multi Line Comments426- Strings427- Multi Line Strings428- Blank lines429430Because of this it is able to accurately determine if a comment is in a string or is actually a comment.431432It also attempts to count the complexity of code. This is done by checking for branching operations in the code. For example, each of the following `for if switch while else || && != ==` if encountered in Java would increment that files complexity by one.433434### Complexity Estimates435436Let's take a minute to discuss the complexity estimate itself.437438The complexity estimate is really just a number that is only comparable to files in the same language. It should not be used to compare languages directly without weighting them. The reason for this is that its calculated by looking for branch and loop statements in the code and incrementing a counter for that file.439440Because some languages don't have loops and instead use recursion they can have a lower complexity count. Does this mean they are less complex? Probably not, but the tool cannot see this because it does not build an AST of the code as it only scans through it.441442Generally though the complexity there is to help estimate between projects written in the same language, or for finding the most complex file in a project `scc --by-file -s complexity` which can be useful when you are estimating on how hard something is to maintain, or when looking for those files that should probably be refactored.443444As for how it works.445446It's my own definition, but tries to be an approximation of cyclomatic complexity <https://en.wikipedia.org/wiki/Cyclomatic_complexity> although done only on a file level.447448The reason it's an approximation is that it's calculated almost for free from a CPU point of view (since its a cheap lookup when counting), whereas a real cyclomatic complexity count would need to parse the code. It gives a reasonable guess in practice though even if it fails to identify recursive methods. The goal was never for it to be exact.449450In short when scc is looking through what it has identified as code if it notices what are usually branch conditions it will increment a counter.451452The conditions it looks for are compiled into the code and you can get an idea for them by looking at the JSON inside the repository. See <https://github.com/boyter/scc/blob/master/languages.json#L3869> for an example of what it's looking at for a file that's Java.453454The increment happens for each of the matching conditions and produces the number you see.455456### Unique Lines of Code (ULOC)457458ULOC stands for Unique Lines of Code and represents the unique lines across languages, files and the project itself. This idea was taken from459<https://cmcenroe.me/2018/12/14/uloc.html> where the calculation is presented using standard Unix tools `sort -u *.h *.c | wc -l`. This metric is460there to assist with the estimation of complexity within the project. Quoting the source461462> In my opinion, the number this produces should be a better estimate of the complexity of a project. Compared to SLOC, not only are blank lines discounted, but so are close-brace lines and other repetitive code such as common includes. On the other hand, ULOC counts comments, which require just as much maintenance as the code around them does, while avoiding inflating the result with license headers which appear in every file, for example.463464You can obtain the ULOC by supplying the `-u` or `--uloc` argument to `scc`.465466It has a corresponding metric `DRYness %` which is the percentage of ULOC to CLOC or `DRYness = ULOC / SLOC`. The467higher the number the more DRY (don't repeat yourself) the project can be considered. In general a higher value468here is a better as it indicates less duplicated code. The DRYness metric was taken from a comment by minimax <https://lobste.rs/s/has9r7/uloc_unique_lines_code>469470To obtain the DRYness metric you can use the `-a` or `--dryness` argument to `scc`, which will implicitly set `--uloc`.471472Note that there is a performance penalty when calculating the ULOC metrics which can double the runtime.473474Running the uloc and DRYness calculations against C code a clone of redis produces an output as follows.475476```bash477$ scc -a -i c redis 478───────────────────────────────────────────────────────────────────────────────479Language                 Files     Lines   Blanks  Comments     Code Complexity480───────────────────────────────────────────────────────────────────────────────481C                          437   267,353   31,103    45,998  190,252     48,269482(ULOC)                            149892483───────────────────────────────────────────────────────────────────────────────484Total                      437   267,353   31,103    45,998  190,252     48,269485───────────────────────────────────────────────────────────────────────────────486Unique Lines of Code (ULOC)       149892487DRYness %                           0.56488───────────────────────────────────────────────────────────────────────────────489Estimated Cost to Develop (organic) $6,681,762490Estimated Schedule Effort (organic) 28.31 months491Estimated People Required (organic) 20.97492───────────────────────────────────────────────────────────────────────────────493Processed 9390815 bytes, 9.391 megabytes (SI)494───────────────────────────────────────────────────────────────────────────────495```496497Further reading about the ULOC calculation can be found at <https://boyter.org/posts/sloc-cloc-code-new-metic-uloc/>498499Interpreting Dryness,500501- 75% (High Density): Very terse, expressive code. Every line counts. (Example: Clojure, Haskell)502- 60% - 70% (Standard): A healthy balance of logic and structural ceremony. (Example: Java, Python)503- < 55% (High Boilerplate): High repetition. Likely due to mandatory error handling, auto-generated code, or verbose configuration. (Example: C#, CSS)504505See <https://boyter.org/posts/boilerplate-tax-ranking-popular-languages-by-density/> for more details.506507### COCOMO508509The COCOMO statistics displayed at the bottom of any command line run can be configured as needed.510511```text512Estimated Cost to Develop (organic) $664,081513Estimated Schedule Effort (organic) 11.772217 months514Estimated People Required (organic) 5.011633515```516517To change the COCOMO parameters, you can either use one of the default COCOMO models.518519```text520scc --cocomo-project-type organic521scc --cocomo-project-type semi-detached522scc --cocomo-project-type embedded523```524525You can also supply your own parameters if you are familiar with COCOMO as follows,526527```text528scc --cocomo-project-type "custom,1,1,1,1"529```530531See below for details about how the model choices, and the parameters they use.532533Organic – A software project is said to be an organic type if the team size required is adequately small, the534problem is well understood and has been solved in the past and also the team members have a nominal experience535regarding the problem.536537`scc --cocomo-project-type "organic,2.4,1.05,2.5,0.38"`538539Semi-detached – A software project is said to be a Semi-detached type if the vital characteristics such as team-size,540experience, knowledge of the various programming environment lie in between that of organic and Embedded.541The projects classified as Semi-Detached are comparatively less familiar and difficult to develop compared to542the organic ones and require more experience and better guidance and creativity. Eg: Compilers or543different Embedded Systems can be considered of Semi-Detached type.544545`scc --cocomo-project-type "semi-detached,3.0,1.12,2.5,0.35"`546547Embedded – A software project with requiring the highest level of complexity, creativity, and experience548requirement fall under this category. Such software requires a larger team size than the other two models549and also the developers need to be sufficiently experienced and creative to develop such complex models.550551`scc --cocomo-project-type "embedded,3.6,1.20,2.5,0.32"`552553### LOCOMO554555LOCOMO (LLM Output COst MOdel) estimates the cost to regenerate a codebase using a large language model. It is the LLM-era counterpart to COCOMO — a rough ballpark estimator, not a project planning tool.556557Note: LOCOMO was developed as part of `scc` and is not an industry-standard model. Unlike COCOMO, which is based on decades of empirical research by Barry Boehm, LOCOMO is an experimental heuristic designed to give a useful order-of-magnitude estimate for LLM-assisted development costs. Treat its output as a conversation starter, not a definitive answer.558559**Important distinction:** LOCOMO estimates the cost to **regenerate** known code — essentially "given this exact codebase, how much would it cost to have an LLM produce it?" This is fundamentally different from the cost to **create** something from scratch, which involves exploration, architectural decisions, dead ends, debugging, and iteration that can cost orders of magnitude more. COCOMO estimates the human *creation* cost; LOCOMO estimates the LLM *regeneration* cost. They answer different questions.560561LOCOMO is opt-in. Enable it with `--locomo` or use `--cost-comparison` to display both COCOMO and LOCOMO side by side.562563```564$ scc --locomo .565...566LOCOMO LLM Cost Estimate (medium)567  Tokens Required (in/out) 3.0M / 0.7M568  Cost to Generate $20569  Estimated Cycles 2.1570  Generation Time (serial) 3.9 hours571  Human Review Time 5.9 hours572  Disclaimer: rough ballpark for regenerating code using a LLM.573  Does not account for context reuse, test generation, or heavy debugging.574```575576#### How it works577578LOCOMO uses SLOC and complexity data that `scc` already computes. The model works per-file and aggregates:5795801. **Output tokens** — each line of code maps to ~10 LLM output tokens (configurable).5812. **Input tokens** — estimated prompting cost, scaled by code complexity. More complex code (higher branch density) requires more detailed prompts. Scales to prevent runaway estimates.5823. **Iteration factor** — LLMs rarely produce correct code on the first try. A retry multiplier scales with complexity, also scales.5834. **Dollar cost** — input and output tokens multiplied by per-token pricing.5845. **Generation time** — total serial output tokens divided by tokens-per-second throughput.5856. **Human review time** — estimated per-line overhead for planning, review, testing, and integration.586587#### Model presets588589Presets are tier-based rather than tied to specific models, so they don't go stale as models are retired or renamed. Use `--locomo-preset` to select a tier:590591| Preset | Represents | Input $/1M | Output $/1M | TPS |592|--------|-----------|-----------|-------------|-----|593| `large` | Frontier models (Opus, GPT-5.3, Gemini 3.1 Pro, etc.) | 10.00 | 30.00 | 30 |594| `medium` (default) | Balanced models (Sonnet, Gemini Flash, etc.) | 3.00 | 15.00 | 50 |595| `small` | Fast/cheap models (Haiku, GPT-4o-mini, etc.) | 0.50 | 2.00 | 100 |596| `local` | Self-hosted models (Llama, Mistral, Qwen etc.) | 0.00 | 0.00 | 15 |597598For `local`, cost is $0 but generation time is still reported to capture the compute/time investment. Preset pricing reflects approximate tier rates as of early 2026 and can be overridden with explicit flags.599600```601scc --locomo --locomo-preset large .602scc --locomo --locomo-preset local .603```604605#### Overriding preset values606607You can override individual preset values for pricing or throughput:608609```610scc --locomo --locomo-input-price 1.0 --locomo-output-price 5.0 .611scc --locomo --locomo-tps 100 .612```613614#### Human review time615616The `--locomo-review` flag controls estimated human review minutes per line of code (default: 0.01, i.e. 0.6 seconds per line). This is intentionally optimistic and assumes light oversight.617618For mission-critical, security-sensitive, or complex algorithmic code you should increase this:619620```621scc --locomo --locomo-review 0.05 .622scc --locomo --locomo-review 0.1 .623```624625#### Power-user configuration626627The five internal model parameters can be overridden with a single comma-separated config string:628629```630scc --locomo --locomo-config "tokensPerLine,inputPerLine,complexityWeight,iterations,iterationWeight"631```632633The defaults are `"10,20,5,1.5,2"`. Here is what each parameter controls:634635| Position | Name | Default | Description |636|----------|------|---------|-------------|637| 1 | tokensPerLine | 10 | Average LLM output tokens per line of code |638| 2 | inputPerLine | 20 | Base LLM input (prompt) tokens per output line |639| 3 | complexityWeight | 5 | How much complexity density scales input tokens: `inputFactor = 1 + sqrt(density) * weight` |640| 4 | iterations | 1.5 | Base iteration/retry cycles before complexity adjustment |641| 5 | iterationWeight | 2 | How much complexity density adds extra cycles: `cycles = iterations + sqrt(density) * weight` |642643The iteration factor (cycles) scales both input and output tokens — it represents how many generation attempts the LLM needs. Simple code (~0.05 complexity density) produces ~1.9 cycles; complex code (~0.3 density) produces ~2.6 cycles. Use `--locomo-cycles` to override this with a fixed value.644645For example, to model a cheaper/faster LLM that needs fewer tokens but more retries:646647```648scc --locomo --locomo-config "8,15,3,2.0,1.5"649```650651#### Comparing COCOMO and LOCOMO652653Use `--cost-comparison` to show both estimates side by side. This enables COCOMO (if it was disabled) and LOCOMO together:654655```656scc --cost-comparison .657```658659#### What LOCOMO does not account for660661LOCOMO is a rough estimator with known limitations:662663- **No context reuse.** Real LLM-assisted development shares context across files. The per-file model overestimates input tokens for large projects with shared patterns.664- **Boilerplate vs algorithmic code.** A 500-line CRUD controller and a 500-line compression algorithm have very different real costs, but the model only differentiates them via complexity density.665- **Code that LLMs can't write well.** Complex concurrency, platform-specific edge cases, and security-critical crypto need human authoring, not just review.666- **No test generation cost.** The model estimates source code generation only, not test suites.667- **Pricing changes.** LLM pricing drops rapidly. Preset defaults will become stale — use explicit price flags for current estimates.668669#### All LOCOMO flags670671| Flag | Default | Description |672|------|---------|-------------|673| `--locomo` | false | Enable LOCOMO output |674| `--cost-comparison` | false | Show COCOMO + LOCOMO side by side |675| `--locomo-preset` | medium | Model tier preset for pricing and throughput |676| `--locomo-input-price` | (preset) | Override: cost per 1M input tokens ($) |677| `--locomo-output-price` | (preset) | Override: cost per 1M output tokens ($) |678| `--locomo-tps` | (preset) | Override: output tokens per second |679| `--locomo-review` | 0.01 | Human review minutes per line of code |680| `--locomo-cycles` | (calculated) | Override estimated LLM iteration cycles |681| `--locomo-config` | 10,20,5,1.5,2 | Power-user config: tokensPerLine, inputPerLine, complexityWeight, iterations, iterationWeight |682683### Large File Detection684685You can have `scc` exclude large files from the output.686687The option to do so is `--no-large` which by default will exclude files over 1,000,000 bytes or 40,000 lines.688689You can control the size of either value using `--large-byte-count` or `--large-line-count`.690691For example to exclude files over 1,000 lines and 50kb you could use the following,692693`scc --no-large --large-byte-count 50000 --large-line-count 1000`694695### Minified/Generated File Detection696697You can have `scc` identify and optionally remove files identified as being minified or generated from the output.698699You can do so by enabling the `-z` flag like so `scc -z` which will identify any file with an average line byte size >= 255 (by default) as being minified.700701Minified files appear like so in the output.702703```text704$ scc --no-cocomo -z ./examples/minified/jquery-3.1.1.min.js705───────────────────────────────────────────────────────────────────────────────706Language                 Files     Lines   Blanks  Comments     Code Complexity707───────────────────────────────────────────────────────────────────────────────708JavaScript (min)             1         4        0         1        3         17709───────────────────────────────────────────────────────────────────────────────710Total                        1         4        0         1        3         17711───────────────────────────────────────────────────────────────────────────────712Processed 86709 bytes, 0.087 megabytes (SI)713───────────────────────────────────────────────────────────────────────────────714```715716Minified files are indicated with the text `(min)` after the language name.717718Generated files are indicated with the text `(gen)` after the language name.719720You can control the average line byte size using `--min-gen-line-length` such as `scc -z --min-gen-line-length 1`. Please note you need `-z` as modifying this value does not imply minified detection.721722You can exclude minified files from the count totally using the flag `--no-min-gen`. Files which match the minified check will be excluded from the output.723724### Remapping725726Some files may not have an extension. They will be checked to see if they are a #! file. If they are then the language will be remapped to the727correct language. Otherwise, it will not process.728729However, you may have the situation where you want to remap such files based on a string inside it. To do so you can use `--remap-unknown`730731```bash732 scc --remap-unknown "-*- C++ -*-":"C Header"733```734735The above will inspect any file with no extension looking for the string `-*- C++ -*-` and if found remap the file to be counted using the C Header rules.736You can have multiple remap rules if required,737738```bash739 scc --remap-unknown "-*- C++ -*-":"C Header","other":"Java"740```741742There is also the `--remap-all` parameter which will remap all files.743744Note that in all cases if the remap rule does not apply normal #! rules will apply.745746### Output Formats747748By default `scc` will output to the console. However, you can produce output in other formats if you require.749750The different options are `tabular, wide, json, csv, csv-stream, cloc-yaml, html, html-table, sql, sql-insert, openmetrics`.751752Note that you can write `scc` output to disk using the `-o, --output` option. This allows you to specify a file to753write your output to. For example `scc -f html -o output.html` will run `scc` against the current directory, and output754the results in html to the file `output.html`.755756You can also write to multiple output files, or multiple types to stdout if you want using the `--format-multi` option. This is757most useful when working in CI/CD systems where you want HTML reports as an artifact while also displaying the counts in stdout.758759```bash760scc --format-multi "tabular:stdout,html:output.html,csv:output.csv"761```762763The above will run against the current directory, outputting to standard output the default output, as well as writing764to output.html and output.csv with the appropriate formats.765766#### Tabular767768This is the default output format when scc is run.769770#### Wide771772Wide produces some additional information which is the complexity/lines metric. This can be useful when trying to773identify the most complex file inside a project based on the complexity estimate.774775#### JSON776777JSON produces JSON output. Mostly designed to allow `scc` to feed into other programs.778779Note that this format will give you the byte size of every file `scc` reads allowing you to get a breakdown of the780number of bytes processed.781782#### CSV783784CSV as an option is good for importing into a spreadsheet for analysis.785786Note that this format will give you the byte size of every file `scc` reads allowing you to get a breakdown of the787number of bytes processed. Also note that CSV respects `--by-file` and as such will return a summary by default.788789#### CSV-Stream790791csv-stream is an option useful for processing very large repositories where you are likely to run into memory issues. It's output format is 100% the same as CSV.792793Note that you should not use this with the `format-multi` option as it will always print to standard output, and because of how it works will negate the memory saving it normally gains.794savings that this option provides. Note that there is no sort applied with this option.795796#### cloc-yaml797798Is a drop in replacement for cloc using its yaml output option. This is quite often used for passing into other799build systems and can help with replacing cloc if required.800801```text802$ scc -f cloc-yml processor803# https://github.com/boyter/scc/804header:805  url: https://github.com/boyter/scc/806  version: 2.11.0807  elapsed_seconds: 0.008808  n_files: 21809  n_lines: 6562810  files_per_second: 2625811  lines_per_second: 820250812Go:813  name: Go814  code: 5186815  comment: 273816  blank: 1103817  nFiles: 21818SUM:819  code: 5186820  comment: 273821  blank: 1103822  nFiles: 21823824$ cloc --yaml processor825      21 text files.826      21 unique files.827       0 files ignored.828829---830# http://cloc.sourceforge.net831header :832  cloc_url           : http://cloc.sourceforge.net833  cloc_version       : 1.60834  elapsed_seconds    : 0.196972846984863835  n_files            : 21836  n_lines            : 6562837  files_per_second   : 106.613679608407838  lines_per_second   : 33314.2364566841839Go:840  nFiles: 21841  blank: 1137842  comment: 606843  code: 4819844SUM:845  blank: 1137846  code: 4819847  comment: 606848  nFiles: 21849```850851#### HTML and HTML-TABLE852853The HTML output options produce a minimal html report using a table that is either standalone `html` or as just a table `html-table`854which can be injected into your own HTML pages. The only difference between the two is that the `html` option includes855html head and body tags with minimal styling.856857The markup is designed to allow your own custom styles to be applied. An example report858[is here to view](SCC-OUTPUT-REPORT.html).859860Note that the HTML options follow the command line options, so you can use `scc --by-file -f html` to produce a report with every861file and not just the summary.862863Note that this format if it has the `--by-file` option will give you the byte size of every file `scc` reads allowing you to get a breakdown of the864number of bytes processed.865866#### SQL and SQL-Insert867868The SQL output format "mostly" compatible with cloc's SQL output format <https://github.com/AlDanial/cloc#sql->869870While all queries on the cloc documentation should work as expected, you will not be able to append output from `scc` and `cloc` into the same database. This is because the table format is slightly different871to account for scc including complexity counts and bytes.872873The difference between `sql` and `sql-insert` is that `sql` will include table creation while the latter will only have the insert commands.874875Usage is 100% the same as any other `scc` command but sql output will always contain per file details. You can compute totals yourself using SQL, however COCOMO calculations will appear against the metadata table as the columns `estimated_cost` `estimated_schedule_months` and `estimated_people`.876877The below will run scc against the current directory, name the output as the project scc and then pipe the output to sqlite to put into the database code.db878879```bash880scc --format sql --sql-project scc . | sqlite3 code.db881```882883Assuming you then wanted to append another project884885```bash886scc --format sql-insert --sql-project redis . | sqlite3 code.db887```888889You could then run SQL against the database,890891```bash892sqlite3 code.db 'select project,file,max(nCode) as nL from t893                         group by project order by nL desc;'894```895896See the cloc documentation for more examples.897898#### OpenMetrics899900[OpenMetrics](https://openmetrics.io/) is a metric reporting format specification extending the Prometheus exposition text format.901902The produced output is natively supported by [Prometheus](https://prometheus.io/) and [GitLab CI](https://docs.gitlab.com/ee/ci/testing/metrics_reports.html)903904Note that OpenMetrics respects `--by-file` and as such will return a summary by default.905906The output includes a metadata header containing definitions of the returned metrics:907908```text909# TYPE scc_files count910# HELP scc_files Number of sourcecode files.911# TYPE scc_lines count912# UNIT scc_lines lines913# HELP scc_lines Number of lines.914# TYPE scc_code count915# HELP scc_code Number of lines of actual code.916# TYPE scc_comments count917# HELP scc_comments Number of comments.918# TYPE scc_blanks count919# HELP scc_blanks Number of blank lines.920# TYPE scc_complexity count921# HELP scc_complexity Code complexity.922# TYPE scc_bytes count923# UNIT scc_bytes bytes924# HELP scc_bytes Size in bytes.925```926927The header is followed by the metric data in either language summary form:928929```text930scc_files{language="Go"} 1931scc_lines{language="Go"} 1000932scc_code{language="Go"} 1000933scc_comments{language="Go"} 1000934scc_blanks{language="Go"} 1000935scc_complexity{language="Go"} 1000936scc_bytes{language="Go"} 1000937```938939or, if `--by-file` is present, in per file form:940941```text942scc_lines{language="Go",file="./bbbb.go"} 1000943scc_code{language="Go",file="./bbbb.go"} 1000944scc_comments{language="Go",file="./bbbb.go"} 1000945scc_blanks{language="Go",file="./bbbb.go"} 1000946scc_complexity{language="Go",file="./bbbb.go"} 1000947scc_bytes{language="Go",file="./bbbb.go"} 1000948```949950### Performance951952Generally `scc` will the fastest code counter compared to any I am aware of and have compared against. The below comparisons are taken from the fastest alternative counters. See `Other similar projects` above to see all of the other code counters compared against. It is designed to scale to as many CPU's cores as you can provide.953954However, if you want greater performance and you have RAM to spare you can disable the garbage collector like the following on Linux `GOGC=-1 scc .` which should speed things up considerably. For some repositories turning off the code complexity calculation via `-c` can reduce runtime as well.955956Benchmarks are run on fresh 32 Core CPU Optimised Vultr Ocean Virtual Machine 2026/03/05 all done using [hyperfine](https://github.com/sharkdp/hyperfine).957958See <https://github.com/boyter/scc/blob/master/benchmark.sh> to see how the benchmarks are run.959960#### Valkey <https://github.com/valkey-io/valkey>961962```shell963Benchmark 1: scc valkey964  Time (mean ± σ):      27.7 ms ±   2.1 ms    [User: 175.7 ms, System: 87.0 ms]965  Range (min … max):    23.1 ms …  32.1 ms    96 runs966 967Benchmark 2: scc -c valkey968  Time (mean ± σ):      23.0 ms ±   1.5 ms    [User: 131.7 ms, System: 84.0 ms]969  Range (min … max):    19.5 ms …  31.4 ms    130 runs970 971Benchmark 3: tokei valkey972  Time (mean ± σ):      74.0 ms ±  13.0 ms    [User: 394.2 ms, System: 245.1 ms]973  Range (min … max):    49.1 ms …  92.5 ms    37 runs974 975Benchmark 4: polyglot valkey976  Time (mean ± σ):      41.1 ms ±   1.2 ms    [User: 54.2 ms, System: 103.3 ms]977  Range (min … max):    37.5 ms …  47.0 ms    69 runs978 979Summary980  scc -c valkey ran981    1.20 ± 0.12 times faster than scc valkey982    1.78 ± 0.13 times faster than polyglot valkey983    3.21 ± 0.61 times faster than tokei valkey984```985986#### CPython <https://github.com/python/cpython>987988```shell989Benchmark 1: scc cpython990  Time (mean ± σ):      80.8 ms ±   2.6 ms    [User: 751.1 ms, System: 265.6 ms]991  Range (min … max):    75.7 ms …  87.4 ms    36 runs992 993Benchmark 2: scc -c cpython994  Time (mean ± σ):      70.5 ms ±   2.4 ms    [User: 592.6 ms, System: 254.7 ms]995  Range (min … max):    66.2 ms …  77.6 ms    40 runs996 997Benchmark 3: tokei cpython998  Time (mean ± σ):     450.2 ms ±  36.1 ms    [User: 1822.0 ms, System: 1246.9 ms]999  Range (min … max):   378.6 ms … 491.2 ms    10 runs1000 1001Benchmark 4: polyglot cpython1002  Time (mean ± σ):     149.9 ms ±   5.8 ms    [User: 199.2 ms, System: 326.2 ms]1003  Range (min … max):   138.3 ms … 164.1 ms    19 runs1004 1005Summary1006  scc -c cpython ran1007    1.15 ± 0.05 times faster than scc cpython1008    2.13 ± 0.11 times faster than polyglot cpython1009    6.39 ± 0.56 times faster than tokei cpython1010```10111012#### Linux Kernel <https://github.com/torvalds/linux>10131014```shell1015Benchmark 1: scc linux1016  Time (mean ± σ):     907.2 ms ±  17.1 ms    [User: 13764.7 ms, System: 2957.0 ms]1017  Range (min … max):   878.2 ms … 925.0 ms    10 runs1018 1019Benchmark 2: scc -c linux1020  Time (mean ± σ):     842.5 ms ±  17.2 ms    [User: 9363.3 ms, System: 2977.0 ms]1021  Range (min … max):   819.4 ms … 874.0 ms    10 runs1022 1023Benchmark 3: tokei linux1024  Time (mean ± σ):      1.422 s ±  0.089 s    [User: 13.292 s, System: 9.582 s]1025  Range (min … max):    1.176 s …  1.471 s    10 runs1026 1027Benchmark 4: polyglot linux1028  Time (mean ± σ):      1.862 s ±  0.046 s    [User: 3.802 s, System: 3.543 s]1029  Range (min … max):    1.800 s …  1.935 s    10 runs1030 1031Summary1032  scc -c linux ran1033    1.08 ± 0.03 times faster than scc linux1034    1.69 ± 0.11 times faster than tokei linux1035    2.21 ± 0.07 times faster than polyglot linux1036```10371038#### Sourcegraph <https://github.com/SINTEF/sourcegraph.git>10391040Sourcegraph has gone dark since I last ran these benchmarks hence using a clone taken before this occured.1041The reason for this is to track what appears to be a performance regression in tokei.10421043```shell1044Benchmark 1: scc sourcegraph1045  Time (mean ± σ):     108.2 ms ±   3.5 ms    [User: 559.4 ms, System: 323.6 ms]1046  Range (min … max):   100.5 ms … 115.9 ms    26 runs1047 1048Benchmark 2: scc -c sourcegraph1049  Time (mean ± σ):      99.7 ms ±   4.2 ms    [User: 503.1 ms, System: 316.8 ms]1050  Range (min … max):    91.4 ms … 109.4 ms    29 runs1051 1052Benchmark 3: tokei sourcegraph1053  Time (mean ± σ):     21.359 s ±  1.025 s    [User: 57.252 s, System: 411.480 s]1054  Range (min … max):   19.371 s … 22.741 s    10 runs1055 1056Benchmark 4: polyglot sourcegraph1057  Time (mean ± σ):     135.1 ms ±   5.0 ms    [User: 198.6 ms, System: 543.7 ms]1058  Range (min … max):   126.0 ms … 144.8 ms    21 runs1059 1060Summary1061  scc -c sourcegraph ran1062    1.08 ± 0.06 times faster than scc sourcegraph1063    1.36 ± 0.08 times faster than polyglot sourcegraph1064  214.26 ± 13.64 times faster than tokei sourcegraph1065```10661067If you enable duplicate detection expect performance to fall by about 20% in `scc`.10681069Performance is tracked for some releases and presented below.10701071[![scc perfromance on Linux kernel](./performance-over-time.png)]1072The decrease in performance from the 3.3.0 release was due to accurate .gitignore, .ignore and .gitmodule support.1073Current work is focussed on resolving this.10741075### CI/CD Support10761077Some CI/CD systems which will remain nameless do not work very well with the box-lines used by `scc`. To support those systems better there is an option `--ci` which will change the default output to ASCII only.10781079```text1080$ scc --ci main.go1081-------------------------------------------------------------------------------1082Language                 Files     Lines   Blanks  Comments     Code Complexity1083-------------------------------------------------------------------------------1084Go                           1       272        7         6      259          41085-------------------------------------------------------------------------------1086Total                        1       272        7         6      259          41087-------------------------------------------------------------------------------1088Estimated Cost to Develop $6,5391089Estimated Schedule Effort 2.268839 months1090Estimated People Required 0.3414371091-------------------------------------------------------------------------------1092Processed 5674 bytes, 0.006 megabytes (SI)1093-------------------------------------------------------------------------------1094```10951096The `--format-multi` option is especially useful in CI/CD where you want to get multiple output formats useful for storage or reporting.10971098### Development10991100If you want to hack away feel free! PR's are accepted. Some things to keep in mind. If you want to change a language definition you need to update `languages.json` and then run `go generate` which will convert it into the `processor/constants.go` file.11011102For all other changes ensure you run all tests before submitting. You can do so using `go test ./...`. However, for maximum coverage please run `test-all.sh` which will run `gofmt`, unit tests, race detector and then all of the integration tests. All of those must pass to ensure a stable release.11031104### API Support11051106The core part of `scc` which is the counting engine is exposed publicly to be integrated into other Go applications. See <https://github.com/pinpt/ripsrc> for an example of how to do this.11071108It also powers all of the code calculations displayed in <https://searchcode.com/> such as <https://searchcode.com/file/169350674/main.go/> making it one of the more used code counters in the world.11091110However as a quick start consider the following,11111112Note that you must pass in the number of bytes in the content in order to ensure it is counted!11131114```go1115package main11161117import (1118  "fmt"1119  "io/ioutil"11201121  "github.com/boyter/scc/v3/processor"1122)11231124type statsProcessor struct{}11251126func (p *statsProcessor) ProcessLine(job *processor.FileJob, currentLine int64, lineType processor.LineType) bool {1127  switch lineType {1128  case processor.LINE_BLANK:1129    fmt.Println(currentLine, "lineType", "BLANK")1130  case processor.LINE_CODE:1131    fmt.Println(currentLine, "lineType", "CODE")1132  case processor.LINE_COMMENT:1133    fmt.Println(currentLine, "lineType", "COMMENT")1134  }1135  return true1136}11371138func main() {1139  bts, _ := ioutil.ReadFile("somefile.go")  1140  t := &statsProcessor{}1141  filejob := &processor.FileJob{1142    Filename: "test.go",1143    Language: "Go",1144    Content:  bts,1145    Callback: t,1146    Bytes:    int64(len(bts)),1147  }  1148  processor.ProcessConstants() // Required to load the language information and need only be done once1149  processor.CountStats(filejob)1150}1151```11521153#### Per-Byte Content Classification11541155For library consumers who need finer granularity than per-line classification, `scc` supports opt-in per-byte content classification. When enabled, `CountStats` populates a byte slice classifying every byte in the file as code, comment, string, or blank. This is useful for stripping comments from source files, extracting only comments, or building syntax-aware tools without reimplementing language parsing.11561157To enable it, set `ClassifyContent: true` on the `FileJob` before calling `CountStats`. When disabled (the default), there is zero performance impact.11581159```go1160package main11611162import (1163  "fmt"1164  "os"11651166  "github.com/boyter/scc/v3/processor"1167)11681169func main() {1170  processor.ProcessConstants()11711172  bts, _ := os.ReadFile("main.go")1173  filejob := &processor.FileJob{1174    Filename:        "main.go",1175    Language:        "Go",1176    Content:         bts,1177    Bytes:           int64(len(bts)),1178    ClassifyContent: true, // Enable per-byte classification1179  }1180  processor.CountStats(filejob)11811182  // ContentByteType has one entry per byte with values:1183  //   processor.ByteTypeBlank   (0) - blank lines / leading whitespace1184  //   processor.ByteTypeCode    (1) - code1185  //   processor.ByteTypeComment (2) - comments (including docstrings)1186  //   processor.ByteTypeString  (3) - string literals11871188  // Example: extract only code, replacing everything else with spaces1189  codeOnly := filejob.FilterContentByType(processor.ByteTypeCode)1190  fmt.Println(string(codeOnly))11911192  // Example: extract only comments1193  commentsOnly := filejob.FilterContentByType(processor.ByteTypeComment)1194  fmt.Println(string(commentsOnly))11951196  // Example: keep both code and strings, strip comments1197  noComments := filejob.FilterContentByType(processor.ByteTypeCode, processor.ByteTypeString)1198  fmt.Println(string(noComments))1199}1200```12011202`FilterContentByType` returns a copy of the content with non-matching bytes replaced by spaces. Newlines are always preserved regardless of type, so the output maintains the same line structure as the original file. It returns `nil` if classification was not enabled.12031204Note that at syntax marker boundaries (e.g., `//`, `/*`, `"`), the first byte of the marker may be classified as the preceding state. This is a 1-byte approximation that is acceptable for content filtering use cases.12051206### MCP Server Mode12071208`scc` can run as an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server over stdio, allowing LLM tools like Claude Desktop, Claude Code, Cursor, and others to use it as a code analysis tool.12091210```shell1211scc --mcp1212```12131214#### Claude Code Configuration12151216Run in your terminal for the current project:12171218```shell1219claude mcp add scc -- scc --mcp1220```12211222Or globally for all projects:12231224```shell1225claude mcp add scc --scope user -- scc --mcp1226```12271228Alternatively, add to your `.mcp.json`:12291230```json1231{1232  "mcpServers": {1233    "scc": {1234      "command": "scc",1235      "args": ["--mcp"]1236    }1237  }1238}1239```12401241#### Claude Desktop Configuration12421243Add to your `claude_desktop_config.json`:12441245```json1246{1247  "mcpServers": {1248    "scc": {1249      "command": "/path/to/scc",1250      "args": ["--mcp"]1251    }1252  }1253}1254```12551256#### Exposed Tools12571258The MCP server exposes one tool:12591260**`analyze`** — Count lines of code, comments, blanks and estimate complexity for a project directory or file.12611262| Parameter | Type | Required | Description |1263|---|---|---|---|1264| `path` | string | no | Directory or file path to analyze. Defaults to current directory. |1265| `sort` | string | no | Column to sort by: `files`, `name`, `lines`, `blanks`, `code`, `comments`, `complexity`, `bytes`. Default: `files`. |1266| `by_file` | boolean | no | If true, return per-file results instead of per-language summary. |1267| `include_ext` | string | no | Comma-separated file extensions to include (e.g. `go,java,js`). |1268| `exclude_ext` | string | no | Comma-separated file extensions to exclude (e.g. `json,xml`). |1269| `no_duplicates` | boolean | no | Remove duplicate files from stats. |1270| `no_min_gen` | boolean | no | Ignore minified or generated files. |1271| `locomo` | boolean | no | Include LOCOMO (LLM cost) estimation in results. |1272| `locomo_preset` | string | no | LOCOMO model preset: `large`, `medium`, `small`, `local`. Default: `medium`. |12731274Results are returned as JSON with per-language breakdown (files, lines, code, comments, blanks, complexity, bytes), totals, and COCOMO cost/schedule estimates. When `locomo` is enabled, LOCOMO estimates (token counts, cost, generation time, review hours) are also included.12751276### Adding/Modifying Languages12771278To add or modify a language you will need to edit the `languages.json` file in the root of the project, and then run `go generate` to build it into the application. You can then `go install` or `go build` as normal to produce the binary with your modifications.12791280### Issues12811282Its possible that you may see the counts vary between runs. This usually means one of two things. Either something is changing or locking the files under scc, or that you are hitting ulimit restrictions. To change the ulimit see the following links.12831284- <https://superuser.com/questions/261023/how-to-change-default-ulimit-values-in-mac-os-x-10-6#306555>1285- <https://unix.stackexchange.com/questions/108174/how-to-persistently-control-maximum-system-resource-consumption-on-mac/221988#221988>1286- <https://access.redhat.com/solutions/61334>1287- <https://serverfault.com/questions/356962/where-are-the-default-ulimit-values-set-linux-centos>1288- <https://www.tecmint.com/increase-set-open-file-limits-in-linux/>12891290To help identify this issue run scc like so `scc -v .` and look for the message `too many open files` in the output. If it is there you can rectify it by setting your ulimit to a higher value.12911292### Low Memory12931294If you are running `scc` in a low memory environment < 512 MB of RAM you may need to set `--file-gc-count` to a lower value such as `0` to force the garbage collector to be on at all times.12951296A sign that this is required will be `scc` crashing with panic errors.12971298### Tests12991300scc is pretty well tested with many unit, integration and benchmarks to ensure that it is fast and complete.13011302### Package13031304Packaging as of version v3.1.0 is done through <https://goreleaser.com/>13051306### Containers13071308Note if you plan to run `scc` in Alpine containers you will need to build with CGO_ENABLED=0.13091310See the below Dockerfile as an example on how to achieve this based on this issue <https://github.com/boyter/scc/issues/208>13111312```Dockerfile1313FROM golang as scc-get13141315ENV GOOS=linux \1316GOARCH=amd64 \1317CGO_ENABLED=013181319ARG VERSION1320RUN git clone --branch $VERSION --depth 1 https://github.com/boyter/scc1321WORKDIR /go/scc1322RUN go build -ldflags="-s -w"13231324FROM alpine1325COPY --from=scc-get /go/scc/scc /bin/1326ENTRYPOINT ["scc"]1327```13281329### Badges13301331You can use `scc` to provide badges on your github/bitbucket/gitlab/sr.ht open repositories. For example, [![Scc Count Badge](https://sloc.xyz/github/boyter/scc/)](https://github.com/boyter/scc/)1332 The format to do so is,13331334<https://sloc.xyz/PROVIDER/USER/REPO>13351336An example of the badge for `scc` is included below, and is used on this page.13371338```Markdown1339[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/)](https://github.com/boyter/scc/)1340```13411342By default the badge will show the repo's lines count. You can also specify for it to show a different category, by using the `?category=` query string.13431344Valid values include `code, blanks, lines, comments, cocomo, effort` and examples of the appearance are included below.13451346[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/?category=code)](https://github.com/boyter/scc/)1347[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/?category=blanks)](https://github.com/boyter/scc/)1348[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/?category=lines)](https://github.com/boyter/scc/)1349[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/?category=comments)](https://github.com/boyter/scc/)1350[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/?category=cocomo)](https://github.com/boyter/scc/)1351[![Scc Count Badge](https://sloc.xyz/github/boyter/scc/?category=effort)](https://github.com/boyter/scc/)13521353For `cocomo` you can also set the `avg-wage` value similar to `scc` itself. For example,13541355<https://sloc.xyz/github/boyter/scc/?category=cocomo&avg-wage=1>1356<https://sloc.xyz/github/boyter/scc/?category=cocomo&avg-wage=100000>13571358Note that the avg-wage value must be a positive integer otherwise it will revert back to the default value of 56286.13591360You can also configure the look and feel of the badge using the following parameters,13611362- ?lower=true will lower the title text, so "Total lines" would be "total lines"13631364The below can control the colours of shadows, fonts and badges. Colors can be specified as either hex codes or named colors (similar to shields.io):13651366- ?font-color=fff1367- ?font-shadow-color=0101011368- ?top-shadow-accent-color=bbb1369- ?title-bg-color=5551370- ?badge-bg-color=4c113711372##### Named Colors13731374For convenience, you can use named colors instead of hex codes. The following named colors are supported:13751376**Shields.io colors:** `brightgreen`, `green`, `yellowgreen`, `yellow`, `orange`, `red`, `blue`, `lightgrey`, `blueviolet`13771378**Semantic aliases:** `success`, `important`, `critical`, `informational`, `inactive`13791380**CSS colors:** `white`, `black`, `silver`, `gray`, `maroon`, `purple`, `fuchsia`, `lime`, `olive`, `navy`, `teal`, `aqua`, `cyan`, `magenta`, `pink`, `coral`, `salmon`, `gold`, `khaki`, `violet`, `indigo`, `crimson`, `turquoise`, `tan`, `brown`, and many more standard CSS color names.13811382For example, instead of `?badge-bg-color=007ec6` you can use `?badge-bg-color=blue`.13831384An example of using some of these parameters to produce an admittedly ugly result13851386[![Scc Count Badge](https://sloc.xyz/github/boyter/scc?font-color=ff0000&badge-bg-color=0000ff&lower=true)](https://github.com/boyter/scc/)13871388An example using named colors for as a slightly nicer result13891390[![Scc Count Badge](https://sloc.xyz/github/boyter/scc?title-bg-color=navy&badge-bg-color=blue&font-color=white)](https://github.com/boyter/scc/)13911392*NB* it may not work for VERY large repositories (has been tested on Apache hadoop/spark without issue).13931394You can find the source code for badges in the repository at <https://github.com/boyter/scc/blob/master/cmd/badges/main.go>13951396#### A example for each supported provider13971398- Github - <https://sloc.xyz/github/boyter/scc/>1399- sr.ht - <https://sloc.xyz/sr.ht/~nektro/magnolia-desktop/>1400- Bitbucket - <https://sloc.xyz/bitbucket/boyter/decodingcaptchas>1401- Gitlab - <https://sloc.xyz/gitlab/esr/loccount>14021403### Languages14041405List of supported languages. The master version of `scc` supports 322 languages at last count. Note that this is always assumed that you built from master, and it might trail behind what is actually supported. To see what your version of `scc` supports run `scc --languages`14061407[Click here to view all languages supported by master](LANGUAGES.md)14081409### Citation14101411Please use the following bibtex entry to cite scc in a publication:14121413<pre>1414@software{scc,1415  author       = {Ben Boyter},1416  title        = {scc: v3.5.0},1417  month        = ...,1418  year         = ...,1419  publisher    = {...},1420  version      = {v3.5.0},1421  doi          = {...},1422  url          = {...}1423}1424</pre>14251426You may need to check the release page <https://github.com/boyter/scc/releases> to find the correct year and month for the release you are using.14271428### Release Checklist14291430- Update version1431- Push code with release number1432- Tag off1433- Release via goreleaser1434- Update dockerfile
Findings

✓ No findings reported for this file.
Findings

Get this view in your editor