Analyze your Dev Team's Programming Language Usage

Analyze your Dev Team's Programming Language Usage

Gitential can help through filetype correlation statistics

As an engineer manager, have you ever thought to track which programming languages your developers have been working with over a period of time? How closely aligned are your assessments of project requirements with the actual work and technologies required? Can you track these things? And how accurately? As it turns out, filetype extensions hold the key.

After examining almost a thousand different filetype extensions, we thought you might be curious about the results. How accurate are we in associating the files you are working with to the programming languages you’re using? After running a lot of tests, we discovered two things. First, that Gitential is presently about 99% accurate correlating file types to programming languages. Second, we need to develop more tests for that last 1%. We’re happy to roll out this update - essentially, you’ve already had this reflected in your analytics. It serves for us to pay down a bit of our own technical debt while hopefully providing extra confidence in the software analytics we provide you.

Why Do Filetype Classifications Matter?

If you’re already using Gitential, you’re already familiar with the variety of different reports you can run. If you’re not, we welcome you to try out our Demo. Filetype extensions are likely the easiest way to track which technologies your developers have been working on over any period of time.

The programming language statistics should align pretty closely to what you would expect with each developer’s assignments. If your work projections are off and potentially contributing to technical debt, evaluating work on file types is one means of finding out why. Maybe your projections were off, the developer had difficulties with their task, or their tasks proved more complex than anticipated. A higher frequency of bugs or bad code in particular file types or languages can guide further examination, as well. Combined with other reports, you can better identify pairings for code reviews.

The first snapshot below reflects the different languages our “sample team” has been working on over the past few months:

Languages used

Our second snapshot shows what one developer, Catherine, has been working on. If you were to look through the rest of the team’s work, you’d see that Catherine and Michael are responsible for the lion’s share of work with Java Server Pages (jsp files).

Technology stack

The Three filetype Categories in our Analytics

For transparency, we performed the analysis of the file extensions with the help of recognition programs. The program investigated the file contents and correlated it to one or more languages. The results showed that the thousand file types we examined fit three categories.

  • Category 1. Explicit file types File extensions associated with only one programming language. When Gitential’s analytics encounter these unambiguous file extensions, they are automatically associated with the corresponding language. This is the case for nearly 97% (968) of the files examined. These are listed in alphabetical order in Appendix C (below).
  • Category 2. Ambiguous Files Types. File extensions recognized as one of multiple programming languages. In example a frag file is used by JavaScript and GLSL. In these cases, the pattern recognition software conducts a hit-rate on the files and associates the filetype accordingly.

    For most of these cases, the pattern recognition consistently associated the file to the correct programming language. Our test size, however, was not large enough to completely eliminate any chance of a wrong classification. We’re working to improve recognition for inc and shader files, as noted in Appendix B.

  • Category 3. Rare Files. Some file extensions are so rare that we didn’t have recognition patterns for them. For these cases, we’ve created a Priority Catalog. We assigned them the most likely filetype as reflected in the catalog, Appendix A.

    Included in category 3, the cgi extension is sort of an exception as it can be used by any programming language.

Our team is investigating Category 2 and 3 file types to create new and/or better pattern-sets. Presently, we’re confident that 99% of filetype extensions are accurately attributed by our software - especially as relates to overall volume of files. Pinning down that last 1%, as we all know, will require more effort. We’re committed to ensuring developing our file recognition capabilities to be as precise as at all possible.

Do you have any questions about specific file types or see other important types that aren’t included in our report? Please let us know at We’d love to answer any questions you may have about how you can use Gitential to gather deep insight on your own software development efforts. Again, we welcome you to try our free demo - or sign up for a free trial, no credit card is needed.

Appendix A:

Extensions with a preselected filetype (the first filetype in the list will be the result):

Extension Possible file types
cake (CoffeeScript, C#)
cp (C++, Component Pascal)
cps (CPS, Component Pascal)
cs (C#, Smalltalk)
deface (HTML+ERB, Haml)
e (Eiffel, E)
eclxml (ECL, XML)
erb (HTML+ERB, Netlinx)
f (Fortran, Forth, Filebench WML)
fr (Forth, Text)
for (Fortran, Formatted, Forth)
frag (JavaScript, GLSL)
gml (Game Maker Language, Graph Modeling Language)
h (C, C++)
hl (HTML, Clojure)
in (Shell, Cmake)
inc (PAWN, C++, HTML, Assembly, PHP, POV-Ray SDL, Pascal, SQL)
m (Matlab, Limbo, MUF, Mathematica, Mercury)
ml (Standard, ML, OCaml)
mm (XML, C++)
mod (Modula-2, AMPL, Linux Kernel Module)
mysql (SQL, YAML)
re (Reason, C++)
shader (GLSL, ShaderLab)
st (Smalltalk, HTML)

Appendix B:

Extensions having file content investigation for determining the filetype:

Extension Possible file types Recognition rate
.erb (NetLinx, HTML+ERB) 100% 100%
.inc (C++, HTML, PAWN, Pascal, SQL, POV-Ray) 76% 🡨 needs more development
.mm (C++, XML) 100%
.frag (GLSL, JavaScript) 100%
.shader (GLSL, ShaderLab) 60% 🡨 needs more development
.f (FilebenchWML, Fortran77) 100%
.fr (GLSL, F#, Formatted) 100%
.cgi (can be any programming language) 100%

Appendix C:

.1in .di .ipf .os .smt2
.1m .diff .ipp .owl .soy
.1x .djs .ipynb .ox .sp
0.2 .dlm .irbrc .oxh .spacemacs
0.3 .dm .irclog .oxo .sparql
.3in .do .iss .oxygene .spc
.3m .dockerfile .j .oz .spec
.3qt .doh .jade .p .spin
.3x .dot .java .p4 .sps
0.4 .dpatch .jbuilder .p6 .sqf
.4th .dpr .jfif .p6l .sql
0.5 .druby .jflex .p6m .sra
0.6 .dtx .jif .pan .src
.6pl .duby .jinja .parrot .srt
.6pm .dwl .jison .pas .sru
0.7 .dyalog .jisonlex .pascal .srw
0.8 .dyl .jl .pasm .ss
.8xk .dylan .jpe .pat .sss
.8xk.txt .e .jpeg .patch .st
.8xp .eb .jpg .pb .stan
.8xp.txt .ebnf .jq .pbi .sthlp
0.9 .ebuild .js .pbt .ston
.9fs .ec .jscsrc .pck .sty
._coffee .ecl .jshintrc .pcss .styl
._ls .eclass .jslintrc .pd .sublime-build
.a51 .eclxml .json .pd_lua .sublime-commands
.abap .ecr .json-tmlanguage .pde .sublime-completions
.abbrev_defs .edc .json5 .pep .sublime-keymap
.abnf .editorconfig .jsonl .perl .sublime-macro
.ada .edn .jsonld .pfa .sublime-menu
.adb .eex .jsp .ph .sublime-mousemap
.ado .eh .jsx .php .sublime-project
.adoc .ejs .kicad_mod .php3 .sublime-settings
.adp .el .kicad_pcb .php4 .sublime-syntax
.ads .eliom .kicad_wks .php5 .sublime-theme
.afm .eliomi .kid .phtml .sublime-workspace
.agc .elm .kit .pic .sublime_metrics
.agda .em .kojo .pig .sublime_session
.ahk .emacs .krl .pike .sv
.ahkl .emacs.desktop .ksh .pir .svg
.aj .emberscript .kt .pkb .svh
.al .emf .ktm .pkgbuild .swf
.als .epj .kts .pkl .swift
.ampl .eps .l .pks .syntax
.angelscript .eq .lagda .pl .t
.anim .erb.deface .las .pl6 .tac
.apacheconf .erl .lasso .plb .tcl
.apib .escript .lasso8 .plot .tcsh
.apl .ex .lasso9 .pls .tea
.app.src .exs .latte .plsql .tern-config
.applescript .eye .lbx .plt .tern-project
.arc .f03 .ld .plx .tesc
.arcconfig .f08 .ldml .pm .tese
.arpa .f77 .lds .pm6 .tex
.as .f90 .lean .pmod .textile
.asax .f95 .less .png .tf
.asc .factor .lex .po .tfstate
.asciidoc .factor-boot-rc .lfe .pod .tfstate.backup
.ascx .factor-rc .lgt .podsl .tfvars
.asd .fan .lhs .podspec .tga
.ash .fancypack .lid .pogo .thor
.ashx .fea .lidr .pony .thrift
.asm .feature .liquid .pot .thy
.asmx .fish .lisp .pov .tif
.asn .flex .litcoffee .pp .tiff
.asp .flux .lkml .pprx .tl
.aspx .fnc .ll .prefab .tla
.asset .for .lmi .prefs .tm
.au3 .forth .login .prg .tmac
.aug .fp .logtalk .pri .tmux
.auk .fpp .lol .pro .toc
.aux .fr .lookml .profile .toml
.avsc .frg .lpr .prolog .tool
.awk .frm .ls .properties .topojson
.axd .frt .lsl .proto .tpb
.b .frx .lslp .prw .tpl
.babelrc .fsh .lsp .pryrc .tps
.backup .fshader .ltx .ps .trg
.bal .fsi .lua .ps1 .tst
.bas .fsscript .lvproj .psc .ttl
.bash .fsx .ly .psd1 .tu
  .fth .m .psgi .twig
.bash_history .ftl .m4 .psm1 .txl
.bash_logout .fun .ma .pub .txt
.bash_profile .fx .mak .pug .uc
.bashrc .fxh .make .purs .udo
.bat .fy .mako .pwn .unity
.bats .g .man .pxd .uno
.bb .g4 .mao .pxi .upc
.bbx .gap .markdown .py .ur
.bdy .gawk .marko .py3 .urs
.befunge .gbl .mask .pyde .v
.bf .gbo .mat .pyi .vala
.bib .gbp .mata .pyp .vapi
.bison .gbr .matah .pyt .vark
.blade .gbs .mathematica .pytb .vb
.blade.php .gclient .matlab .pyw .vba
.bmp .gco .mawk .pyx .vbhtml
.bmx .gcode .maxhelp .qbs .vbs
.boo .gd .maxpat .qml .vcl
.boot .gdb .maxproj .r .veo
.brd .gdbinit .mcr .r2 .vert
.bro .gemrc .md .r3 .vh
.brs .gemspec .mdown .rabl .vhd
.bsl .geo .mdwn .rake .vhdl
.bsv .geojson .me .raml .vhf
.builder .geom .mediawiki .raw .vhi
.bzl .gf .meta .rb .vho
.c .gi .metadata .rbbas .vhost
.c++ .gif .metal .rbfrm .vhs
.c++-objdump .gitconfig .minid .rbmnu .vht
.c++objdump .gko .mir .rbres .vhw
.c-objdump .glf .mirah .rbtbar .view.lkml
.cake .glsl .mk .rbuild .vim
.cbl .glslv .mkd .rbuistate .vimrc
.cbx .gltf .mkdn .rbw .viper
.cc .gml .mkdown .rbx .volt
.ccp .gms .mkfile .rbxs .vrx
.cdf .gn .mkii .rd .vsh
.cdr .gni .mkiv .rdoc .vshader
.ceylon .gnu .mkvi .re .vue
.cfc .gnuplot .ml .reb .vw
.cfg .gnus .ml4 .rebol .w
.cfm .go .mli .red .wast
.cfml .god .mll .reds .wat
.cginc .golo .mly .reek .watchr
.ch .gp .mmk .regex .wdl
.chem .gpb .mms .regexp .webapp
.chpl .gpt .mo .rei .webidl
.chs .gql .mod .rest .webmanifest
.cirru .grace .model.lkml .rest.txt .weechatlog
.cjsx .gradle .monkey .rex .wiki
.ck .gradlew .monkey2 .rexx .wisp
.cl .graphql .moo .rg .wl
.cl2 .groovy .moon .rhn .wlt
.clang-format .grt .mq4 .rhtml .wlua
.clang-tidy .gshader .mq5 .ring .wmf
.click .gsp .mqh .rkt .workbook
.clj .gst .ms .rktd .wsgi
.cljc .gsx .mspec .rktl .x
.cljs .gtl .mss .rl .x10
.cljs.hl .gto .mt .rmd .xc
.cljscm .gtp .mtl .rnh .xcompose
.cljx .gtpl .mtml .rno .xht
.clp .gts .mu .robot .xhtml
.cls .gv .muf .roff .xi
.clw .gvimrc .mustache .ron .xm
.cmake .gvy .mxt .rpy .xml .gyp .mysql .rq .xojo_code
.cmd .gypi .myt .rs .xojo_menu
.cob .h .n .xojo_report
.cobol .haml .nasm .rsc .xojo_script
.coffee .handlebars .nawk .rsh .xojo_toolbar
.com .hats .nb .rst .xojo_window
.command .hb .nbp .rst.txt .xpl
.conll .hbs .nc .rsx .xpm
.conllu .hcl .ncl .ru .xproc
.coq .hh .ne .ruby .xpy
.cp .hic .nearley .rviz .xq
.cpp .hl .nf .s .xql
.cpp-objdump .hlean .nginxconf .sage .xqm
.cppobjdump .hlsl .ni .sagews .xquery
.cps .hlsli .nim .sas .xqy
.cpy .hpp .nimrod .sass .xrl
.cr .hqf .ninja .sats .xs
.creole .hrl .nit .sbt .xsl
.cs .hs .nix .sc .xslt
.csd .hsc .njk .scad .xsp-config
.csh .htaccess .nl .scala .xsp.metadata
.cshrc .htm .nlogo .scaml .xtend
.cshtml .html .no .scd .y
.cson .htmlhintrc .nqp .sce .yacc
.css .http .nr .sci .yaml
.csv .hx .nse .scm .yaml-tmlanguage
.csx .hxsl .nsh .sco .yang
.cu .hxx .nsi .scpt .yap
.cuh .hy .nu .scrbl .yar
.cw .i7x .numpy .scss .yara
.cwl .iced .numpyw .self .yml
.cxx .icl .numsc .sexp .yrl
.cxx-objdump .idr .nut .sfd .yy
.cy .ihlp .nvimrc .sh .zep
.d .ijs .ny .sh-session .zimpl
.dae .ik .obj .sig .zlogin
.darcspatch .ily .objdump .sj .zlogout
.dart .in .ol .sl .zmpl
.dats .ini .omgrofl .sld .zone
.dcl .inl .ooc .slim .zpl
.decls .ino .opa .sls .zprofile
.deface .ins .opal .sma .zsh
.desktop .intr .opencl .smali .zshenv .io .orc .sml .zshrc
.dfm .iol .org .smt  
Gitential Team's Picture

Gitential Team