Emacs, forcing semantic to parse all source files under a directory

Elisp code to force the semantic package of your emacs to parse all source files matching a particular regular expression under a given directory

Note: The setup detailed in this post is a bit outdated. You can find a more interesting C/C++ configuration for Emacs in this blog post.

Long story short I have been using Cedet for my C/C++ development in Emacs. The most awesome part of Cedet is Semantic, which among other things can act as a Lexical Analyzer & Preprocessor, a Database of Tags, a Parser and Parser Generator. For a very nice intro to Semantic and Cedet in general I would suggest A gentle introduction to Cedet by Alex Ott.

But for all its usefulness it has some annoying quirks. You can specificy a project directory and have cedet parse all header files by using EDE and (ede-cpp-root …) and/or (global-ede-mode t). This in conjunction with the various semantic options can succesfully parse all header files under your project’s directories most of the times. The problem is that it totally ignores source files unless you explicitly open them yourself. This screws up operations like (semantic-analyze-proto-impl-toggle) which toggles between the declaration and the implementation of a function and more.

My elisp skills are not that great since I only use it for my emacs needs but I have devised the following solution. An elisp function that parses all of the files of a given directory that match a regular expression and subsequently feed them to semantic. Here is the elisp file of the implementation along with some helper functions:

(defvar c-files-regex ".*\\.\\(c\\|cpp\\|h\\|hpp\\)"
  "A regular expression to match any c/c++ related files under a directory")
 
(defun my-semantic-parse-dir (root regex)
  "
   This function is an attempt of mine to force semantic to
   parse all source files under a root directory. Arguments:
   -- root: The full path to the root directory
   -- regex: A regular expression against which to match all files in the directory
  "
  (let (
        ;;make sure that root has a trailing slash and is a dir
        (root (file-name-as-directory root))
        (files (directory-files root t ))
       )
    ;; remove current dir and parent dir from list
    (setq files (delete (format "%s." root) files))
    (setq files (delete (format "%s.." root) files))
    ;; remove any known version control directories 
    (setq files (delete (format "%s.git" root) files))
    (setq files (delete (format "%s.hg" root) files))
    (while files
      (setq file (pop files))
      (if (not(file-accessible-directory-p file))
          ;;if it's a file that matches the regex we seek
          (progn (when (string-match-p regex file)
                   (save-excursion
                     (semanticdb-file-table-object file))
           ))
          ;;else if it's a directory
          (my-semantic-parse-dir file regex)
      )
     )
  )
)
 
(defun my-semantic-parse-current-dir (regex)
  "
   Parses all files under the current directory matching regex
  "
  (my-semantic-parse-dir (file-name-directory(buffer-file-name)) regex)
)
 
(defun lk-parse-curdir-c ()
  "
   Parses all the c/c++ related files under the current directory
   and inputs their data into semantic
  "
  (interactive)
  (my-semantic-parse-current-dir c-files-regex)
)
 
(defun lk-parse-dir-c (dir)
  "Prompts the user for a directory and parses all c/c++ related files
   under the directory
  "
  (interactive (list (read-directory-name "Provide the directory to search in:")))
  (my-semantic-parse-dir (expand-file-name dri) c-files-regex)
)
 
(provide 'lk-file-search)

The function is quite simple to use. You can either call it directly from your own elisp code like so:
(my-semantic-parse-dir "path/to/dir/root/" ".*regex") or press M-x lk-parse-curdir-c from a buffer to recursively scan all c/c++ related files from that buffer’s visiting filename directory. An alternate way to call the function is to specify the directory explicitly by calling the function (lk-parse-dir-c) which will prompt you for a directory.

Needless to say that you can fine-tune what kind of files to filter by playing with the elisp regular expression at the top of the file or by adding your own.

The above function has the advantage of not needing to visit any buffers. It simply feeds files to semantic. There is another implementation, let’s call it version 2 that is much slower but uses the same function that semantic would call in order to fetch the tags of the current buffer. I don’t think that it’s needed but for completeness sake I am adding it to the post since this is how I started experimenting with this feature.

(defun my-semantic-parse-dir-v2 (root regex)
  "
   This function is an attempt of mine to force semantic to
   parse all source files under a root directory. Arguments:
   -- root: The full path to the root directory
   -- regex: A regular expression against which to match all files in the directory
  "
  (let (
        ;;make sure that root has a trailing slash and is a dir
        (root (file-name-as-directory root))
        (files (directory-files root t ))
       )
    ;; remove current dir and parent dir from list
    (setq files (delete (format "%s." root) files))
    (setq files (delete (format "%s.." root) files))
    ;; remove any known version control directories 
    (setq files (delete (format "%s.git" root) files))
    (setq files (delete (format "%s.hg" root) files))
    (while files
      (setq file (pop files))
      (if (not(file-accessible-directory-p file))
          ;;if it's a file that matches the regex we seek
          (progn (when (string-match-p regex file)
                      ;; get if buffer is open or open it if it's not
                      (setq buff (get-file-buffer file))
                      (setq opened-buffer nil)
                      (when (not buff)
                        (progn 
                          (setq opened-buffer t)
                          (setq buff (find-file-noselect file))
                      ))
                      (switch-to-buffer buff)
                      ;; parse the buffer with semantic
                      (semantic-fetch-tags)
                      ;; if we opened the buffer cleanup before proceeding
                      (when opened-buffer
                        (kill-buffer (buffer-name))
                      )
           ))
          ;;else if it's a directory
          (my-semantic-parse-dir file regex)
      )
     )
  )
)

It’s easy to see that this is a much more complicated and slow function. The reason is that (semantic-fetch-tags) can only work on the current buffer which means we have to open all matching files as buffers, visit them, fetch tags and then cleanup.

So for anyone having the same problem I would suggest using the first version I showed in this blogpost. As I stated in the beginning of the post I am no elisp expert and as such I would greatly appreciate any and all comments and suggestions to improve the code and its interface with semantic.

5 thoughts on “Emacs, forcing semantic to parse all source files under a directory”

  1. Thanks a lot! I’ve been using CEDET/Semantic/Company for awhile now but it always bugged me that it didn’t find all include files. And I don’t find it a good solution to use EDE projects because I have loads of projects. So your implementation works very well. 🙂

  2. Thanks, very usefull!!
    how would be the elisp regular expression to add the sufixless files of the STL (algorithm, etc..,)

  3. Just drop the suffic part and instead use the < and > as start and point of the regex.

Leave a Reply

Your email address will not be published. Required fields are marked *