Wandering Thoughts archives

2019-08-30

How I'm dealing with my Python indentation problem in GNU Emacs

The current (cultural) standard for indentation in Python is four space indent levels and indenting only with spaces, never tabs; this is what GNU Emacs' python mode defaults to and what YAPF and other code formatters use. Our new and updated Python 3 code is written in this official standard, as is some relatively recent Python 2 code. However, I spent a very long time writing Python code using 8-space indent levels and tab-based indentation, which means that I have a great deal of existing Python code in this style, including almost all of our existing Python 2 code at work and all of DWiki. For various reasons I don't want to reformat or reindent all of this code, so I want to work on existing code in its current style, whatever that is. Because Python 3 doesn't like it when you mix spaces and tabs, this should include the use of tabs in indentation.

My existing .emacs settings for this across various different systems were basically an inconsistent mess. On my desktop I was reflexively clinging to my old indentation style with various python mode settings; on our Ubuntu login servers, I'd stopped overriding the python mode defaults due to shifting toward the standard style, but that left me with the tab problem. Today, as part of dealing with my .emacs in general, I decided that I wanted to have the same .emacs everywhere, and that drove me to actively work out a solution.

First, I realized that if I was willing to really commit to shifting my indentation style to the standard on, the only real problem I had was with tabs. GNU Emacs's python mode will automatically detect the current indentation level for existing Python code, and for new files I'll use 4-space indents with spaces no matter what the other files existed in the project. For tabs, I want to continue using tabs if and only if the file is already in my old 8-space tab based indentation style, so the only problem is detecting this.

As far as I can tell there are no existing GNU Emacs features or functions to do this, so I wrote some ELisp to be run as a python-mode hook (which means it happens on a per-file basis). I won't claim it's very good ELisp, but here it is:

(defun cks/leading-tabs-p ()
  "Detect if the current buffer has a line with leading tab(s)."
  (save-excursion
    (save-restriction
      (widen)
      (goto-char (point-min))
      (if (re-search-forward "^\t+" nil t)
          t
        nil))))

(add-hook 'python-mode-hook
          (lambda ()
            (if (and (= python-indent-offset 8) (cks/leading-tabs-p))
                (setq indent-tabs-mode t))))

The detection of tab-indented lines here is highly imperfect and can be fooled by all sorts of things, but for my purposes it's good enough; misfires are unlikely in practice. I'm not sure I even have any Python code that uses 8-space indentation but without tabs.

(The start of cks/leading-tabs-p is copied directly from the python-mode function that scans the buffer to determine the indentation level it currently uses. The function naming is superstition based on what I've seen around the Internet.)

I also decided to write some ELisp functions to toggle back and forth between the modern style and my old style and to report the indentation state of a buffer:

(defun cks/python-toggle ()
  "Toggle between old-style Python 2 and modern Python 3 settings."
  (interactive)
  (if (= python-indent-offset 8)
      (progn (setq indent-tabs-mode nil) (setq python-indent-offset 4)
             (message "Set to modern Python 3 (4-level spaces)"))
    (progn (setq indent-tabs-mode t) (setq python-indent-offset 8)
           (message "Set to ancient Python 2 with tabs"))))

(defun cks/rep-python ()
  "Report the Python indentation status of the current buffer."
  (interactive)
  (message "Python indentation is %d-space indents with %s %s" python-indent-offset
           (if (eq indent-tabs-mode t) "tabs" "spaces only")
           (cond ((and (= python-indent-offset 4) (eq indent-tabs-mode nil))
                  "(Python 3 standard)")
                 ((and (= python-indent-offset 8) (eq indent-tabs-mode t))
                  "(my Python 2 style)")
                 (t "(something weird)"))))

It's deliberate that after cks/python-toggle, I'm in one or the other of my standard indentation styles, even if the buffer started out in some weird style.

PS: Both python-indent-offset and indent-tabs-mode are buffer-local variables by the time I get my hands on them, so I can just directly use setq and so on. There may be a better way to do this these days, but my ELisp knowledge is old and rusty.

EmacsPythonIndentation written at 00:13:43; Add Comment

2019-08-18

Early notes on using LSP-based editing in GNU Emacs for Python

The two languages that I most use GNU Emacs for these days are Go and Python. After I got LSP-based editing working for Go, I decided to take a run at getting it to work for Python as well. Python is one of the languages that lsp-mode supports, through pyls, so I was hoping that it would be an install and go experience. The reality was not quite so smooth and I've wound up with some open questions and uncertainties.

As I usually do with pip-based install instructions, I opted to use 'pip install --user', which puts the resulting programs in ~/.local/bin. Since this isn't on my regular $PATH, I had to arrange for GNU Emacs to be able to see the pyls program before lsp-mode could do anything. Once it did, warnings popped up all over the Python code that I tried it out on, because I'd installed it as 'python-language-server[all]', which installs all linters and checkers. I must regretfully report that my code is not at all clean to all of them; for example, I frequently use short variable names that are all in lower case. After poking at this a bit I decided that I didn't want any linters right now. Some of the linters apparently could be disabled by 'pip uninstall', but others have standard Ubuntu versions and it's not clear how to tell lsp-mode to tell pyls to turn them off, and anyway some of them may be used to detect outright syntax errors, which I would like flagged.

Talking of syntax errors brings up the next issue, which is Python 2 versus Python 3. While we're moving towards Python 3, we still have plenty of Python 2 code, and so I would like a LSP-based setup that works smoothly with both. Unfortunately, as far as I can see pyls is at least partially specific to the version of Python you install it for. I actually used pip3 to install the Python 3 versions of things (since that's our future and seems the right choice if I have to pick one). This still seems to at least partially work for some test Python 2 code, in that in simple navigation works, but various syntax warnings and so on appear and there may be other LSP things that don't.

(As far as I can tell, pyls has no particular provisions for picking Python versions, which is not surprising. Some things I've read suggest that most people who have to deal with this use per-project virtualenvs, and Python 2 projects would then have the Python 2 version of pyls installed in their virtualenv. Manually starting GNU Emacs with a $PATH that finds the Python 2 version of pyls first does seem to work right, and I may be able to partially automate this with a frontend script for pyls that tries to figure out which Python version is more likely for the current context.)

All of this makes me fairly uncertain about whether lsp-mode is currently worth it for my Python programming. It does give me nice things like completions, but it's probably not going to be a set and forget thing the way it is for Go. Probably I'm going to be shaving more yaks before I have clear answers.

(There are various writeups on the net of using Python with lsp-mode but they seem to mostly come from people who already know a lot of Emacs, which is not me these days. Reading them and flailing away at my .emacs has been a humbling experience.)

PS: As usual, writing this entry pushed me to go further, try more things, and do more experimentation than I had at the start, which is a good thing.

PythonEmacsLSPNotes written at 22:56:47; Add Comment

2019-08-17

A situation where Python has undefined values

In most of Python, either a name has a value or it doesn't exist and attempts to access it will fail with some variation of 'that's not defined'. You get NameError for globals and AttributeError for attributes of objects, classes, and interestingly also for modules. Similarly, accessing a nonexistent key in a dictionary gets you a KeyError, also saying that 'this doesn't exist'.

(This means that code inside a module gets a different error for a nonexistent module variable than code outside it. I think this is just an artifact of how the name is accessed.)

But local variables in functions are different and special:

>>> def afunc():
...   print(a)
...   a = 10
... 
>>> afunc()
[...]
UnboundLocalError: local variable 'a' referenced before assignment

When we do the print(), the name a exists as a local variable (at least in some sense), but its value is undefined (and an error) instead of being, say, None. If a was not even a local variable, we should get either some variant of 'name not defined' or we'd access a global a if it existed.

(I say that a exists in some sense because it doesn't fully exist; for example, it is not in the dictionary that locals() will return.)

At one level this is a straightforward consequence of how local variables are implemented in CPython. All references to local variables within a function use the same fast access method, whether or not a value has been bound to the local variable. When no value has been set, you get an error.

At another level, this is a sensible language design decision regardless of the specifics of the implementation. Python has decided that it has lexically scoped local variables, and this opens up the possibility of accessing a local variable before it's had a value set (unlike globals and attributes). When this happens, you have three choices; you can invent an arbitrary 'unset' value, such as None, you can generate a 'name does not exist' error, or you can generate a unique error. Python doesn't have zero values in the way that a language like Go does (fundamentally because the meaning of variables is different in the two languages), so the first choice would be unusual. The second choice would be a confusing pretense, because the name actually does exist and is in fact blocking you from accessing a global version of the name. That leaves the third choice of a unique error, which is at least clear even if it's unusual.

(This sprung from a Twitter thread.)

UndefinedLocalVariables written at 23:31:36; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.