2019-08-30
How I'm dealing with my Python indentation problem in GNU Emacs
The current (cultural) standard for indentation in Python is four space indent levels and indenting only with spaces, never tabs; this is what GNU Emacs' python mode defaults to and what YAPF and other code formatters use. Our new and updated Python 3 code is written in this official standard, as is some relatively recent Python 2 code. However, I spent a very long time writing Python code using 8-space indent levels and tab-based indentation, which means that I have a great deal of existing Python code in this style, including almost all of our existing Python 2 code at work and all of DWiki. For various reasons I don't want to reformat or reindent all of this code, so I want to work on existing code in its current style, whatever that is. Because Python 3 doesn't like it when you mix spaces and tabs, this should include the use of tabs in indentation.
My existing .emacs
settings for this across various different
systems were basically an inconsistent mess. On my desktop I was
reflexively clinging to my old indentation style with various python
mode settings; on our Ubuntu login servers, I'd stopped overriding
the python mode defaults due to shifting toward the standard style, but that left me with the tab problem.
Today, as part of dealing with my .emacs
in general, I decided
that I wanted to have the same .emacs
everywhere, and that drove
me to actively work out a solution.
First, I realized that if I was willing to really commit to shifting my indentation style to the standard on, the only real problem I had was with tabs. GNU Emacs's python mode will automatically detect the current indentation level for existing Python code, and for new files I'll use 4-space indents with spaces no matter what the other files existed in the project. For tabs, I want to continue using tabs if and only if the file is already in my old 8-space tab based indentation style, so the only problem is detecting this.
As far as I can tell there are no existing GNU Emacs features or functions to do this, so I wrote some ELisp to be run as a python-mode hook (which means it happens on a per-file basis). I won't claim it's very good ELisp, but here it is:
(defun cks/leading-tabs-p () "Detect if the current buffer has a line with leading tab(s)." (save-excursion (save-restriction (widen) (goto-char (point-min)) (if (re-search-forward "^\t+" nil t) t nil)))) (add-hook 'python-mode-hook (lambda () (if (and (= python-indent-offset 8) (cks/leading-tabs-p)) (setq indent-tabs-mode t))))
The detection of tab-indented lines here is highly imperfect and can be fooled by all sorts of things, but for my purposes it's good enough; misfires are unlikely in practice. I'm not sure I even have any Python code that uses 8-space indentation but without tabs.
(The start of cks/leading-tabs-p is copied directly from the python-mode function that scans the buffer to determine the indentation level it currently uses. The function naming is superstition based on what I've seen around the Internet.)
I also decided to write some ELisp functions to toggle back and forth between the modern style and my old style and to report the indentation state of a buffer:
(defun cks/python-toggle () "Toggle between old-style Python 2 and modern Python 3 settings." (interactive) (if (= python-indent-offset 8) (progn (setq indent-tabs-mode nil) (setq python-indent-offset 4) (message "Set to modern Python 3 (4-level spaces)")) (progn (setq indent-tabs-mode t) (setq python-indent-offset 8) (message "Set to ancient Python 2 with tabs")))) (defun cks/rep-python () "Report the Python indentation status of the current buffer." (interactive) (message "Python indentation is %d-space indents with %s %s" python-indent-offset (if (eq indent-tabs-mode t) "tabs" "spaces only") (cond ((and (= python-indent-offset 4) (eq indent-tabs-mode nil)) "(Python 3 standard)") ((and (= python-indent-offset 8) (eq indent-tabs-mode t)) "(my Python 2 style)") (t "(something weird)"))))
It's deliberate that after cks/python-toggle
, I'm in one or the
other of my standard indentation styles, even if the buffer started
out in some weird style.
PS: Both python-indent-offset and indent-tabs-mode are buffer-local
variables by the time I get my hands on them, so I can just directly
use setq
and so on. There may be a better way to do this these days,
but my ELisp knowledge is old and rusty.
2019-08-18
Early notes on using LSP-based editing in GNU Emacs for Python
The two languages that I most use GNU Emacs for these days are Go and Python. After I got LSP-based editing working for Go, I decided to take a run at getting it to work for Python as well. Python is one of the languages that lsp-mode supports, through pyls, so I was hoping that it would be an install and go experience. The reality was not quite so smooth and I've wound up with some open questions and uncertainties.
As I usually do with pip-based install instructions, I opted to use
'pip install --user
', which puts the resulting programs in
~/.local/bin
. Since this isn't on my regular $PATH
, I had to
arrange for GNU Emacs to be able to see the pyls
program before
lsp-mode could do anything. Once it did, warnings popped up all
over the Python code that I tried it out on, because I'd installed
it as 'python-language-server[all]', which installs all linters and
checkers. I must regretfully report that my code is not at all clean
to all of them; for example, I frequently use short variable names
that are all in lower case. After poking at this a bit I decided
that I didn't want any linters right now. Some of the linters
apparently could be disabled by 'pip uninstall
', but others have
standard Ubuntu versions and it's not clear how to tell lsp-mode
to tell pyls to turn them off, and anyway some of them may be used
to detect outright syntax errors, which I would like flagged.
Talking of syntax errors brings up the next issue, which is Python 2 versus Python 3. While we're moving towards Python 3, we still have plenty of Python 2 code, and so I would like a LSP-based setup that works smoothly with both. Unfortunately, as far as I can see pyls is at least partially specific to the version of Python you install it for. I actually used pip3 to install the Python 3 versions of things (since that's our future and seems the right choice if I have to pick one). This still seems to at least partially work for some test Python 2 code, in that in simple navigation works, but various syntax warnings and so on appear and there may be other LSP things that don't.
(As far as I can tell, pyls has no particular provisions for picking
Python versions, which is not surprising. Some things I've read
suggest that most people who have to deal with this use per-project
virtualenvs, and Python 2 projects would then have the Python 2
version of pyls installed in their virtualenv. Manually starting
GNU Emacs with a $PATH
that finds the Python 2 version of pyls
first does seem to work right, and I may be able to partially
automate this with a frontend script for pyls that tries to figure
out which Python version is more likely for the current context.)
All of this makes me fairly uncertain about whether lsp-mode is currently worth it for my Python programming. It does give me nice things like completions, but it's probably not going to be a set and forget thing the way it is for Go. Probably I'm going to be shaving more yaks before I have clear answers.
(There are various writeups on the net of using Python with lsp-mode
but they seem to mostly come from people who already know a lot of
Emacs, which is not me these days. Reading them and flailing away at
my .emacs
has been a humbling experience.)
PS: As usual, writing this entry pushed me to go further, try more things, and do more experimentation than I had at the start, which is a good thing.
2019-08-17
A situation where Python has undefined values
In most of Python, either a name has a value or it doesn't exist
and attempts to access it will fail with some variation of 'that's
not defined'. You get NameError
for globals and AttributeError
for attributes of objects, classes, and interestingly also for
modules. Similarly, accessing a nonexistent key in a dictionary
gets you a KeyError
, also saying that 'this doesn't exist'.
(This means that code inside a module gets a different error for a nonexistent module variable than code outside it. I think this is just an artifact of how the name is accessed.)
But local variables in functions are different and special:
>>> def afunc(): ... print(a) ... a = 10 ... >>> afunc() [...] UnboundLocalError: local variable 'a' referenced before assignment
When we do the print()
, the name a
exists as a local variable
(at least in some sense), but its value is undefined (and an error)
instead of being, say, None
. If a
was not even a local variable,
we should get either some variant of 'name not defined' or we'd
access a global a
if it existed.
(I say that a
exists in some sense because it doesn't fully
exist; for example, it is not in the dictionary that locals()
will return.)
At one level this is a straightforward consequence of how local variables are implemented in CPython. All references to local variables within a function use the same fast access method, whether or not a value has been bound to the local variable. When no value has been set, you get an error.
At another level, this is a sensible language design decision
regardless of the specifics of the implementation. Python has decided
that it has lexically scoped local variables,
and this opens up the possibility of accessing a local variable
before it's had a value set (unlike globals and attributes). When
this happens, you have three choices; you can invent an arbitrary
'unset' value, such as None
, you can generate a 'name does not
exist' error, or you can generate a unique error. Python doesn't
have zero values in the way that a language like Go does (fundamentally
because the meaning of variables is different in the two languages),
so the first choice would be unusual. The second choice would be a
confusing pretense, because the name actually does exist and is in
fact blocking you from accessing a global version of the name. That
leaves the third choice of a unique error, which is at least clear
even if it's unusual.
(This sprung from a Twitter thread.)