I recently signed up to stdlib-sig so I could just nod in agreement to the people that suggested that the stdlib needs to evolve. In the discussions that ensued, the backwards compatibility argument came up often. I think it's not a valid argument for the specific discussion, though. Here are my thoughts.

What is stdlib?

The standard library is a set of packages and modules that get shipped by default with the Python interpreter. Despite what I thought when I was starting with Python, it doesn't represent the so-called "best-practices" modules. There is no overarching design that dictates why things are the way they are. Instead, it just represents a set of modules that have been picked up at some point in time. Some of them are close to the language (such as collections, itertools and others) and some are domain-specific tools. The difference is significant.

Here is an attempt to divide the ones visible at the docs. Of course, I'm choosing here based on personal preference, YMMV. Here we go:

Language

String Services:

string, re, struct, StringIO, cStringIO, codecs, unicodedata

Data types:

datetime, collections, heapq, bisect, array, sets, sched, mutex, queue, weakref, UserDict, UserList, UserString, types, new, copy, pprint, repr

Numeric and Mathematical Modules:

numbers, math, cmath, decimal, fractions, random, itertools, functools, operator

(why itertools, functools and operator are here is beyond me)

File and Directory Access:

os.path, stat, statvfs, filecmp, tempfile, glob, fnmatch, shutil

Data Persistence:

pickle, cPickle, copy_reg, shelve, marshal

Data Compression and Archiving:

zlib, gzip, bz2, zipfile, tarfile

  • this category can argued to be both domain-speficic and close to language.

Cryptographic Services:

hashlib, hmac, md5, sha

Generic Operating System Services:

os, io, time, getpass, platform, errno, ctypes

Optional Operating System Services:

select, threading, thread, dummy_threading, dummy_thread, multiprocessing, mmap

Interprocess Communication and Networking:

subprocess, socket, ssl, signal, popen2, asyncore, asynchat

  • asyncore and asynchat can be said to be domain specific, but async io is fundamental IMO

Internet Data Handling:

base64, binhex, binascii, uu

  • these should probably live alongside codecs

Internet Protocols and Support:

wsgiref, uuid

  • wsgi is meant as an interop protocol, so I put it close to the language.

Internationalization:

gettext, locale

Development Tools:

pydoc, doctest, unittest, 2to2, test

Debugging and Profiling

bdb, pdb, hotshot, timeit, trace

Python Runtime Services

sys, __builtin__, future_builtins, __main__, warnings, contextlib, abc, atexit, traceback, __future__, gc, inspect, site, user, fpectl

Importing Modules:

imp, imputil, zipimport, pkgutil, modulefinder, runpy

Python Language Services:

parser, ast, symtable, symbol, token, keyword, tokenize, tabnanny, py_compile, compileall, dis, pickletools, distutils, pyclbr, compiler

  • a lot of these arguably are domain-speficic, but given the domain is Python...
  • the compiler package is included here as well

Domain specific

String Services:

difflib, textwrap, stringprep, fpformat

Data types:

calendar

File and Directory Access:

fileinput, linecache, dircache, macpath

Data Persistence:

anydbm, whichdbm, dbm, gdbm, dbhash, bsddb, dubmdbm, sqlite3

File Formats:

csv, ConfigParser, robotparser, nterc, xdrlib, plistlib

Generic Operating System Services:

optparse, getopt, logging, curses, curses.*

Optional Operating System Services:

readline, rlcompleter

Internet Data Handling:

email, json, mailcap, mailbox, mhlib, mimetools, mimetypes, MimeWriter, mimify, multifile, rfc822, quopri

Structured Markup Processing Tools:

HTMLParser, sgmllib, htmllib, htmlentitydefs, xml.*

Internet Protocols and Support:

webbrowser, cgi, cgitb, urllib, urllib2, httplib, ftplib, poplib, imaplib, nntplib, smtplib, smtpd, telnetlib, urlparse, SocketServer, BaseHTTPServer, SimpleHTTPServer, CGIHTTPServer, cookielib, Cookie, xmlrpclib, SimpleXMLRPCServer, DocXMLRPCServer

Multimedia Services:

audioop, imageop, aifc, sunau, wave, chunk, colorsys, imghdr, sndhdr, ossaudiodev

Program Frameworks

cmd, shlex

GUI with Tk:

Tkinter, Tix, ScrolledText, turtle, IDLE, Others

Custom Python Interpreters:

code, codeop

Restricted Execution:

rexec, Bastion

  • Both have been removed from Python 3.0

Miscellaneous Services:

formatter

MS Windows Specific Services:

msilib, msvcrt, _winreg, winsound

Unix Specific Services:

posix, pwd, spwd, grp, crypt, dl, termios, tty, pty, fcntl, pipes, posixfile, resource, nis, syslog, commands

Mac OS X specific services:

ic, MacOS, macostools, findertools, EasyDialogs, Framework, autoGIL, ColorPicker

MacPython OSA Modules:

gensuitemodule, aetools, aepack, aetypes, MiniAEFrame

SGI IRIX Specific Services:

al, AL, cd, dl, DL, flp, fm, gl, DEVICE, GL, imgfile, jpeg

SunOS Specific Services:

sunaudiodev, SUNAUDIODEV

GRAND TOTAL

130 language-related

151 domain-specific

Damn isn't that a lot of packages. For reference, PyPI currently hosts ~7500 of them. Truly, Python has a lot of batteries.

A platform, or a framework?

I can easily see a neat split there - the first half is Python, the platform. The second half are the batteries. However nowadays the batteries are not enough. While you may be able to write a quick and dirty script with them, if you're doing web stuff you're probably using another framework, if you're doing desktop stuff you're probably using another toolkit as well. Of course, there are other uses I probably don't know nothing about, and for them Python becomes the framework.

I know that many frameworks built on top of Python-the-platform start with the batteries, and then they start writing their own implementations to fix bugs or add features. Django-the-framework runs on Python-the-platform 2.3-2.6 so it can't rely on features being present or bugs fixed in the batteries - it has its own.

Backwards compatible

My issue with the backwards compatibility argument is this: No one forces anyone to update to any version of Python. Developers make a conscious decision - to develop software for a specific (or a range of) version of Python, and specific versions for all the other libraries they depend on. Any change to the dependencies of a piece of software may lead to breakage. I see no reason why Python should be different for that purpose.

I can't see backwards compatibility as an argument against upgrading Python, adding features, deprecating and removing modules, and of course fixing bugs. (Aside: Microsoft is so backwards compatible so as to emulate bugs if important programs need it. We don't want to do that!). Instead, I see backwards compatibility as an argument for better isolation of Python-the-framework. If a program needs specific versions of Python and libraries, it should be trivial to guard them against change. If an operating system depends on a specific version of Python, it should hide it away and not allow modifications.

On the other hand, I would argue against radical changes to Python-the-platform. Of course, this has been the case so far, with one exception in Python 3.0 to fix issues that needed to be fixed. In fact, there's a nice forwards-compatible feature for changes to the platform - __future__. People have been upgrading to new Python features with minor complaints, so I don't see why changing the batteries part of stdlib is tickling people so much.

Best of breed

When I started with Python, I only used modules from stdlib - I had no idea about PyPI, and I assumed that things from python-core would be more high-quality. However, this is only true for the language modules, not the domain specific modules. The reason is simple - python-core are experts on Python and language design, but not experts on the numerous domains the batteries cover. There are now replacements for most, if not all (os-specific stuff probably excluded) domain-specific modules. People trying to get a GUI running with Tk and not knowing about wx, Qt, Gtk, or the platform-specific choices is bad. People trying to do image manipulation and not knowing about PIL is bad.

I would argue that domain-specific parts should be spun off the stdlib and be released as separate PyPI modules. We can keep Python-the-framework going by having a download with the kitchen sink provided (as Jesse Noller proposed), and cooperate with packagers/distributions so that they can fortify their installations against change.

Conclusion

The argument on stdlib-sig is huge, and thankfully it seems that something is getting done in the end. I expect a some people to agree with me, and some to disagree. Writing my thoughts makes me think, so please keep in mind that I am willing to be persuaded otherwise, with the correct arguments.

As far as my day to day use is concerned, 99% of the batteries could disappear from my site-packages, and I would not care. Of course, packages I actually use and import (twisted, pyobjc) would care (actually, both of those will most likely use their own batteries). I wonder for how many people is this situation familiar. Find your imports, and see what the results are.

September 16, 2009, 4:50 p.m. More (1416 words) 10 comments Feed
Previous entry: Athens Python User Group Venue
Next entry: Athens Python UG - 1st meeting results

Comments

1

Comment by jesse , 5 years, 1 month ago :

fwiw - some of the modules you list are marked with a deprecation warning. I trimmed those from my initial count.

2

Comment by Orestis Markou , 5 years, 1 month ago :

Yes, but only a depressingly low number. I started marking those as such, but it didn't make much difference.

3

Comment by dgou , 5 years, 1 month ago :

Yep, I find the whole "I want to upgrade and yet nothing will change" argument very puzzling.

In my experience with corporate users, upgrading is a -huge- deal logistically, and if there is nothing new/improved/better to get, they won't do it. Even if there is, they're not likely to do it.

There has to be something very compelling. Transition strategy? Yes. Stagnation? No.

4

Comment by Paddy3118 , 5 years, 1 month ago :

I think new additions should either be very innovative, tackling areas that no other module comes close to providing a solution for, or a drop-in replacement for one and hopefully more, existing modules giving more functionality and/or a simplified interface.

In any case, additions should be mature. The standard library should not be seen as a place to hold change for changes sake. The discussion about the addition of a new optparse replacement covers an area already served, and is not backwards compatable. How old is it? from what i saw, it relied on the type annotations of Python 3. What about Python 2.X?

Standards, almost by definition are slow to change, and this is one of their strengths. I would hate for people to loose confidence in the standard library because the lifetime of its contents was seen as fleeting.

- Paddy.

5

Comment by Marius Gedminas , 5 years, 1 month ago :

"Nobody is forcing you to upgrade to a newer version of Python" is a specious argument. Perhaps I need a newer version because it has a critical bugfix to stdlib package X that I use. Perhaps I'm upgrading my whole OS because I need a (non-Python) package Z, or because the security support is now discontinued, and the new version of the OS no longer has Python 2.3 or whatever.

I'm against Microsoft-style bug-compatibility, but discarding backwards compatibility altogether is not a very good idea.

6

Comment by Lee , 5 years, 1 month ago :

Marius has the right of it. Enough said.

7

Comment by Michael Foord , 5 years, 1 month ago :

I think backwards compatibility *is* important - otherwise you get gratuitous breakage and give people strong incentives *not* to upgrade.

The bar for adding new modules to the standard library is deliberately high, and I don't think that should change. When a new module is added the core-development team takes on the responsibility of maintaining and developing it.

Having said that there are several modules *still* in the standard library that clearly shouldn't be there (calendar anyone?) and modules whose API design makes it hard or impossible to evolve to meet new use cases.

I think obsolete, broken, sub-standard and unmaintained modules *should* be removed.

Modules that are useful, or potentially useful, but have API designs that makes them hard / impossible to update should be replaced or some small amount of backwards incompatibility be allowed. *Preferably* with the normal deprecation processes followed (although that means that some API improvements may have to wait five years or more which may be unacceptable).

Whatever happens, changes will need to be made carefully and decided on a case-by-case basis.

8

Comment by nnis , 5 years, 1 month ago :

Hey don't mess with calendar, I use it to get the name of the months :-)

I am all in favor of slow change. I don't see the point of rushing out to swap a library with the latest shiny thing out. But slow change is not the same as no change. Why can't we say: let's add this better library for the next version have both the old and new one for 2 versions and then deprecate the old one? That gives people 4 years to switch, a pretty reasonable time.

Stdlib changes should be more prevalent than language changes, especially if the library to be deprecated will still be maintained on PyPi.

9

Comment by j_king , 5 years, 1 month ago :

Notwithstanding your misinterpretation of what a "straw man" argument is, I'm a supporter of backwards compatibility.

Having the primary implementation and its standard library become a moving target is bad. It will lock down software written in Python to the underlying implementation. No long-lasting language does this.

Python has already shot itself in the foot by ensuring its implementation is the primary reference specification. For all the work that has gone into Jython, IronPython, and others; it's really frustrating that Python software can't be written once and run on any of them. The software has to be written specifically for each one of them.

"Evolving" the standard library is just inviting complications. It also defeats the purpose of having a standardized library: a collection of modules one can reasonably expect to be available in all environments in which Python is installed. Hundreds, if not thousands, of software systems rely on that standard library to be available. If one module gets removed, many years of effort have just been wasted and each one of those projects now has to go back to the drawing board. If I was one of those sorry bastards maintaining one of those projects I'd be rightfully pissed off.

There's no great reason to alter the standard library in any way. It's not broken. Ergo, there's nothing to fix.

10

Comment by Orestis Markou , 5 years, 1 month ago :

I like the idea with keeping two modules side by side with different names.

Django did that with forms/oldforms/newforms:

import oldforms as forms
import newforms as forms
import forms

Certainly for some modules that need to be replaced this approach can work well.


This post is older than 30 days and comments have been turned off.