Ben at QCon

On-stage at QCon London this week, Dion and I gave a demo of Bespin. After spending a few minutes building up to the demo, emphasizing how fast Bespin performs, how it scales to tens of thousands of lines, beats DOM-based editor approaches–the usual litany of claims–I proceeded to mash on the keyboard and demonstrate that Bespin’s performance was… decent on a hundred-line file and, as it turned out, “slow as molasses” on thousand-line files. (Still, the most embarrassing moment of the talk was to come later, when Dion talked of his hurrying home after school every day as a youth to “twiddle his paddle”; that’s another story.)

Dion's Paddles

Just days before, Bespin was wicked fast. What happened?

I thought it might be interesting to explain this little performance regression.

Tabs

Shortly before launching Bespin, we decided to cheat on supporting tabs in the editor. Our initial code didn’t support properly displaying tabs so we decided to convert any tabs in a file to a fixed amount of whitespace when such files were imported. Obviously, this is unacceptable and we knew we had to get to real tab support in the future.

To be clear, I hate tabs. They are evil, loathsome little control characters, and the programmers who use them are the sorts of people who go out on weekends to weird nightclubs where they hang out with the hippies who put curly braces on new lines.

Jamie Zawinski

Speaking of nightclubs, Netscape alumnus and nightclub owner Jamie Zawinski has written an interesting piece that discusses tab support in editors. Essentially, he concludes that tab characters should be removed entirely from files in favor of spaces–that editors should transparently remove them on save.

As much as I agree with this perspective, and as much as I don’t want to add tab support to Bespin, one of our goals with the project is to be a great editor for all kinds of programmers; even the tab-using ones. So, I have to stop whining and support them.

There are two basic ways to add this support. Tabs can either be present in the editor and rendered as a certain amount of white space, or they can not be in the editor but instead be inserted when the file is written to disk as a sort of compression scheme (i.e., one tab character is used to represent one or more spaces).

Of those two approaches, the latter would be quite easy for us to implement. We’d simply scan each line in the file for spaces when saving and when one or more spaces occurs immediately before a tab stop, replace it with a tab character in the file. Ten minutes work.

But would you believe that there are some people who, in addition to wanting real tab characters in their files, actually mix tabs and spaces? Yes, it’s true. There are even file formats which depend on this sort of thing. Horrors!

So I can’t do the easy thing and just deal in terms of spaces at run-time in the editor. I have to support putting real tab characters in the file during the edit session, and distinguishing such characters from spaces.

Implementing Tab Support

Bespin backs each row in the editor with a string array; each character is represented by an element in the array. Because Bespin is limited to fixed-width fonts, we’ve been able to get away with easy measuring operations, like calculating the pixel width of a line by multiplying the size of the line’s array by a standard character pixel width:

var lineWidthInPixels = lineArray.length * characterWidthInPixels;
Pseudo-code for measuring a line’s desired width

We then use this measurement to determine whether the line will entirely fit in the editor without horizontal scrolling.

As it turns out, we need to measure the width of a line quite often. In fact, we have to know the width of every line in the editor, regardless of how few lines are currently visible in the window, in order to properly render the horizontal scrollbar. That’s because every editor I’ve seen sizes the horizontal scrollbar according to the widest line in the entire file, regardless of what’s currently visible.

Because obtaining the length of an array and multiplying it by some value is so cheap, we’ve been able to get away with measuring each line in the file very, very often.

The Performance Regression Explained

And, this is what drove our vaunted performance right into the floor. You see, with tabs, measuring the width of a line is no longer quite so cheap. We now have to scan each character in the line to see if it is a tab character and then work out how much whitespace the tab character represents. Here’s roughly how we do that now:

var length = 0;

// check for tabs and handle them
for (var i = 0; i < lineArray.length; i++) {
    // check if the current character is a tab
    if (lineArray[i].charCodeAt(0) == 9) {
        // since the current character is a tab, we potentially need to 
        // insert some blank space between the tab character
        // and the next tab stop
        var toInsert = tabstop - (i % tabstop);

        // increment the length to correspond to the new space
        length += toInsert;
    } else {
        length++;
    }
}

var lineWidthInPixels = length * characterWidthInPixels;
Modified extract from Bespin for measuring lines with tabs

Hey, wouldn’t you know, executing this code once per every line on every repaint is considerably more expensive than the previous code sample.

Making It Fast Again

While perhaps we could find more efficient ways to measure the length of a tab-riddled line, even when the algorithm was based on the length of the line’s backing array this represented a bottleneck that would have manifested itself eventually. Performance was degrading linearly in inverse proportion to the length of the file; tabs makes the degradation exponential, but either way, it’s unacceptable. We didn’t care about the degradation at 50,000 lines, but at 5,000,000 lines, we probably would have.

Maybe a better answer would be to measure each line once and then re-measure the line only when it changes. Genius! But, in reviewing our current code that marks lines as “dirty”, I discovered that it’s incomplete; there are many line mutations that merrily skip the dirty codepath. I didn’t want to get distracted by fixing this bit of infrastructure (which might have introduce yet more side-effects), so I moved on past this option and searched for another accelerator.

The next obvious approach would seem to be to only measure those lines that are presently visible. The disadvantage is that the horizontal scrollbar will size itself only according to what you can see; so if all of the visible lines fit in the editor window but somewhere a few pages down there are some super-long lines, you’d see no horizontal scrollbar at all. Better this than our present slower-than-CRuby-on-a-good-day behavior (har har), so I implemented it.

Dion and I were a bit intrigued by the result. We expected to be a bit off-put by a horizontal scrollbar whose width changes as you scroll vertically, but because Bespin’s scrollbars are a bit subtle with the transparency and all, it actually seems more cool than jarring. And, in fact, as we think about it, we can imagine some users preferring this behavior over the traditional behavior.

Horizontal Scrollbar 1

Horizontal Scrollbar 2

Horizontal Scrollbar 3

What do you think?

(The public Bespin instance doesn’t yet have this behavior; it will in the next day or so. By the way, all the inflammatory statements in this post, such as tabs are evil, etc., are just jokes. Mostly.)

UPDATE

I modified the section on Jamie Zawinski after Ted Mielczarek rightly pointed out that I completely misquoted him. I wrote this post on a plane without access to Jamie’s piece, and my memory was obviously faulty. Still, the spirit was the same–we both hate tabs.

[1] Pictures of Ben and Dion cribbed from the JAOO blog (http://blog.jaoo.dk/2009/03/11/sirtonyhoareatqcon/)

[2] Picture of Jamie Zawinski credited to Yetta Howard/Salon.com (http://archive.salon.com/tech/feature/2000/02/10/zawinski/)

37 thoughts on “Of Tabs and Performance

  1. Why not store the tab character *and* its expansion to spaces? Then finding the line width and finding visual positions of characters remains trivial. When writing out the file you can eliminate these expansion spaces. The only hard part would be making sure the caret never gets positioned inside the expansion spaces.

  2. I think this actually sounds like a particularly useful feature. It’s a simple way of telling the user how much there is off screen at the moment, i.e. how much there is that you’re not seeing where you’re looking. Because this is browser-based and prone to be loaded on all sorts of systems, possibly devices with small screens, this could be very helpful when the file doesn’t fit on screen very well.

  3. From what I can tell, you had previously been counting a tab as one character (or, at least, you could have), correct?

    If you aren’t already, I think that it would be best if you at least approximated the line length of non-visible lines, and adjust the scrollbar according the longest line in the file character-wise, if not screenspace-wise. Is that reasonable?

  4. I guess I like the behaviour of having the horizontal scrollbar match the currently visible lines. I often grunted at the horizontal bar being useless because a line somewhere in the file was HUGE.

    If it was doing a smooth transition on scrolling, that’d be even nicer, so if you did some lookahead, and weighted non-visible lines, the scrollbar wouldn’t jump. Still would give you a finite number of lines to look at.

    Btw, the performance regressed code is still O(1/N), AFAICT. More precisely O(1/N*1/M) when M would be some measure of line length. (O(1/N*1/Mmin + 1/Mmax)? Anyway.) But from what I see it’s far from exponential. It’s just more costly per line, but O() doesn’t show that.

  5. I wonder what happens if I go to the longest line in the file, scroll to the right, move one page up (where my current horizontal position is outside of what the scrollbar allows) and then move one page down again. Will I still be at the same position horizontally or will my horizontal position be reset? If it is the latter this will make quite an annoying bug IMO.

  6. kde’s editor kate used to do that too, also for performance. I don’t know if they still do that.

    I just checked and gvim on windows also does it.

    So i guess i was more surprised that you actually managed to keep it fast WITHOUT doing this.

    otherwise personally i prefer tabs.

  7. As Bespin runs in browser I think the scrollbar behavior should match the one of browser, i.e. the scrollbar fits the longest line in the whole file.

    Also, Wladimir Palant has a point, I think….

  8. The Microsoft Windows Edit control only keeps track of the longest line it’s ever seen, until you delete all text in the control, at which point it turns the horizontal scrollbar off again.

    Would it be possible to have a flag on each line that contains TAB characters to make you take the slow path?

  9. I second @Robert’s approach. Store the tab character, and then, if the tab character expands to more than one space, pick some other non-visible character (e.g. 0x00) and pad out the array elements with that.

    Then your data structures still map nicely to the display.
    You need some extra fiddly code when moving the cursor to skip the “special” chars.
    And you need some extra code in the saveFile() path.
    And probably some extra code in the load() path to filter out any such junk that might be present in the input file.
    And I don’t know how well this will play with Unicode support.
    But hey, who said performance was going to easy 🙂

  10. Upon more consideration, in addition to giving a rough estimate of maximum line length (as described above), I think the scrollbar should also remember its length when/if it expands the tabs, too. That way, if you come across the long lines in the file, but then scroll past them, you don’t lose the length of the ta lines.

    Continuous jumping of the scrollbar can be jarring, but it can be less so if preparations are made to reduce the it (even without the performance loss).

  11. I think you misquoted jwz:
    “My opinion is that the best way to solve the technical issues is to mandate that the ASCII #9 TAB character never appear in disk files”

    Of course this doesn’t actually work with Makefiles, but who wants to edit Makefiles on the web anyway? (I don’t even want to edit them on my local machine!)

  12. @Robert: That was actually my first approach, but I abandoned it because: (1) I didn’t want to deal with changing the mutation-related APIs to check if the insertion was at a valid position, (2) I would have to use a non-space character to represent tab expansion spaces, otherwise I would probably break the tab/space mix scenarios (i.e., how would I reliably guarantee that spaces were explicit versus inserted by the expansion?) and therefore I would have to replace those characters every time I painted the row and that seemed like a pain, and (3) I didn’t want to have to change the entire array on every keypress which I would have to do in that case as the number of spaces represented by a tab are variable depending on column alignment. #3 is the weakest of these points and none of them are show-stoppers. I wonder if you’re right and I should have kept going in this direction…

  13. @Gordon: That’s an interesting compromise; continue to measure the array and bet that the difference wouldn’t be material–and then by the time the actual line is visible, the scrollbar could subtly adjust to the real width, and the adjustment would be so small that you wouldn’t notice it.

    Hmm… not a bad idea.

  14. @Wladimir: Good question. At present, the editor snaps the visible region back to show visible content if you scroll way out horizontally and then move vertically up or down. Not sure if this is a good thing.

  15. @Neil, re Windows Edit: Interesting.

    re Flag for tab. I thought about this, mebbe using regex’s to search for a \t before scanning the line. But I want to get out of the business of measuring each line in general.

  16. You can tell which characters are expansion spaces by looking backwards to the previous tab stop; if you find a tab character, the space is an expansion space.

  17. @Robert: In addition to changing the API to potentially reject an insertion attempt as having occurred in an invalid region (in the “expanded spaces”) and updating callers of the API to perform the checks–a non-trivial change but not a big deal–my dopey reasoning was that I wanted to avoid the complexity of (1) having to re-evaluate the inserted spaces every time a character was inserted before the last position in the line (i.e., behind an existing tab), (2) having to change each line every time the user changed the number of spaces a tabstop represented, and (3) the numerous side-effects for undo/redo. For example, if a user performs an action and then changes the tabstop property, all of the existing undo operations will have to be changed to reference new coordinates in the model. An alternative would be to make changing the tabstop property a undoable operation, but that breaks the existing semantics of properties in Bespin and I didn’t want tabstop to be a weird one-off (i.e., changing properties generally is outside of the undo queue but in this one case, it is in the queue).

    Weighing all of this together made me feel that it was a lot of leaky complexity, whereas the alternative approach allowed me to encapsulate the complexity inside a single module (a new CursorManager abstraction) that handles converting between the “screen” cursor position and the “model” cursor position, etc.

    Having said all that, maybe I did goof by going the way I did. I’ll think about it some more…

  18. You could also do a pascal-string style thing and to each line array affix expando metadata (27 characters, 5 tabs), or even take over the first N characters of each line array for such things. This might give you an additional speedup since the JS interpreter won’t have to do strlen until you edit a line or load a file from some system that doesn’t keep this data.

  19. I have been working with editors before that only showed the horizontal scrollbar for the widest visible line, and I never found it distracting, so don’t worry too much about it.

  20. What’s the behavior when scrolled all the way to the right horizontally for a long line, and then scrolling vertically after the line goes out of the visible range? It seems you’d either have to change the horizontal scroll position so you’re fully scrolled to the current visible line (Bad, think scanning through log files or tab-delimited files) or have a mismatch between the width of the scrollbar and your current scroll position (Bad, because it breaks the mental model of proportional scrollbars) or maintain both and readjust the horizontal scrollbar only after the user has scrolled back to the left (not bad, but kinda weird, but that’s how the OS X Finder does it in a similar situation).

    I like Robert’s solution, but I think you might be able to take a sorta weird approach: Replace every tab with four (or whatever) tabs internally, and have the “width” of a tab equal to one space but make the cursor go forward and back four spaces on arrowing. In other words, treat any group of four tabs in the internal representation as a single tab in the display. When the file is saved, copied, etc., you simply grep /\t{4}/\t/g (or again, however many spaces a tab occupies) on the internal representation. Obviously, this method falls apart if a tab is somehow entered without Bespin’s awareness, which I think is the issue you were referring to with tracking dirty lines. But you’ll be back to being able to use the string.length * character width to get your scroll size.

    Of course, one of these days, you’re going to have to support proportional fonts. I tried doing that with something I worked on long, long ago. It made me cry.

  21. Actually, you have to forget what I said. I just realized tabs don’t work that way. Consider these two lines:

    AA{tab}
    {tab}AA

    … the tab itself isn’t equal to four spaces, the tab-stop is. So in the first line, the line length is four characters, and in the second, it’s six characters. Looking at your code, you obviously have figured this out already. The truth is, I don’t think you have any reasonable option but to measure character by character and store that length in pixels with the line or draw the line offscreen, measure it and store that length. Either way, I don’t see anyway to increase performance without either abandoning proportional horizontal scrolling or reducing how often you measure each line. The good news is that if you go the hard way and do offscreen measurement, you’ve solved the same problem for proportional fonts.

  22. @Mark Kawakami: You and I think alike, my friend. The initial implementation for tabs just replaced them with a constant number of spaces–and then I realized afterwards, like you, that tabs don’t work that way.

  23. @endolith: Absolutely, we’re not done with implementation yet. A tab character should be painted and the background should be subtly different. Or something.

  24. Facilities within hospitals meet world class expectations.
    It is meant to be flexible and to supply a protective
    region for the major organs like the heart and the liver.
    Dont keep the same mattress for more than 10 years as it is bound to lose
    its shape and structure and the wear and tear of the coils through the years will give you
    an uncomfortable feeling.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s