Random technical stuff: libreoffice

Showing posts with label libreoffice. Show all posts

Wednesday, September 8, 2021

Separating metafile processing from rendering in OutputDevice

The OutputDevice class is what we use for rendering a whole host of different things, and other classes inherit from it to do their rendering. Notably a little while ago Noel Grandin separated OutputDevice from Window, which is a huge boon for the codebase!

In an attempt to further separate the concerns of OutputDevice, I noticed a long time ago that not only does it do rendering, but it records actions in a metafile. These are two related and yet quite separate things. My goal is to shift out the rendering into a RenderContext class. I have already got a PoC on GitHub, which has already done much of the work, albeit in a messier way (basically it was a test to see if this was even feasible).

Obviously migrating rendering functionality into its own class is not something to take lightly, so I have been writing smaller patches that test out the code. When I say "small", I really mean this. Currently I am testing out a bunch of code that sets state in OutputDevice, which can be found in the outdevstate topic branch in gerrit.

Along with test the state functions of OutputDevice, I am also adding some tests to the bitmap functionality of OutputDevice. I recently landed a patch to move GetDownsampledBitmap() out of OutputDevice and into BitmapTools.cxx because it really didn't need to completely rely on OutputDevice. This bit of code will hopefully one day migrate it's way into the VCL backend modules. You can monitor the progress I'm making via the bitmap topic branch in gerrit.

I have to say that writing tests is not always easy. Often there is no easy way of accessing the functions, especially if they are private, but luckily most of the functions are protected so I can at least write a mock class to expose these functions without having to modify the code.

Currently I have the following topic branches:

Wednesday, November 13, 2019

Drawing in OutputDevice

For a long time now I have noticed that OutputDevice is a class that is tightly coupled to drawing primitives such a pixels, lines, rectangles, etc. To draw new primitives in OutputDevice, you need to change the interface by adding another function, often you need to add new private functions, etc.

I have never been entirely comfortable with this - I believe that we shouldn't vary the OutputDevice class, but instead the functionality should be implemented in a command pattern. In a command pattern, you use an object to encapsulate the functionality used to perform an action. What this means is that OutputDevice no longer needs to know how to directly draw a line, pixel, rectangle or any other primitive we throw at it - this is all done in the command object. I call these OutputDevice Drawables. It turns out, I find it easier to test a command object.

In fact, the more I rewrote code to use the command pattern, the more I found that we were using a particular sequence of actions in most of our drawing functions:

Add action to GDI metafile
Can we draw? No - exit
Initialize the clipping region if possible
Initialize the object's color
Initialize the object's fill color
Acquire the SalGraphics instance
Do the actual drawing!
Paint alpha virtual device

When I realised this, I decided to follow the principle of encapsulating what was varying - point 7 - the actual drawing. Everything else was the same - at least so I initially thought. It turns out this is not the case all the time, so I allowed two things - a function that can be overridden that tells the Drawable that the step should be bypassed, and in the case where we need to through all this scaffolding out the window a way of bypassing it entirely. However, this is an example of a template method pattern.

You can see the base work for this in a gerrit patch I submitted today. To use the actually command pattern is very easy: if there is not an existing function like DrawPixel in OutputDevice, you just have to invoke the command object via OutputDevice::Draw(Drawable* pDrawable). An example can be found in the unit test.

To implement a new command, you just derive from Drawable - the simplest Drawable class you can define only needs a constructor with the parameters you would normally send to a Draw command in OutputDevice (e.g. PixelDrawable only needs a Point sent to the constructor), and you implement the actually drawing in DrawCommand(OutputDevice* pRenderContext).

A few things I found helped me when I created Drawables - I quite like passing parameters to functions, I'm not a huge fan of a lot of state in the class. Drawables work because they essentially work by calling the Execute function, this handles all the basic drawing for you on the state you set in the constructor. I have found in some patches that I have on github that the functionality is somewhat complex, for this I have extract function to simplify test and readability of the code (an example of this is GradientDrawable - see GradientDrawable.cxx and GradientDrawableHelper.cxx).

Wednesday, March 16, 2016

Tips for LibreOffice newbies

Li Haoyi has written an excellent blog post entitled "Diving Into Other People's Code" about diving into an unfamilar codebase (HN discussion here).

I think this is really very helpful for anyone who wants to look at the LibreOffice source for the first time. Many of the things he mentions are directly relatable to LibreOffice - in particular getting your dev environment setup is particularly relatable.

Getting Started

To develop in LibreOffice, you really need some level of C++ skill, an ability to read other people's code, and be willing to learn and improve over time. However, you can still contribute Java and Python code to parts of the project (though in terms of unit tests, we really encourage converting Java unit tests to C++ tests eventually).

Any newbies should start here:

https://wiki.documentfoundation.org/Development/GetInvolved

and then have a look at:

https://wiki.documentfoundation.org/Development/How_to_build

One easy way of setting up your development environment is to use lode, which is a project that helps you setup a base environment to build LibreOffice:

https://wiki.documentfoundation.org/Development/lode

Once it is installed, check out how to use gerrit:

https://wiki.documentfoundation.org/Development/gerrit

Be warned, depending on your setup this can take quite a bit of time and can be at times frustrating. We do our best to make things as easy as possible, but even building LibreOffice can take hours and hours to complete.

Learn git

A bit of git experience is very helpful - luckily you only ever have to do a git clone once, but you should learn about branching, rebasing, pulling and pushing. I would strongly suggest using the logerrit scripts. I personally use the following to pull in changes onto my current working branch:

./g pull -r

I strongly recommend spending a few hours on the following graphical tutorial on git, Learn Git Branching. This is well worth doing regardless of whether you contribute to LibreOffice - knowing how git works to the degree that this shows you is a skill that is immensely valuable. This tutorial demystified a lot of git concepts for me, and helped me "get" how git does things. Eventually you will get stuck in some git mess, so knowing how to reset is very valuable.

I also recommend learning how to use interactive rebase in git, which is done via:

git rebase -i

What this does is take your changes are change the "base" of the commit in question. You can move the commit up or down the version history, reword specific commit messages, "squash" commits into the parent commit (be very careful about doing this) or edit the commit itself. When you run git rebase -i it opens up your text editor - just read what it states in the editor to learn how to use it.

If you do ever mess up a rebase, incidentally, you can use the git reflog to back out of your change. Check out the Atlassian reflog tutorial on this, could be a lifesaver. Another tutorial I used before I found the Atlassian one can be found here, try the Atlassian one first though.

Hot tip for vim fans

If you are a vim fan, then I suggest the following setup for your .vimrc file:

filetype indent on

set tabstop=4

set shiftwidth=4

set expandtab

highlight ExtraWhitespace ctermbg=red guibg=red

match ExtraWhitespace /\s\+$/

augroup trailing_whitespace

autocmd!

autocmd BufWritePre *.c :%s/\s\+$//e

autocmd BufWritePre *.h :%s/\s\+$//e

autocmd BufWritePre *.cxx :%s/\s\+$//e

autocmd BufWritePre *.hxx :%s/\s\+$//e

augroup END

We don't use tabs, we use 4 spaced-tabs. Some time ago I discovered a way of showing extra whitespace at the end of lines directly in vim, and then later after some discussion on the LibreOffice developer mailing list I discovered how to safely remove trailing whitespace on write.

Start Hacking!

When you are modifying the LibreOffice codebase, you'll find that you'll spend more time reading and grokking the code than actually modifying it. I'd consider this normal, so if you find this is occurring don't fret - we all have to do this. The LibreOffice codebase is massive, I think from memory it has more lines of code than the Linux codebase and has been in continuous development since 1988.

When I started on LibreOffice hacking, I found I was most interested in finding the code entry point. Frankly, I was only initially reading the code for pleasure (yes, I'm strange that way) and so I used http://opengrok.libreoffice.org to browse the code.

However, many years before I had actually attempted to compile OpenOffice.org - and failed rather spectacularly as I ran out of disk space. Whilst this was occuring I did a lot of reading of OpenOffice.org developer manuals - I really recommend reading OpenOffice.org's Developer Guide. Don't be put off by the fact that it is hosted on the competing Apache OpenOffice site, it's really very high quality and whilst the LibreOffice codebase is more actively developed there is still a lot of really useful information on the Apache OpenOffice sites.

Once you have read enough, have a look at our EasyHacks list. These are a list of things that need to be done in the LibreOffice codebase that are designed for newbies - and if you ever want to participate in Google Summer of Code you'll need to demonstrate that you've worked on a few bugs or submitted a few commits to deal with easy hacks. You'll also know that you've contributed to a codebase that literally millions of people use every day, and your name will be in the commit history. :-)

Join in with the development community

LibreOffice is a very friendly environment. We mostly go out of our way to help out newbies, as we want to encourage as many participants as possible. A such, you can join us on the #libreoffice-dev IRC channel on Freenode. We don't largely answer user questions (for that there is #libreoffice), but if you are getting stuck on something feel free to ask on the channel.

The other way of joining in is to subscribe to the developer mailing list. You can either subscribe to the digest, or get every message directly - in which case I suggest setting up a filter or else you might get drowned out in emails. If you do get digest messages, please don't reply to the digest directly but copy the message you want to respond to into the email, and retain the original subject line, prefixed with "Re:". We suggest setting up the Reply-To header on your mail client, but it's good practice to ensure that the original mail receipients are cc'ed into your responses.

(I'm also a chronic abuser, but please don't top-post. Don't be me.)

Commit messages

A final tip: commit messages are really important. I personally tend to write longer messages detailing the work that I've done, but that's just my own style. The most important thing about commit messages is they must be to the point and explain what you have done, not what the code used to do. If your code change is complicated or not necessarily easy to follow, then it's best to try to leave an explanation of what you are doing. The commit title should be clear enough that someone browsing the git log in cgit can understand the purpose of the commit.

If your commit is easy to understand or minor, then only a commit message title is needed. If you need to explain things in more detail in the commit message body, as I said, you should state what you did.

An example that shows both what you should and shouldn't do is where I fixed an issue where the PPI wasn't being exported to JPEGs correctly.

Good things about my commit message:

Note that as I fixed a bug, I prefixed the commit message short title with tdf#85761. There is a bot that reviews the git logs and if it finds this in the message title it automatically updates the bugzilla bug. You can only reference one bug however.
I summarised my change that we don't hardcode JPEGs to 96 DPI any more but use mapmode pref size to get the DPI, and moved the scaling function from the EPS filter code to the MapMode class.
It explains what the problem actually is, as it was a bit difficult to understand the cause of the issue from the original bug report.

Bad thing:

I still spent too much time explaining what the code originally did (thus violating my own recommendation I give in this blog post).

FWIW, it can still be useful in explaining what the code did, but I'd leave that to a note in the last paragraph - and I recommend you make it brief.

Friday, January 22, 2016

Why I love hacking at LibreOffice

The LibreOffice codebase is, to be frank, messy. This isn't a criticism of previous developers - it's still an amazing product and an amazing feat of programming given the number of platforms it runs on. The StarView guys, and later OpenOffice.org development team, did a great job. For instance, I was reading up on the font mapping code and I often saw Herbert Duerr's name, and I've got nothing but respect for the work that he did and his dedication to the project.

Unfortunately, the messiness is the result of a codebase that was first created in 1988 and has been actively worked on and changed directions a number of times. So... what is it that I love about working on LibreOffice?

In a word: the people. I've never met any of my fellow hackers in person, only a few phone calls to Michael Meeks and listened in on some Collabora meetings when I did some work for them. Mostly I stick around the IRC channel and keep an eye on the mailing list.

On the IRC channel, however, I've noticed the good humour and tolerance of the developers towards those not as expert as themselves. For example, Stephan Bergman (sberg) is the go to guy in terms of anything C++ related - and the other day he helped me out with a tricky bit of code (well, I thought it was tricky) related to the equality operator. Tor Lillqvist, as another example, is a diamond in the rough IMO and has often pointed me in the right direction when I'm online. And I cannot forget to mention Norbert Thiebaud (shm_get on IRC) for his constant maintenance of Jenkins (a thankless task!) and gerrit. These are literally just some of the folks who I interact with all the time and just a small sample of the folks who work on the LibreOffice project.

Without the goodwill and encouragement of the folks who hack away daily on the LibreOffice project, I doubt I'd be able to work on the small part of the codebase that I have chosen to improve. In fact, there is an active outreach program currently being run lead by Jan Iverson (JanIV on IRC) and there are quite a few commits coming in from newbies of all stripes. This is leading to some real improvements in the code, for instance new developer Dipankar Niranjan did a whole series of commits to remove the "dog-tag" code from the VCL module (the "dog-tags" was a way of attempting to ensure that Windows weren't deleted at the wrong time, but was a total hack and has been solved by VclPtr) - this has recently allowed sberg to remove ImplSVEvent::maDelData. Seeing the code evolve is also highly enjoyable :-)

There are so many more examples of this it's just not possible for me to list them in this post. It's interesting that LibreOffice has made so much traction in the last year in terms of code quality - whilst the UI hasn't changed hugely (though that is going to change I suspect in the next year!) the incremental improvements have been increasing at a rate of knots and folks are beginning to notice the positive effects. One user, who will remain nameless but is an active follower of the LibreOffice development process, told me that he noticed that the 5.x release was very much more responsive and stable that the 4.x series... and amusingly commented that "it's remarkable how interesting it is to watch the development of such an intrinsically boring type of software".

So there you have it, the reason I love hacking at LibreOffice is the people who work on it! May it ever be this way.

Thursday, January 7, 2016

Restricting Doxygen indexing to particular LibreOffice modules

I have committed a code change to the LibreOffice tree to allow developers to specify what modules they want Doxygen to index. I have done this because indexing every module is quite time consuming, and you may only want to review a subset of modules (in my case I am interested primarily in the vcl module).

To index specific modules only, you now export the $INPUT_PROJECTS environment variable and run make docs. For example, if you want to index only the vcl and svtools modules, you would do the following:

chris@libreoffice-ia64:~/repos/libreoffice$ export INPUT_PROJECTS="vcl svtools"
chris@libreoffice-ia64:~/repos/libreoffice$ make docs
chris@libreoffice-ia64:~/repos/libreoffice$ unset INPUT_PROJECTS

You can(not) have a void namespace in C++...

Update: So this was discussed on IRC, and it turns out I've got this all entirely wrong. It's actually much simpler - it turns out you don't need a space before the scope operator thus you can legally have:

void::TheClass::TheFunction(int n1, int n2) {}

This is the same as:

void ::TheClass::TheFunction(int n1, int n2) {}

and:

void ::TheClass ::TheFunction(int n1, int n2) {}

--------

I ran doxygen over the vcl module in LibreOffice and I got a lot of warnings and errors. I am now making a concerted effort to fix these warnings because I actually find doxygen generates good info on the classes, and I rather like the collaboration and inheritance diagrams it produces.

However, a strange error that it produced was:

/home/chris/repos/libreoffice/vcl/source/control/combobox.cxx:1322 warning: no uniquely matching class member found for void::ComboBox::Impl::ImplUserDrawHandler(UserDrawEvent *pEvent)

"A void namespace in the LibreOffice codebase?" I thought. How could this be? How is this even possible?

Sure enough, the definition was void::ComboBox::Impl:ImplUserDrawHandler(UserDrawEvent*)!

In an effort to work out if this was what was intended (which I was very doubtful of) I tracked down the commit, which converted the old style IMPL_LINK macro to a boost signal. It was definitely not intended!

I then tracked down where the C++ standard is held and reviewed the C++14 draft (I can't afford to spend US$216 on the official ISO standard). Section 7.3 deals with namespaces:

Amazingly (to my mind) there is no restriction on the namespace identifier, thus it is entirely possible to accidentally create a void, int, float or any other reserved C++ keyword as a namespace!

I wonder if it might be worthwhile for compilers to spit out a warning if they see this? Perhaps we need to write a clang plugin to detect this sort of thing in LibreOffice. The only reason this hasn't had an impact is that it doesn't appear that anything is currently using user-defined draw signals on comboboxes.

I pushed commit 86338fbb94324ed to fix this.

Tuesday, January 5, 2016

Possible 15 year old regression

A few days ago, I filed bug 96826 ("Typewriter attribute not given enough weight when finding font based on attributes") against LibreOffice. Basically, I've been refactoring VCL font code and I was very interested to see what my predecessors had been doing before me. This meant that I actually read pretty much all of the commits I could find all the way back to the year 2000 for that file, which held much of the font code for VCL.

One of the functions was originally in a class called ImplFontCache, which had a get() function that worked as a font mapper. It basically does a running total of a number of weighted values depending on the "strength" or "quality" of the font matching attribute. One of those attributes was to check if the font is a fixed-width font, then check to see if the font is a typewriter style font, giving a deliberately higher weighting to typewriter style fonts.

Anyway, on the 29th June 2001 someone with the initials th (no idea who this was, nor is it really important) did some tweaks to the function in an attempt to improve font mapping in commit 925806de6. To ensure that the typewriter style was given a weighted value, they have multiplied the value by 2. However, they appear to have accidentally removed a zero from the weighting, thus reducing the weighting of the typewriter style fonts, not increased it!

As I was reading the code, I thought "whoops, that was unfortunate" and assumed I'd see the error picked up somewhere down the track and fixed. Now either the weighting has little effect on these fonts, or not many people have problems with typewriter fonts not being cleanly mapped by the underlying platform, but this has never been fixed.

The code has since migrated its way into PhysicalFontCollection::FindFontFamilyByAttributes(), and it's still there!

    610         // test MONOSPACE+TYPEWRITER attributes
    611         if( nSearchType & ImplFontAttrs::Fixed )
    612         {
    613             if( nMatchType & ImplFontAttrs::Fixed )
    614                 nTestMatch += 1000000*2;
    615             // a typewriter attribute is even better
    616             if( ImplFontAttrs::None == ((nSearchType ^ nMatchType) & ImplFontAttrs::Typewriter) )
    617                 nTestMatch += 10000*2;
    618         }

As with any codebase with a 30 year pedigree, I've not committed a fix because I just don't know if this is something I want to touch. Hence I've logged the bug to see if anyone can shed any light on it. I'll probably follow up on the mailing list in a few weeks time to ping any more experienced developers, but until then this potential bug, which is in the middle of going through puberty, will remain.

Monday, March 2, 2015

OSI membership

I've just joined the OSI as a member - in AUD it is about $50, but it's well worth it!

The reason I became a member was because I thought it was high time that I did this. I contribute to LibreOffice, and I truly believe that open source and open culture is very important to society at large. I believe that open source gives maximum freedom to those in society who are not necessarily empowered due to economic or social circumstances. It levels the playing field, and what I most love is that it really gives transparency to those who use software so that they can verify and improve upon the work of those who have gone before them, without restricting the ability of others to use the work to improve the conditions and lives of others in society.

I encourage others to also join the OSI, as it is really is a force of good in the world.

Friday, October 17, 2014

Refactoring LibreOffice: VCL FontCharMap

I have been looking at the VCL in LibreOffice, which is its cross-platform widget and window library. Whilst reading the SalGraphics class (more on this in a future post) I noticed a class called ImplFontCharMap . Curious, I looked more into it. Why the "Impl"-prefix? What about FontCharMap?

As it turns out, ImplFontCharMap is a Pimpl for FontCharMap. Now normally a Pimpl has very little code and is not directly accessible by any class other than the class that uses it. A Pimpl allows for better builds in C++, and a number of other reasons. In this case ImplFontCharMap was doing a LOT.

A font character map in VCL allows a font to be mapped to Unicode codepoints. The VCL font charmap allows you to find a character in the font based on the Unicode codepoint, find the next and previous character supported by the font (these are not necessarily contiguous) and the first and last characters supported. There is also a default charmap, which maps the BMP plane, including surrogates.

ImplFontCharMap had an internal reference counting mechanism to optimise sharing of charmaps. However, this was better changed to boost's intrusive_ptr, because frankly that implementation is far more well engineered and tested, not to mention I'm not a fan of maintaining code that isn't really specifically addressing VCL needs. (incidentally, the best rundown I found of intrusive_ptr can be found at this man's blog) The commit can be found here. You can see that I've been able to immediately remove AddReference() and DeReference().

You will notice, however, that there are a few classes who rely on ImplFontCharMap (now ImplFontCharMapPtr, a typedef to an intrusive_ptr) directly. In particular, SalGraphics was relying on returning the Pimpl! Frankly, that's madness in my view. As I've said, Pimpls are really intended to be tightly coupled to a particular class and should never be used directly. The class that needs the Pimpl should be used! You can see other side effects, because the Pimpl is really duplicating code that should be in FontCharMap. This is clearly a bit if a code maintenance nightmare.

Given that a Pimpl is one of the very few concepts in C++ that relies on tightly coupling two classes, I made FontCharMap a friend class of ImplFontCharMap and moved most public functions from ImplFontCharMap to FontCharMap. I kept the destructor and the functions getDefaultCharMap() and isCharMap() but you'll notice I made them private, hence the lowercase first letter of the function names. I do NOT want VCL based code to access the Pimpl! I thought this was a necessary compromise because the logic really was more entwined with the data itself. There is a function FontCharMap::GetDefaultCharMap() although it's not normally necessary as the default charmap is shared via intrusive_ptr and the default FontCharMap constructor just returns a reference to the default charmap. I have provided it because you can get a default font charmap that excludes symbols.

I realised at this point, after a chat with Kendy on IRC, that I had dealt with managing the Pimpl for FontCharMap, but now I was returning raw FontCharMap pointers. This was defeating my refactor, so I made the typedef FontCharMapPtr, which is an intrusive_ptr to a FontCharMap instance. I then refactored the code to use this instead of the raw pointer.

The second commit that implemented this can be found here.

Finally, I have to have a big shout-out to Caolán McNamara from RedHat who found a rather embarassing bug I introduced and fixed it. The issue was that I didn't initialize the ref counter variables, except where I did where I set it to 1... which was rather silly of me, though I was reliably informed by Michael Meeks that he has done the same thing in the past.

Anyway, this is how I refactored a relatively small class. It actually took a lot of effort to do. In the end, I created a unit test to ensure that FontCharMap was still working correctly.

Sunday, March 23, 2014

How to build only vcl modules from scratch in LibreOffice

After reviewing the component diagram on the LibreOffice wiki, the following will make the VCL module in LibreOffice:

make sal salhelper store cppuhelper cppu xmlreader registry unoidl dtrans \
     binaryurp dtrans animations jvmfwk jvmaccess javaunohelper stoc i18nlangtag \
     ucbhelper comphelper basegfx tools unotools i18nutil i18npool sot svl vcl

Update: this doesn't always work. Turns out that there is some sort of circular dependency between i18npool and another module, which make sorts out itself.

I'm now trying:

make CppunitTest_i18npool_test_breakiterator ucb configmgr vcl

Random technical stuff