Codestin Search App

Showing posts with label stack. Show all posts

Sunday, August 14, 2016

The Four Flaws of Haskell

Summary: Last year I made a list of four flaws with Haskell. Most have improved significantly over the last year.

No language/compiler ecosystem is without its flaws. A while ago I made a list of the flaws I thought might harm people using Haskell in an industrial setting. These are not flaws that impact beginners, or flaws that stop people from switching to Haskell, but those that might harm a big project. Naturally, everyone would come up with a different list, but here is mine.

Package Management: Installing a single consistent set of packages used across a large code base used to be difficult. Upgrading packages within that set was even worse. On Windows, anything that required a new network package was likely to end in tears. The MinGHC project solved the network issue. Stackage solved the consistent set of packages, and Stack made it even easier. I now consider Haskell package management a strength for large projects, not a risk.

IDE: The lack of an IDE certainly harms Haskell. There are a number of possibilities, but whenever I've tried them they have come up lacking. The fact that every Haskell programmer has an entrenched editor choice doesn't make it an easier task to solve. Fortunately, with Ghcid there is at least something near the minimum acceptable standard for everyone. At the same time various IDE projects have appeared, notably the Haskell IDE Engine and Intero. With Ghcid the lack of an IDE stops being a risk, and with the progress on other fronts I hope the Haskell IDE story continues to improve.

Space leaks: As Haskell programs get bigger, the chance of hitting a space leak increases, becoming an inevitability. While I am a big fan of laziness, space leaks are the big downside. Realising space leaks were on my flaws list, I started investigating methods for detecting space leaks, coming up with a simple detection method that works well. I've continued applying this method to other libraries and tools. I'll be giving a talk on space leaks at Haskell eXchange 2016. With these techniques space leaks don't go away, but they can be detected with ease and solved relatively simply - no longer a risk to Haskell projects.

Array/String libraries: When working with strings/arrays, the libraries that tend to get used are vector, bytestring, text and utf8-string. While each are individually nice projects, they don't work seamlessly together. The utf8-string provides UTF8 semantics for bytestring, which provides pinned byte arrays. The text package provides UTF16 encoded unpinned Char arrays. The vector package provides mutable and immutable vectors which can be either pinned or unpinned. I think the ideal situation would be a type that was either pinned or unpinned based on size, where the string was just a UTF8 encoded array with a newtype wrapping. Fortunately the foundation library provides exactly that. I'm not brave enough to claim a 0.0.1 package released yesterday has derisked Haskell projects, but things are travelling in the right direction.

It has certainly been possible to use Haskell for large projects for some time, but there were some risks. With the improvements over the last year the remaining risks have decreased markedly. In contrast, the risks of using an untyped or impure language remain significant, making Haskell a great choice for new projects.

Monday, July 18, 2016

Why did Stack stop using Shake?

Summary: Stack originally used Shake. Now it doesn't. There are reasons for that.

The Stack tool originally used the Shake build system, as described on the page about Stack's origins. Recently Edward Yang asked why doesn't Stack still use Shake - a very interesting question. I've taken the information shared in that mailing list thread and written it up, complete with my comments and distortions/inferences.

Stack is all about building Haskell code, in ways that obey dependencies and perform minimal rebuilds. Already in Haskell the dependency story is somewhat muddied. GHC (as available through ghc --make) does advanced dependency tracking, including header includes and custom Template Haskell dependency directives. You can also run ghc in single-shot mode, compiling a file at a time, but the result is about 3x slower and GHC will still do some dependency tracking itself anyway. Layered on top of ghc --make is Cabal which is responsible for tracking dependencies with .cabal files, configured Cabal information and placing things into the GHC package database. Layered on top of that is Stack, which has multiple projects and needs to track information about which Stackage snapshot is active and shared build dependencies.

Shake is good at taking complex dependencies and hiding all the messy details. However, for Stack many of these messy details were the whole purpose of the project. When Michael Snoyman and Chris Done were originally writing Stack they didn't have much experience with Shake, and opted to go for simplicity and directly managing the pieces, which they viewed to be less risky.

Now that Stack is written, and works nicely, the question changes to if it is worth changing existing working code to make use of Shake. Interestingly, at the heart of Stack there is a "Shake-lite" - see Control.Concurrent.Execute. This piece could certainly be replaced by Shake, but what would the benefit be? Looking at it with my Shake implementers hat on, there are a few things that spring to mind:

This existing code is O(n^2) in lots of places. For the size of Stack projects, compared to the time required to compile Haskell, that probably doesn't matter.
Shake persists the dependencies, but the Stack code does not seem to. Would that be useful? Or is the information already persisted elsewhere? Would Shake persisting the information make stack builds which had nothing to do go faster? (The answer is almost certainly yes.)
Since the code is only used on one project it probably isn't as well tested as Shake, which has a lot of tests. On the other hand, it has a lot less features, so a lot less scope for bugs.
The code makes a lot of assumptions about the information fed to it. Shake doesn't make such assumptions, and thus invalid input is less likely to fail silently.
Shake has a lot of advanced dependency forms such as resources. Stack currently blocks when simultaneous configures are tried, whereas Shake would schedule other tasks to run.
Shake has features such as profiling that are not worth creating for a single project, but that when bundled in the library can be a useful free feature.

In some ways Stack as it stands avoids a lot of the best selling points about Shake:

If you have lots of complex interdependencies, Shake lets you manage
them nicely. That's not really the case for Stack, but is in large
heterogeneous build systems, e.g. the GHC build system.
If you are writing things quickly, Shake lets you manage
exceptions/retries/robustness quickly. For a project which has the
effort invested that Stack does, that's less important, but for things
like MinGHC (something Stack killed), it was critically important because no one cared enough to do all this nasty engineering.
If you are experimenting, Shake provides a lot of pieces (resources,
parallelism, storage) that help explore the problem space without
having to do lots of work at each iteration. That might mean Shake is
more of a benefit at the start of a project than in a mature project.

If you are writing a version of Stack from scratch, I'd certainly recommend thinking about using Shake. I suspect it probably does make sense for Stack to switch to Shake eventually, to simplify ongoing maintenance, but there's no real hurry.

Thursday, March 03, 2016

Compiling GHC on Windows

Summary: How to compile GHC on Windows using Stack and the new Shake-based GHC build system.

Regularly updated instructions are at https://github.com/snowleopard/hadrian/blob/master/doc/windows.md - use them.

Here are a list of instructions to compile GHC, from source, on Windows. I tested these instructions on a clean machine using the free Windows 10 VirtualBox image (I bumped the VM CPUs to 4, and RAM to 4096Mb).

The first step is to install Stack (I just accepted all the defaults), then open a command prompt and run:

stack setup
stack install happy alex
stack exec -- pacman -S gcc binutils git automake-wrapper tar make patch autoconf --noconfirm
stack exec -- git clone --recursive git://git.haskell.org/ghc.git
cd ghc
stack exec -- git clone git://github.com/snowleopard/hadrian
stack build --stack-yaml=hadrian/stack.yaml --only-dependencies
stack exec --stack-yaml=hadrian/stack.yaml -- hadrian/build.bat -j

The entire process (after the VM has downloaded) takes a bit less than an hour. These steps use the Stack supplied tools (MinGW, Git), and the new Shake-based build system. The hope is that by using the isolation Stack provides, combined with the portability improvements from writing the build system in Haskell, these instructions will work robustly on many Windows machines.

I have not tried these instructions on other platforms, but suspect that removing the pacman line might be sufficient to get it to work.

Update: Instructions simplified following improvements to the build system.

Neil Mitchell's Blog (Haskell etc)