Of course, with all of these package managers it can be tough to decide which package manager to support if you are an open source library author. Having a single universal package manager for C++ would be very desirable, but the fact that we don’t have a universal build system seems less likely we would get a universal package manager. Also, many package managers focuses on different things: direct integration in cmake, reproducible builds, shared binary server, SAT-solving dependencies, and more.
Instead of focusing on unifying ever package manager into one universal one, lets try to create a standardization for understanding dependencies and installing packages. This would allow for a simple format for a C++ author to support many package managers, and it would allow a package manager to focus on whats makes it unique instead of converting libraries to be consumed by its new format. This comes in two parts, a package specification, and a toolchain specification.
For any standardization to be successful, it should focus on building on top of current practice. Now there is no common way a package defines its dependencies, so that will need some coordination from the current package managers. However, there is common way a library(or package) is built and install even among different build tools, and most package managers build on top of this workflow. What C++ needs is a common format to communicate a package’s requirements among different package manager tools.
In the post, I will discuss what this may look like.
Now, a package manager’s job is to install dependencies, and decide which versions to install as well. Its not the package manager’s job to provide usage requirements. First, the package manager doesn’t know the usage requirements. It can take guess from the dependencies used, but thats not complete. Secondly, the build scripts do know the usage requirements, so should be installed by the build script so that the build script and package manager stays decoupled.
Now, the build script could communicate the usage requirements to the package manager(perhaps through some query step in the build), but that is not the way it is currently done by build scripts. Currently, build scripts generate package configuration files that are consumed by downstream build scripts. There are two formats for these package configuration files cmake’s and pkgconfig. Since, usage requirements is beyond the scope of a package manager I won’t be discussing what these should look like, but no doubt it seems an obvious solution would be to use the build-independent pkgconfig files.
A package specification would be a file that would describe details about the package. This would include such fields as:
- Possible build scheme, describing which build system to use to build the package. This may be better than relying on the package manager deducing the build system, which sometimes can be ambiguous.
- A list of requirements to run the package including version constraints. Even a way to specify which requirements are just for building or testing
- A mapping from a package name in the requirements to a URL(or possibly a URL to a package that has these mappings). This is useful for package managers that may not have a package index, or wants reproducible builds.
- A list of packages this would conflict
- A list of packages that this would replace.
Ideally, this would be stored in the library itself, perhaps at the top-level, and package managers could index these files for quick access.
Also, not all libraries may provide this package file, so there needs to be a way to provide this package file non-intrusively. In this case, it would make sense to have additional fields:
- A URL to download the package.
- A possible build script to be used, which can be used when a build script is not provided or the original build script is not sufficient.
Especially, the non-intrusive format can define a standard way on how to build a package that can be used across all package managers. Currently, every package manager has its own format, and each package manager re-implements its own package “recipes”.
One possible format for this information could be similar to pkgconfig, but with different keywords. Even more so, this could include the variable definitions and substitutions used in pkgconfig files. It would need to be extended to support conditions, which is useful for optional dependencies. A example format of such a package file could look like this:
Name: foo Description: A foo library Version: 1.0 Requires: zlib > 1.5 Dependencies: zlib = http://zlib.net/zlib-1.2.11.tar.gz
Now, there is an issue of duplication currently between the package file and the build system. As the dependencies would need to be requested again in the build script. Hopefully, in the future, build systems could read the same package file to know what dependencies to search for. This integration could be very likely if this were able to be standardized.
Now with the package file, a package manager can know where to go get the dependencies. The next step is for the package manager to build and install the package. We could try to create a standard build script, but some builds are just too complicated to deal with every build requirement out there. CMake tries to be high-level build script that generates other build scripts, but even this sometimes is not sufficient, and authors go to other build tools.
Rather than focus on trying to standardized some all-encompassing build scripts, instead we can focus on standardizing how build systems are invoked, which is much simpler. The common way a build is invoked is with configure, build, and install. This is pretty easy to standardized. However, the key part for a package manager is to communicate the build “environment” or toolchain to the build system. Currently every build system has a different format. CMake uses a toolchain file, Meson has a cross file and environment variables, boost build has a user-config.jam file, and makefile or autotools use a set of environment variables.
So there needs to be a standardized format to describe the toolchain. This can include:
- Compiler flags
- Linker flags
- Cross compiling
- Build type such as debug or release
- Library type such as shared or static
- Include directories
- Preprocessor definitions
- Options to be used for compiling
- Options to be used when linking for shared, static, or executable
- List of paths to find dependencies
- Root paths(aka sysroots) to use when cross-compiling
This could be a simple format such as variable assignment. Furthermore, each variable should be accessible in the package file, so optional dependencies could be decided based on the toolchain.
This standardized toolchain can help make building and installing consistent across toolchains and build systems. It can also help ensure that build systems are mature enough to handle the build scenarios that a package manager needs.
A sample toolchain for mingw file might look like this:
system = windows cross_compile = true c_compiler = x86_64-w64-mingw32-gcc cxx_compiler = x86_64-w64-mingw32-g++ rc_compiler = x86_64-w64-mingw32-windres root_path = /usr/x86_64-w64-mingw32 emulator = wine install_prefix = ~/packages prefix_path = ~/packages
Until build systems support the toolchain file, it would be fairly easy to write wrappers that can convert this format to the native build tools. Of course, not all build tools may understand every option, so a warning to the user that the option is not supported would be helpful.
As an author of a package manager(ie cget), I have my own format to describe a package’s dependencies, but having a standardized C++ package specification will help collaboration and interoperability across different build and package tools. Furthermore, trying to build a universal build tool and package manager(like with build2) is very much an uphill battle, and will have very slow adoption. Instead, we should focus on standardizing over existing practice as much as possible so users will not have to rewrite their build scripts.