cdist/docs/src/cdist-polyglot.rst

Polyglot
========

Description
-----------

Although **cdist** itself is written in **Python**, it features a
*language-agnostic* (and hence *polyglot*) extension system.

As such, **cdist** can be extended with a mix-and-match of
**any scripting language** in addition to the usual -and recommended-
**POSIX shell** (`sh`): `bash`, `perl`, `python`, `ruby`, `node`, ... whatever.

This is true for all extension mechanisms available for **cdist**, namely:

.. list-table::

    * - :doc:`manifests <cdist-manifest>`
      - (including :ref:`manifest/init <cdist-manifest#initial-and-type-manifests>`
        and :ref:`type manifests <cdist-type#manifest>`)

    * - :doc:`explorers <cdist-explorer>`
      - (both **global** and :ref:`type explorers <cdist-type#explorers>`)

    * - :ref:`gencode-* scripts <cdist-type#gencode-scripts>`
      - (both :program:`gencode-local` and :program:`gencode-remote`)

    * - and even :ref:`generated code <cdist-type#gencode-scripts>`
      - (i.e. the outputs from
        :ref:`gencode-* scripts <cdist-type#gencode-scripts>`)


.. raw:: html

    <details>
    <summary>
        <a>You do not have to commit to any single language...</a>
    </summary>

.. container::

    .. note::

        It's indeed possible (though not necessarily recommended)
        to **mix-and-match** different
        languages when extending **cdist**, for example:

        A **type** could, in principal, have a `manifest` and an **explorer** written
        in **POSIX shell**, a `gencode-remote` in **Python**
        (which could generate code in **POSIX shell**) and a `gencode-local`
        in **Perl**  (which could generate code in **Perl**,
        or some other language), while you are at it...

        Just don't expect to submit such a hodge-podge as a candidate for being
        distributed  with **cdist** itself, though... :-)
        especially if it turns out to be something that can be acheieved with
        reasonable effort in **POSIX shell**.

        In practise, you would at least want to enforce some consistency, if anything for
        code maintainibility and your own sanity, in addition to the
        the `CAVEATS`_ mentioned down below.

.. raw:: html

    </details>
    <br/>

Needless to say, just because you *can* do something,
doesn't mean you *should* be doing it, or it's even a *good idea* to do so.

As a general rule of thumb, when extending **cdist**,
there are many good reasons in favor of sticking with the **POSIX shell**
wherever you can, and very few in favor of opting for some other
scripting language.

This is particularly true for any code that is meant to be run *remotely*
on **target hosts** (such as **explorers**),
where it is usually important to keep assumptions and requirements/dependencies
to a bare minimum. See the  `CAVEATS`_ down below.

That being said, **polyglot** capabilities of **cdist** can come
quite handy for when you really need this sort of thing,
provided that you are ready to bare the consequences,
including the burden of extra dependecies
--- which is usually not that hard for code run *locally* on **master**
(`manifests`, `gencode-*` scripts, and code generated by `gencode-local`).

In any case, the mere fact of knowing we *can* escape the POSIX hatch
if we really have to, can be quite comforting for those of us suffering
from POSIX claustrophobia... which *is* of course a real health hazard
associated with high anxiety levels and all,
in case you didn't already know... ;-)


Writing polyglot extensions for **cdist**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Whatever the kind of script (`manifest`, explorer, ...) you are writing,
you need to ensure that all 3 conditions below are met:

1.  your script starts with an appropriate **shebang** line, such as::

      #!/usr/bin/env bash

    .. comment: It would have been nice to make use of an extension
        (such as `"sphinx_design"`) which provides a `.. dropdown::`
        directive (for toggling visibility) which is the reason for
        the ugly `.. raw:: html` stuff below...

    .. raw:: html

        <details>
        <summary><a>It's usually preferable to rely on the <b>env</b> program...</a></summary>

    .. container::

        It's usually preferable to rely on the :program:`env` program,
        like in the example above, to find the interpreter by searching the PATH.

        The :program:`env` program is almost guaranteed to exist even on a rudimentary
        UNIX/Linux system at a pretty stable location: `/usr/bin/env`

        It is, of course, also possible to write down a **hard coded** path
        for the interpreter, if you are certain that it will always be
        located at that location, like so::

            #!/bin/bash

        This may sometimes be desirable, for example when you want to ascertain
        using a specific version of an interpreter or when you are unsure about
        what might get foundthrough the PATH.

    .. raw:: html

        </details>

2.  your script has "*execute*" permissions set (in the Unix/Linux sense),
    like so::

        chmod a+x /path/to/your/script

    This is essentially what matters to **cdist**, which it will take as a
    clue for invoking your script *directly* (instead of passing it
    to a shell as an argument).

    For **generated code**, `cdist` will automatically take care of setting
    *execute* permissions for you,
    based on the presence of a leading **shebang** within the generated code.

3.  the **interpreter** referenced by the **shebang** is available on any host(s)
    where your code will run.

.. raw:: html

    <details>
    <summary>
        <a>
        Even for the <b>POSIX shell</b>,
        it is still recommended to <b>follow the same guidelines</b> outlined above.
        </a>
    </summary>

.. note::

    Even if you are just writing for the **POSIX shell**,
    it is still recommended to follow the same guidelines outlined above.

    At the very least, make sure your script has a proper **shebang**.

    -   If you have been following the usual **cdist** advise:
            you probably already have a proper **shebang** at the very beginning
            of your POSIX shell scripts.


    -   If (and *only* if), your POSIX shell script *does* contain a proper **shebang**:
            you are also encouraged to also give it *"execute"* permissions,
            so that your **shebang** will actually get honored.

.. raw:: html

    </details>
    <br/>


That's pretty much it... except...

.. seealso:: The `CAVEATS`_ below.


CAVEATS
^^^^^^^^^^^^

Shebang and execute permissions
"""""""""""""""""""""""""""""""""
In general, the first two conditions above are trivial to satisfy:
Just make sure you put in a **shebang** and mark your script as *executable*.


**Beware**, however, that:

.. attention::

    -   If your script lacks `execute` permissions (regardless of any **shebang**):
            **cdist** will end up passing your script to `/bin/sh -e`
            (or to `local_shell` / `remote_shell`,
            if one is configured for the current context),
            which may or may not be what you want.

    -   If your script *does* have `execute` permissions but *lacks* a **shebang**:
            you can no longer be sure which interpreter (if any) will end up running your script.

            What is certain, on the other hand, is that there is a wide range of
            different things that could happen in such a case, depending on the OS and the chain
            of execution up to that point...

            It is possible (but not certain) that, in such a case, your script may
            end up getting fed into `/bin/sh` or the default shell
            (whatever it happens to be for the current user).

            There's even a legend according to which even `csh` may get a chance to feed
            on your script, and then proceed to burning your barn...

            So, don't do that.


Interpreter availibility
"""""""""""""""""""""""""""""""""

For the last condition (interpreter availability),
your mileage may vary for languages other than the **POSIX shell**.

- For scripts meant to be run *locally* on the **master**, things remain relatively easy :
    All you may need, if anything,
    is a one time installation of stuff.

    So, things should be realtively easy when it comes to: :file:`manifest` and :file:`gencode-*` scripts themselves, as well as any code generated by :file:`gencode-local`.


- For scripts meant to be run *remotely* on **target hosts**, things might get quite tricky,
    depending on how likely it is
    for the desired **interpreter** to be installed by default
    on the **target system**.

    This is an important concern for :file:`explorer` scripts
    and any code generated by :file:`gencode-remote`.

    .. warning::

        Apart from the POSIX shell (`/bin/sh`), there aren't many interpreters out
        there that are likely to have a guaranteed presence on a pristine system.

        At the very least, you would have to make sure that the required interpreter
        (and any extra modules/libraries your script might depend on)
        are indeed available on those host(s)
        before your script is invoked...
        which kind of goes against the near-zero-dependency philosphy embraced
        by **cdist**.

        Depending on the target host OS, you might get lucky with
        `bash`, `perl`, or `python` being preinstalled.
        Even then, those may not necessarily be the version you expect
        or have the extra modules/libraries your script might require.

        **You have been warned.**


More details
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As mentioned earlier, **cdist** itself mostly cares about the script
being marked as an *executable*, which it will take as a clue for invoking
that script *directly* (instead of passing it to a shell as an argument).

The **shebang** magic is handled by the usual process `exec` mechanisms
of the host OS (where the script is invoked) that will take over from
that point on.


Here is a simplified summary :

+-------------+---------------+------------------------------+--------------+--------------------------------------------------------+
| executable? | shebang       | invocation resembles         | interpreter  | remarks                                                |
+=============+===============+==============================+==============+========================================================+
| yes         | `#!/bin/sh`   | `/path/to/script`            | `/bin/sh`    | shebang **honored** by OS                              |
+-------------+---------------+------------------------------+--------------+--------------------------------------------------------+
| yes         | `#!/bin/bash` | `/path/to/script`            | `/bin/bash`  | shebang **honored** by OS                              |
+-------------+---------------+------------------------------+--------------+--------------------------------------------------------+
| yes         |               | `/path/to/script`            | *uncertain*  | shebang **absent**                                     |
+-------------+---------------+------------------------------+--------------+--------------------------------------------------------+
| no          | `#!/bin/sh`   | `/bin/sh -e /path/to/script` | `/bin/sh -e` | shebang **irrelevant** (as script is not "executable") |
+-------------+---------------+------------------------------+--------------+--------------------------------------------------------+
| no          | `#!/bin/bash` | `/bin/sh -e /path/to/script` | `/bin/sh -e` | shebang **irrelevant** (as script is not "executable") |
+-------------+---------------+------------------------------+--------------+--------------------------------------------------------+
| no          |               | `/bin/sh -e /path/to/script` | `/bin/sh -e` | shebang **irrelevant** (as script is not "executable") |
+-------------+---------------+------------------------------+--------------+--------------------------------------------------------+

In fact, it's a little bit more involved than the above. Remember:

- As a special case, for any **generated code** (output by `gencode-*` scripts),
  **cdist** will solely rely on the presence (or absence) of a leading **shebang**,
  and set the executable bits accordingly, for obvious reasons.

- In the end, if a script is NOT marked as "executable",
  it will simply be passed as an argument to the configured shell
  that corresponds to the relevant context (i.e. `local_shell` or `remote_shell`),
  if one is defined within the **cdist** configuration,
  or else to `/bin/sh -e`, as a fallback in in both cases.

Well, there are also some gory implementation details
(related to how environment variables get propagated),
but those should normally have no relevance to this discussion.


The API between **cdist** and any polyglot extensions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Conceptually, the API, based on well-known UNIX constructs,
remains exactly the same as it is for
any extension written for the **POSIX shell**.

Basically, you are all set as long as your scripting language is capable of:

- accessing **environment variables**;
- reading from and writing to the **filesystem** (files, directories, ...);
- reading from :file:`STDIN` and writing to :file:`STDOUT` (and eventually to :file:`STDERR`)
- **executing** other programs/commands;
- **exiting** with an appropriate **status code** (where 0=>success).

For all we know, no serious scripting language out there
would be missing any such basics.

The actual syntax and mechanisms will obviously be different,
the shell idioms usually being much more concise for this sort of thing,
as expected.

See the below example entitled "`Interacting with the cdist API`_".


Examples
-------------------

Interacting with the cdist API
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As an API example, here's an excerpt from a **cdist** `type manifest`,
written for the POSIX shell, showing how one would get at the name
of the kernel on the **target host**:::

    kernel_name=$(cat "${__global}/explorer/kernel_name")

    # ... do something with kernel_name ...


In a nutshell, the above snippet gives the general idea about the cdist API:

Basically, we are stuffing a shell variable with the contents of a file...
which happens to contain the output from the `kernel_name` explorer...

Before invoking our `manifest` script,  **cdist** would have, among other things,
run all **global explorers** on the **target host**,
collected and copied their outputs under a temporary directory on the **master**, and
set a specific environment variable (`$__global`)
to the path of a specifc subdirectory of that temporary working area.

At this point, that file (which contains the kernel name) is sitting there,
ready to be slurped... which can obviously be done from any language
that can access environment variables and read files from the filesystem...

Here's how you could do the same thing in **Python**:

.. code-block:: python

    #!/usr/bin/env python

    import os

    def read_file(path):
        content = ""
        try:
            with open(path, "r") as fd:
                content = fd.read().rstrip('\n')
        except EnvironmentError:
            pass
        return content

    kernel_name = read_file( os.environ['__global'] + '/explorer/kernel_name' )

    # ... do something with kernel_name ...


And in **Perl**, it could look like:

.. code-block:: perl

    #!/usr/bin/env perl

    sub read_file {
        my ($path) = @_;
        return unless open( my $fh, $path );
        local ($/);
        <$fh>
    }

    my $kernel_name = read_file("$ENV{__global}/explorer/kernel_name");

    # ... do something with kernel_name ...


Incidently, this example also helps appreciate some aspects of programming
for the shell... which were designed for this sort of thing in the first place...

A polygot type explorer (in Perl)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here's an imaginary type explorer written in **Perl**,
that ouputs the version of the perl interpreter running on the target host:

.. code-block:: perl

    #!/usr/bin/env perl

    use English;

    print "${PERL_VERSION}\n";

If the path to the intended interpreter can be ascertained, you can
put that down directly on the **shebang**, like so::

     #!/usr/bin/perl

However, more often than not, you would want to rely
on the `env` program (`/usr/bin/env`) to
invoke the first interpreter with the given name (`perl`, in this case)
found on the current PATH, like in the above example.

Don't forget to set *execute* permissions on the script file:::

    chmod a+x ...

Or else **cdist** will feed it to a shell instance...
which may burn your barn... :-)